phi-2-orange / README.md
rhysjones's picture
Update README.md
fb8b52a
|
raw
history blame
1.79 kB
metadata
license: mit

Phi-2 Orange

A two-step finetune of Phi-2.

First using a collection of broad training data:

And then a DPO finetune using:

Evaluations

Evaluations done using mlabonne's usefull Colab notebook llm-autoeval. Also check out the alternative leaderboard at Yet_Another_LLM_Leaderboard

Model AGIEval GPT4All TruthfulQA Bigbench Average
phi-2-orange 33.29 71.39 49.9 37.14 47.93
phi-2-dpo 30.39 71.68 50.75 34.9 46.93
dolphin-2_6-phi-2 33.12 69.85 47.39 37.2 46.89
phi-2 27.98 70.8 44.43 35.21 44.61