metadata
license: mit
datasets:
- Open-Orca/SlimOrca-Dedup
- migtissera/Synthia-v1.3
- LDJnr/Verified-Camel
- LDJnr/Pure-Dove
- LDJnr/Capybara
- meta-math/MetaMathQA
- Intel/orca_dpo_pairs
- argilla/ultrafeedback-binarized-preferences-cleaned
Phi-2 Orange
A two-step finetune of Phi-2, with a bit of zest.
There is an updated model at rhysjones/phi-2-orange-v2 which has higher evals, if you wish to test.
Training details
A first finetune using a collection of broad training data:
- Open-Orca/SlimOrca-Dedup
- migtissera/Synthia-v1.3
- LDJnr/Verified-Camel
- LDJnr/Pure-Dove
- LDJnr/Capybara
- meta-math/MetaMathQA
And then a DPO finetune using:
Run within Ollama
If you're using Ollama, you can download and run using:
ollama run rhysjones/phi-2-orange
Prompt Format
Phi-2 Orange uses ChatML as the prompt format, with or without the system instruction.
To prompt with a system instruction (use whatever system prompt you like):
<|im_start|>system
You are a helpful assistant for Python which outputs in Markdown format.<|im_end|>
<|im_start|>user
Write a function to calculate the Fibonacci sequence<|im_end|>
<|im_start|>assistant
You can also omit the system prompt if you wish:
<|im_start|>user
Why is the sky blue?<|im_end|>
<|im_start|>assistant
Evaluations
Evaluations done using mlabonne's usefull Colab notebook llm-autoeval. Also check out the alternative leaderboard at Yet_Another_LLM_Leaderboard
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
phi-2-orange | 33.37 | 71.33 | 49.87 | 37.3 | 47.97 |
phi-2-dpo | 30.39 | 71.68 | 50.75 | 34.9 | 46.93 |
dolphin-2_6-phi-2 | 33.12 | 69.85 | 47.39 | 37.2 | 46.89 |
phi-2 | 27.98 | 70.8 | 44.43 | 35.21 | 44.61 |