metadata
license: apache-2.0
datasets:
- argilla/distilabel-intel-orca-dpo-pairs
language:
- en
tags:
- distilabel
- dpo
- rlaif
- rlhf
⚗️ distilabeled Marcoro14 7B Slerp
Benchmark results
For benchmarking we used the famous "Nous" or "Teknium" benchmark. You can find below an overview, including our first experiment with a less ambitious dataset filtering (removing ties and score>5
).
For running the benchmark we used another awesome contribution from Maxime: LLM AutoEval, check it out!
Model | AGIEval | GPT4ALL | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
argilla/distilabeled-Marcoro14-7B-slerp | 45.4 | 76.47 | 65.46 | 47.19 | 58.63 |
Marcoro14-7B-slerp | 44.66 | 76.24 | 64.15 | 45.64 | 57.67 |
argilla/distilabeled-Hermes-2.5-Mistral-7B | 44.64 | 73.35 | 55.96 | 42.21 | 54.04 |