metadata

license: apache-2.0
datasets:
  - argilla/distilabel-intel-orca-dpo-pairs
language:
  - en
tags:
  - distilabel
  - dpo
  - rlaif
  - rlhf

⚗️ distilabeled Marcoro14 7B Slerp

Benchmark results

For benchmarking we used the famous "Nous" or "Teknium" benchmark. You can find below an overview, including our first experiment with a less ambitious dataset filtering (removing ties and score>5).

For running the benchmark we used another awesome contribution from Maxime: LLM AutoEval, check it out!

Model	AGIEval	GPT4ALL	TruthfulQA	Bigbench	Average
argilla/distilabeled-Marcoro14-7B-slerp	45.4	76.47	65.46	47.19	58.63
Marcoro14-7B-slerp	44.66	76.24	64.15	45.64	57.67
argilla/distilabeled-Hermes-2.5-Mistral-7B	44.64	73.35	55.96	42.21	54.04