Question: did you use beta=0.1?
#1
by
eengad
- opened
(default in alignment handbook).
BTW I ran MT-bench and got:
gemma-2b-zephyr-dpo 4.347826
gemma-2b-zephyr-sft 4.215625
Here is the run: https://wandb.ai/llm_surgery/gemma-zephyr/runs/lbqi9kvq
nope, beta=0.01. I think the default is 0.05 in the new recipe
tcapelle
changed discussion status to
closed
tcapelle
changed discussion status to
open
The idea here, was to use the "original recipe" and in that recipe, beta: 0.01
> https://github.com/huggingface/alignment-handbook/blob/ff618a4d13a2c77cf97479fac8af2c576619062a/recipes/zephyr-7b-beta/dpo/config_full.yaml#L16