Surprise result!
#1
by
sometimesanotion
- opened
@sthenno
,
@CultriX
, I think you'll want to see this. I made this merge because I felt Lamarck hadn't integrated DeepSeek R1 enough, and a model_stock would make the MUSR pop. That's not what happened. Most scores fell slightly towards the average, but - look at the MATH.
It appears that R1 and Qwenvergence v9 (hence DRT) are clashing on MUSR, but a model_stock shows where they are synergistic on MATH.
Amazing! --but, I got a lot of confusions in MATH. See: https://huggingface.co/bamec66557/Qwen-2.5-14B-MINUS/discussions/1#6792f65509f4f9090f0c62bd