Surprise result!

#1
by sometimesanotion - opened

@sthenno , @CultriX , I think you'll want to see this. I made this merge because I felt Lamarck hadn't integrated DeepSeek R1 enough, and a model_stock would make the MUSR pop. That's not what happened. Most scores fell slightly towards the average, but - look at the MATH.
newplot (1).png

It appears that R1 and Qwenvergence v9 (hence DRT) are clashing on MUSR, but a model_stock shows where they are synergistic on MATH.

Sign up or log in to comment