A bigger and harder pain point for reasoning model is to switch modes.
We now have powerful models capable of either system I thinking or system II thinking, but not both, much less switching between the two. But humans can do this quite easily.
ChatGPT and others push the burden to users to switch between models. I guess this is the best we have now.
2 replies
ยท
reacted to AdinaY's
post with ๐ฅabout 24 hours ago
It's beating Claude 3.7 on (competitive) programming โa domain Anthropic has been historically really strong atโ and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!
And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3
I'm currently experimenting with the SFT dataset Lunzima/alpaca_like_dataset to further boost the performance of NQLSG-Qwen2.5-14B-MegaFusion-v9.x. This includes data sourced from DeepSeek-R1 or other cleaned results (excluding CoTs). Additionally, datasets that could potentially enhance the model's performance in math and programming/code, as well as those dedicated to specific uses like Swahili, are part of the mix. @sometimesanotion@sthenno@wanlige