Post
2688
Less is More for Reasoning (LIMO): a 32B model fine-tuned with 817 examples can beat o1-preview on math reasoning! ๐คฏ
Do we really need o1's huge RL procedure to see reasoning emerge? It seems not.
Researchers from Shanghai Jiaotong University just demonstrated that carefully selected examples can boost math performance in large language models using SFT โno huge datasets or RL procedures needed.
Their procedure allows Qwen2.5-32B-Instruct to jump from 6.5% to 57% on AIME and from 59% to 95% on MATH, while using only 1% of the data in previous approaches.
โก The Less-is-More Reasoning Hypothesis:
โฃ Minimal but precise examples that showcase optimal reasoning patterns matter more than sheer quantity
โฃ Pre-training knowledge plus sufficient computational resources at inference levels up math skills
โก๏ธ Core techniques:
โฃ High-quality reasoning chains with self-verification steps
โฃ 817 handpicked problems that encourage deeper reasoning
โฃ Enough inference-time computation to allow extended reasoning
๐ช Efficiency gains:
โฃ Only 817 examples instead of 100k+
โฃ 40.5% absolute improvement across 10 diverse benchmarks, outperforming models trained on 100x more data
This really challenges the notion that SFT leads to memorization rather than generalization! And opens up reasoning to GPU-poor researchers ๐
Read the full paper here ๐ย LIMO: Less is More for Reasoning (2502.03387)
Do we really need o1's huge RL procedure to see reasoning emerge? It seems not.
Researchers from Shanghai Jiaotong University just demonstrated that carefully selected examples can boost math performance in large language models using SFT โno huge datasets or RL procedures needed.
Their procedure allows Qwen2.5-32B-Instruct to jump from 6.5% to 57% on AIME and from 59% to 95% on MATH, while using only 1% of the data in previous approaches.
โก The Less-is-More Reasoning Hypothesis:
โฃ Minimal but precise examples that showcase optimal reasoning patterns matter more than sheer quantity
โฃ Pre-training knowledge plus sufficient computational resources at inference levels up math skills
โก๏ธ Core techniques:
โฃ High-quality reasoning chains with self-verification steps
โฃ 817 handpicked problems that encourage deeper reasoning
โฃ Enough inference-time computation to allow extended reasoning
๐ช Efficiency gains:
โฃ Only 817 examples instead of 100k+
โฃ 40.5% absolute improvement across 10 diverse benchmarks, outperforming models trained on 100x more data
This really challenges the notion that SFT leads to memorization rather than generalization! And opens up reasoning to GPU-poor researchers ๐
Read the full paper here ๐ย LIMO: Less is More for Reasoning (2502.03387)