Model Details
Model Description
This is a 32B reasoning model preference optimized on top of Sky-T1-32B-Preview to significantly reduce generation lengths while maintaining accuracy. The performance is on par with o1-preview model in both math and coding, while reducing generation lengths by up to 57% relative to Sky-T1-32B-Preview. Please see our blog post for more details.
- Developed by: NovaSky Team from Sky Computing Lab at UC Berkeley.
Training Details
Training Data
10K preference pairs in math and coding domains, generated by Sky-T1-32B-Preview.
Training Procedure
We perform Simple Policy Optimization (SimPO) with a batch size of 96, learning rate of 5e-7, gamma of 0.3, and beta of 2.0.
Speeds
We use Llama-Factory for training. On 8xH100, the SimPO training takes ~2.5 hours with DeepSpeed Zero-3 Offload.
Evaluation
Sky-T1-32B-Preview | Sky-T1-32B-Flash | Qwen2.5-32B-Instruct | QwQ-32B- Base | DeepSeek-R1-Distill-Qwen-32B | ||
---|---|---|---|---|---|---|
Math500 | Acc | 88.6 | 88.6 | 76.2 | 89.2 | 90.8 |
Avg Len | 2124 | 1417 (-33%) | 522 | 2089 | 2010 | |
AIME24 | Acc | 43.3 | 43.3 | 16.7 | 50 | 66.7 |
Avg Len | 6881 | 4365 (-37%) | 970 | 7379 | 9173 | |
LCB Easy | Acc | 87.4 | 89 | 84.6 | 90.7 | 91.2 |
Avg Len | 3415 | 2265 (-34%) | 414 | 3255 | 2775 | |
LCB Medium | Acc | 56.8 | 56.3 | 40.8 | 56.3 | 76.7 |
Avg Len | 8263 | 4389 (-47%) | 535 | 6742 | 6324 | |
LCB Hard | Acc | 17.9 | 17.9 | 9.8 | 17.1 | 38.2 |
Avg Len | 14564 | 6199 (-57%) | 618 | 10450 | 10448 | |
MMLU | Acc | 82.4 | 81.7 | 80.1 | 85.2 | 82.1 |
Avg Len | 1087 | 799 (-17%) | 312 | 1041 | 774 | |
GPQA Diamond | Acc | 56.8 | 56.6 | 45.5 | 52.5 | 62.6 |
Avg Len | 3503 | 2148 (-39%) | 600 | 3302 | 5108 |
Acknowledgement
We would like to thanks the compute resources from Lambda Lab and AnyScale.
Citation
Please considering citing our blog post if you found it useful for your research. Thank you!
@misc{reduce_overthinking_2025,
author = {NovaSky Team},
title = {Think Less, Achieve More: Cut Reasoning Costs by 50% Without Sacrificing Accuracy},
howpublished = {https://novasky-ai.github.io/posts/reduce-overthinking},
note = {Accessed: 2025-01-23},
year = {2025}
}
- Downloads last month
- 267
Model tree for NovaSky-AI/Sky-T1-32B-Flash
Base model
Qwen/Qwen2.5-32B