Sky-T1-32B-Flash / README.md
NovaSkyAI's picture
Update README.md
0dccf55 verified
metadata
library_name: transformers
datasets:
  - BAAI/TACO
  - tasksource/PRM800K
language:
  - en
base_model:
  - Qwen/Qwen2.5-32B-Instruct
  - NovaSky-AI/Sky-T1-32B-Preview
license: apache-2.0

Model Details

Model Description

This is a 32B reasoning model preference optimized on top of Sky-T1-32B-Preview to significantly reduce generation lengths while maintaining accuracy. The performance is on par with o1-preview model in both math and coding, while reducing generation lengths by up to 57% relative to Sky-T1-32B-Preview. Please see our blog post for more details.

  • Developed by: NovaSky Team from Sky Computing Lab at UC Berkeley.

Training Details

Training Data

10K preference pairs in math and coding domains, generated by Sky-T1-32B-Preview.

Training Procedure

We perform Simple Policy Optimization (SimPO) with a batch size of 96, learning rate of 5e-7, gamma of 0.3, and beta of 2.0.

Speeds

We use Llama-Factory for training. On 8xH100, the SimPO training takes ~2.5 hours with DeepSpeed Zero-3 Offload.

Evaluation

Sky-T1-32B-Preview Sky-T1-32B-Flash Qwen2.5-32B-Instruct QwQ-32B- Base DeepSeek-R1-Distill-Qwen-32B
Math500 Acc 88.6 88.6 76.2 89.2 90.8
Avg Len 2124 1417 (-33%) 522 2089 2010
AIME24 Acc 43.3 43.3 16.7 50 66.7
Avg Len 6881 4365 (-37%) 970 7379 9173
LCB Easy Acc 87.4 89 84.6 90.7 91.2
Avg Len 3415 2265 (-34%) 414 3255 2775
LCB Medium Acc 56.8 56.3 40.8 56.3 76.7
Avg Len 8263 4389 (-47%) 535 6742 6324
LCB Hard Acc 17.9 17.9 9.8 17.1 38.2
Avg Len 14564 6199 (-57%) 618 10450 10448
MMLU Acc 82.4 81.7 80.1 85.2 82.1
Avg Len 1087 799 (-17%) 312 1041 774
GPQA Diamond Acc 56.8 56.6 45.5 52.5 62.6
Avg Len 3503 2148 (-39%) 600 3302 5108

Acknowledgement

We would like to thanks the compute resources from Lambda Lab and AnyScale.

License

Apache-2.0

Citation

Please considering citing our blog post if you found it useful for your research. Thank you!

@misc{reduce_overthinking_2025,
  author       = {NovaSky Team},
  title        = {Think Less, Achieve More: Cut Reasoning Costs by 50% Without Sacrificing Accuracy},
  howpublished = {https://novasky-ai.github.io/posts/reduce-overthinking},
  note         = {Accessed: 2025-01-23},
  year         = {2025}
}