Model Details

Model Description

This is a 32B reasoning model preference optimized on top of Sky-T1-32B-Preview to significantly reduce generation lengths while maintaining accuracy. The performance is on par with o1-preview model in both math and coding, while reducing generation lengths by up to 57% relative to Sky-T1-32B-Preview. Please see our blog post for more details.

  • Developed by: NovaSky Team from Sky Computing Lab at UC Berkeley.

Training Details

Training Data

10K preference pairs in math and coding domains, generated by Sky-T1-32B-Preview.

Training Procedure

We perform Simple Policy Optimization (SimPO) with a batch size of 96, learning rate of 5e-7, gamma of 0.3, and beta of 2.0.

Speeds

We use Llama-Factory for training. On 8xH100, the SimPO training takes ~2.5 hours with DeepSpeed Zero-3 Offload.

Evaluation

Sky-T1-32B-Preview Sky-T1-32B-Flash Qwen2.5-32B-Instruct QwQ-32B- Base DeepSeek-R1-Distill-Qwen-32B
Math500 Acc 88.6 88.6 76.2 89.2 90.8
Avg Len 2124 1417 (-33%) 522 2089 2010
AIME24 Acc 43.3 43.3 16.7 50 66.7
Avg Len 6881 4365 (-37%) 970 7379 9173
LCB Easy Acc 87.4 89 84.6 90.7 91.2
Avg Len 3415 2265 (-34%) 414 3255 2775
LCB Medium Acc 56.8 56.3 40.8 56.3 76.7
Avg Len 8263 4389 (-47%) 535 6742 6324
LCB Hard Acc 17.9 17.9 9.8 17.1 38.2
Avg Len 14564 6199 (-57%) 618 10450 10448
MMLU Acc 82.4 81.7 80.1 85.2 82.1
Avg Len 1087 799 (-17%) 312 1041 774
GPQA Diamond Acc 56.8 56.6 45.5 52.5 62.6
Avg Len 3503 2148 (-39%) 600 3302 5108

Acknowledgement

We would like to thanks the compute resources from Lambda Lab and AnyScale.

Citation

Please considering citing our blog post if you found it useful for your research. Thank you!

@misc{reduce_overthinking_2025,
  author       = {NovaSky Team},
  title        = {Think Less, Achieve More: Cut Reasoning Costs by 50% Without Sacrificing Accuracy},
  howpublished = {https://novasky-ai.github.io/posts/reduce-overthinking},
  note         = {Accessed: 2025-01-23},
  year         = {2025}
}
Downloads last month
267
Safetensors
Model size
32.8B params
Tensor type
BF16
·
Inference API
Input a message to start chatting with NovaSky-AI/Sky-T1-32B-Flash.

Model tree for NovaSky-AI/Sky-T1-32B-Flash

Base model

Qwen/Qwen2.5-32B
Finetuned
(8)
this model
Finetunes
1 model
Quantizations
3 models

Datasets used to train NovaSky-AI/Sky-T1-32B-Flash