π Introduction
DistilQwen2.5-R1 Series: Advanced Reasoning Models
Overview
As large language models (LLMs) evolve toward deep reasoning capabilities, deploying them in resource-constrained environments (e.g., mobile devices, edge computing) remains challenging. The DistilQwen2.5-R1 series addresses this by transferring reasoning capabilities from ultra-large models (e.g., DeepSeek-R1) to compact models through innovative distillation techniques, achieving high performance while reducing computational costs.
Key Innovations
1. Cognitive Trajectory Adaptation Framework
- Challenge: Discrepancies in reasoning paths between large and small models (e.g., small models struggle to comprehend large models' high-level problem-solving logic)
- Solutions:
- Phase 1: CoT Data Optimization
- Difficulty grading of large model reasoning chains (simple/medium/hard) via LLM-as-a-Judge
- Adaptive adjustments: Expand simple chains and simplify complex chains to create medium-difficulty datasets digestible by small models
- Phase 2: Preference Optimization
- Generate contrastive data pairs containing correct/incorrect reasoning paths
- Apply DPO algorithm with tailored configurations to enhance reasoning path discrimination
- Phase 1: CoT Data Optimization
2. Performance Highlights
- DistilQwen2.5-R1-7B outperforms comparable distilled models (e.g., OpenThinker-7B) across multiple benchmarks
- Successfully transfers high-order reasoning patterns originally dependent on large model parameter scales
Technical Advantages
- Dynamic data optimization eliminates cognitive trajectory discrepancies
- Two-stage training balances reasoning accuracy and computational efficiency
- Enables complex task reasoning in edge computing environments
π Quick Start
Here provides a code snippet with apply_chat_template
to show you how to load the tokenizer and model and how to generate contents.
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
"alibaba-pai/DistilQwen2.5-R1-32B",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("alibaba-pai/DistilQwen2.5-R1-32B")
prompt = "Give me a short introduction to large language model."
messages=[
{"role": "system", "content": "Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\n\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution} <|end_of_solution|> Now, try to solve the following question through the above guidelines:"},
{"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=2048οΌ
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
π Evaluation
We compared DistilQwen2.5-R1 series with leading reasoning models across four benchmarks:
7B Model Comparison
Model | Training Data Size | AIME2024 | MATH-500 | GPQA Diamond | LiveCodeBench V2 |
---|---|---|---|---|---|
DeepSeek-R1-Distill-Qwen-7B | 800k | 55.5 | 92.8 | 49.1 | - |
Bespoke-Stratos-7B | 17k | 20.0 | 82.0 | 37.8 | 36.1 |
OpenThinker-7B | 114k | 31.3 | 83.0 | 42.4 | 39.9 |
DistilQwen2.5-R1-7B | 105k | 43.33 | 88.4 | 42.93 | 46.38 |
32B Model Comparison
Model | Training Data Size | AIME2024 | MATH-500 | GPQA Diamond | LiveCodeBench V2 |
---|---|---|---|---|---|
DeepSeek-R1-Distill-Qwen-32B | 800k | 72.6 | 94.3 | 62.1 | - |
Sky-T1-32B-Preview | 17k | 43.3 | 86.4 | 56.8 | - |
OpenThinker-32B | 114k | 66.0 | 90.6 | 61.6 | 68.9 |
DistilQwen2.5-R1-32B | 105k | 70.0 | 93.8 | 62.12 | 65.95 |
Key highlights:
- DistilQwen2.5-R1 models achieve superior performance while using 6.1Γ less training data than DeepSeek-R1-Distill-Qwen series
- Maintains open-source training lineage using filtered OpenThoughts subsets
- Leads in LiveCodeBench V2 among open-source trained models
- Downloads last month
- 2
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.