How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients Paper • 2504.10766 • Published 7 days ago • 39
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing Paper • 2504.07964 • Published 12 days ago • 61
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? Paper • 2504.06514 • Published 13 days ago • 39
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning Paper • 2504.05520 • Published 14 days ago • 9 • 2
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning Paper • 2504.05520 • Published 14 days ago • 9
Difficulty Estimation Math Datasets Collection We perform difficulty estimation on popular math datasets. • 5 items • Updated 13 days ago • 1
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning Paper • 2504.05520 • Published 14 days ago • 9