Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published 10 days ago • 64
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published 12 days ago • 110
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment Paper • 2408.06266 • Published Aug 12, 2024 • 10
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 39 items • Updated Nov 28, 2024 • 360