Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper β’ 2501.18585 β’ Published 4 days ago β’ 39
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Paper β’ 2501.16411 β’ Published 7 days ago β’ 17
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper β’ 2501.17703 β’ Published 5 days ago β’ 45
Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts Paper β’ 2501.14334 β’ Published 10 days ago β’ 15
GuardReasoner: Towards Reasoning-based LLM Safeguards Paper β’ 2501.18492 β’ Published 4 days ago β’ 74
Optimizing Large Language Model Training Using FP4 Quantization Paper β’ 2501.17116 β’ Published 6 days ago β’ 29
Kimi k1.5: Scaling Reinforcement Learning with LLMs Paper β’ 2501.12599 β’ Published 13 days ago β’ 83
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper β’ 2501.13926 β’ Published 11 days ago β’ 33
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper β’ 2501.07301 β’ Published 21 days ago β’ 89
TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space Paper β’ 2501.12224 β’ Published 13 days ago β’ 46
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper β’ 2501.12380 β’ Published 13 days ago β’ 81
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper β’ 2501.12895 β’ Published 12 days ago β’ 55