WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training Paper • 2501.18511 • Published 4 days ago • 15
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published 12 days ago • 55
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 12 days ago • 78
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 12 days ago • 284
PaSa: An LLM Agent for Comprehensive Academic Paper Search Paper • 2501.10120 • Published 17 days ago • 42
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 20 days ago • 271
Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published 20 days ago • 52
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 21 days ago • 89
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 24 days ago • 59
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published 27 days ago • 84
Search-o1: Agentic Search-Enhanced Large Reasoning Models Paper • 2501.05366 • Published 25 days ago • 86
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published 26 days ago • 90
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published 26 days ago • 252
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published 27 days ago • 68
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models Paper • 2501.03124 • Published 28 days ago • 14
Test-time Computing: from System-1 Thinking to System-2 Thinking Paper • 2501.02497 • Published 29 days ago • 41
Personalized Graph-Based Retrieval for Large Language Models Paper • 2501.02157 • Published about 1 month ago • 28