Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published 1 day ago • 14
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 2 days ago • 161
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 2 days ago • 61
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Paper • 2501.09781 • Published 8 days ago • 20
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong Paper • 2501.09775 • Published 9 days ago • 26
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published 8 days ago • 33
Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published 10 days ago • 47
A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following Paper • 2501.08187 • Published 10 days ago • 24
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks Paper • 2501.08326 • Published 10 days ago • 31
Enhancing Automated Interpretability with Output-Centric Feature Descriptions Paper • 2501.08319 • Published 10 days ago • 10
OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training Paper • 2501.08197 • Published 10 days ago • 7
Potential and Perils of Large Language Models as Judges of Unstructured Textual Data Paper • 2501.08167 • Published 10 days ago • 6
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them Paper • 2501.08292 • Published 10 days ago • 16
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 10 days ago • 268
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published 12 days ago • 48
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 11 days ago • 85
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding Paper • 2501.05452 • Published 15 days ago • 15
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? Paper • 2501.05510 • Published 15 days ago • 37
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution Paper • 2501.05040 • Published 16 days ago • 14