Towards Visual Text Grounding of Multimodal Large Language Model Paper • 2504.04974 • Published 7 days ago • 9
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published 4 days ago • 18
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Paper • 2504.07956 • Published 4 days ago • 42
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published 13 days ago • 71
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Paper • 2504.06958 • Published 5 days ago • 9
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought Paper • 2504.05599 • Published 7 days ago • 77
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 7 days ago • 158
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) Paper • 2504.03151 • Published 11 days ago • 12
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models Paper • 2504.04718 • Published 8 days ago • 37
Slow-Fast Architecture for Video Multi-Modal Large Language Models Paper • 2504.01328 • Published 13 days ago • 8
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models Paper • 2504.03641 • Published 10 days ago • 13
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving Paper • 2504.02605 • Published 11 days ago • 43
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers Paper • 2504.00502 • Published 14 days ago • 21
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Paper • 2504.02782 • Published 11 days ago • 54
PaperBench: Evaluating AI's Ability to Replicate AI Research Paper • 2504.01848 • Published 12 days ago • 34