Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published 1 day ago • 22
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published 4 days ago • 77
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 4 days ago • 219
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning Paper • 2504.08837 • Published 8 days ago • 39
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Paper • 2504.07615 • Published 8 days ago • 24
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images Paper • 2504.08727 • Published 7 days ago • 8
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement Paper • 2504.07934 • Published 8 days ago • 14
Compass Control: Multi Object Orientation Control for Text-to-Image Generation Paper • 2504.06752 • Published 9 days ago • 7
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published 17 days ago • 79
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Paper • 2504.07956 • Published 8 days ago • 43
An Empirical Study of GPT-4o Image Generation Capabilities Paper • 2504.05979 • Published 10 days ago • 59
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 10 days ago • 143
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models Paper • 2504.04718 • Published 12 days ago • 38
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning Paper • 2504.02949 • Published 15 days ago • 19