SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement Paper • 2504.07934 • Published 7 days ago • 14
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Paper • 2504.06148 • Published 9 days ago • 12
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models Paper • 2503.20198 • Published 22 days ago • 4