Towards Visual Text Grounding of Multimodal Large Language Model Paper • 2504.04974 • Published 4 days ago • 3
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing Paper • 2504.07964 • Published about 18 hours ago • 12
MM-IFEngine: Towards Multimodal Instruction Following Paper • 2504.07957 • Published about 18 hours ago • 24
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement Paper • 2504.07934 • Published about 19 hours ago • 7
MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations Paper • 2504.07830 • Published about 21 hours ago • 12
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Paper • 2504.07956 • Published about 18 hours ago • 30
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Paper • 2504.07960 • Published about 18 hours ago • 23
HoloPart: Generative 3D Part Amodal Segmentation Paper • 2504.07943 • Published about 19 hours ago • 21
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published 9 days ago • 23
RobustDexGrasp: Robust Dexterous Grasping of General Objects from Single-view Perception Paper • 2504.05287 • Published 4 days ago • 2
Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling Paper • 2504.05410 • Published 4 days ago • 1
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting Paper • 2504.05541 • Published 4 days ago • 10
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Paper • 2504.06958 • Published 2 days ago • 6
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion Paper • 2504.04010 • Published 6 days ago • 7
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding Paper • 2504.06719 • Published 2 days ago • 6
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility Paper • 2504.07086 • Published 2 days ago • 13