V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Paper • 2504.06148 • Published 14 days ago • 12
VideoMind Collection VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning • 8 items • Updated 22 days ago • 3
Edit Transfer: Learning Image Editing via Vision In-Context Relations Paper • 2503.13327 • Published Mar 17 • 29
Long-Context Autoregressive Video Modeling with Next-Frame Prediction Paper • 2503.19325 • Published 28 days ago • 72
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning Paper • 2503.13444 • Published Mar 17 • 15 • 2
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary Paper • 2503.09402 • Published Mar 12 • 6
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary Paper • 2503.09402 • Published Mar 12 • 6 • 2