FlowReasoner: Reinforcing Query-Level Meta-Agents Paper • 2504.15257 • Published about 21 hours ago • 30
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers Paper • 2504.00502 • Published 21 days ago • 21
Efficient Inference for Large Reasoning Models: A Survey Paper • 2503.23077 • Published 24 days ago • 46
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation Paper • 2503.19622 • Published 28 days ago • 30
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations Paper • 2502.14881 • Published Feb 14 • 1
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations Paper • 2502.14881 • Published Feb 14 • 1
VLM^2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues Paper • 2502.12084 • Published Feb 17 • 30
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Paper • 2502.09621 • Published Feb 13 • 28
Running on Zero 1.95k 1.95k Chat With Janus-Pro-7B 🌍 A unified multimodal understanding and generation model.