OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? Paper • 2501.05510 • Published 14 days ago • 37
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 139
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published Nov 26, 2024 • 79
Contrastive Localized Language-Image Pre-Training Paper • 2410.02746 • Published Oct 3, 2024 • 34
LLaVA-Critic: Learning to Evaluate Multimodal Models Paper • 2410.02712 • Published Oct 3, 2024 • 35
LLaVA-Video Collection Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 6 items • Updated Oct 5, 2024 • 57
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published Jul 17, 2024 • 34
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models Paper • 2402.07865 • Published Feb 12, 2024 • 13
Aligning Large Multimodal Models with Factually Augmented RLHF Paper • 2309.14525 • Published Sep 25, 2023 • 30
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models Paper • 2308.01390 • Published Aug 2, 2023 • 33
DIALGEN: Collaborative Human-LM Generated Dialogues for Improved Understanding of Human-Human Conversations Paper • 2307.07047 • Published Jul 13, 2023 • 15