MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs Paper • 2411.15296 • Published Nov 22 • 19
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Paper • 2410.16256 • Published Oct 21 • 58
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs Paper • 2410.12405 • Published Oct 16 • 13
LLaVA-Video Collection Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 6 items • Updated Oct 5 • 55
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation Paper • 2410.05363 • Published Oct 7 • 44
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published Sep 24 • 41
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published Jul 16 • 43
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning Paper • 2406.17770 • Published Jun 25 • 18
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published Jun 6 • 72
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Paper • 2406.14544 • Published Jun 20 • 34
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper • 2406.14515 • Published Jun 20 • 32