plmsmile
's Collections
benchmarks
updated
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper
•
2404.12390
•
Published
•
24
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with
Text-Rich Visual Comprehension
Paper
•
2404.16790
•
Published
•
7
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large
Language Models in Code Generation from Scientific Plots
Paper
•
2405.07990
•
Published
•
16
MuirBench: A Comprehensive Benchmark for Robust Multi-image
Understanding
Paper
•
2406.09411
•
Published
•
18
CVQA: Culturally-diverse Multilingual Visual Question Answering
Benchmark
Paper
•
2406.05967
•
Published
•
5
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation
in Videos
Paper
•
2406.08407
•
Published
•
24
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via
Chart-to-Code Generation
Paper
•
2406.09961
•
Published
•
55
Needle In A Multimodal Haystack
Paper
•
2406.07230
•
Published
•
53
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images
Interleaved with Text
Paper
•
2406.08418
•
Published
•
29
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for
Southeast Asian Languages
Paper
•
2406.10118
•
Published
•
31
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Paper
•
2406.10227
•
Published
•
9
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and
Instruction-Tuning Dataset for LVLMs
Paper
•
2406.11833
•
Published
•
61
Benchmarking Multi-Image Understanding in Vision and Language Models:
Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Paper
•
2406.12742
•
Published
•
14
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of
Multimodal Large Language Models
Paper
•
2406.11230
•
Published
•
33
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video
Understanding
Paper
•
2406.14515
•
Published
•
33
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in
Large Video-Language Models
Paper
•
2406.16338
•
Published
•
25
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal
LLMs
Paper
•
2406.18521
•
Published
•
29
We-Math: Does Your Large Multimodal Model Achieve Human-like
Mathematical Reasoning?
Paper
•
2407.01284
•
Published
•
75
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and
Efficient Evaluation
Paper
•
2407.00468
•
Published
•
34
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding
Paper
•
2407.01791
•
Published
•
5
HEMM: Holistic Evaluation of Multimodal Foundation Models
Paper
•
2407.03418
•
Published
•
8