Running
2
🏢
HumanEval V Benchmark Viewer
A simple data viewer for the HumanEval-V benchmark.
Visual Centric Coding Tasks for Large Multimodal Models
📄 Paper • 🏠 Home Page • 💻 GitHub Repository • 🏆 Leaderboard • 🤗 Dataset • 🤗 Dataset Viewer
HumanEval-V is a novel and lightweight benchmark designed to evaluate the visual understanding and reasoning capabilities of Large Multimodal Models (LMMs) through coding tasks. The dataset comprises 108 entry-level Python programming challenges, adapted from platforms like CodeForces and Stack Overflow. Each task includes visual context that is indispensable to the problem, requiring models to perceive, reason, and generate Python code solutions accordingly.
Key features: