Spaces:

HumanEval-V
/

README

No application file

App Files Files Community

README / README.md

zfj1998's picture

Update README.md

54427b3 verified 17 days ago

|

1.51 kB

	---
	title: README
	emoji: 💻
	colorFrom: green
	colorTo: red
	sdk: streamlit
	pinned: false
	---
	### HumanEval-V: A Lightweight Visual Understanding and Reasoning Benchmark for Evaluating LMMs through Coding Tasks

	<p align="center"> <a href="https://arxiv.org/abs/2410.12381">📄 Paper</a> • <a href="https://humaneval-v.github.io">🏠 Home Page</a> • <a href="https://github.com/HumanEval-V/HumanEval-V-Benchmark">💻 GitHub Repository</a> • <a href="https://humaneval-v.github.io/#leaderboard">🏆 Leaderboard</a> • <a href="https://huggingface.co/datasets/HumanEval-V/HumanEval-V-Benchmark">🤗 Dataset</a> • <a href="https://huggingface.co/spaces/HumanEval-V/HumanEval-V-Benchmark-Viewer">🤗 Dataset Viewer</a> </p>

	HumanEval-V is a novel and lightweight benchmark designed to evaluate the visual understanding and reasoning capabilities of Large Multimodal Models (LMMs) through coding tasks. The dataset comprises 108 entry-level Python programming challenges, adapted from platforms like CodeForces and Stack Overflow. Each task includes visual context that is indispensable to the problem, requiring models to perceive, reason, and generate Python code solutions accordingly.

	Key features:
	- Visual coding tasks that require understanding images to solve.
	- Entry-level difficulty, making it ideal for assessing the baseline performance of foundational LMMs.
	- Handcrafted test cases for evaluating code correctness through an execution-based metric pass@k.