Spaces:

ChenYi99
/

EgoPlan-Bench_Leaderboard

Sleeping

App Files Files Community

EgoPlan-Bench_Leaderboard / constants.py

ChenYi99

Update EgoPlan-Bench Leaderboard.

6c61a43 7 months ago

raw

history blame contribute delete

3.16 kB

	SPLIT_INFO = ['Validation Split', 'Test Split']
	MODEL_INFO = ['Model', 'Large Language Model']
	DEFAULT_SPLIT = 'Validation Split'
	DATA_TITILE_TYPE = ["markdown", "markdown", "number", "number"]
	CSV_DIR = "file/result_egoplan_bench.csv"

	COLUMN_NAMES = MODEL_INFO + SPLIT_INFO

	TABLE_INTRODUCTION = "In the table below, we summarize the model performance on the validation and test splits."

	LEADERBORAD_INTRODUCTION = """# EgoPlan-Bench Leaderboard 🏆

	Welcome to the EgoPlan-Bench Leaderboard! This leaderboard ranks Multimodal Large Language Models (MLLMs) based on their performance on EgoPlan-Bench, which evaluates their planning abilities in real-world, egocentric scenarios. EgoPlan-Bench features realistic tasks, diverse action plans, and intricate visual observations, providing a challenging assessment platform for MLLMs. Explore the leaderboard to track the progress of MLLMs towards achieving human-level planning ! 🛫

	Join our evaluation by sending an email 📧 ([email protected])! You may also read the [Code](https://github.com/ChenYi99/EgoPlan), [Paper](https://arxiv.org/pdf/2312.06722), and [Project page](https://chenyi99.github.io/ego_plan/) for more detailed information 🤗
	"""

	LEADERBORAD_INFO = """
	The pursuit of artificial general intelligence (AGI) has been accelerated by Multimodal Large Language Models (MLLMs), which exhibit superior reasoning, generalization capabilities, and proficiency in processing multimodal inputs. A crucial milestone in the evolution of AGI is the attainment of human-level planning, a fundamental ability for making informed decisions in complex environments, and solving a wide range of real-world problems. Despite the impressive advancements in MLLMs, a question remains: How far are current MLLMs from achieving human-level planning?
	To shed light on this question, we introduce EgoPlan-Bench, a comprehensive benchmark to evaluate the planning abilities of MLLMs in real-world scenarios from an egocentric perspective, mirroring human perception. EgoPlan-Bench emphasizes the evaluation of planning capabilities of MLLMs, featuring realistic tasks, diverse action plans, and intricate visual observations. Our rigorous evaluation of a wide range of MLLMs reveals that EgoPlan-Bench poses significant challenges, highlighting a substantial scope for improvement in MLLMs to achieve human-level task planning. To facilitate this advancement, we further present EgoPlan-IT, a specialized instruction-tuning dataset that effectively enhances model performance on EgoPlan-Bench. We have made all codes, data, and a maintained benchmark leaderboard available to advance future research.
	"""



	CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
	CITATION_BUTTON_TEXT = r"""@inproceedings{Liu2023EvalCrafterBA,
	title={EvalCrafter: Benchmarking and Evaluating Large Video Generation Models},
	author={Yaofang Liu and Xiaodong Cun and Xuebo Liu and Xintao Wang and Yong Zhang and Haoxin Chen and Yang Liu and Tieyong Zeng and Raymond Chan and Ying Shan},
	year={2023},
	url={https://api.semanticscholar.org/CorpusID:264172222}
	}"""