Spaces:
Running
A newer version of the Gradio SDK is available:
5.27.0
title: Agent Papers
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: apache-2.0
short_description: LLM Agent Research Collection
sdk_version: 5.19.0
Large Language Model Agent Papers Explorer
This is a companion application for the paper "Large Language Model Agent: A Survey on Methodology, Applications and Challenges" (arXiv:2503.21460).
About
The application provides an interactive interface to explore papers from our comprehensive survey on Large Language Model (LLM) agents. It allows you to search and filter papers across key categories including agent construction, collaboration mechanisms, evolution, tools, security, benchmarks, and applications.
Key Features
- Paper Search: Find papers by keywords, titles, summaries, or publication venues
- Category Filtering: Browse papers by sections/categories
- Year Filtering: Filter papers by publication year
- Sorting Options: Sort papers by year, title, or section
- Paper Statistics: View distributions of papers across categories and years
- Direct Links: Access original papers through direct links to their sources
Collection Overview
Our paper collection spans multiple categories:
- Introduction: Survey papers and foundational works introducing LLM agents
- Construction: Papers on building and designing agents
- Collaboration: Multi-agent systems and communication methods
- Evolution: Learning and improvement of agents over time
- Tools: Integration of external tools with LLM agents
- Security: Safety, alignment, and ethical considerations
- Datasets & Benchmarks: Evaluation frameworks and resources
- Applications: Domain-specific uses in science, medicine, etc.
Related Resources
How to Contribute
If you have a paper that you believe should be included in our collection:
- Check if the paper is already in our database
- Submit your paper at this form or email us at [email protected]
- Include the paper's title, authors, abstract, URL, publication venue, and year
- Suggest a section/category for the paper
Citation
If you find our survey helpful, please consider citing our work:
@article{agentsurvey2025,
title={Large Language Model Agent: A Survey on Methodology, Applications and Challenges},
author={Junyu Luo and Weizhi Zhang and Ye Yuan and Yusheng Zhao and Junwei Yang and Yiyang Gu and Bohan Wu and Binqi Chen and Ziyue Qiao and Qingqing Long and Rongcheng Tu and Xiao Luo and Wei Ju and Zhiping Xiao and Yifan Wang and Meng Xiao and Chenwu Liu and Jingyang Yuan and Shichang Zhang and Yiqiao Jin and Fan Zhang and Xian Wu and Hanqing Zhao and Dacheng Tao and Philip S. Yu and Ming Zhang},
journal={arXiv preprint arXiv:2503.21460},
year={2025}
}
Local Development
To run this application locally:
- Clone this repository
- Install the required dependencies with
pip install -r requirements.txt
- Run the application with
python app.py
License
This project is licensed under the MIT License - see the LICENSE file for details.
Start the configuration
Most of the variables to change for a default leaderboard are in src/env.py
(replace the path for your leaderboard) and src/about.py
(for tasks).
Results files should have the following format and be stored as json files:
{
"config": {
"model_dtype": "torch.float16", # or torch.bfloat16 or 8bit or 4bit
"model_name": "path of the model on the hub: org/model",
"model_sha": "revision on the hub",
},
"results": {
"task_name": {
"metric_name": score,
},
"task_name2": {
"metric_name": score,
}
}
}
Request files are created automatically by this tool.
If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.
Code logic for more complex edits
You'll find
- the main table' columns names and properties in
src/display/utils.py
- the logic to read all results and request files, then convert them in dataframe lines, in
src/leaderboard/read_evals.py
, andsrc/populate.py
- the logic to allow or filter submissions in
src/submission/submit.py
andsrc/submission/check_validity.py