metadata

title: Agent Papers
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: apache-2.0
short_description: LLM Agent Research Collection
sdk_version: 5.19.0

Large Language Model Agent Papers Explorer

This is a companion application for the paper "Large Language Model Agent: A Survey on Methodology, Applications and Challenges" (arXiv:2503.21460).

About

The application provides an interactive interface to explore papers from our comprehensive survey on Large Language Model (LLM) agents. It allows you to search and filter papers across key categories including agent construction, collaboration mechanisms, evolution, tools, security, benchmarks, and applications.

Key Features

Paper Search: Find papers by keywords, titles, summaries, or publication venues
Category Filtering: Browse papers by sections/categories
Year Filtering: Filter papers by publication year
Sorting Options: Sort papers by year, title, or section
Paper Statistics: View distributions of papers across categories and years
Direct Links: Access original papers through direct links to their sources

Collection Overview

Our paper collection spans multiple categories:

Introduction: Survey papers and foundational works introducing LLM agents
Construction: Papers on building and designing agents
Collaboration: Multi-agent systems and communication methods
Evolution: Learning and improvement of agents over time
Tools: Integration of external tools with LLM agents
Security: Safety, alignment, and ethical considerations
Datasets & Benchmarks: Evaluation frameworks and resources
Applications: Domain-specific uses in science, medicine, etc.

Related Resources

How to Contribute

If you have a paper that you believe should be included in our collection:

Check if the paper is already in our database
Submit your paper at this form or email us at [email protected]
Include the paper's title, authors, abstract, URL, publication venue, and year
Suggest a section/category for the paper

Citation

If you find our survey helpful, please consider citing our work:

@article{agentsurvey2025,
  title={Large Language Model Agent: A Survey on Methodology, Applications and Challenges},
  author={Junyu Luo and Weizhi Zhang and Ye Yuan and Yusheng Zhao and Junwei Yang and Yiyang Gu and Bohan Wu and Binqi Chen and Ziyue Qiao and Qingqing Long and Rongcheng Tu and Xiao Luo and Wei Ju and Zhiping Xiao and Yifan Wang and Meng Xiao and Chenwu Liu and Jingyang Yuan and Shichang Zhang and Yiqiao Jin and Fan Zhang and Xian Wu and Hanqing Zhao and Dacheng Tao and Philip S. Yu and Ming Zhang},
  journal={arXiv preprint arXiv:2503.21460},
  year={2025}
}

Local Development

To run this application locally:

Clone this repository
Install the required dependencies with pip install -r requirements.txt
Run the application with python app.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Start the configuration

Most of the variables to change for a default leaderboard are in src/env.py (replace the path for your leaderboard) and src/about.py (for tasks).

Results files should have the following format and be stored as json files:

{
    "config": {
        "model_dtype": "torch.float16", # or torch.bfloat16 or 8bit or 4bit
        "model_name": "path of the model on the hub: org/model",
        "model_sha": "revision on the hub",
    },
    "results": {
        "task_name": {
            "metric_name": score,
        },
        "task_name2": {
            "metric_name": score,
        }
    }
}

Request files are created automatically by this tool.

If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.

Code logic for more complex edits

You'll find

the main table' columns names and properties in src/display/utils.py
the logic to read all results and request files, then convert them in dataframe lines, in src/leaderboard/read_evals.py, and src/populate.py
the logic to allow or filter submissions in src/submission/submit.py and src/submission/check_validity.py