Agent-Papers / README.md
luojunyu's picture
update
f8d35c2

A newer version of the Gradio SDK is available: 5.27.0

Upgrade
metadata
title: Agent Papers
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: apache-2.0
short_description: LLM Agent Research Collection
sdk_version: 5.19.0

Large Language Model Agent Papers Explorer

This is a companion application for the paper "Large Language Model Agent: A Survey on Methodology, Applications and Challenges" (arXiv:2503.21460).

About

The application provides an interactive interface to explore papers from our comprehensive survey on Large Language Model (LLM) agents. It allows you to search and filter papers across key categories including agent construction, collaboration mechanisms, evolution, tools, security, benchmarks, and applications.

Screenshot of the application

Key Features

  • Paper Search: Find papers by keywords, titles, summaries, or publication venues
  • Category Filtering: Browse papers by sections/categories
  • Year Filtering: Filter papers by publication year
  • Sorting Options: Sort papers by year, title, or section
  • Paper Statistics: View distributions of papers across categories and years
  • Direct Links: Access original papers through direct links to their sources

Collection Overview

Our paper collection spans multiple categories:

  • Introduction: Survey papers and foundational works introducing LLM agents
  • Construction: Papers on building and designing agents
  • Collaboration: Multi-agent systems and communication methods
  • Evolution: Learning and improvement of agents over time
  • Tools: Integration of external tools with LLM agents
  • Security: Safety, alignment, and ethical considerations
  • Datasets & Benchmarks: Evaluation frameworks and resources
  • Applications: Domain-specific uses in science, medicine, etc.

Related Resources

How to Contribute

If you have a paper that you believe should be included in our collection:

  1. Check if the paper is already in our database
  2. Submit your paper at this form or email us at [email protected]
  3. Include the paper's title, authors, abstract, URL, publication venue, and year
  4. Suggest a section/category for the paper

Citation

If you find our survey helpful, please consider citing our work:

@article{agentsurvey2025,
  title={Large Language Model Agent: A Survey on Methodology, Applications and Challenges},
  author={Junyu Luo and Weizhi Zhang and Ye Yuan and Yusheng Zhao and Junwei Yang and Yiyang Gu and Bohan Wu and Binqi Chen and Ziyue Qiao and Qingqing Long and Rongcheng Tu and Xiao Luo and Wei Ju and Zhiping Xiao and Yifan Wang and Meng Xiao and Chenwu Liu and Jingyang Yuan and Shichang Zhang and Yiqiao Jin and Fan Zhang and Xian Wu and Hanqing Zhao and Dacheng Tao and Philip S. Yu and Ming Zhang},
  journal={arXiv preprint arXiv:2503.21460},
  year={2025}
}

Local Development

To run this application locally:

  1. Clone this repository
  2. Install the required dependencies with pip install -r requirements.txt
  3. Run the application with python app.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Start the configuration

Most of the variables to change for a default leaderboard are in src/env.py (replace the path for your leaderboard) and src/about.py (for tasks).

Results files should have the following format and be stored as json files:

{
    "config": {
        "model_dtype": "torch.float16", # or torch.bfloat16 or 8bit or 4bit
        "model_name": "path of the model on the hub: org/model",
        "model_sha": "revision on the hub",
    },
    "results": {
        "task_name": {
            "metric_name": score,
        },
        "task_name2": {
            "metric_name": score,
        }
    }
}

Request files are created automatically by this tool.

If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.

Code logic for more complex edits

You'll find

  • the main table' columns names and properties in src/display/utils.py
  • the logic to read all results and request files, then convert them in dataframe lines, in src/leaderboard/read_evals.py, and src/populate.py
  • the logic to allow or filter submissions in src/submission/submit.py and src/submission/check_validity.py