Spaces:
Running
Running
title: LLM-Perf Leaderboard | |
emoji: πποΈ | |
colorFrom: green | |
colorTo: indigo | |
sdk: gradio | |
sdk_version: 4.26.0 | |
app_file: app.py | |
pinned: true | |
license: apache-2.0 | |
tags: [llm perf leaderboard, llm performance leaderboard, llm, performance, leaderboard] | |
# LLM-perf leaderboard | |
## π About | |
The π€ LLM-Perf Leaderboard ποΈ is a laderboard at the intersection of quality and performance. | |
Its aim is to benchmark the performance (latency, throughput, memory & energy) | |
of Large Language Models (LLMs) with different hardwares, backends and optimizations | |
using [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark). | |
Anyone from the community can request a new base model or hardware/backend/optimization | |
configuration for automated benchmarking: | |
- Model evaluation requests should be made in the | |
[π€ Open LLM Leaderboard π ](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) ; | |
we scrape the [list of canonical base models](https://github.com/huggingface/optimum-benchmark/blob/main/llm_perf/utils.py) from there. | |
- Hardware/Backend/Optimization configuration requests should be made in the | |
[π€ LLM-Perf Leaderboard ποΈ](https://huggingface.co/spaces/optimum/llm-perf-leaderboard) or | |
[Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark) repository (where the code is hosted). | |
## βοΈ Details | |
- To avoid communication-dependent results, only one GPU is used. | |
- Score is the average evaluation score obtained from the [π€ Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) | |
- LLMs are running on a singleton batch with a prompt size of 256 and generating a 64 tokens for at least 10 iterations and 10 seconds. | |
- Energy consumption is measured in kWh using CodeCarbon and taking into consideration the GPU, CPU, RAM and location of the machine. | |
- We measure three types of memory: Max Allocated Memory, Max Reserved Memory and Max Used Memory. The first two being reported by PyTorch and the last one being observed using PyNVML. | |
All of our benchmarks are ran by this single script | |
[benchmark_cuda_pytorch.py](https://github.com/huggingface/optimum-benchmark/blob/llm-perf/llm-perf/benchmark_cuda_pytorch.py) | |
using the power of [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark) to garantee reproducibility and consistency. | |
## π How to run locally | |
To run the LLM-Perf Leaderboard locally on your machine, follow these steps: | |
### 1. Clone the Repository | |
First, clone the repository to your local machine: | |
```bash | |
git clone https://github.com/huggingface/optimum-benchmark.git | |
cd optimum-benchmark | |
``` | |
### 2. Install the Required Dependencies | |
Install the necessary Python packages listed in the requirements.txt file: | |
`pip install -r requirements.txt` | |
### 3. Run the Application | |
You can run the Gradio application in one of the following ways: | |
- Option 1: Using Python | |
`python app.py` | |
- Option 2: Using Gradio CLI (include hot-reload) | |
`gradio app.py` | |
### 4. Access the Application | |
Once the application is running, you can access it locally in your web browser at http://127.0.0.1:7860/ |