alozowski's picture
alozowski HF staff
submission-fix (#1065)
23c96f8 verified
|
raw
history blame
9.77 kB
# Backend - Open LLM Leaderboard πŸ†
FastAPI backend for the Open LLM Leaderboard. This service is part of a larger architecture that includes a React frontend. For complete project installation, see the [main README](../README.md).
## ✨ Features
- πŸ“Š REST API for LLM models leaderboard management
- πŸ—³οΈ Voting and ranking system
- πŸ”„ HuggingFace Hub integration
- πŸš€ Caching and performance optimizations
## πŸ— Architecture
```mermaid
flowchart TD
Client(["**Frontend**<br><br>React Application"]) --> API["**API Server**<br><br>FastAPI REST Endpoints"]
subgraph Backend
API --> Core["**Core Layer**<br><br>β€’ Middleware<br>β€’ Cache<br>β€’ Rate Limiting"]
Core --> Services["**Services Layer**<br><br>β€’ Business Logic<br>β€’ Data Processing"]
subgraph Services Layer
Services --> Models["**Model Service**<br><br>β€’ Model Submission<br>β€’ Evaluation Pipeline"]
Services --> Votes["**Vote Service**<br><br>β€’ Vote Management<br>β€’ Data Synchronization"]
Services --> Board["**Leaderboard Service**<br><br>β€’ Rankings<br>β€’ Performance Metrics"]
end
Models --> Cache["**Cache Layer**<br><br>β€’ In-Memory Store<br>β€’ Auto Invalidation"]
Votes --> Cache
Board --> Cache
Models --> HF["**HuggingFace Hub**<br><br>β€’ Models Repository<br>β€’ Datasets Access"]
Votes --> HF
Board --> HF
end
style Client fill:#f9f,stroke:#333,stroke-width:2px
style Models fill:#bbf,stroke:#333,stroke-width:2px
style Votes fill:#bbf,stroke:#333,stroke-width:2px
style Board fill:#bbf,stroke:#333,stroke-width:2px
style HF fill:#bfb,stroke:#333,stroke-width:2px
```
## πŸ› οΈ HuggingFace Datasets
The application uses several datasets on the HuggingFace Hub:
### 1. Requests Dataset (`{HF_ORGANIZATION}/requests`)
- **Operations**:
- πŸ“€ `POST /api/models/submit`: Adds a JSON file for each new model submission
- πŸ“₯ `GET /api/models/status`: Reads files to get models status
- **Format**: One JSON file per model with submission details
- **Updates**: On each new model submission
### 2. Votes Dataset (`{HF_ORGANIZATION}/votes`)
- **Operations**:
- πŸ“€ `POST /api/votes/{model_id}`: Adds a new vote
- πŸ“₯ `GET /api/votes/model/{provider}/{model}`: Reads model votes
- πŸ“₯ `GET /api/votes/user/{user_id}`: Reads user votes
- **Format**: JSONL with one vote per line
- **Sync**: Bidirectional between local cache and Hub
### 3. Contents Dataset (`{HF_ORGANIZATION}/contents`)
- **Operations**:
- πŸ“₯ `GET /api/leaderboard`: Reads raw data
- πŸ“₯ `GET /api/leaderboard/formatted`: Reads and formats data
- **Format**: Main dataset containing all scores and metrics
- **Updates**: Automatic after model evaluations
### 4. Official Providers Dataset (`{HF_ORGANIZATION}/official-providers`)
- **Operations**:
- πŸ“₯ Read-only access for highlighted models
- **Format**: List of models selected by maintainers
- **Updates**: Manual by maintainers
## πŸ›  Local Development
### Prerequisites
- Python 3.9+
- [Poetry](https://python-poetry.org/docs/#installation)
### Standalone Installation (without Docker)
```bash
# Install dependencies
poetry install
# Setup configuration
cp .env.example .env
# Start development server
poetry run uvicorn app.asgi:app --host 0.0.0.0 --port 7860 --reload
```
Server will be available at http://localhost:7860
## βš™οΈ Configuration
| Variable | Description | Default |
| ------------ | ------------------------------------ | ----------- |
| ENVIRONMENT | Environment (development/production) | development |
| HF_TOKEN | HuggingFace authentication token | - |
| PORT | Server port | 7860 |
| LOG_LEVEL | Logging level (INFO/DEBUG/WARNING) | INFO |
| CORS_ORIGINS | Allowed CORS origins | ["*"] |
| CACHE_TTL | Cache Time To Live in seconds | 300 |
## πŸ”§ Middleware
The backend uses several middleware layers for optimal performance and security:
- **CORS Middleware**: Handles Cross-Origin Resource Sharing
- **GZIP Middleware**: Compresses responses > 500 bytes
- **Rate Limiting**: Prevents API abuse
- **Caching**: In-memory caching with automatic invalidation
## πŸ“ Logging
The application uses a structured logging system with:
- Formatted console output
- Different log levels per component
- Request/Response logging
- Performance metrics
- Error tracking
## πŸ“ File Structure
```
backend/
β”œβ”€β”€ app/ # Source code
β”‚ β”œβ”€β”€ api/ # Routes and endpoints
β”‚ β”‚ └── endpoints/ # Endpoint handlers
β”‚ β”œβ”€β”€ core/ # Configurations
β”‚ β”œβ”€β”€ services/ # Business logic
β”‚ └── utils/ # Utilities
└── tests/ # Tests
```
## πŸ“š API
Swagger documentation available at http://localhost:7860/docs
### Main Endpoints & Data Structures
#### Leaderboard
- `GET /api/leaderboard/formatted` - Formatted data with computed fields and metadata
```typescript
Response {
models: [{
id: string, // eval_name
model: {
name: string, // fullname
sha: string, // Model sha
precision: string, // e.g. "fp16", "int8"
type: string, // e.g. "fined-tuned-on-domain-specific-dataset"
weight_type: string,
architecture: string,
average_score: number,
has_chat_template: boolean
},
evaluations: {
ifeval: {
name: "IFEval",
value: number, // Raw score
normalized_score: number
},
bbh: {
name: "BBH",
value: number,
normalized_score: number
},
math: {
name: "MATH Level 5",
value: number,
normalized_score: number
},
gpqa: {
name: "GPQA",
value: number,
normalized_score: number
},
musr: {
name: "MUSR",
value: number,
normalized_score: number
},
mmlu_pro: {
name: "MMLU-PRO",
value: number,
normalized_score: number
}
},
features: {
is_not_available_on_hub: boolean,
is_merged: boolean,
is_moe: boolean,
is_flagged: boolean,
is_official_provider: boolean
},
metadata: {
upload_date: string,
submission_date: string,
generation: string,
base_model: string,
hub_license: string,
hub_hearts: number,
params_billions: number,
co2_cost: number // COβ‚‚ cost in kg
}
}]
}
```
- `GET /api/leaderboard` - Raw data from the HuggingFace dataset
```typescript
Response {
models: [{
eval_name: string,
Precision: string,
Type: string,
"Weight type": string,
Architecture: string,
Model: string,
fullname: string,
"Model sha": string,
"Average ⬆️": number,
"Hub License": string,
"Hub ❀️": number,
"#Params (B)": number,
"Available on the hub": boolean,
Merged: boolean,
MoE: boolean,
Flagged: boolean,
"Chat Template": boolean,
"COβ‚‚ cost (kg)": number,
"IFEval Raw": number,
IFEval: number,
"BBH Raw": number,
BBH: number,
"MATH Lvl 5 Raw": number,
"MATH Lvl 5": number,
"GPQA Raw": number,
GPQA: number,
"MUSR Raw": number,
MUSR: number,
"MMLU-PRO Raw": number,
"MMLU-PRO": number,
"Maintainer's Highlight": boolean,
"Upload To Hub Date": string,
"Submission Date": string,
Generation: string,
"Base Model": string
}]
}
```
#### Models
- `GET /api/models/status` - Get all models grouped by status
```typescript
Response {
pending: [{
name: string,
submitter: string,
revision: string,
wait_time: string,
submission_time: string,
status: "PENDING" | "EVALUATING" | "FINISHED",
precision: string
}],
evaluating: Array<Model>,
finished: Array<Model>
}
```
- `GET /api/models/pending` - Get pending models only
- `POST /api/models/submit` - Submit model
```typescript
Request {
user_id: string,
model_id: string,
base_model?: string,
precision?: string,
model_type: string
}
Response {
status: string,
message: string
}
```
- `GET /api/models/{model_id}/status` - Get model status
#### Votes
- `POST /api/votes/{model_id}` - Vote
```typescript
Request {
vote_type: "up" | "down",
user_id: string // HuggingFace username
}
Response {
success: boolean,
message: string
}
```
- `GET /api/votes/model/{provider}/{model}` - Get model votes
```typescript
Response {
total_votes: number,
up_votes: number,
down_votes: number
}
```
- `GET /api/votes/user/{user_id}` - Get user votes
```typescript
Response Array<{
model_id: string,
vote_type: string,
timestamp: string
}>
```
## πŸ”’ Authentication
The backend uses HuggingFace token-based authentication for secure API access. Make sure to:
1. Set your HF_TOKEN in the .env file
2. Include the token in API requests via Bearer authentication
3. Keep your token secure and never commit it to version control
## πŸš€ Performance
The backend implements several optimizations:
- In-memory caching with configurable TTL (Time To Live)
- Batch processing for model evaluations
- Rate limiting for API endpoints
- Efficient database queries with proper indexing
- Automatic cache invalidation for votes