Spaces:
Runtime error
Runtime error
ozayezerceli
commited on
Commit
•
07c9dc6
1
Parent(s):
29e9498
Update README.md
Browse files
README.md
CHANGED
@@ -10,5 +10,30 @@ pinned: false
|
|
10 |
license: mit
|
11 |
short_description: Example Leaderboard
|
12 |
---
|
|
|
13 |
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
license: mit
|
11 |
short_description: Example Leaderboard
|
12 |
---
|
13 |
+
This Space provides an interactive leaderboard for comparing language model performance across various benchmarks and custom tasks.
|
14 |
|
15 |
+
## Features
|
16 |
+
- Automated model evaluation using lm-evaluation-harness
|
17 |
+
- Support for standard and custom benchmarks
|
18 |
+
- Interactive visualization of results
|
19 |
+
- Daily automated evaluations
|
20 |
+
- Easy submission of new models and custom tasks
|
21 |
+
|
22 |
+
## Usage
|
23 |
+
1. Visit the Space to view current leaderboard
|
24 |
+
2. Submit new models for evaluation
|
25 |
+
3. Create custom evaluation tasks
|
26 |
+
4. Track performance trends over time
|
27 |
+
|
28 |
+
## Custom Task Format
|
29 |
+
```json
|
30 |
+
{
|
31 |
+
"examples": [
|
32 |
+
{
|
33 |
+
"input": "question or prompt",
|
34 |
+
"ideal": "expected answer",
|
35 |
+
"metrics": ["accuracy", "f1"]
|
36 |
+
}
|
37 |
+
]
|
38 |
+
}
|
39 |
+
```
|