Jae-Won Chung commited on
Commit
94e85ec
1 Parent(s): 0082f5e

Create README.md

Browse files
Files changed (1) hide show
  1. data/README.md +33 -0
data/README.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Data files for the ML.ENERGY Leaderboard
2
+
3
+ This directory holds all the data for the leaderboard table.
4
+
5
+ ## Parameters
6
+
7
+ There are two types of parameters: (1) Those that become radio buttons on the leaderboard and (2) those that become columns on the leaderboard table.
8
+ Models are always placed in rows.
9
+
10
+ Currently, there are only two parameters that become radio buttons: GPU model (e.g., V100, A40, A100) and task (e.g., chat, chat-concise, instruct, and instruct-concise).
11
+ This is defined in the `schema.yaml` file.
12
+
13
+ Radio button parameters have their own CSV file in this directory.
14
+ For instance, benchmark results for the *chat* task ran on an *A100* GPU lives in `A100_chat_benchmark.csv`. This file name is dynamically constructed by the leaderboard Gradio application by looking at `schema.yaml` and read in as a Pandas DataFrame.
15
+
16
+ Parameters that become columns in the table are put directly in the benchmark CSV files, e.g., `batch_size` and `datatype`.
17
+
18
+ ## Adding new models
19
+
20
+ 1. Add your model to `models.json`.
21
+ - The model's JSON key should be its unique codename, e.g. Hugging Face Hub model name. It's usually not that readable.
22
+ - `url` should point to a page where people can obtain the model's weights, e.g. Hugging Face Hub.
23
+ - `nickname` should be a short human-readable string that identifies the model.
24
+ - `params` should be an integer rounded to billions.
25
+
26
+ 1. Add NLP dataset evaluation scores to `score.csv`.
27
+ - `model` is the model's JSON key in `models.json`.
28
+ - `arc` is the accuracy on the [ARC challenge](https://allenai.org/data/arc) dataset.
29
+ - `hellaswag` is the accuracy on the [HellaSwag](https://allenai.org/data/hellaswag) dataset.
30
+ - `truthfulqa` is the accuracy on the [TruthfulQA](https://github.com/sylinrl/TruthfulQA) MC2 dataset.
31
+ - We obtain these metrics using lm-evaluation-harness. See [here](https://github.com/ml-energy/leaderboard/tree/master/pegasus#nlp-benchmark) for specific instructions.
32
+
33
+ 1. Add benchmarking results in CSV files, e.g. `A100_chat_benchmark.csv`. It should be evident from the name of the CSV files which setting the file corresponds to.