Spaces:
Running
Running
Jae-Won Chung
commited on
Commit
•
4e9ddf9
1
Parent(s):
ce6d832
Benchmarking with Pegasus (#7)
Browse files- README.md +4 -0
- models.txt +0 -20
- pegasus/README.md +60 -0
- pegasus/benchmark.yaml +32 -0
- pegasus/hosts.yaml +19 -0
- pegasus/setup-nodes.yaml +7 -0
- scripts/benchmark.py +5 -5
README.md
CHANGED
@@ -33,6 +33,10 @@ $ docker run -it \
|
|
33 |
|
34 |
## Running the benchmark
|
35 |
|
|
|
|
|
|
|
|
|
36 |
```console
|
37 |
# Inside the container
|
38 |
$ cd /workspace/leaderboard
|
|
|
33 |
|
34 |
## Running the benchmark
|
35 |
|
36 |
+
We run benchmarks using multiple nodes and GPUs using [Pegasus](https://github.com/jaywonchung/pegasus). Take a look at [`pegasus/`](/pegasus) for details.
|
37 |
+
|
38 |
+
You can still run benchmarks without Pegasus like this:
|
39 |
+
|
40 |
```console
|
41 |
# Inside the container
|
42 |
$ cd /workspace/leaderboard
|
models.txt
DELETED
@@ -1,20 +0,0 @@
|
|
1 |
-
/data/leaderboard/weights/metaai/llama-7B
|
2 |
-
/data/leaderboard/weights/metaai/llama-13B
|
3 |
-
/data/leaderboard/weights/lmsys/vicuna-7B
|
4 |
-
/data/leaderboard/weights/lmsys/vicuna-13B
|
5 |
-
/data/leaderboard/weights/tatsu-lab/alpaca-7B
|
6 |
-
/data/leaderboard/weights/BAIR/koala-7b
|
7 |
-
/data/leaderboard/weights/BAIR/koala-13b
|
8 |
-
/data/leaderboard/weights/BlinkDL/RWKV-4-Raven-7B-v12-Eng98%-Other2%-20230521-ctx8192.pth
|
9 |
-
camel-ai/CAMEL-13B-Combined-Data
|
10 |
-
databricks/dolly-v2-12b
|
11 |
-
FreedomIntelligence/phoenix-inst-chat-7b
|
12 |
-
h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2
|
13 |
-
lmsys/fastchat-t5-3b-v1.0
|
14 |
-
Neutralzz/BiLLa-7B-SFT
|
15 |
-
nomic-ai/gpt4all-13b-snoozy
|
16 |
-
openaccess-ai-collective/manticore-13b-chat-pyg
|
17 |
-
OpenAssistant/oasst-sft-1-pythia-12b
|
18 |
-
project-baize/baize-v2-7B
|
19 |
-
StabilityAI/stablelm-tuned-alpha-7b
|
20 |
-
togethercomputer/RedPajama-INCITE-7B-Chat
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
pegasus/README.md
ADDED
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Running benchmarks on multiple GPU nodes with Pegasus
|
2 |
+
|
3 |
+
[Pegasus](https://github.com/jaywonchung/pegasus) is an SSH-based multi-node command runner.
|
4 |
+
Different models have different verbosity, and benchmarking takes vastly different amounts of time.
|
5 |
+
Therefore, we want an automated piece of software that drains a queue of benchmarking jobs (one job per model) on a set of GPUs.
|
6 |
+
|
7 |
+
## Setup
|
8 |
+
|
9 |
+
### Install Pegasus
|
10 |
+
|
11 |
+
Pegasus needs to keep SSH connections with all the nodes in order to queue up and run jobs over SSH.
|
12 |
+
So you should install and run Pegasus on a computer that you can keep awake.
|
13 |
+
|
14 |
+
If you already have Rust set up:
|
15 |
+
|
16 |
+
```console
|
17 |
+
$ cargo install pegasus-ssh
|
18 |
+
```
|
19 |
+
|
20 |
+
Otherwise, you can set up Rust [here](https://www.rust-lang.org/tools/install), or just download Pegasus release binaries [here](https://github.com/jaywonchung/pegasus/releases/latest).
|
21 |
+
|
22 |
+
### Necessary setup for each node
|
23 |
+
|
24 |
+
Every node must have two things:
|
25 |
+
|
26 |
+
1. This repository cloned under `~/workspace/leaderboard`.
|
27 |
+
- If you want a different path, search and replace in `setup-nodes.yaml`.
|
28 |
+
2. Model weights under `/data/leaderboard/weights`.
|
29 |
+
- If you want a different path, search and replace in `setup-nodes.yaml` and `benchmark.yaml`.
|
30 |
+
|
31 |
+
### Specify node names for Pegasus
|
32 |
+
|
33 |
+
Modify `hosts.yaml` with nodes. See the file for an example.
|
34 |
+
|
35 |
+
- `hostname`: List the hostnames you would use in order to `ssh` into the node, e.g. `jaywonchung@gpunode01`.
|
36 |
+
- `gpu`: We want to create one Docker container for each GPU. List the indices of the GPUs you would like to use for the hosts.
|
37 |
+
|
38 |
+
### Set up Docker containers on your nodes with Pegasus
|
39 |
+
|
40 |
+
This builds our Docker image and spawns one container per GPU (named `leaderboard%d`), for every node.
|
41 |
+
|
42 |
+
```console
|
43 |
+
$ cd pegasus
|
44 |
+
$ cp setup-nodes.yaml queue.yaml
|
45 |
+
$ pegasus b
|
46 |
+
```
|
47 |
+
|
48 |
+
`b` stands for broadcast. Every command is run once on all (`hostname`, `gpu`) combinations.
|
49 |
+
|
50 |
+
## Benchmark
|
51 |
+
|
52 |
+
Now use Pegasus to run benchmarks for all the models across all nodes.
|
53 |
+
|
54 |
+
```console
|
55 |
+
$ cd pegasus
|
56 |
+
$ cp benchmark.yaml queue.yaml
|
57 |
+
$ pegasus q
|
58 |
+
```
|
59 |
+
|
60 |
+
`q` stands for queue. Each command is run once on the next available (`hostname`, `gpu`) combination.
|
pegasus/benchmark.yaml
ADDED
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# This YAML dictionary will expand into 20 (models) x 4 (tasks) = 80 job commands,
|
2 |
+
# where {{ model }} and {{ task }} are filled in with all possible combinations.
|
3 |
+
# {{ gpu }} is defined in `hosts.yaml`, and will be filled in when Pegasus
|
4 |
+
# determines the specific node and gpu the generated job command will run on.
|
5 |
+
- command:
|
6 |
+
- docker exec leaderboard{{ gpu }} python scripts/benchmark.py --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled.json --model-path {{ model }} --task {{ task }}
|
7 |
+
model:
|
8 |
+
- /data/leaderboard/weights/metaai/llama-7B
|
9 |
+
- /data/leaderboard/weights/metaai/llama-13B
|
10 |
+
- /data/leaderboard/weights/lmsys/vicuna-7B
|
11 |
+
- /data/leaderboard/weights/lmsys/vicuna-13B
|
12 |
+
- /data/leaderboard/weights/tatsu-lab/alpaca-7B
|
13 |
+
- /data/leaderboard/weights/BAIR/koala-7b
|
14 |
+
- /data/leaderboard/weights/BAIR/koala-13b
|
15 |
+
- /data/leaderboard/weights/BlinkDL/RWKV-4-Raven-7B-v12-Eng98%-Other2%-20230521-ctx8192.pth
|
16 |
+
- camel-ai/CAMEL-13B-Combined-Data
|
17 |
+
- databricks/dolly-v2-12b
|
18 |
+
- FreedomIntelligence/phoenix-inst-chat-7b
|
19 |
+
- h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2
|
20 |
+
- lmsys/fastchat-t5-3b-v1.0
|
21 |
+
- Neutralzz/BiLLa-7B-SFT
|
22 |
+
- nomic-ai/gpt4all-13b-snoozy
|
23 |
+
- openaccess-ai-collective/manticore-13b-chat-pyg
|
24 |
+
- OpenAssistant/oasst-sft-1-pythia-12b
|
25 |
+
- project-baize/baize-v2-7B
|
26 |
+
- StabilityAI/stablelm-tuned-alpha-7b
|
27 |
+
- togethercomputer/RedPajama-INCITE-7B-Chat
|
28 |
+
task:
|
29 |
+
- chat
|
30 |
+
- chat-concise
|
31 |
+
- instruct
|
32 |
+
- instruct-concise
|
pegasus/hosts.yaml
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Example:
|
2 |
+
# Four 4-GPU nodes (node01 to node04), one container per GPU.
|
3 |
+
# node01 and node02 have four GPUs, and hence four containers.
|
4 |
+
# node03 and node04 have just two GPUs, and hence two containers.
|
5 |
+
# With this configuration, 2 * 4 + 2 * 2 = 12 jobs will run in parallel.
|
6 |
+
- hostname:
|
7 |
+
- node01
|
8 |
+
- node02
|
9 |
+
gpu:
|
10 |
+
- 0
|
11 |
+
- 1
|
12 |
+
- 2
|
13 |
+
- 3
|
14 |
+
- hostname:
|
15 |
+
- node03
|
16 |
+
- node04
|
17 |
+
gpu:
|
18 |
+
- 0
|
19 |
+
- 1
|
pegasus/setup-nodes.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# The first item builds our docker image on each node once.
|
2 |
+
# The second item spawns one docker container per GPU.
|
3 |
+
# {{ gpu }} is defined in `hosts.yaml`, and will be filled in when Pegasus
|
4 |
+
# determines the specific node and gpu the generated job command will run on.
|
5 |
+
# We check {{ gpu }} = 0 to ensure that the image is only built once on each node.
|
6 |
+
- if [ {{ gpu }} = 0 ]; then cd workspace/leaderboard && docker build -t ml-energy:latest .; fi
|
7 |
+
- docker run -dit --name leaderboard{{ gpu }} --gpus '"device={{ gpu }}"' -v /data/leaderboard:/data/leaderboard -v $HOME/workspace/leaderboard:/workspace/leaderboard ml-energy:latest bash
|
scripts/benchmark.py
CHANGED
@@ -19,21 +19,21 @@ from zeus.monitor import ZeusMonitor
|
|
19 |
SYSTEM_PROMPTS = {
|
20 |
"chat": (
|
21 |
"A chat between a human user (prompter) and an artificial intelligence (AI) assistant. "
|
22 |
-
"The assistant gives helpful, detailed, and polite answers to the user's questions."
|
23 |
),
|
24 |
"chat-concise": (
|
25 |
"A chat between a human user (prompter) and an artificial intelligence (AI) assistant. "
|
26 |
"The assistant gives helpful, detailed, and polite answers to the user's questions. "
|
27 |
-
"The
|
28 |
),
|
29 |
"instruct": (
|
30 |
"Below is an instruction that describes a task. "
|
31 |
-
"Write a response that appropriately completes the request."
|
32 |
),
|
33 |
"instruct-concise": (
|
34 |
"Below is an instruction that describes a task. "
|
35 |
-
"Write a response that appropriately completes the request."
|
36 |
-
"The response should be concise
|
37 |
),
|
38 |
}
|
39 |
|
|
|
19 |
SYSTEM_PROMPTS = {
|
20 |
"chat": (
|
21 |
"A chat between a human user (prompter) and an artificial intelligence (AI) assistant. "
|
22 |
+
"The assistant gives helpful, detailed, and polite answers to the user's questions. "
|
23 |
),
|
24 |
"chat-concise": (
|
25 |
"A chat between a human user (prompter) and an artificial intelligence (AI) assistant. "
|
26 |
"The assistant gives helpful, detailed, and polite answers to the user's questions. "
|
27 |
+
"The assistant's answers are very concise. "
|
28 |
),
|
29 |
"instruct": (
|
30 |
"Below is an instruction that describes a task. "
|
31 |
+
"Write a response that appropriately completes the request. "
|
32 |
),
|
33 |
"instruct-concise": (
|
34 |
"Below is an instruction that describes a task. "
|
35 |
+
"Write a response that appropriately completes the request. "
|
36 |
+
"The response should be very concise. "
|
37 |
),
|
38 |
}
|
39 |
|