Spaces:

ml-energy
/

leaderboard

Running

Jae-Won Chung commited on Jun 30, 2023

Commit

aa739dd

1 Parent(s): b9c6dec

Push to Docker automatically

Files changed (6) hide show

.github/workflows/push_docker.yaml ADDED Viewed

+name: Push Docker image
+on:
+  push:
+    branches:
+      - master
+    paths:
+      - '.github/workflows/push_docker.yaml'
+      - 'pegasus/**'
+      - 'scripts/**'
+      - 'sharegpt/**'
+      - 'Dockerfile'
+      - 'LICENSE'
+      - 'requirements-benchmark.txt'
+      - '.gitignore'
+concurrency:
+  group: ${{ github.ref }}-dhpush
+  cancel-in-progress: true
+jobs:
+  build_and_push:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v3
+      - name: Docker Hub login
+        uses: docker/login-action@v2
+        with:
+          username: ${{ secrets.DOCKER_HUB_USERNAME }}
+          password: ${{ secrets.DOCKER_HUB_TOKEN }}
+      - name: Generate image metadata
+        id: meta
+        uses: docker/metadata-action@v4
+        with:
+          images: mlenergy/leaderboard
+          tags: latest
+      - name: Setup Docker Buildx
+        uses: docker/setup-buildx-action@v2
+      - name: Build and push to Docker Hub
+        uses: docker/build-push-action@v3
+        with:
+          context: .
+          file: Dockerfile
+          builder: ${{ steps.buildx.outputs.name }}
+          push: true
+          tags: ${{ steps.meta.outputs.tags }}
+          labels: ${{ steps.meta.outputs.labels }}
+          cache-from: type=registry,ref=mlenergy/leaderboard:buildcache
+          cache-to: type=registry,ref=mlenergy/leaderboard:buildcache,mode=max

.github/workflows/push_spaces.yaml CHANGED Viewed

@@ -12,7 +12,7 @@ on:
       - 'requirements.txt'
 concurrency:
-  group: ${{ github.ref }}-hfdeploy
   cancel-in-progress: true
 jobs:

       - 'requirements.txt'
 concurrency:
+  group: ${{ github.ref }}-hfpush
   cancel-in-progress: true
 jobs:

Dockerfile CHANGED Viewed

@@ -1,7 +1,5 @@
 FROM nvidia/cuda:11.7.1-devel-ubuntu20.04
-WORKDIR /workspace
 # Basic installs
 ARG DEBIAN_FRONTEND=noninteractive
 ENV TZ='America/Detroit'
@@ -21,14 +19,15 @@ RUN mkdir -p /root/.local \
     && ln -sf /root/.local/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh
 # Install PyTorch and Zeus
-RUN pip install torch==2.0.1 zeus-ml==0.4.0
 # Install requirements for benchmarking
 ADD . /workspace/leaderboard
-RUN cd leaderboard \
-      && pip install -r requirements-benchmark.txt \
-      && cd ..
 ENV TRANSFORMERS_CACHE=/data/leaderboard/hfcache
 WORKDIR /workspace/leaderboard

 FROM nvidia/cuda:11.7.1-devel-ubuntu20.04
 # Basic installs
 ARG DEBIAN_FRONTEND=noninteractive
 ENV TZ='America/Detroit'
     && ln -sf /root/.local/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh
 # Install PyTorch and Zeus
+RUN pip install torch==2.0.1
 # Install requirements for benchmarking
 ADD . /workspace/leaderboard
+RUN cd /workspace/leaderboard \
+      && pip install -r requirements-benchmark.txt
+# Where all the weights downloaded from Hugging Face Hub will go to
 ENV TRANSFORMERS_CACHE=/data/leaderboard/hfcache
+# So that docker exec container python scripts/benchmark.py will work
 WORKDIR /workspace/leaderboard

LEADERBOARD.md CHANGED Viewed

@@ -2,8 +2,9 @@ The goal of the ML.ENERGY Leaderboard is to give people a sense of how much **en
 ## How is energy different?
-Even between models with the exact same architecture and size, the average energy consumption per prompt is different because they have **different verbosity**.
-That is, when asked the same thing, they answer in different lengths.
 ## Metrics
@@ -62,11 +63,10 @@ A chat between a human user (prompter) and an artificial intelligence (AI) assis
 ## Upcoming
-- Compare against more optimized inference runtimes, like TensorRT.
-- Other GPUs
-- Other model/sampling parameters
 - More models
-- Model quality evaluation numbers (e.g., AI2 Reasoning Challenge, HellaSwag)
 # License

 ## How is energy different?
+The energy consumption of running inference on a model will depends on factors such as architecture, size, and GPU model.
+However, even if we run models with the exact same architecture and size on the same GPU, the average energy consumption **per prompt** is different because different models have **different verbosity**.
+That is, when asked the same thing, different models answer in different lengths.
 ## Metrics
 ## Upcoming
+- Compare energy numbers against more optimized inference runtimes, like TensorRT.
+- More GPU types
 - More models
+- Other model/sampling parameters
 # License

README.md CHANGED Viewed

@@ -28,19 +28,20 @@ The actual leaderboard is here: https://ml.energy/leaderboard.
 ### Docker container
 ```console
-$ git clone https://github.com/ml-energy/leaderboard.git
-$ cd leaderboard
-$ docker build -t ml-energy:latest .
-# Replace /data/leaderboard with your data directory.
 $ docker run -it \
-    --name leaderboard \
-    --gpus all \
-    -v /data/leaderboard:/data/leaderboard \
     -v $(pwd):/workspace/leaderboard \
-    ml-energy:latest bash
 ```
 ## Running the benchmark
 We run benchmarks using multiple nodes and GPUs using [Pegasus](https://github.com/jaywonchung/pegasus). Take a look at [`pegasus/`](/pegasus) for details.
@@ -48,8 +49,6 @@ We run benchmarks using multiple nodes and GPUs using [Pegasus](https://github.c
 You can still run benchmarks without Pegasus like this:
 ```console
-# Inside the container
-$ cd /workspace/leaderboard
-$ python scripts/benchmark.py --model-path /data/leaderboard/weights/lmsys/vicuna-13B --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled.json
-$ python scripts/benchmark.py --model-path databricks/dolly-v2-12b --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled.json
 ```

 ### Docker container
+We have our pre-built Docker image published with the tag `mlenergy/leaderboard:latest` ([Dockerfile](/Dockerfile)).
 ```console
 $ docker run -it \
+    --name leaderboard0 \
+    --gpus '"device=0"' \
+    -v /path/to/your/data/dir:/data/leaderboard \
     -v $(pwd):/workspace/leaderboard \
+    mlenergy/leaderboard:latest bash
 ```
+The container internally expects weights to be inside `/data/leaderboard/weights` (e.g., `/data/leaderboard/weights/lmsys/vicuna-7B`), and sets the Hugging Face cache directory to `/data/leaderboard/hfcache`.
+If needed, the repository should be mounted to `/workspace/leaderboard` to override the copy of the repository inside the container.
 ## Running the benchmark
 We run benchmarks using multiple nodes and GPUs using [Pegasus](https://github.com/jaywonchung/pegasus). Take a look at [`pegasus/`](/pegasus) for details.
 You can still run benchmarks without Pegasus like this:
 ```console
+$ docker exec leaderboard0 python scripts/benchmark.py --model-path /data/leaderboard/weights/lmsys/vicuna-13B --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled.json
+$ docker exec leaderboard0 python scripts/benchmark.py --model-path databricks/dolly-v2-12b --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled.json
 ```

requirements-benchmark.txt CHANGED Viewed

@@ -1,4 +1,4 @@
-zeus-ml
 fschat==0.2.14
 rwkv==0.7.5
 einops

+zeus-ml==0.4.0
 fschat==0.2.14
 rwkv==0.7.5
 einops