sambanovasystems
/

BLOOMChat-176B-v1

Text Generation

text-generation-inference

Model card Files Files and versions Community

jayr014 commited on May 19, 2023

Commit

7917a1f

·

1 Parent(s): 39abe33

updating README

Files changed (1) hide show

README.md +28 -7

README.md CHANGED Viewed

@@ -90,8 +90,6 @@ The inference code to run the model can be found our [github repo](https://githu
 ### Quick Start Inference on GPU
-[This tutorial](https://github.com/huggingface/transformers-bloom-inference) from Huggingface will be the base layer for running our model. The tutorial is intended for BLOOM; however, since our model is based off of BLOOM we can repurpose it.
 First create a python virtual environment for these packages
 ```
@@ -100,12 +98,34 @@ source bloomchat_venv/bin/activate
 pip install --upgrade pip
 ```
-For setup instructions follow the Huggingface tutorial.
-NOTE: Things that we had to modify in order for BLOOMChat to work:
-- Install transformers version 4.27.0
-    - `pip install transformers==4.27.0`
-- Change the model name from `bigscience/bloom` to `sambanovasystems/BLOOMChat-176B-v1`
 - Modifying `inference_server/models/hf_accelerate.py`
     - This is because for our testing of this repo we used 4 80GB A100 GPUs and would run into memory issues
 - Modifying `inference_server/cli.py`
@@ -168,6 +188,7 @@ index fc903d5..5450236 100644
          print_rank_0("Generated tokens:", response.num_generated_tokens[0])
 ```
 Running command for bf16, NO sampling
 ```

 ### Quick Start Inference on GPU
 First create a python virtual environment for these packages
 ```
 pip install --upgrade pip
 ```
+<!-- Please follow this section [Inference solutions for BLOOM 176B](https://github.com/huggingface/transformers-bloom-inference#bloom-inference-via-command-line) in the Huggingface Tutorial for environment set up and stop before the [BLOOM inference via command-line
+](https://github.com/huggingface/transformers-bloom-inference#bloom-inference-via-command-line) section. -->
+```
+pip install flask flask_api gunicorn pydantic accelerate huggingface_hub>=0.9.0 deepspeed>=0.7.3 deepspeed-mii==0.0.2
+```
+And then
+```
+pip install transformers==4.27.0
+```
+You will see messages like this
+```
+ERROR: deepspeed-mii 0.0.2 has requirement transformers==4.21.2, but you'll have transformers 4.27.0 which is incompatible.
+Installing collected packages: transformers
+  Found existing installation: transformers 4.21.2
+    Uninstalling transformers-4.21.2:
+      Successfully uninstalled transformers-4.21.2
+Successfully installed transformers-4.27.0
+```
+Now let's git clone the [huggingface/transformers-bloom-inference](https://github.com/huggingface/transformers-bloom-inference) repo.
+```
+git clone https://github.com/huggingface/transformers-bloom-inference.git
+cd transformers-bloom-inference/
+```
+And then you need to modify two files in this [transformers-bloom-inference](https://github.com/huggingface/transformers-bloom-inference) repo:
 - Modifying `inference_server/models/hf_accelerate.py`
     - This is because for our testing of this repo we used 4 80GB A100 GPUs and would run into memory issues
 - Modifying `inference_server/cli.py`
          print_rank_0("Generated tokens:", response.num_generated_tokens[0])
 ```
+And now you are good to go!
 Running command for bf16, NO sampling
 ```