jayr014 commited on
Commit
7917a1f
·
1 Parent(s): 39abe33

updating README

Browse files
Files changed (1) hide show
  1. README.md +28 -7
README.md CHANGED
@@ -90,8 +90,6 @@ The inference code to run the model can be found our [github repo](https://githu
90
 
91
  ### Quick Start Inference on GPU
92
 
93
- [This tutorial](https://github.com/huggingface/transformers-bloom-inference) from Huggingface will be the base layer for running our model. The tutorial is intended for BLOOM; however, since our model is based off of BLOOM we can repurpose it.
94
-
95
  First create a python virtual environment for these packages
96
 
97
  ```
@@ -100,12 +98,34 @@ source bloomchat_venv/bin/activate
100
  pip install --upgrade pip
101
  ```
102
 
103
- For setup instructions follow the Huggingface tutorial.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
 
105
- NOTE: Things that we had to modify in order for BLOOMChat to work:
106
- - Install transformers version 4.27.0
107
- - `pip install transformers==4.27.0`
108
- - Change the model name from `bigscience/bloom` to `sambanovasystems/BLOOMChat-176B-v1`
109
  - Modifying `inference_server/models/hf_accelerate.py`
110
  - This is because for our testing of this repo we used 4 80GB A100 GPUs and would run into memory issues
111
  - Modifying `inference_server/cli.py`
@@ -168,6 +188,7 @@ index fc903d5..5450236 100644
168
  print_rank_0("Generated tokens:", response.num_generated_tokens[0])
169
 
170
  ```
 
171
 
172
  Running command for bf16, NO sampling
173
  ```
 
90
 
91
  ### Quick Start Inference on GPU
92
 
 
 
93
  First create a python virtual environment for these packages
94
 
95
  ```
 
98
  pip install --upgrade pip
99
  ```
100
 
101
+ <!-- Please follow this section [Inference solutions for BLOOM 176B](https://github.com/huggingface/transformers-bloom-inference#bloom-inference-via-command-line) in the Huggingface Tutorial for environment set up and stop before the [BLOOM inference via command-line
102
+ ](https://github.com/huggingface/transformers-bloom-inference#bloom-inference-via-command-line) section. -->
103
+
104
+ ```
105
+ pip install flask flask_api gunicorn pydantic accelerate huggingface_hub>=0.9.0 deepspeed>=0.7.3 deepspeed-mii==0.0.2
106
+ ```
107
+ And then
108
+ ```
109
+ pip install transformers==4.27.0
110
+ ```
111
+
112
+ You will see messages like this
113
+ ```
114
+ ERROR: deepspeed-mii 0.0.2 has requirement transformers==4.21.2, but you'll have transformers 4.27.0 which is incompatible.
115
+ Installing collected packages: transformers
116
+ Found existing installation: transformers 4.21.2
117
+ Uninstalling transformers-4.21.2:
118
+ Successfully uninstalled transformers-4.21.2
119
+ Successfully installed transformers-4.27.0
120
+ ```
121
+
122
+ Now let's git clone the [huggingface/transformers-bloom-inference](https://github.com/huggingface/transformers-bloom-inference) repo.
123
+ ```
124
+ git clone https://github.com/huggingface/transformers-bloom-inference.git
125
+ cd transformers-bloom-inference/
126
+ ```
127
+ And then you need to modify two files in this [transformers-bloom-inference](https://github.com/huggingface/transformers-bloom-inference) repo:
128
 
 
 
 
 
129
  - Modifying `inference_server/models/hf_accelerate.py`
130
  - This is because for our testing of this repo we used 4 80GB A100 GPUs and would run into memory issues
131
  - Modifying `inference_server/cli.py`
 
188
  print_rank_0("Generated tokens:", response.num_generated_tokens[0])
189
 
190
  ```
191
+ And now you are good to go!
192
 
193
  Running command for bf16, NO sampling
194
  ```