updating README
Browse files
README.md
CHANGED
@@ -90,8 +90,6 @@ The inference code to run the model can be found our [github repo](https://githu
|
|
90 |
|
91 |
### Quick Start Inference on GPU
|
92 |
|
93 |
-
[This tutorial](https://github.com/huggingface/transformers-bloom-inference) from Huggingface will be the base layer for running our model. The tutorial is intended for BLOOM; however, since our model is based off of BLOOM we can repurpose it.
|
94 |
-
|
95 |
First create a python virtual environment for these packages
|
96 |
|
97 |
```
|
@@ -100,12 +98,34 @@ source bloomchat_venv/bin/activate
|
|
100 |
pip install --upgrade pip
|
101 |
```
|
102 |
|
103 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
104 |
|
105 |
-
NOTE: Things that we had to modify in order for BLOOMChat to work:
|
106 |
-
- Install transformers version 4.27.0
|
107 |
-
- `pip install transformers==4.27.0`
|
108 |
-
- Change the model name from `bigscience/bloom` to `sambanovasystems/BLOOMChat-176B-v1`
|
109 |
- Modifying `inference_server/models/hf_accelerate.py`
|
110 |
- This is because for our testing of this repo we used 4 80GB A100 GPUs and would run into memory issues
|
111 |
- Modifying `inference_server/cli.py`
|
@@ -168,6 +188,7 @@ index fc903d5..5450236 100644
|
|
168 |
print_rank_0("Generated tokens:", response.num_generated_tokens[0])
|
169 |
|
170 |
```
|
|
|
171 |
|
172 |
Running command for bf16, NO sampling
|
173 |
```
|
|
|
90 |
|
91 |
### Quick Start Inference on GPU
|
92 |
|
|
|
|
|
93 |
First create a python virtual environment for these packages
|
94 |
|
95 |
```
|
|
|
98 |
pip install --upgrade pip
|
99 |
```
|
100 |
|
101 |
+
<!-- Please follow this section [Inference solutions for BLOOM 176B](https://github.com/huggingface/transformers-bloom-inference#bloom-inference-via-command-line) in the Huggingface Tutorial for environment set up and stop before the [BLOOM inference via command-line
|
102 |
+
](https://github.com/huggingface/transformers-bloom-inference#bloom-inference-via-command-line) section. -->
|
103 |
+
|
104 |
+
```
|
105 |
+
pip install flask flask_api gunicorn pydantic accelerate huggingface_hub>=0.9.0 deepspeed>=0.7.3 deepspeed-mii==0.0.2
|
106 |
+
```
|
107 |
+
And then
|
108 |
+
```
|
109 |
+
pip install transformers==4.27.0
|
110 |
+
```
|
111 |
+
|
112 |
+
You will see messages like this
|
113 |
+
```
|
114 |
+
ERROR: deepspeed-mii 0.0.2 has requirement transformers==4.21.2, but you'll have transformers 4.27.0 which is incompatible.
|
115 |
+
Installing collected packages: transformers
|
116 |
+
Found existing installation: transformers 4.21.2
|
117 |
+
Uninstalling transformers-4.21.2:
|
118 |
+
Successfully uninstalled transformers-4.21.2
|
119 |
+
Successfully installed transformers-4.27.0
|
120 |
+
```
|
121 |
+
|
122 |
+
Now let's git clone the [huggingface/transformers-bloom-inference](https://github.com/huggingface/transformers-bloom-inference) repo.
|
123 |
+
```
|
124 |
+
git clone https://github.com/huggingface/transformers-bloom-inference.git
|
125 |
+
cd transformers-bloom-inference/
|
126 |
+
```
|
127 |
+
And then you need to modify two files in this [transformers-bloom-inference](https://github.com/huggingface/transformers-bloom-inference) repo:
|
128 |
|
|
|
|
|
|
|
|
|
129 |
- Modifying `inference_server/models/hf_accelerate.py`
|
130 |
- This is because for our testing of this repo we used 4 80GB A100 GPUs and would run into memory issues
|
131 |
- Modifying `inference_server/cli.py`
|
|
|
188 |
print_rank_0("Generated tokens:", response.num_generated_tokens[0])
|
189 |
|
190 |
```
|
191 |
+
And now you are good to go!
|
192 |
|
193 |
Running command for bf16, NO sampling
|
194 |
```
|