sambanovasystems
/

BLOOMChat-176B-v1

Text Generation

text-generation-inference

Model card Files Files and versions Community

jayr014 commited on May 18, 2023

Commit

2076db5

·

1 Parent(s): befc61c

updating readmoe

Files changed (1) hide show

README.md +14 -0

README.md CHANGED Viewed

@@ -96,6 +96,8 @@ NOTE: Things that we had to modify in order for BLOOMChat to work:
 - Change the model name from `bigscience/bloom` to `sambanovasystems/BLOOMChat-176B-v1`
 - Modifying `inference_server/models/hf_accelerate.py`
     - This is because for our testing of this repo we used 4 80GB A100 GPUs and would run into memory issues
 Modifications for `inference_server/models/hf_accelerate.py`:
@@ -112,6 +114,18 @@ class HFAccelerateModel(Model):
         kwargs["max_memory"] = reduce_max_memory_dict
 ```
 Running command for int8 (sub optimal performance, but fast inference time):
 ```
 python -m inference_server.cli --model_name sambanovasystems/BLOOMChat-176B-v1 --model_class AutoModelForCausalLM --dtype int8 --deployment_framework hf_accelerate --generate_kwargs '{"do_sample": false, "temperature": 0.8, "repetition_penalty": 1.2, "top_p": 0.9, "max_new_tokens": 512}'

 - Change the model name from `bigscience/bloom` to `sambanovasystems/BLOOMChat-176B-v1`
 - Modifying `inference_server/models/hf_accelerate.py`
     - This is because for our testing of this repo we used 4 80GB A100 GPUs and would run into memory issues
+- Modifying `inference_server/cli.py`
+    - This is because the model was trained using specific human, bot tags
 Modifications for `inference_server/models/hf_accelerate.py`:
         kwargs["max_memory"] = reduce_max_memory_dict
 ```
+Modifications for `inference_server/cli.py`:
+```python
+def main() -> None:
+    ...
+    while True:
+        input_text = input("Input text: ")
+        input_text = input_text.strip()
+        modified_input_text = f"<human>: {input_text}\n<bot>:"
+```
 Running command for int8 (sub optimal performance, but fast inference time):
 ```
 python -m inference_server.cli --model_name sambanovasystems/BLOOMChat-176B-v1 --model_class AutoModelForCausalLM --dtype int8 --deployment_framework hf_accelerate --generate_kwargs '{"do_sample": false, "temperature": 0.8, "repetition_penalty": 1.2, "top_p": 0.9, "max_new_tokens": 512}'