jayr014 commited on
Commit
6da99f4
·
1 Parent(s): 5eb0df6

adding in disclaimer for RDU tutorial

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -86,6 +86,10 @@ model = AutoModelForCausalLM.from_pretrained("sambanovasystems/BLOOMChat-176B-v1
86
 
87
  ### Tutorial on using the model for text generation
88
 
 
 
 
 
89
  [This tutorial](https://github.com/huggingface/transformers-bloom-inference) from Huggingface will be the base layer for running our model. The tutorial is intended for BLOOM; however, since our model is based off of BLOOM we can repurpose it.
90
 
91
  For setup instructions follow the Huggingface tutorial.
@@ -135,7 +139,7 @@ Running command for int8 (sub optimal performance, but fast inference time):
135
  ```
136
  python -m inference_server.cli --model_name sambanovasystems/BLOOMChat-176B-v1 --model_class AutoModelForCausalLM --dtype int8 --deployment_framework hf_accelerate --generate_kwargs '{"do_sample": false, "temperature": 0.8, "repetition_penalty": 1.2, "top_p": 0.9, "max_new_tokens": 512}'
137
  ```
138
- **DISCLAIMER:** When using int8, the results will be subpar compared to bf16 as the model is being [quantized](https://huggingface.co/blog/hf-bitsandbytes-integration#introduction-to-model-quantization).
139
 
140
  ### Suggested Inference Parameters
141
  - Temperature: 0.8
 
86
 
87
  ### Tutorial on using the model for text generation
88
 
89
+ As this model was trained on SambaNova's Reconfigurable Dataflow Unit (RDU) which is not accessible by everyone, we provided a tutorial on how to use this model on GPU.
90
+
91
+ For those interested in running models on RDUs, [please contact us](https://sambanova.ai/getstarted).
92
+
93
  [This tutorial](https://github.com/huggingface/transformers-bloom-inference) from Huggingface will be the base layer for running our model. The tutorial is intended for BLOOM; however, since our model is based off of BLOOM we can repurpose it.
94
 
95
  For setup instructions follow the Huggingface tutorial.
 
139
  ```
140
  python -m inference_server.cli --model_name sambanovasystems/BLOOMChat-176B-v1 --model_class AutoModelForCausalLM --dtype int8 --deployment_framework hf_accelerate --generate_kwargs '{"do_sample": false, "temperature": 0.8, "repetition_penalty": 1.2, "top_p": 0.9, "max_new_tokens": 512}'
141
  ```
142
+ **DISCLAIMER:** When using int8, the results will be subpar compared to bf16 as the model is being [quantized](https://huggingface.co/blog/hf-bitsandbytes-integration#introduction-to-model-quantization).
143
 
144
  ### Suggested Inference Parameters
145
  - Temperature: 0.8