Update README.md
Browse files
README.md
CHANGED
@@ -214,18 +214,17 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
|
|
214 |
|
215 |
> alternative section title: how to get this monster to run inference on free colab runtimes
|
216 |
|
217 |
-
|
218 |
|
219 |
-
|
220 |
-
|
221 |
-
install the latest `main` branch:
|
222 |
|
|
|
223 |
```bash
|
224 |
-
pip install bitsandbytes
|
225 |
-
pip install git+https://github.com/huggingface/transformers.git
|
226 |
```
|
227 |
|
228 |
-
load in 8-bit (
|
229 |
|
230 |
```python
|
231 |
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
@@ -241,9 +240,7 @@ model = AutoModelForSeq2SeqLM.from_pretrained(
|
|
241 |
)
|
242 |
```
|
243 |
|
244 |
-
The above is already present in the Colab demo linked at the top of the model
|
245 |
-
|
246 |
-
Do you like to ask questions? Great. But first, check out the [how LLM.int8 works blog post](https://huggingface.co/blog/hf-bitsandbytes-integration) by huggingface.
|
247 |
|
248 |
\* More rigorous metrics-based research comparing beam-search summarization with and without LLM.int8 will take place over time.
|
249 |
|
|
|
214 |
|
215 |
> alternative section title: how to get this monster to run inference on free colab runtimes
|
216 |
|
217 |
+
Via [this PR](https://github.com/huggingface/transformers/pull/20341) LLM.int8 is now supported for `long-t5` models.
|
218 |
|
219 |
+
- per **initial tests** the summarization quality seems to hold while using _significantly_ less memory! \*
|
220 |
+
- a version of this model quantized to int8 is [already on the hub here](https://huggingface.co/pszemraj/long-t5-tglobal-xl-16384-book-summary-8bit) so if you're using the 8-bit version anyway, you can start there for a 3.5 gb download only!
|
|
|
221 |
|
222 |
+
First, make sure you have the latest versions of the relevant packages:
|
223 |
```bash
|
224 |
+
pip install -U transformers bitsandbytes accelerate
|
|
|
225 |
```
|
226 |
|
227 |
+
load in 8-bit (_magic completed by `bitsandbytes` behind the scenes_)
|
228 |
|
229 |
```python
|
230 |
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
|
|
240 |
)
|
241 |
```
|
242 |
|
243 |
+
The above is already present in the Colab demo linked at the top of the model card.
|
|
|
|
|
244 |
|
245 |
\* More rigorous metrics-based research comparing beam-search summarization with and without LLM.int8 will take place over time.
|
246 |
|