File size: 1,533 Bytes

0b5d7fa
1ef4622
 
 
 
 
 
0b5d7fa
ac003c6
e0eca88
 
 
 
4803551
e0eca88
f94dc64
138c85b
2e8d599
 
 
 
 
e154121
e0eca88
 
 
 
bd0aac7
e0eca88
9a9d2e8
aa73533
9a9d2e8
aa73533
 
 
e0eca88
bd0aac7
 
 
 
 
 
e0eca88
 
 
 
 
aa73533
e0eca88
aa73533

---
tags:
- text-generation
- 8bit
- 8-bit
- quantization
- compression
inference: False
license: apache-2.0
---

# ethzanalytics/gpt-j-6B-8bit-sharded

This is a version of `hivemind/gpt-j-6B-8bit` for low-RAM loading, i.e., free Colab runtimes :)

- shards are <= 1000MB each
- a demo notebook of how to use it [is here](https://colab.research.google.com/gist/pszemraj/1c0b32173df5b1efbdb7a2358ed4195b/generate-text-with-an-llm-sharded-on-huggingface.ipynb)


 [![colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/pszemraj/1c0b32173df5b1efbdb7a2358ed4195b/generate-text-with-an-llm-sharded-on-huggingface.ipynb)
 
 
**Please refer to the [original model card for hivemind/gpt-j-6B-8bit](https://huggingface.co/hivemind/gpt-j-6B-8bit) for all details.**

## Usage


> **NOTE:** PRIOR to loading the model, you need to "patch" it to be compatible with loading 8bit weights etc. See the original model card above for details on how to do this.

install `transformers`, `accelerate`, and `bitsandbytes` if needed:
```sh
pip install transformers accelerate bitsandbytes
```
Patch the model, load using `device_map="auto"`:

```python
import transformers 
from transformers import AutoTokenizer

"""
CODE TO PATCH GPTJForCausalLM GOES HERE
"""

tokenizer = AutoTokenizer.from_pretrained("ethzanalytics/gpt-j-6B-8bit-sharded")

model = GPTJForCausalLM.from_pretrained(
    "ethzanalytics/gpt-j-6B-8bit-sharded",
    device_map="auto",
)
```

Take a look at the notebook for details.