File size: 1,971 Bytes
6577183 df55c5a 6577183 df55c5a 6577183 84907ce 6577183 df55c5a 6577183 df55c5a 6577183 df55c5a 6577183 df55c5a 6577183 df55c5a 6577183 df55c5a 6577183 df55c5a 6577183 df55c5a 6577183 df55c5a 6577183 df55c5a 6577183 8edf197 df55c5a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
---
library_name: transformers
license: llama3.2
base_model:
- meta-llama/Llama-3.2-1B-Instruct
---
# This model has been xMADified!
This repository contains [`meta-llama/Llama-3.2-1B-Instruct`](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) quantized from 16-bit floats to 4-bit integers, using xMAD.ai proprietary technology.
# How to Run Model
Loading the model checkpoint of this xMADified model requires less than 2 GiB of VRAM. Hence it can be efficiently run on most laptop GPUs.
**Package prerequisites**: Run the following commands to install the required packages.
```bash
pip install -q --upgrade transformers accelerate optimum
pip install -q --no-build-isolation auto-gptq
```
**Sample Inference Code**
```python
from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
model_id = "xmadai/Llama-3.2-1B-Instruct-xMADai-4bit"
prompt = [
{"role": "system", "content": "You are a helpful assistant, that responds as a pirate."},
{"role": "user", "content": "What's Deep Learning?"},
]
tokenizer = AutoTokenizer.from_pretrained(model_id)
inputs = tokenizer.apply_chat_template(
prompt,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
).to("cuda")
model = AutoGPTQForCausalLM.from_quantized(
model_id,
device_map='auto',
trust_remote_code=True,
)
outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
```
Other xMADified models and their GPU memory requirements are listed below.
Model | GPU Memory Requirement
--- | ---
Llama-3.2-3B-Instruct-xMADai-4bit | 6.5 GB → 3.5 GB
Llama-3.2-1B-Instruct-xMADai-4bit | 2.5 → 2 GB
Llama-3.1-405B-Instruct-xMADai-4bit | 258.14 GB → 250 GB
Llama-3.1-8B-Instruct-xMADai-4bit | 16 → 7 GB
For additional xMADified models, access to fine-tuning, and general questions, please contact us at [email protected] and join our waiting list. |