xmadai
/

Llama-3.2-1B-Instruct-xMADai-4bit

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

JonahYixMAD commited on 30 days ago

Commit

ef2d27e

•

1 Parent(s): 8edf197

Update README.md

Files changed (1) hide show

README.md +1 -3

README.md CHANGED Viewed

@@ -51,13 +51,11 @@ outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
 print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
 ```
-Other xMADified models and their GPU memory requirements are listed below.
 Model | GPU Memory Requirement
 --- | ---
 Llama-3.2-3B-Instruct-xMADai-4bit | 6.5 GB → 3.5 GB
 Llama-3.2-1B-Instruct-xMADai-4bit | 2.5 → 2 GB
-Llama-3.1-405B-Instruct-xMADai-4bit | 258.14 GB → 250 GB
 Llama-3.1-8B-Instruct-xMADai-4bit | 16 → 7 GB
 For additional xMADified models, access to fine-tuning, and general questions, please contact us at [email protected] and join our waiting list.

 print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
 ```
 Model | GPU Memory Requirement
 --- | ---
 Llama-3.2-3B-Instruct-xMADai-4bit | 6.5 GB → 3.5 GB
 Llama-3.2-1B-Instruct-xMADai-4bit | 2.5 → 2 GB
+Llama-3.1-405B-Instruct-xMADai-4bit | 800 GB (16 H100s) → 250 GB (8 V100)
 Llama-3.1-8B-Instruct-xMADai-4bit | 16 → 7 GB
 For additional xMADified models, access to fine-tuning, and general questions, please contact us at [email protected] and join our waiting list.