xmadai
/

Llama-3.2-1B-Instruct-xMADai-4bit

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

JonahYixMAD commited on about 1 month ago

Commit

8edf197

•

1 Parent(s): 84907ce

Update README.md

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -51,4 +51,13 @@ outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
 print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
 ```
 For additional xMADified models, access to fine-tuning, and general questions, please contact us at [email protected] and join our waiting list.

 print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
 ```
+Other xMADified models and their GPU memory requirements are listed below.
+Model | GPU Memory Requirement
+--- | ---
+Llama-3.2-3B-Instruct-xMADai-4bit | 6.5 GB → 3.5 GB
+Llama-3.2-1B-Instruct-xMADai-4bit | 2.5 → 2 GB
+Llama-3.1-405B-Instruct-xMADai-4bit | 258.14 GB → 250 GB
+Llama-3.1-8B-Instruct-xMADai-4bit | 16 → 7 GB
 For additional xMADified models, access to fine-tuning, and general questions, please contact us at [email protected] and join our waiting list.