xmadai
/

Llama-3.2-1B-Instruct-xMADai-4bit

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

JonahYixMAD commited on 30 days ago

Commit

80b512e

•

1 Parent(s): ef2d27e

Update README.md

Files changed (1) hide show

README.md +0 -7

README.md CHANGED Viewed

@@ -51,11 +51,4 @@ outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
 print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
 ```
-Model | GPU Memory Requirement
---- | ---
-Llama-3.2-3B-Instruct-xMADai-4bit | 6.5 GB → 3.5 GB
-Llama-3.2-1B-Instruct-xMADai-4bit | 2.5 → 2 GB
-Llama-3.1-405B-Instruct-xMADai-4bit | 800 GB (16 H100s) → 250 GB (8 V100)
-Llama-3.1-8B-Instruct-xMADai-4bit | 16 → 7 GB
 For additional xMADified models, access to fine-tuning, and general questions, please contact us at [email protected] and join our waiting list.

 print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
 ```
 For additional xMADified models, access to fine-tuning, and general questions, please contact us at [email protected] and join our waiting list.