etemiz's picture
Update README.md
5d5fbdc verified
|
raw
history blame
779 Bytes
---
license: llama3.1
---
Llama 3.1 405B Quants
- IQ1_S: 86.8 GB
- IQ1_M: 95.1 GB
- IQ2_XXS: 109.0 GB
Quantization from BF16 here:
https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/
which is converted from Llama 3.1 405B:
https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct
llama.cpp version b3459. There is ongoing work in llama.cpp to support this model. If you use context = 8192 there are some reports that say this model works fine. If not, you can also try changing the Frequency Base as described in: https://www.reddit.com/r/LocalLLaMA/comments/1ectacp/until_the_rope_scaling_is_fixed_in_gguf_for/
imatrix file https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/blob/main/405imatrix.dat
Lmk if you need bigger quants.