etemiz
/

Llama-3.1-405B-Inst-GGUF

Inference Endpoints

Model card Files Files and versions Community

etemiz commited on Jul 27

Commit

d9494fa

•

1 Parent(s): 24a37dc

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ which is converted from Llama 3.1 405B:
 https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct
-llama.cpp version b3459. There is ongoing work in llama.cpp to support this model. If you use context = 8192 there are some reports that say this model works fine. If not, you can also try chaing the Frequency Base as described in: https://www.reddit.com/r/LocalLLaMA/comments/1ectacp/until_the_rope_scaling_is_fixed_in_gguf_for/
 imatrix file https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/blob/main/405imatrix.dat

 https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct
+llama.cpp version b3459. There is ongoing work in llama.cpp to support this model. If you use context = 8192 there are some reports that say this model works fine. If not, you can also try changing the Frequency Base as described in: https://www.reddit.com/r/LocalLLaMA/comments/1ectacp/until_the_rope_scaling_is_fixed_in_gguf_for/
 imatrix file https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/blob/main/405imatrix.dat