ThomasBaruzier/Meta-Llama-3.1-70B-Instruct-GGUF · May I please publish an ollamified version of your IQ2

Sep 18, 2024

•

edited Sep 18, 2024

It's smaller than the available 3.1 instruct 70B models on ollama.com and may prove useful for some wanting to use it on a 3090.
Similar with IQ2_S?

Cheers,

Chris

ThomasBaruzier

Owner Sep 18, 2024

•

edited Sep 19, 2024

No problem, do whatever you want with these! No need to credit if you don't feel like to.

But beware of the quality degradation!

Q4_K_M: PPL = 3.9306 +/- 0.02097
IQ2_S:  PPL = 6.1807 +/- 0.03676
IQ2_XS: PPL = 6.4942 +/- 0.03892

If you are serious about running L3.1 70B on a RTX 3090, I strongly recommend using this AQLM+PV SOTA quant, it will probably be better quality and still fit on the GPU. Or even better, play with the new Mistral Small 22B or Qwen 2.5 34B at higher precisions, I heard that they really punch above their weights.

https://www.reddit.com/r/LocalLLaMA/comments/1fkbumy/just_replaced_llama_31_70b_iq2s_for_qwen_25_32b/

LaughterOnWater

Sep 20, 2024

Thank you!

LaughterOnWater changed discussion status to closed Sep 20, 2024

ThomasBaruzier
/

Meta-Llama-3.1-70B-Instruct-GGUF

May I please publish an ollamified version of your IQ2_XS model to ollama.com?