May I please publish an ollamified version of your IQ2_XS model to ollama.com?

#1
by LaughterOnWater - opened

It's smaller than the available 3.1 instruct 70B models on ollama.com and may prove useful for some wanting to use it on a 3090.
Similar with IQ2_S?

Cheers,

Chris

No problem, do whatever you want with these! No need to credit if you don't feel like to.

But beware of the quality degradation!

Q4_K_M: PPL = 3.9306 +/- 0.02097
IQ2_S:  PPL = 6.1807 +/- 0.03676
IQ2_XS: PPL = 6.4942 +/- 0.03892

If you are serious about running L3.1 70B on a RTX 3090, I strongly recommend using this AQLM+PV SOTA quant, it will probably be better quality and still fit on the GPU. Or even better, play with the new Mistral Small 22B or Qwen 2.5 34B at higher precisions, I heard that they really punch above their weights.

https://www.reddit.com/r/LocalLLaMA/comments/1fkbumy/just_replaced_llama_31_70b_iq2s_for_qwen_25_32b/

LaughterOnWater changed discussion status to closed

Sign up or log in to comment