IQ3_XS quant not visible in sidebar
Oh, ok!
I've honestly noticed IQ3_XS/IQ3_XXS missing from a lot of these sidebars in your repos, e.g. https://huggingface.co/bartowski/Qwen2.5-32B-Instruct-GGUF, https://huggingface.co/bartowski/Meta-Llama-3.1-70B-Instruct-GGUF, https://huggingface.co/bartowski/gemma-2-27b-it-GGUF (IQ3_XXS is visible, but IQ3_XS is not) (just 3 random ones)
It's weird. Thank you for looking into this
oh that's even stranger, i assumed they didn't show up anywhere !
Oh wait, I think IQ3_XS is missing everywhere, but IQ3_XXS shows up
I think IQ3_XS doesn't show up for other people either: https://huggingface.co/mradermacher/ThinkPhi1.1-Tensors-i1-GGUF, https://huggingface.co/mradermacher/Qwenvergence-14B-v11-i1-GGUF
Seems like HF has a strange aversion to IQ3_XS
Can you make an IQ3_XXS version of this so it shows up and 16GB VRAM people can use it? Thanks!
I personally do not understand the expediency of such quantization as IQ3_XXS. A model with fewer parameters, but in adequate quantization (>=Q4_K_M) will produce much better answers.
I personally do not understand the expediency of such quantization as IQ3_XXS. A model with fewer parameters, but in adequate quantization (>=Q4_K_M) will produce much better answers.
No. The larger size you go, the less true it is. That said it might be true for 24B, not sure. But 70B IQ3_XXS is good, and Mistral large 123B IQ3_XXS is very good.
Another reason is there are gaps. Eg L3 only is 8B and 70B (and Ok, 405B). 70B IQ3_XXS if you can run, is much better than even full precision 8B and you do not have any in-between size to step into.
70B IQ3_XXS if you can run, is much better than even full precision 8B.
And the gemma 2 27b will be better than both of them.