IQ3_XS quant not visible in sidebar

by x0wllaar - opened 9 days ago

9 days ago

•

Hello! Thank you for the quants!
I've noticed that IQ3_XS is missing from the sidebar thing, making using the model with Ollama annoying and error-prone:

Is this a problem on my end? If not, if it fixable for yours?

Thank you!

bartowski

Owner 9 days ago

it's not on yours or mine, it would be on the HF side, @reach-vb ?

x0wllaar

9 days ago

•

edited 9 days ago

Oh, ok!

I've honestly noticed IQ3_XS/IQ3_XXS missing from a lot of these sidebars in your repos, e.g. https://huggingface.co/bartowski/Qwen2.5-32B-Instruct-GGUF, https://huggingface.co/bartowski/Meta-Llama-3.1-70B-Instruct-GGUF, https://huggingface.co/bartowski/gemma-2-27b-it-GGUF (IQ3_XXS is visible, but IQ3_XS is not) (just 3 random ones)

It's weird. Thank you for looking into this

bartowski

Owner 9 days ago

oh that's even stranger, i assumed they didn't show up anywhere !

x0wllaar

9 days ago

Oh wait, I think IQ3_XS is missing everywhere, but IQ3_XXS shows up

x0wllaar

9 days ago

•

edited 9 days ago

I think IQ3_XS doesn't show up for other people either: https://huggingface.co/mradermacher/ThinkPhi1.1-Tensors-i1-GGUF, https://huggingface.co/mradermacher/Qwenvergence-14B-v11-i1-GGUF

Seems like HF has a strange aversion to IQ3_XS

x0wllaar

9 days ago

Can you make an IQ3_XXS version of this so it shows up and 16GB VRAM people can use it? Thanks!

Wszechobecny

9 days ago

I personally do not understand the expediency of such quantization as IQ3_XXS. A model with fewer parameters, but in adequate quantization (>=Q4_K_M) will produce much better answers.

McUH

9 days ago

•

edited 9 days ago

I personally do not understand the expediency of such quantization as IQ3_XXS. A model with fewer parameters, but in adequate quantization (>=Q4_K_M) will produce much better answers.

No. The larger size you go, the less true it is. That said it might be true for 24B, not sure. But 70B IQ3_XXS is good, and Mistral large 123B IQ3_XXS is very good.
Another reason is there are gaps. Eg L3 only is 8B and 70B (and Ok, 405B). 70B IQ3_XXS if you can run, is much better than even full precision 8B and you do not have any in-between size to step into.

Wszechobecny

9 days ago

•

edited 9 days ago

70B IQ3_XXS if you can run, is much better than even full precision 8B.

And the gemma 2 27b will be better than both of them.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment