public quantized code

#2
by nobita3921 - opened

Hi @gaunernst ,
AutoAWQ still does not support AWQ for Gemma-3 family models. Can you publicize your quantized code to the user can quantize the model with a custom config as CUDA backend instead of GEMM or using int8 instead of int4.
Thank u.

wdym by "AutoAWQ still does not support AWQ for Gemma-3 family models"?

I know that there are problems with HF+AutoAWQ running BF16 models. But this checkpoint should work with vLLM, since vLLM supports BF16 AutoAWQ

vllm serve gaunernst/gemma-3-12b-it-int4-awq

The conversion code is here: https://huggingface.co/gaunernst/gemma-3-12b-it-int4-awq/blob/main/convert_flax.py (note that this is simply a conversion from the Flax checkpoint)

Also note that this is different from the official GGUFs here https://huggingface.co/collections/google/gemma-3-qat-67ee61ccacbf2be4195c265b. To use those official GGUFs in AutoAWQ format, you can check these https://huggingface.co/collections/gaunernst/gemma-3-qat-int4-from-gguf-67f2a6bb7c26a18a9714dd54

@gaunernst thank you so much.

nobita3921 changed discussion status to closed

Sign up or log in to comment