public quantized code
Hi
@gaunernst
,
AutoAWQ still does not support AWQ for Gemma-3 family models. Can you publicize your quantized code to the user can quantize the model with a custom config as CUDA backend instead of GEMM or using int8 instead of int4.
Thank u.
wdym by "AutoAWQ still does not support AWQ for Gemma-3 family models"?
I know that there are problems with HF+AutoAWQ running BF16 models. But this checkpoint should work with vLLM, since vLLM supports BF16 AutoAWQ
vllm serve gaunernst/gemma-3-12b-it-int4-awq
The conversion code is here: https://huggingface.co/gaunernst/gemma-3-12b-it-int4-awq/blob/main/convert_flax.py (note that this is simply a conversion from the Flax checkpoint)
Also note that this is different from the official GGUFs here https://huggingface.co/collections/google/gemma-3-qat-67ee61ccacbf2be4195c265b. To use those official GGUFs in AutoAWQ format, you can check these https://huggingface.co/collections/gaunernst/gemma-3-qat-int4-from-gguf-67f2a6bb7c26a18a9714dd54