quantize parameters

#1
by rastegar - opened

Hi, can you share your quantize parameters, I have finetuned model try to quantize in awq and exlv2 I need best performance config for quantize my 32b model
Thanks in advance

also share what is minimum hardware requirement for quantize this 32b model into awq, please

TBH, this model was made long time ago, as far as I can recall, my parameters are listed below:

Hardware requirement:

AutoAWQ will firstly load the model into memory, and quantize the model layer by layer in VRAM, which means the whole model (fp/bf16 weights) should be fitted into your memory. You maybe need 64G+ memory, given that the original model occupies approximately 64G, and your system will consume some memory. As for VRAM, I didn't pay much attention to it. At least 16G VRAM I guess.

I have 3090 24gb of vram and 128g memory
when I run auto-awq
I always get panic error at 98%
With wikitext dataset
I will check with your suggested dataset and update with result

It's not about the calibration dataset, it's about your loading strategy. Are you loading your model fully into your VRAM? I suggest loading model to your memory and do computation only on VRAM, just as what AutoAWQ sample code did.

Sign up or log in to comment