quantize parameters

by rastegar - opened Jan 25

Jan 25

Hi, can you share your quantize parameters, I have finetuned model try to quantize in awq and exlv2 I need best performance config for quantize my 32b model
Thanks in advance

devops724

Jan 25

also share what is minimum hardware requirement for quantize this 32b model into awq, please

Orion-zhen

Owner Jan 26

•

edited Jan 26

TBH, this model was made long time ago, as far as I can recall, my parameters are listed below:

precision: 4bit
version: GEMM
group size: 128
zero point: true
calibration dataset: Orion-zhen/meissa-lima

Hardware requirement:

AutoAWQ will firstly load the model into memory, and quantize the model layer by layer in VRAM, which means the whole model (fp/bf16 weights) should be fitted into your memory. You maybe need 64G+ memory, given that the original model occupies approximately 64G, and your system will consume some memory. As for VRAM, I didn't pay much attention to it. At least 16G VRAM I guess.

rastegar

Jan 27

I have 3090 24gb of vram and 128g memory
when I run auto-awq
I always get panic error at 98%
With wikitext dataset
I will check with your suggested dataset and update with result

Orion-zhen

Owner Jan 27

It's not about the calibration dataset, it's about your loading strategy. Are you loading your model fully into your VRAM? I suggest loading model to your memory and do computation only on VRAM, just as what AutoAWQ sample code did.

Orion-zhen

Owner Feb 6

@devops724 @rastegar Hi, I'm currently making a deepseek-r1-distill-qwen-32b on my own, with latest autoawq compiled from source. And I can report that with quantize code below:

quant_config = {
    "zero_point": True,
    "q_group_size": 128,
    "w_bit": 4,
    "version": "GEMM",
}

model.quantize(
    tokenizer,
    quant_config=quant_config,
    calib_data=load_r1_data(),
    n_parallel_calib_samples=32,
    max_calib_samples=256
)

The VRAM usage is 16.53G + 8.10G on my dual card setup. BTW, RAM usage is 25.1G, surprisingly low. So it might be a good choice to have a dual card setup as far as I think.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment