What's the difference between the "padded" and the original version?
#1
by
Nexesenex
- opened
Padded version has it's weights padded with 0's to enable it to use parallel tensors for Multi-GPU inference after quantization using GPTQ. Otherwise, this would result in an error. You can read more on the bottom of the page here: https://qwen.readthedocs.io/en/latest/quantization/gptq.html
CalamitousFelicitousness
changed discussion status to
closed