What's the difference between the "padded" and the original version?

#1
by Nexesenex - opened

Padded version has it's weights padded with 0's to enable it to use parallel tensors for Multi-GPU inference after quantization using GPTQ. Otherwise, this would result in an error. You can read more on the bottom of the page here: https://qwen.readthedocs.io/en/latest/quantization/gptq.html

CalamitousFelicitousness changed discussion status to closed

Sign up or log in to comment