Why is this exactly the same size as the 8-bit one?
#1
by
dagelf
- opened
I'm guessing there is a mistake...?
In our research, to quickly validate the effectiveness of various quantization methods, we only performed fake-quant on smoothquant without storing in real 4-bit format. Therefore, the size of the checkpoints we saved and uploaded is actually equivalent to the fp16 model.πͺ
We will continue to improve our work to achieve as realistic quantization testing as possible with software and hardware support. More work is on the way!π€
That makes sense, I guess I could've looked :-D Thank you for the clarification!
dagelf
changed discussion status to
closed