Performance reduction from using 8bit or 4bit quantized model
#58
by
michaelomahony
- opened
I am using the 8bit quantized model implemented with bitsandbytes. Does anybody know how much of a performance reduction is expected from using these versions of the model?
How about from using float16 vs bfloat16?