Performance reduction from using 8bit or 4bit quantized model

#58

by michaelomahony - opened Jul 10, 2023

Jul 10, 2023

I am using the 8bit quantized model implemented with bitsandbytes. Does anybody know how much of a performance reduction is expected from using these versions of the model?

How about from using float16 vs bfloat16?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment