What is the recommended quantization that combines speed with quality?

#12
by Blakus - opened

Hello! I have 24 GB of VRAM so I can use the q_8 Wan… But do you know if using smaller quantizations helps the speed? Generally, for language models I have seen that in the repositories a list is made of all the quantizations, and they are named from worst to best and among them there is usually one that combines speed with quality and is the recommended.
Would it apply to this case as well? Thanks!

actually, you might need to test it one by one; since it differs by models and the final structure of the quantized files; someone found that q8_0 loads faster but seems cannot apply to all

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment