What is the recommended quantization that combines speed with quality?

#12

by Blakus - opened about 12 hours ago

Discussion

Blakus

about 12 hours ago

•

edited about 12 hours ago

Hello! I have 24 GB of VRAM so I can use the q_8 Wan… But do you know if using smaller quantizations helps the speed? Generally, for language models I have seen that in the repositories a list is made of all the quantizations, and they are named from worst to best and among them there is usually one that combines speed with quality and is the recommended.
Would it apply to this case as well? Thanks!

calcuis

Owner about 10 hours ago

actually, you might need to test it one by one; since it differs by models and the final structure of the quantized files; someone found that q8_0 loads faster but seems cannot apply to all

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment