4-bit Quantization (GPTQ or GGUF)

#5
by ThetaCursed - opened

Are there plans to release this model in a 4-bit quantized version?

AIDC-AI org

Currently, we do not have plans to develop a quantized version in the short term. However, we are working on training smaller models (e.g., 2~3B) to better meet different user needs and application scenarios.

It's very sad to know, because this model in the quantized version should fit in 12 GB VRAM.

Why release smaller models if you can make 4-bit quantization for this one and allow people to use it locally, considering the fact that the most popular selling video card model of all time is the GeForce RTX 3060 12 GB.

If you don't have a person who can handle this, then at least leave some instructions on how to do this, I will do it and share with the whole community.

This comment has been hidden
AIDC-AI org

Thank you for the suggestion. Considering the community's feedback on the quantized version, we have decided to dedicate our efforts to developing it. We will strive to complete it within a month.

Sign up or log in to comment