Can this be quantized with bitsAndBytes?

#6
by Permahuman - opened

Atm it's a huge safetensor model and gguf llama.cpp support likely won't be incoming for quite some time. Can we use bitsAndBytes to quantize this and use it at 4bit? And will it still he functional?

Just naively tried and it won't work.

I wonder how in the world to even diagnose how to get this quantized. This model could be a huge game changer if we can get it's size down to 4bit. Qwen is already balling but this would put a cherry on top of their ice cream heap.

Just naively tried and it won't work.

Thank you for trying. I am reluctant to download it and try to quantize it myswlf as it will take over 12 hours for my slow internet connection to aquire the model. Cheers.

is bnb even that good? i tried load in 4bit with llama3 and the quality is clearly worse than gptq/awq/gguf...

is bnb even that good? i tried load in 4bit with llama3 and the quality is clearly worse than gptq/awq/gguf...

I dont know honestly I just thought it would be a much quicker route to quantization than waiting for llama.cpp to implement ggguf support. Maybe awq or gptq would be faster than llama.cpp repo?

is bnb even that good? i tried load in 4bit with llama3 and the quality is clearly worse than gptq/awq/gguf...

I dont know honestly I just thought it would be a much quicker route to quantization than waiting for llama.cpp to implement ggguf support. Maybe awq or gptq would be faster than llama.cpp repo?

lol !

Just naively tried and it won't work.

it wouiild need to use the llamacpp surgery
a seach component needs to be extracted first :
same a llava !
the image processor and then the audio processor .. either as bin files then to gguf so in the end there woudl be three ggufs !
one for image and one for audio and the other for llm !
like the llava models !

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment