elinas/alpaca-13b-lora-int4 · oh you were the guy

Mar 20, 2023

so you did apparently use lora weights in 4int quantization

Owner Mar 20, 2023

Yes this uses LoRA as mentioned in the README. I am currently working on testing LoRA on 30B to see if there is an improvement. I will upload it in int4 if there is.

KnutJaegersberg

Mar 22, 2023

Have not done quantization before, but to my understanding there is a script which can properly be adjusted to work with any model that works with huggingface transformers. I think galactica 30b would also be very handy to have. Do you think it is possible to quantize? Does quantization happen in RAM or in VRAM?

KnutJaegersberg

Mar 22, 2023

Thinking the same about nllb-moe-54b

elinas

Owner Mar 23, 2023

RAM and VRAM. Approx 90GB RAM at peak and 10GB vram (30b) but you can also use swap. If you link me those models I can look into them but if there's no GPTQ conversion script I'd have to create my own.

KnutJaegersberg

Mar 23, 2023

Good to know, thanks. That's thus easily possible on a beefy workstation.
There is no script yet for them. As far as I understand, the script can be adjusted for huggingface / transformer lib models.

The links are:
https://huggingface.co/facebook/nllb-moe-54b
https://huggingface.co/facebook/galactica-30b

The former is huge. Galactica would be very handy to fit in VRAM. Currently for a lot of setups that's not possible.
nllb is more like an asset. Currently sota for translation work, but it is unusably huge.

elinas

Owner Mar 23, 2023

If you modify the conversion script to support them (individually) then I can run it on my server. Don't really have the time to dig into the scripts and changing them around right now.