Any plan for 8bit version?
Hi! Thank you very much for sharing. Is there any plan for an 8-bit version? Many thanks: )
Thanks for taking an interest! @jm4n21
NM already got there first: neuralmagic/Mistral-Small-24B-Instruct-2501-FP8-Dynamic
Unless you mean int8 (w8a16); do those run well? Afaik Machete kernel from vLLM is for Hopper architectures (handles w8a16), but Hopper doesn't do too well with int quants.
Further floating 8bit point quants:
There's static integer W8A8 quants too: noneUsername/Mistral-Small-24B-Instruct-2501-W8A8
Or another dynamic quant (should perform better, but slower): EliasOenal/Mistral-Small-24B-Instruct-2501-W8A8-dynamic
But let me know if you meant int8 weights with bf16/f16 activations - I'll make one next weekend.
Hi @JustJaro
Thank you for your detailed suggestions.
The reason I was asking is that FP-dynamic models are typically loaded via vLLM. However, I have specific reasons for preferring Higginface’s loading method, hence, GPTQ models, which are probably more suitable for my use case.
Please let me know if I’ve misunderstanding anything
Many thanks for your help :)