Quantized Versions of jais-13b-chat
Hello,
I'm using the "jais-13b-chat" model and find it beneficial. For optimization purposes, could you consider providing 4-bit and 8-bit quantized versions? This would greatly assist deployments in resource-limited environments.
Thanks for considering,
Noureddine
you can use bitsandbytes directly on jais
There is this quantized version (https://huggingface.co/mouaff25/jais-13b-chat-8bit) but it did not work for me. Model loaded by got tensor mismatch error.
There is this quantized version (https://huggingface.co/mouaff25/jais-13b-chat-8bit) but it did not work for me. Model loaded by got tensor mismatch error.
It works using A100 :
https://colab.research.google.com/drive/1QLihIVHOnWrz5P7XER4mn13YuGAbnPDq?usp=sharing
I've just pushed an 8-bit quantized version , feel free to check it 'drakkola/jais-13b-chat-8bit'