Issue with loading 4-bit quantized model on Apple M1 pro
#45
by
waxsum8
- opened
Have been facing an issue with loading the gemma-2b-it model 4-bit quantization config on Apple M1 pro.
Code:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
model_id = "google/gemma-2b-it"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config)
Error:
ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`
I have tried installing the latest version of accelerate, bitsandbytes as well as transformers, still facing the issue for quantized version.
Hmm I also tried this and am facing the same issue. for some reason it seems "is_accelerate_available()" is failing despite installing accelerate. Looks like an issue w/ the transformers library directly.
Hi Google I have open a space using this model you can check it by going to my profile but the problem is the model is showing errors something when I give it a larger promt. Please fix this problem.