Issue with loading 4-bit quantized model on Apple M1 pro

#45
by waxsum8 - opened

Have been facing an issue with loading the gemma-2b-it model 4-bit quantization config on Apple M1 pro.

Code:

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

model_id = "google/gemma-2b-it"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config)

Error:

ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`

I have tried installing the latest version of accelerate, bitsandbytes as well as transformers, still facing the issue for quantized version.

Hmm I also tried this and am facing the same issue. for some reason it seems "is_accelerate_available()" is failing despite installing accelerate. Looks like an issue w/ the transformers library directly.

Google org

Hi @waxsum8 , bitsandbytes is not supported on MacOS devices, Please refer to this similar issue.

Hi Google I have open a space using this model you can check it by going to my profile but the problem is the model is showing errors something when I give it a larger promt. Please fix this problem.

Sign up or log in to comment