Not able to load via transformers
Hi bro, I am newbie to qlora, I tried below code and it raises OSError. Can you tell me how to load and use this using python.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("TheBloke/guanaco-65B-GPTQ")
model = AutoModelForCausalLM.from_pretrained("TheBloke/guanaco-65B-GPTQ")
OSError Traceback (most recent call last)
in <cell line: 5>()
3 tokenizer = AutoTokenizer.from_pretrained("TheBloke/guanaco-65B-GPTQ")
4
----> 5 model = AutoModelForCausalLM.from_pretrained("TheBloke/guanaco-65B-GPTQ")
1 frames
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
2553 )
2554 else:
-> 2555 raise EnvironmentError(
2556 f"{pretrained_model_name_or_path} does not appear to have a file named"
2557 f" {_add_variant(WEIGHTS_NAME, variant)}, {TF2_WEIGHTS_NAME}, {TF_WEIGHTS_NAME} or"
OSError: TheBloke/guanaco-65B-GPTQ does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
You can't load GPTQ models from regular transformers, you need AutoGPTQ
pip install auto-gptq
Here is example code:
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse
model_name_or_path = "TheBloke/guanaco-65B-GPTQ"
model_basename = "Guanaco-65B-GPTQ-4bit.act-order"
use_triton = False
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)
prompt = "Tell me about AI"
prompt_template=f'''### Instruction: {prompt}
### Response:'''
print("\n\n*** Generate:")
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))
# Inference can also be done using transformers' pipeline
# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)
print("*** Pipeline:")
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
temperature=0.7,
top_p=0.95,
repetition_penalty=1.15
)
print(pipe(prompt_template)[0]['generated_text'])
Thank you bro ππ
First of all, thanks a lot for your work!
I encounter an issue which is directly caused by following codes:
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
device_map="auto",
trust_remote_code=True,
device="cuda",
use_triton=use_triton,
quantize_config=None)
it first warns me:
WARNING 2023-07-03 22:36:45,587-1d: CUDA extension not installed.
....
WARNING 2023-07-03 22:36:58,012-1d: The safetensors archive passed at /home/mydir/.cache/huggingface/hub/models--TheBloke--guanaco-65B-GPTQ/snapshots/c1a31c76e7228a13bc542b25243b912f12e39c87/Guanaco-65B-GPTQ-4bit.act-order.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
after a huge amount of information about device_map, it raises the following error:
```
C++ Traceback (most recent call last):
No stack trace in paddle, may be caused by external reasons.
Error Message Summary:
FatalError: Access to an undefined portion of a memory object
is detected by the operating system.
[TimeInfo: *** Aborted at 1688395081 (unix time) try "date -d @1688395081" if you are using GNU date ***]
[SignalInfo: *** SIGBUS (@0x7fbce9c3dff0) received by PID 424101 (TID 0x7fbea6e7e740) from PID 18446744073336512496 ***]
![image.png](https://cdn-uploads.huggingface.co/production/uploads/6033ae93b5883695ce9d0918/LsYSFyg909xJjyhnCxeCj.png)
I pretty sure that I have my cudatoolkit installted, do you have any clue about the problerm?
Again, thanks for your work and hope for your reply.
Firstly, just to check you're running this on a system with an Nvidia GPU available, with at least 48GB VRAM?
If so, the first problem is that the CUDA extension is not installed. Please try re-installing auto-gptq with:
pip3 uninstall -y auto-gptq
GITHUB_ACTIONS=true pip3 install auto-gptq
Not sure about the rest, let's see if installing AutoGPTQ with the CUDA module available fixes that first.
Firstly, just to check you're running this on a system with an Nvidia GPU available, with at least 48GB VRAM?
If so, the first problem is that the CUDA extension is not installed. Please try re-installing auto-gptq with:
pip3 uninstall -y auto-gptq GITHUB_ACTIONS=true pip3 install auto-gptq
Not sure about the rest, let's see if installing AutoGPTQ with the CUDA module available fixes that first.
Thank you so much