I'm Having trouble running inference on Colab

#1
by ali-issa - opened

When trying the run this code :

Use a pipeline as a high-level helper

from transformers import pipeline

Use a pipeline as a high-level helper

from transformers import pipeline

Load model directly

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Trelis/Llama-2-7b-chat-hf-function-calling-GPTQ")
model = AutoModelForCausalLM.from_pretrained("Trelis/Llama-2-7b-chat-hf-function-calling-GPTQ")

I am receiving this error :

Trelis/Llama-2-7b-chat-hf-function-calling-GPTQ does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

Howdy!

What packages have you installed? Also, this model is saved as safetensors.

Here is what I used:

model_name_or_path = "Trelis/Llama-2-7b-chat-hf-function-calling-GPTQ"

model_basename = "gptq_model-4bit-128g"

from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

os.environ["SAFETENSORS_FAST_GPU"] = "1"

use_triton = False

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

You also need to pip install autogptq - which is fastest done using wheels. Check out this notebook for a full example.

RonanMcGovern changed discussion status to closed
RonanMcGovern changed discussion status to open
ali-issa changed discussion title from I;m Having trouble running inference on Colab to I'm Having trouble running inference on Colab

great ! I appreciate your response.

ali-issa changed discussion status to closed

Sign up or log in to comment