load granite model with pipeline

by lenadan - opened Oct 22, 2024

Oct 22, 2024

I tried to follow your code example and use pipeline to load granite-3.0-8b-instruct, but I'm getting an error that indicates the tokenizer is not initialized.
I debugged the code, and it seems that the tokenizer is missing from TOKENIZER_MAPPING_NAMES that is defined in tokenization_auto.py.
Could advise?

gabegoodhart

IBM Granite org Oct 22, 2024

@lenadan Thanks for your interest! Can you share a code snippet of how you loaded the model the model with a pipeline?

lenadan

Oct 22, 2024

Sure. I actually used the code you've prodived:

from transformers import pipeline

messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="/my_local_path/granite-3.0-8b-instruct")
pipe(messages)

model was cloned locally by running: git clone https://huggingface.co/ibm-granite/granite-3.0-8b-instruct.

These are my dependencies:
transformers==4.45.2
ibm_watsonx_ai==1.1.16

And this is the full error I got:

gabegoodhart

IBM Granite org Oct 22, 2024

Thanks! This is definitely a gap and we'll work to get a fix into transformers soon. In the meantime, you can manually pass a tokenizer to the pipeline initialization:

from transformers import AutoTokenizer, pipeline
model_id = "/my_local_path/granite-3.0-8b-instruct"
tok = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline(task="text-generation", model=model_id, tokenizer=tok)
print(pipe("Hello world!"))

Interesting aside: It looks like you're pulling the example code from the Use this model drop down. That is actually auto-populated by Huggingface and not something we wrote. Thanks for pointing out this gap!

lenadan

Oct 22, 2024

Thanks, I had no idea this drop down was auto-populated. It would be nice if you could make the basic pipeline API work (without the need to provide the tokenizer), because it will enable users how use pipelines to switch to Granite without changing anything in their code. I'll keep experimenting with Granite and check every once in a while if there's update regarding this issue.
Thanks for your quick reply!

gabegoodhart

IBM Granite org Oct 22, 2024

Yes, 100% agree. The simplest fix seems to be adding "tokenizer_class" to the config.json (I've verified it works locally). We'll work to get that set in all of the models. Thanks again for pointing this out!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment