all the outputs from the llava-hf/llava-v1.6-mistral-7b-hf model are showing as <unk> when prompted with an image. kindly clarify if there is an issue or I’m doing something incorrectly?

#39
by Abdrabu - opened

"The model responds to the question: 'What is depicted in this image?' Below is my code:
python
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch
from PIL import Image
import requests

Load the processor and model

processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
model = LlavaNextForConditionalGeneration.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf", torch_dtype=torch.float16, low_cpu_mem_usage=True)
model.to("cuda:0")

Prepare the image and text prompt using the appropriate template

image = Image.open("image.jpg")

Define the chat history and format the prompt correctly

Each item in "content" should be a list of dictionaries with types ("text", "image")

conversation = [{
"role": "user",
"content": [
{"type": "text", "text": "What is shown in this image?"},
{"type": "image"},
],
}]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")

Generate a response to the prompt

output = model.generate(**inputs, max_new_tokens=100)
print(processor.decode(output[0], skip_special_tokens=True))

Abdrabu changed discussion title from All result output is <unk> from the prompt what is shown in the image to all the outputs from the llava-hf/llava-v1.6-mistral-7b-hf model are showing as <unk> when prompted with an image. kindly clarify if there is an issue or I’m doing something incorrectly?

Sign up or log in to comment