all the outputs from the llava-hf/llava-v1.6-mistral-7b-hf model are showing as <unk> when prompted with an image. kindly clarify if there is an issue or I’m doing something incorrectly?
"The model responds to the question: 'What is depicted in this image?' Below is my code:
python
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch
from PIL import Image
import requests
Load the processor and model
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
model = LlavaNextForConditionalGeneration.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf", torch_dtype=torch.float16, low_cpu_mem_usage=True)
model.to("cuda:0")
Prepare the image and text prompt using the appropriate template
image = Image.open("image.jpg")
Define the chat history and format the prompt correctly
Each item in "content" should be a list of dictionaries with types ("text", "image")
conversation = [{
"role": "user",
"content": [
{"type": "text", "text": "What is shown in this image?"},
{"type": "image"},
],
}]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")
Generate a response to the prompt
output = model.generate(**inputs, max_new_tokens=100)
print(processor.decode(output[0], skip_special_tokens=True))