"The model responds to the question: 'What is depicted in this image?' Below is my code:
python
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch
from PIL import Image
import requests

Load the processor and model

processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
model = LlavaNextForConditionalGeneration.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf", torch_dtype=torch.float16, low_cpu_mem_usage=True)
model.to("cuda:0")

Prepare the image and text prompt using the appropriate template

image = Image.open("image.jpg")

Define the chat history and format the prompt correctly

Each item in "content" should be a list of dictionaries with types ("text", "image")

conversation = [{
"role": "user",
"content": [
{"type": "text", "text": "What is shown in this image?"},
{"type": "image"},
],
}]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")

Generate a response to the prompt

output = model.generate(**inputs, max_new_tokens=100)
print(processor.decode(output[0], skip_special_tokens=True))

llava-hf
/

llava-v1.6-mistral-7b-hf

all the outputs from the llava-hf/llava-v1.6-mistral-7b-hf model are showing as <unk> when prompted with an image. kindly clarify if there is an issue or I’m doing something incorrectly?

Load the processor and model

Prepare the image and text prompt using the appropriate template

Define the chat history and format the prompt correctly

Each item in "content" should be a list of dictionaries with types ("text", "image")

Generate a response to the prompt