Multiple local images

#9
by chiauho - opened

Hi, thanks for the code to the question on "local image input'. It works for 1 image. Would it be possible to work for say 3 images? For example if I am doing a RAG using model like RAGMultiModalModel and the model returned 3 relevant pages of a document to a match of a query, how can I pass these 3 images (using Image.open) to pixtral? Should I create something like a list of images and do something like this:

chat = [
{
"role": "user", "content": [
{"type": "text", "content": "What are the causes of eczema?"},
{"type": "image"},
]
}
]

prompt = processor.apply_chat_template(chat)
inputs = processor(text=prompt, images=image_list, return_tensors="pt").to(model.device)
generate_ids = model.generate(**inputs, max_new_tokens=500)
output = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

Adding one of these messages: "{"type": "image"}" per image that you want to pass and handing a list of PIL images to the processor should do the trick.

Sign up or log in to comment