image/png

Idefics3Llama Fine-tuned using QLoRA on VQAv2

  • This is the Idefics3Llama model QLoRA fine-tuned on a very small part of VQAv2 dataset.

  • Find the fine-tuning notebook here.

Usage

You can load and use this model as follows.


from transformers import Idefics3ForConditionalGeneration, AutoProcessor

peft_model_id = "merve/idefics3llama-vqav2"
base_model_id = "HuggingFaceM4/Idefics3-8B-Llama3"
processor = AutoProcessor.from_pretrained(base_model_id)
model = Idefics3ForConditionalGeneration.from_pretrained(base_model_id)
model.load_adapter(peft_model_id).to("cuda")

This model was conditioned on a prompt "Answer briefly.".

from PIL import Image
import requests
from transformers.image_utils import load_image

DEVICE = "cuda"

image = load_image("https://huggingface.co/spaces/merve/OWLSAM2/resolve/main/buddha.JPG")


messages = [
          {
              "role": "user",
              "content": [
                  {"type": "text", "text": "Answer briefly."},
                  {"type": "image"},
                  {"type": "text", "text": "Which country is this located in?"}
              ]
          }
      ]

text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt", padding=True).to("cuda")

We can infer.

generated_ids = model.generate(**inputs, max_new_tokens=500)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(generated_texts)

##['User: Answer briefly.<row_1_col_1><row_1_col_2><row_1_col_3><row_1_col_4>\n<row_2_col_1>
# <row_2_col_2><row_2_col_3><row_2_col_4>\n<row_3_col_1><row_3_col_2><row_3_col_3>
# <row_3_col_4>\n\n<global-img>Which country is this located in?\nAssistant: thailand\nAssistant: thailand']
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Dataset used to train merve/idefics3-llama-vqav2