Finetuned version of MiniCPM-V 2.6 on GQA-it

This is a fine-tuned version of MiniCPM-V 2.6 on GQA-it, designed for Italian Vision Question Answering. The original model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters.

Usage

You can visit the original basic model repository for advanced usage: https://github.com/OpenBMB/MiniCPM-V.

For more details about dataset please visit: https://github.com/crux82/gqa-it

import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer,AutoProcessor

model = AutoModel.from_pretrained('sag-uniroma2/MiniCPM-V-2_6-gqa-it-finetuned', trust_remote_code=True,
                                  attn_implementation='sdpa', torch_dtype=torch.bfloat16)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
img="n346247.jpg"
image = Image.open(img).convert('RGB')

question = "C'è un idrante sull'erba?"
msgs = [{'role': 'user', 'content': [image,question]}]

answer = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer
)
print(answer)

GQA-it

Italian Question Answering on Image Scene Graphs

GQA-it is a large-scale Italian dataset for Visual Question Answering based on the balanced version of GQA.

GQA-it contains more than 1 million question/answer pairs in Italian over 80K images obtained by applying Neural Machine Translation.

Most importantly, a Test set of 3,000 question-answer pairs has been manually validated to provide a valuable benchmark in Italian.

Example

Language Question Answer
En Is the remote to the right or to the left of the book? right
It Il telecomando è a destra o a sinistra del libro? destra
En How thick is the book to the left of the remote? thick
It Quanto è spesso il libro a sinistra del telecomando? spesso
En What device is to the left of the calculator made of plastic? charger
It Quale dispositivo si trova a sinistra della calcolatrice di plastica? caricabatterie
En What's the charger made of? plastic
It Di cosa è fatto il caricabatterie? plastica
En Are there any phones? no
It Ci sono dei telefoni? no

Citation

TODO
Downloads last month
16
Safetensors
Model size
8.1B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Model tree for sag-uniroma2/MiniCPM-V-2_6-gqa-it-finetuned

Finetuned
(5)
this model