File size: 2,543 Bytes
dba8e5d 37fa50e 46991ed faba537 9805fcb 92f6dcf 9805fcb 80371f4 b3f38b3 80371f4 9805fcb aba6ef7 9805fcb be3b77f 30a7190 be3b77f 9805fcb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
---
language:
- it
base_model:
- openbmb/MiniCPM-V-2_6
library_name: transformers
tags:
- vision
- vqa-italian
- visual-question-answering-italian
---
<h1>Finetuned version of MiniCPM-V 2.6 on GQA-it</h1>
This is a fine-tuned version of MiniCPM-V 2.6 on GQA-it, designed for Italian Vision Question Answering.
The original model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters.
# Usage
You can visit the original basic model repository for advanced usage: https://github.com/OpenBMB/MiniCPM-V.
For more details about dataset please visit: https://github.com/crux82/gqa-it
```python
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer,AutoProcessor
model = AutoModel.from_pretrained('sag-uniroma2/MiniCPM-V-2_6-gqa-it-finetuned', trust_remote_code=True,
attn_implementation='sdpa', torch_dtype=torch.bfloat16)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
img="n346247.jpg"
image = Image.open(img).convert('RGB')
question = "C'è un idrante sull'erba?"
msgs = [{'role': 'user', 'content': [image,question]}]
answer = model.chat(
image=None,
msgs=msgs,
tokenizer=tokenizer
)
print(answer)
```
# GQA-it
## Italian Question Answering on Image Scene Graphs
GQA-it is a **large-scale Italian dataset for Visual Question Answering** based on the balanced version of [GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html).
GQA-it contains more than **1 million question/answer pairs in Italian over 80K images** obtained by applying Neural Machine Translation.
Most importantly, a **Test set of 3,000 question-answer pairs has been manually validated to provide a valuable benchmark in Italian**.
## Example
![](n90294.jpg)
| Language | Question | Answer |
| --- | :---: | :---: |
| En | Is the remote to the right or to the left of the book? | right |
| It | _Il telecomando è a destra o a sinistra del libro?_ | _destra_ |
| En | How thick is the book to the left of the remote? | thick |
| It | _Quanto è spesso il libro a sinistra del telecomando?_ | _spesso_ |
| En | What device is to the left of the calculator made of plastic?| charger |
| It | _Quale dispositivo si trova a sinistra della calcolatrice di plastica?_ | _caricabatterie_ |
| En | What's the charger made of? | plastic |
| It | _Di cosa è fatto il caricabatterie?_ | _plastica_ |
| En | Are there any phones? | no |
| It | _Ci sono dei telefoni?_ | _no_ |
# Citation
```
TODO
``` |