File size: 2,543 Bytes
dba8e5d
 
 
 
 
37fa50e
 
 
 
 
46991ed
 
 
faba537
9805fcb
92f6dcf
 
9805fcb
 
80371f4
b3f38b3
 
80371f4
9805fcb
 
 
 
 
 
 
 
 
aba6ef7
9805fcb
 
 
 
 
 
 
 
 
 
 
 
 
 
be3b77f
 
 
 
 
 
 
 
 
 
 
30a7190
be3b77f
 
 
 
 
 
 
 
 
 
 
 
 
 
9805fcb
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
language:
- it
base_model:
- openbmb/MiniCPM-V-2_6
library_name: transformers
tags:
- vision
- vqa-italian
- visual-question-answering-italian
---


<h1>Finetuned version of MiniCPM-V 2.6 on GQA-it</h1>

This is a fine-tuned version of MiniCPM-V 2.6 on GQA-it, designed for Italian Vision Question Answering.
The original model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters.

# Usage
You can visit the original basic model repository for advanced usage: https://github.com/OpenBMB/MiniCPM-V.

For more details about dataset please visit: https://github.com/crux82/gqa-it

```python
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer,AutoProcessor

model = AutoModel.from_pretrained('sag-uniroma2/MiniCPM-V-2_6-gqa-it-finetuned', trust_remote_code=True,
                                  attn_implementation='sdpa', torch_dtype=torch.bfloat16)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
img="n346247.jpg"
image = Image.open(img).convert('RGB')

question = "C'è un idrante sull'erba?"
msgs = [{'role': 'user', 'content': [image,question]}]

answer = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer
)
print(answer)

```

# GQA-it
## Italian Question Answering on Image Scene Graphs

GQA-it is a **large-scale Italian dataset for Visual Question Answering** based on the balanced version of [GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html).

GQA-it contains more than **1 million question/answer pairs in Italian over 80K images** obtained by applying Neural Machine Translation. 

Most importantly, a **Test set of 3,000 question-answer pairs has been manually validated to provide a valuable benchmark in Italian**.


## Example
![](n90294.jpg)

| Language | Question | Answer |
| --- | :---: | :---: |
| En | Is the remote to the right or to the left of the book? | right |
| It | _Il telecomando è a destra o a sinistra del libro?_ | _destra_ |
| En | How thick is the book to the left of the remote? | thick | 
| It | _Quanto è spesso il libro a sinistra del telecomando?_ | _spesso_ |
| En | What device is to the left of the calculator made of plastic?| charger |
| It | _Quale dispositivo si trova a sinistra della calcolatrice di plastica?_ | _caricabatterie_ |
| En | What's the charger made of? | plastic |
| It | _Di cosa è fatto il caricabatterie?_ | _plastica_ |
| En | Are there any phones? | no |
| It | _Ci sono dei telefoni?_ | _no_ |

# Citation
```
TODO
```