|
--- |
|
language: |
|
- it |
|
base_model: |
|
- openbmb/MiniCPM-V-2_6 |
|
library_name: transformers |
|
tags: |
|
- vision |
|
- vqa-italian |
|
- visual-question-answering-italian |
|
--- |
|
|
|
|
|
<h1>Finetuned version of MiniCPM-V 2.6 on GQA-it</h1> |
|
|
|
This is a fine-tuned version of MiniCPM-V 2.6 on GQA-it, designed for Italian Vision Question Answering. |
|
The original model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. |
|
|
|
# Usage |
|
You can visit the original basic model repository for advanced usage: https://github.com/OpenBMB/MiniCPM-V. |
|
|
|
For more details about dataset please visit: https://github.com/crux82/gqa-it |
|
|
|
```python |
|
import torch |
|
from PIL import Image |
|
from transformers import AutoModel, AutoTokenizer,AutoProcessor |
|
|
|
model = AutoModel.from_pretrained('sag-uniroma2/MiniCPM-V-2_6-gqa-it-finetuned', trust_remote_code=True, |
|
attn_implementation='sdpa', torch_dtype=torch.bfloat16) |
|
model = model.eval().cuda() |
|
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True) |
|
img="n346247.jpg" |
|
image = Image.open(img).convert('RGB') |
|
|
|
question = "C'è un idrante sull'erba?" |
|
msgs = [{'role': 'user', 'content': [image,question]}] |
|
|
|
answer = model.chat( |
|
image=None, |
|
msgs=msgs, |
|
tokenizer=tokenizer |
|
) |
|
print(answer) |
|
|
|
``` |
|
|
|
# GQA-it |
|
## Italian Question Answering on Image Scene Graphs |
|
|
|
GQA-it is a **large-scale Italian dataset for Visual Question Answering** based on the balanced version of [GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html). |
|
|
|
GQA-it contains more than **1 million question/answer pairs in Italian over 80K images** obtained by applying Neural Machine Translation. |
|
|
|
Most importantly, a **Test set of 3,000 question-answer pairs has been manually validated to provide a valuable benchmark in Italian**. |
|
|
|
|
|
## Example |
|
![](n90294.jpg) |
|
|
|
| Language | Question | Answer | |
|
| --- | :---: | :---: | |
|
| En | Is the remote to the right or to the left of the book? | right | |
|
| It | _Il telecomando è a destra o a sinistra del libro?_ | _destra_ | |
|
| En | How thick is the book to the left of the remote? | thick | |
|
| It | _Quanto è spesso il libro a sinistra del telecomando?_ | _spesso_ | |
|
| En | What device is to the left of the calculator made of plastic?| charger | |
|
| It | _Quale dispositivo si trova a sinistra della calcolatrice di plastica?_ | _caricabatterie_ | |
|
| En | What's the charger made of? | plastic | |
|
| It | _Di cosa è fatto il caricabatterie?_ | _plastica_ | |
|
| En | Are there any phones? | no | |
|
| It | _Ci sono dei telefoni?_ | _no_ | |
|
|
|
# Citation |
|
``` |
|
TODO |
|
``` |