File size: 1,306 Bytes
dba8e5d
 
 
 
 
37fa50e
 
 
 
 
46991ed
 
 
faba537
9805fcb
faba537
9805fcb
 
65c537b
b3f38b3
 
9805fcb
 
 
 
 
 
 
 
 
aba6ef7
9805fcb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
language:
- it
base_model:
- openbmb/MiniCPM-V-2_6
library_name: transformers
tags:
- vision
- vqa-italian
- visual-question-answering-italian
---


<h1>Finetuned version of MiniCPM-V 2.6 on GQA-it</h1>

This is a fine-tuned version of MiniCPM-V 2.6 on GQA-it, designed for Italian Vision Question Answering

# Usage
Check out the GitHub repository for more insights and code: https://github.com/crux82/XXXXXX. You can also visit the original basic model repository for advanced usage: https://github.com/OpenBMB/MiniCPM-V.

For more details about dataset please visit: https://github.com/crux82/gqa-it
```python
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer,AutoProcessor

model = AutoModel.from_pretrained('sag-uniroma2/MiniCPM-V-2_6-gqa-it-finetuned', trust_remote_code=True,
                                  attn_implementation='sdpa', torch_dtype=torch.bfloat16)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
img="n346247.jpg"
image = Image.open(img).convert('RGB')

question = "C'è un idrante sull'erba?"
msgs = [{'role': 'user', 'content': [image,question]}]

answer = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer
)
print(answer)

```

# Citation
```
TODO
```