File size: 1,306 Bytes
dba8e5d 37fa50e 46991ed faba537 9805fcb faba537 9805fcb 65c537b b3f38b3 9805fcb aba6ef7 9805fcb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
---
language:
- it
base_model:
- openbmb/MiniCPM-V-2_6
library_name: transformers
tags:
- vision
- vqa-italian
- visual-question-answering-italian
---
<h1>Finetuned version of MiniCPM-V 2.6 on GQA-it</h1>
This is a fine-tuned version of MiniCPM-V 2.6 on GQA-it, designed for Italian Vision Question Answering
# Usage
Check out the GitHub repository for more insights and code: https://github.com/crux82/XXXXXX. You can also visit the original basic model repository for advanced usage: https://github.com/OpenBMB/MiniCPM-V.
For more details about dataset please visit: https://github.com/crux82/gqa-it
```python
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer,AutoProcessor
model = AutoModel.from_pretrained('sag-uniroma2/MiniCPM-V-2_6-gqa-it-finetuned', trust_remote_code=True,
attn_implementation='sdpa', torch_dtype=torch.bfloat16)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
img="n346247.jpg"
image = Image.open(img).convert('RGB')
question = "C'è un idrante sull'erba?"
msgs = [{'role': 'user', 'content': [image,question]}]
answer = model.chat(
image=None,
msgs=msgs,
tokenizer=tokenizer
)
print(answer)
```
# Citation
```
TODO
``` |