Model Card for LLaVA-NDiNO_pt_short_long

Model description

LLaVA-NDiNO is a family of Large Vision Language Models (LVLMs) that have been trained for the Italian language.

The model was trained by instruction-tuning LLaVA-NDiNO_pt on an Italian machine-translated version of LLaVA Conversation 58k.

If you are interested in more details regarding the training procedure, you can find the code we used at the following link:

  • Repository: https://github.com/swapUniba/LLaVA-NDiNO

  • Developed by: Elio Musacchio, Lucia Siciliani, Pierpaolo Basile, Giovanni Semeraro

  • Funded by: PNRR project FAIR - Future AI Research

  • Compute infrastructure: Leonardo supercomputer

  • Model type: LLaMA 3 + CLIP

  • Language(s) (NLP): Italian

  • License: Llama 3 Community License

  • Finetuned from model: swap-uniba/LLaVA-NDiNO_pt

Example Usage

import torch
import requests

from PIL import Image
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration, set_seed

model_name = "swap-uniba/LLaVA-NDiNO_pt_long"

processor = LlavaNextProcessor.from_pretrained(model_name)
model = LlavaNextForConditionalGeneration.from_pretrained(model_name, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, device_map="auto") 

url = "https://www.barnorama.com/wp-content/uploads/2016/12/03-Confusing-Pictures.jpg"
image = Image.open(requests.get(url, stream=True).raw)

chat_template = "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}"

conversation = [
    {
        "role": "user",
        "content": "<image>\nCosa c'รจ di strano in questa immagine?"
    },
]

prompt = processor.apply_chat_template(conversation, chat_template, add_generation_prompt=True)
inputs = processor(prompt, image, return_tensors="pt")

set_seed(42)
output = model.generate(**inputs, max_new_tokens=4096)

print(processor.decode(output[0][inputs.input_ids.shape[1]:]))

Citation

@inproceedings{musacchioLLaVANDiNO,
  title={LLaVA-NDiNO: Empowering LLMs with Multimodality for the Italian Language},
  author={Musacchio, Elio and Siciliani, Lucia and Basile, Pierpaolo and Semeraro, Giovanni},
  booktitle={Proceedings of the Eighth Workshop on Natural Language for Artificial Intelligence (NL4AI 2024) co-located with 23th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2024)},
  year={2024}
}
Downloads last month
14
Safetensors
Model size
8.36B params
Tensor type
FP16
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for swap-uniba/LLaVA-NDiNO_pt_long

Finetuned
(394)
this model

Collection including swap-uniba/LLaVA-NDiNO_pt_long