|
--- |
|
library_name: transformers |
|
tags: |
|
- art |
|
datasets: |
|
- ColumbiaNLP/V-FLUTE |
|
language: |
|
- en |
|
metrics: |
|
- f1 |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
This is the checkpoint for the model from the paper [V-FLUTE: Visual Figurative Language Understanding with Textual Explanations](https://arxiv.org/abs/2405.01474). |
|
Specifically, it is the best performing fine-tuned model on a combination of V-FLUTE and e-ViL (e-SNLI-VE) datasets with early stopping based on the V-FLUTE validation set. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
See more on LLaVA 1.5 here: https://github.com/haotian-liu/LLaVA |
|
V-FLUTE dataset: https://huggingface.co/datasets/ColumbiaNLP/V-FLUTE |
|
V-FLUTE paper: https://arxiv.org/abs/2405.01474 |
|
Citation: |
|
``` |
|
@misc{saakyan2024understandingfigurativemeaningexplainable, |
|
title={Understanding Figurative Meaning through Explainable Visual Entailment}, |
|
author={Arkadiy Saakyan and Shreyas Kulkarni and Tuhin Chakrabarty and Smaranda Muresan}, |
|
year={2024}, |
|
eprint={2405.01474}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2405.01474}, |
|
} |
|
``` |
|
|
|
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. |
|
|
|
- **Developed by:** Arkadiy Saakyan (ColumbiaNLP) |
|
- **Model type:** Vision-Language Model |
|
- **Language(s) (NLP):** English |
|
- **Finetuned from model [optional]:** LLaVA-v1.5 |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** https://github.com/asaakyan/V-FLUTE |
|
- **Paper [optional]:** https://arxiv.org/abs/2405.01474 |
|
|
|
## Uses |
|
|
|
The model's intended use is limited to interpreting multimodal figurative inputs such as metaphors, similes, idioms, sarcasm, and humor. |
|
|
|
### Out-of-Scope Use |
|
|
|
The model may not work well for other general instruction-following usecases. |
|
|
|
[More Information Needed] |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
The V-FLUTE dataset or its source datasets may contain bias, especially in datasets reflecting user-generated distributions (memecap and muse). |
|
|
|
### Recommendations |
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. |
|
|
|
## How to Get Started with the Model |
|
|
|
Install LLaVA as described here: https://github.com/asaakyan/LLaVA/tree/6f595efcf2699884f18957ee603986cebfaa9df7 |
|
|
|
``` |
|
from llava.model.builder import load_pretrained_model |
|
from llava.mm_utils import get_model_name_from_path |
|
from llava.eval.run_llava_mod import eval_model |
|
|
|
model_base = "llava-v1.5-7b" |
|
model_dir = "llava-v1.5-7b-evil-vflue-v2-lora" |
|
model_name = get_model_name_from_path(model_path) |
|
tokenizer, model, image_processor, context_len = load_pretrained_model( |
|
model_path=model_path, |
|
model_base=model_base, |
|
model_name=model_name, |
|
load_4bit=False |
|
) |
|
|
|
prompt = """Does the illustration affirm or contest the claim "Feeling motivated and energetic after only cleaning a room minimally."? Provide your argument and choose a label: entailment or contradiction.""" |
|
image_file = f"{image_path}/27.png" |
|
|
|
infer_args = type('Args', (), { |
|
"model_name": model_name, |
|
"model": model, |
|
"tokenizer": tokenizer, |
|
"image_processor": image_processor, |
|
"query": prompt, |
|
"conv_mode": None, |
|
"image_file": image_file, |
|
"sep": ",", |
|
"temperature": 0, |
|
"top_p": None, |
|
"num_beams": 3, |
|
"max_new_tokens": 512 |
|
})() |
|
output = eval_model(infer_args) |
|
print(output) |
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
See [here](https://github.com/asaakyan/LLaVA/tree/6f595efcf2699884f18957ee603986cebfaa9df7/scripts/vflute) |
|
or [here](https://github.com/asaakyan/V-FLUTE) |
|
|
|
### Training Data |
|
|
|
https://huggingface.co/datasets/ColumbiaNLP/V-FLUTE |
|
|
|
## Model Card Contact |
|
|
|
[email protected] |