|
--- |
|
license: mit |
|
--- |
|
|
|
# Probing Visual Language Priors in VLMs |
|
|
|
|
|
## ImageDPO Finetuned Model |
|
|
|
This page provides the **ImageDPO** finetuned checkpoint for LLaVA-v1.5-13B used in [Probing Visual Language Priors in VLMs](https://arxiv.org/abs/2501.00569). We offer both **LoRA parameters** and the **merged model weights** for use. |
|
|
|
## Usage |
|
|
|
First, install the [LLaVA-v1.5 codebase](https://github.com/LLaVA-VL/LLaVA-Plus-Codebase). |
|
|
|
Run the following command to have a try: |
|
|
|
```bash |
|
python -m llava.eval.run_llava \ |
|
--model-path ViLP/LLaVA-v1.5-13b-ImageDPO \ |
|
--image-file 'images/llava_logo.png' \ |
|
--query 'Please caption this image.' \ |
|
--conv-mode llava_v1 |
|
``` |
|
|
|
|
|
## Citation Information |
|
|
|
Please consider citing ***ViLP*** paper, if you find our resource helpful! |
|
|
|
```bibtex |
|
@article{luo2024probing, |
|
title={Probing Visual Language Priors in VLMs}, |
|
author={Luo, Tiange and Cao, Ang and Lee, Gunhee and Johnson, Justin and Lee, Honglak}, |
|
journal={arXiv preprint arXiv:2501.00569}, |
|
year={2024} |
|
} |
|
``` |