README.md · shikiw/LLaVA-v1.5-MoCa-7B at main

metadata

license: llama2
language:
  - en
  - zh
tags:
  - multimodal
datasets:
  - liuhaotian/LLaVA-Pretrain
base_model:
  - lmsys/vicuna-7b-v1.5
pipeline_tag: image-text-to-text
library_name: transformers

Citation

If you find this model useful, please cite the following paper

@article{huang2024deciphering,
  title={Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate},
  author={Huang, Qidong and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Cao, Yuhang and Wang, Jiaqi and Lin, Dahua and Zhang, Weiming and Yu, Nenghai},
  journal={arXiv preprint arXiv:2410.07167},
  year={2024}
}