shikiw
/

LLaVA-v1.5-MoCa-7B

Image-Text-to-Text

text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

shikiw commited on 9 days ago

Commit

4b19068

•

1 Parent(s): d67de9c

Create README.md

Files changed (1) hide show

README.md +26 -0

README.md ADDED Viewed

	@@ -0,0 +1,26 @@

+---
+license: llama2
+language:
+- en
+- zh
+tags:
+- multimodal
+datasets:
+- liuhaotian/LLaVA-Pretrain
+base_model:
+- lmsys/vicuna-7b-v1.5
+pipeline_tag: image-text-to-text
+library_name: transformers
+---
+## **Citation**
+If you find this model useful, please cite the following paper
+```
+@article{huang2024deciphering,
+  title={Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate},
+  author={Huang, Qidong and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Cao, Yuhang and Wang, Jiaqi and Lin, Dahua and Zhang, Weiming and Yu, Nenghai},
+  journal={arXiv preprint arXiv:2410.07167},
+  year={2024}
+}
+```