|
--- |
|
inference: false |
|
library_name: transformers |
|
--- |
|
|
|
# VW-LMM Model Card |
|
|
|
This repo contains the weights of VW-LMM-Vicuna-7b proposed in paper "Multi-modal Auto-regressive Modeling via Visual Words" |
|
|
|
For specific usage and chat templates, please refer to our project repo https://github.com/pengts/VW-LMM |
|
|
|
## Model details |
|
|
|
**Model type:** |
|
VW-LMM is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. |
|
It is an auto-regressive language model, based on the transformer architecture. |
|
|
|
**paper:** |
|
https://arxiv.org/abs/2403.07720 |
|
|
|
**code:** |
|
https://github.com/pengts/VW-LMM |
|
|
|
## License |
|
Llama 2 is licensed under the LLAMA 2 Community License, |
|
Copyright (c) Meta Platforms, Inc. All Rights Reserved. |
|
|
|
## Citation |
|
|
|
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:. |
|
|
|
```BibTeX |
|
@misc{peng2024multimodal, |
|
title={Multi-modal Auto-regressive Modeling via Visual Words}, |
|
author={Tianshuo Peng and Zuchao Li and Lefei Zhang and Hai Zhao and Ping Wang and Bo Du}, |
|
year={2024}, |
|
eprint={2403.07720}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV} |
|
} |
|
``` |