nlpzhaof
/

aligngpt-7b-pretrain

+---
+license: apache-2.0
+language:
+- en
+---
+# AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
+[[Project Page](https://aligngpt-vl.github.io/)] [[Paper](https://arxiv.org/abs/2405.14129)] [[Demo](http://47.116.173.89:7870/)] [[Model](https://huggingface.co/nlpzhaof)]
+Authors: [Fei Zhao*](https://scholar.google.com/citations?user=V01xzWQAAAAJ&hl=zh-CN), Taotian Pang*, Chunhui Li, [Zhen Wu](https://scholar.google.com/citations?user=IoGlgtoAAAAJ&hl=zh-CN), Junjie Guo, Shangyu Xing, [Xinyu Dai](https://scholar.google.com/citations?user=zpWB1CgAAAAJ&hl=zh-CN)
+## News and Updates
+- [5/24] 🔥 We released **AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability**.  Checkout the [paper](https://arxiv.org/abs/2405.14129) and [demo](http://47.116.173.89:7870/).
+## Model Zoo
+| Model | LLM | Vision Backbone | Pre-training | Instruct-tuning |
+|----------|----------|-----------|---|---|
+| AlignGPT-7B | [Vicuna 7B](https://huggingface.co/lmsys/vicuna-7b-v1.5) | [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) |[aligngpt-7b-pretrain](https://huggingface.co/nlpzhaof/aligngpt-7b-pretrain/tree/main)| [aligngpt-7b](https://huggingface.co/nlpzhaof/aligngpt-7b/tree/main)|
+| AlignGPT-13B | [Vicuna 13B](https://huggingface.co/lmsys/vicuna-13b-v1.5) | [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) |[aligngpt-13b-pretrain](https://huggingface.co/nlpzhaof/aligngpt-13b-pretrain/tree/main)| [aligngpt-13b](https://huggingface.co/nlpzhaof/aligngpt-13b/tree/main)|
+| AlignGPT-LLaMA2 | [LLaMA-2-7B-Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) | [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) |To be released| To be released|
+| AlignGPT-LLaMA3 | [LLaMA-3-8B-Base](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) |To be released|To be released|
+## Performance
+| Model | VQAv2 | GQA | VizWiz | SQA | T-VQA | POPE | MME | MM-Bench | MM-Bench-CN | SEED | LLaVA-Bench-Wild | MM-Vet |
+|----------|---|---|---|---|---|---|---|---|---|---|---|---|
+| AlignGPT-7B | 79.1 | 62.9 | 54.2 | 68.5 | 58.4 | 86.0 | 1527.4 | 67.3 | 59.9 | 66.5 | 68.4 | 30.8 |
+| AlignGPT-13B | 80.0 | 63.6 | 56.4 | 70.3 | 60.2 | 86.2 | 1572.0 | 69.5 | 63.7 | 67.8 | 75.2 | 35.6 |
+## Citation
+If you find AlignGPT useful for your research and applications, please cite using this BibTeX:
+```
+@misc{zhao2024aligngpt,
+      title={AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability},
+      author={Fei Zhao and Taotian Pang and Chunhui Li and Zhen Wu and Junjie Guo and Shangyu Xing and Xinyu Dai},
+      year={2024},
+      eprint={2405.14129},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
+## License
+[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE)[![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/DATA_LICENSE)
+The data and checkpoint is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of LLaMA, Vicuna and GPT-4. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.