Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
---
|
6 |
+
|
7 |
+
# AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
|
8 |
+
[[Project Page](https://aligngpt-vl.github.io/)] [[Paper](https://arxiv.org/abs/2405.14129)] [[Demo](http://47.116.173.89:7870/)] [[Model](https://huggingface.co/nlpzhaof)]
|
9 |
+
|
10 |
+
Authors: [Fei Zhao*](https://scholar.google.com/citations?user=V01xzWQAAAAJ&hl=zh-CN), Taotian Pang*, Chunhui Li, [Zhen Wu](https://scholar.google.com/citations?user=IoGlgtoAAAAJ&hl=zh-CN), Junjie Guo, Shangyu Xing, [Xinyu Dai](https://scholar.google.com/citations?user=zpWB1CgAAAAJ&hl=zh-CN)
|
11 |
+
|
12 |
+
|
13 |
+
## News and Updates
|
14 |
+
- [5/24] 🔥 We released **AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability**. Checkout the [paper](https://arxiv.org/abs/2405.14129) and [demo](http://47.116.173.89:7870/).
|
15 |
+
|
16 |
+
|
17 |
+
## Model Zoo
|
18 |
+
|
19 |
+
| Model | LLM | Vision Backbone | Pre-training | Instruct-tuning |
|
20 |
+
|----------|----------|-----------|---|---|
|
21 |
+
| AlignGPT-7B | [Vicuna 7B](https://huggingface.co/lmsys/vicuna-7b-v1.5) | [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) |[aligngpt-7b-pretrain](https://huggingface.co/nlpzhaof/aligngpt-7b-pretrain/tree/main)| [aligngpt-7b](https://huggingface.co/nlpzhaof/aligngpt-7b/tree/main)|
|
22 |
+
| AlignGPT-13B | [Vicuna 13B](https://huggingface.co/lmsys/vicuna-13b-v1.5) | [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) |[aligngpt-13b-pretrain](https://huggingface.co/nlpzhaof/aligngpt-13b-pretrain/tree/main)| [aligngpt-13b](https://huggingface.co/nlpzhaof/aligngpt-13b/tree/main)|
|
23 |
+
| AlignGPT-LLaMA2 | [LLaMA-2-7B-Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) | [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) |To be released| To be released|
|
24 |
+
| AlignGPT-LLaMA3 | [LLaMA-3-8B-Base](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) |To be released|To be released|
|
25 |
+
|
26 |
+
|
27 |
+
## Performance
|
28 |
+
| Model | VQAv2 | GQA | VizWiz | SQA | T-VQA | POPE | MME | MM-Bench | MM-Bench-CN | SEED | LLaVA-Bench-Wild | MM-Vet |
|
29 |
+
|----------|---|---|---|---|---|---|---|---|---|---|---|---|
|
30 |
+
| AlignGPT-7B | 79.1 | 62.9 | 54.2 | 68.5 | 58.4 | 86.0 | 1527.4 | 67.3 | 59.9 | 66.5 | 68.4 | 30.8 |
|
31 |
+
| AlignGPT-13B | 80.0 | 63.6 | 56.4 | 70.3 | 60.2 | 86.2 | 1572.0 | 69.5 | 63.7 | 67.8 | 75.2 | 35.6 |
|
32 |
+
|
33 |
+
## Citation
|
34 |
+
If you find AlignGPT useful for your research and applications, please cite using this BibTeX:
|
35 |
+
```
|
36 |
+
@misc{zhao2024aligngpt,
|
37 |
+
title={AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability},
|
38 |
+
author={Fei Zhao and Taotian Pang and Chunhui Li and Zhen Wu and Junjie Guo and Shangyu Xing and Xinyu Dai},
|
39 |
+
year={2024},
|
40 |
+
eprint={2405.14129},
|
41 |
+
archivePrefix={arXiv},
|
42 |
+
primaryClass={cs.CL}
|
43 |
+
}
|
44 |
+
```
|
45 |
+
|
46 |
+
## License
|
47 |
+
|
48 |
+
[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE)[![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/DATA_LICENSE)
|
49 |
+
|
50 |
+
The data and checkpoint is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of LLaMA, Vicuna and GPT-4. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.
|