Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ pipeline_tag: image-text-to-text
|
|
10 |
`xGen-MM` is a series of the latest foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research. This series advances upon the successful designs of the `BLIP` series, incorporating fundamental enhancements that ensure a more robust and superior foundation. These models have been trained at scale on high-quality image caption datasets and interleaved image-text data.
|
11 |
|
12 |
In the v1.5 (08/2024) release, we present a series of XGen-MM models including:
|
13 |
-
- [🤗 xGen-MM-instruct-interleave (our main instruct model)](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-
|
14 |
- This model has higher overall scores than [xGen-MM-instruct](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-singleimg-r-v1.5) on both single-image and multi-image benchmarks.
|
15 |
- [🤗 xGen-MM-base](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-base-r-v1.5): `xgen-mm-phi3-mini-base-r-v1.5`
|
16 |
- [🤗 xGen-MM-instruct](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-singleimg-r-v1.5): `xgen-mm-phi3-mini-instruct-singleimg-r-v1.5`
|
@@ -65,14 +65,14 @@ The instruct model is fine-tuned on a mixture of around 1 million samples from m
|
|
65 |
|
66 |
<p>
|
67 |
<figure class="half">
|
68 |
-
<a href="
|
69 |
-
<a href="
|
70 |
</figure>
|
71 |
</p>
|
72 |
|
73 |
<p>
|
74 |
<figure>
|
75 |
-
<a href="
|
76 |
</figure>
|
77 |
</p>
|
78 |
|
@@ -105,12 +105,14 @@ We thank the authors for their open-source implementations.
|
|
105 |
|
106 |
# Citation
|
107 |
```
|
108 |
-
@
|
109 |
-
author
|
110 |
-
title
|
111 |
-
|
112 |
-
|
113 |
-
|
|
|
|
|
114 |
}
|
115 |
```
|
116 |
|
|
|
10 |
`xGen-MM` is a series of the latest foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research. This series advances upon the successful designs of the `BLIP` series, incorporating fundamental enhancements that ensure a more robust and superior foundation. These models have been trained at scale on high-quality image caption datasets and interleaved image-text data.
|
11 |
|
12 |
In the v1.5 (08/2024) release, we present a series of XGen-MM models including:
|
13 |
+
- [🤗 xGen-MM-instruct-interleave (our main instruct model)](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5): `xgen-mm-phi3-mini-instruct-interleave-r-v1.5`
|
14 |
- This model has higher overall scores than [xGen-MM-instruct](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-singleimg-r-v1.5) on both single-image and multi-image benchmarks.
|
15 |
- [🤗 xGen-MM-base](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-base-r-v1.5): `xgen-mm-phi3-mini-base-r-v1.5`
|
16 |
- [🤗 xGen-MM-instruct](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-singleimg-r-v1.5): `xgen-mm-phi3-mini-instruct-singleimg-r-v1.5`
|
|
|
65 |
|
66 |
<p>
|
67 |
<figure class="half">
|
68 |
+
<a href="https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5/blob/main/examples/example-1.png"><img src="./examples/example-1.png"></a>
|
69 |
+
<a href="https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5/blob/main/examples/example-2.png"><img src="./examples/example-2.png"></a>
|
70 |
</figure>
|
71 |
</p>
|
72 |
|
73 |
<p>
|
74 |
<figure>
|
75 |
+
<a href="https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5/blob/main/examples/sft-examples.png"><img src="./examples/sft-examples.png"></a>
|
76 |
</figure>
|
77 |
</p>
|
78 |
|
|
|
105 |
|
106 |
# Citation
|
107 |
```
|
108 |
+
@misc{blip3-xgenmm,
|
109 |
+
author = {Le Xue, Manli Shu, Anas Awadalla, Jun Wang, An Yan, Senthil Purushwalkam, Honglu Zhou, Viraj Prabhu, Yutong Dai, Michael S Ryoo, Shrikant Kendre, Jieyu Zhang, Can Qin, Shu Zhang, Chia-Chih Chen, Ning Yu, Juntao Tan, Tulika Manoj Awalgaonkar, Shelby Heinecke, Huan Wang, Yejin Choi, Ludwig Schmidt, Zeyuan Chen, Silvio Savarese, Juan Carlos Niebles, Caiming Xiong, Ran Xu},
|
110 |
+
title = {BLIP-3: A Family of Open Large Multimodal Models},
|
111 |
+
year = {2024},
|
112 |
+
eprint = {2408.08872},
|
113 |
+
archivePrefix = {arXiv},
|
114 |
+
primaryClass = {cs.CV},
|
115 |
+
url = {https://arxiv.org/abs/2408.08872},
|
116 |
}
|
117 |
```
|
118 |
|