HugoLaurencon
commited on
Commit
•
2f0f4fc
1
Parent(s):
8683d64
Update README.md
Browse files
README.md
CHANGED
@@ -124,7 +124,7 @@ The model is trained on the following data mixture of openly accessible English
|
|
124 |
|
125 |
**Wkipedia** is the multimodal equivalent of the encyclopedia. We used the English dump of Wikipedia created on February 20th, 2023.
|
126 |
|
127 |
-
**LAION** is a collection of image-text pairs collected from web pages from Common Crawl and texts are obtained using the alternative texts of each image. We deduplicated it following [this paper](https://arxiv.org/abs/2303.12733).
|
128 |
|
129 |
**PMD** is a collection of publicly-available image-text pair datasets. The dataset contains pairs from Conceptual Captions, Conceptual Captions 12M, WIT, Localized Narratives, RedCaps, COCO, SBU Captions, Visual Genome and a subset of YFCC100M dataset. Due to a server failure at the time of the pre-processing, we did not include SBU captions.
|
130 |
|
|
|
124 |
|
125 |
**Wkipedia** is the multimodal equivalent of the encyclopedia. We used the English dump of Wikipedia created on February 20th, 2023.
|
126 |
|
127 |
+
**LAION** is a collection of image-text pairs collected from web pages from Common Crawl and texts are obtained using the alternative texts of each image. We deduplicated it, following [this paper](https://arxiv.org/abs/2303.12733).
|
128 |
|
129 |
**PMD** is a collection of publicly-available image-text pair datasets. The dataset contains pairs from Conceptual Captions, Conceptual Captions 12M, WIT, Localized Narratives, RedCaps, COCO, SBU Captions, Visual Genome and a subset of YFCC100M dataset. Due to a server failure at the time of the pre-processing, we did not include SBU captions.
|
130 |
|