HuggingFaceM4
/

idefics-80b

Text Generation

image-text-to-text

text-generation-inference

Model card Files Files and versions Community

Leyo commited on Jul 11, 2023

Commit

fa62bff

·

1 Parent(s): 73fda5b

fix source num tokens

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -117,8 +117,8 @@ The model is trained on the following data mixture of openly accessible English
 |-------------|-----------------------------------------|---------------------------|---------------------------|--------|-----------------------------------------|
 | [OBELISC](https://huggingface.co/datasets/HuggingFaceM4/OBELISC)     | Unstructured Multimodal Web Documents    | 114.906B                      | TODO                      | 1      | 73.85%                                  |
 | [Wikipedia](https://huggingface.co/datasets/wikipedia)   | Unstructured Multimodal Web Documents    | 3.192B                     | TODO                      | 3      | 6.15%                                  |
-| [LAION](https://huggingface.co/datasets/laion/laion2B-en)       | Image-Text Pairs                         | 1.636B                      | TODO                      | 1      | 17.18%
-| [PMD](https://huggingface.co/datasets/facebook/pmd)         | Image-Text Pairs                         | 29.920B                      | TODO                      | 3      | 2.82%                                   |                                |
 **OBELISC** is an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images. An interactive visualization of the dataset content is available [here](TODO).

 |-------------|-----------------------------------------|---------------------------|---------------------------|--------|-----------------------------------------|
 | [OBELISC](https://huggingface.co/datasets/HuggingFaceM4/OBELISC)     | Unstructured Multimodal Web Documents    | 114.906B                      | TODO                      | 1      | 73.85%                                  |
 | [Wikipedia](https://huggingface.co/datasets/wikipedia)   | Unstructured Multimodal Web Documents    | 3.192B                     | TODO                      | 3      | 6.15%                                  |
+| [LAION](https://huggingface.co/datasets/laion/laion2B-en)       | Image-Text Pairs                         | 29.920B                      | TODO                      | 1      | 17.18%
+| [PMD](https://huggingface.co/datasets/facebook/pmd)         | Image-Text Pairs                         | 1.636B                      | TODO                      | 3      | 2.82%                                   |                                |
 **OBELISC** is an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images. An interactive visualization of the dataset content is available [here](TODO).