HugoLaurencon commited on
Commit
a8977b9
·
1 Parent(s): fa62bff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -115,7 +115,7 @@ The model is trained on the following data mixture of openly accessible English
115
 
116
  | Data Source | Type of Data | Number of Tokens in Source | Number of Images in Source | Epochs | Effective Proportion in Number of Tokens |
117
  |-------------|-----------------------------------------|---------------------------|---------------------------|--------|-----------------------------------------|
118
- | [OBELISC](https://huggingface.co/datasets/HuggingFaceM4/OBELISC) | Unstructured Multimodal Web Documents | 114.906B | TODO | 1 | 73.85% |
119
  | [Wikipedia](https://huggingface.co/datasets/wikipedia) | Unstructured Multimodal Web Documents | 3.192B | TODO | 3 | 6.15% |
120
  | [LAION](https://huggingface.co/datasets/laion/laion2B-en) | Image-Text Pairs | 29.920B | TODO | 1 | 17.18%
121
  | [PMD](https://huggingface.co/datasets/facebook/pmd) | Image-Text Pairs | 1.636B | TODO | 3 | 2.82% | |
 
115
 
116
  | Data Source | Type of Data | Number of Tokens in Source | Number of Images in Source | Epochs | Effective Proportion in Number of Tokens |
117
  |-------------|-----------------------------------------|---------------------------|---------------------------|--------|-----------------------------------------|
118
+ | [OBELISC](https://huggingface.co/datasets/HuggingFaceM4/OBELISC) | Unstructured Multimodal Web Documents | 114.906B | 353M | 1 | 73.85% |
119
  | [Wikipedia](https://huggingface.co/datasets/wikipedia) | Unstructured Multimodal Web Documents | 3.192B | TODO | 3 | 6.15% |
120
  | [LAION](https://huggingface.co/datasets/laion/laion2B-en) | Image-Text Pairs | 29.920B | TODO | 1 | 17.18%
121
  | [PMD](https://huggingface.co/datasets/facebook/pmd) | Image-Text Pairs | 1.636B | TODO | 3 | 2.82% | |