Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -17,3 +17,5 @@ The creation of π· FineWeb involved careful processing and filtering of large
|
|
17 |
All code and artefacts needed for reproduction are public and built on top of open source libraries, such as the π€ libraries [`datatrove`](https://github.com/huggingface/datatrove/), [`nanotron`](https://github.com/huggingface/nanotron/) or [`lighteval`](https://github.com/huggingface/lighteval/).
|
18 |
|
19 |
Version 1 of the π· FineWeb dataset is available [here](https://huggingface.co/datasets/HuggingFaceFW/fineweb). Our ablation models can be found [here](https://huggingface.co/collections/HuggingFaceFW/ablation-models-662457b0d213e8c14fe47f32).
|
|
|
|
|
|
17 |
All code and artefacts needed for reproduction are public and built on top of open source libraries, such as the π€ libraries [`datatrove`](https://github.com/huggingface/datatrove/), [`nanotron`](https://github.com/huggingface/nanotron/) or [`lighteval`](https://github.com/huggingface/lighteval/).
|
18 |
|
19 |
Version 1 of the π· FineWeb dataset is available [here](https://huggingface.co/datasets/HuggingFaceFW/fineweb). Our ablation models can be found [here](https://huggingface.co/collections/HuggingFaceFW/ablation-models-662457b0d213e8c14fe47f32).
|
20 |
+
|
21 |
+
Version 2 of the π₯ FineWeb dataset (multilingual extension to +1800 languages/script) is available [here](https://huggingface.co/datasets/HuggingFaceFW/fineweb-2).
|