Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -8,6 +8,8 @@ pinned: false
|
|
8 |
---
|
9 |
|
10 |
# π€ HuggingFace π· FineWeb datasets
|
|
|
|
|
11 |
This organization hosts the π· FineWeb datasets, a collection of text datasets sourced from the web ([CommonCrawl](https://commoncrawl.org/)), released under a permissive license ([ODC-By](https://opendatacommons.org/licenses/by/1-0/)).
|
12 |
|
13 |
The creation of π· FineWeb involved careful processing and filtering of large amounts of web data with the aim of lowering the barriers to entry to anyone intending to pretrain high-performance large language models.
|
|
|
8 |
---
|
9 |
|
10 |
# π€ HuggingFace π· FineWeb datasets
|
11 |
+
_Read our [technical report](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1)!_
|
12 |
+
|
13 |
This organization hosts the π· FineWeb datasets, a collection of text datasets sourced from the web ([CommonCrawl](https://commoncrawl.org/)), released under a permissive license ([ODC-By](https://opendatacommons.org/licenses/by/1-0/)).
|
14 |
|
15 |
The creation of π· FineWeb involved careful processing and filtering of large amounts of web data with the aim of lowering the barriers to entry to anyone intending to pretrain high-performance large language models.
|