Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
sinagph
's Collections
Pretrain Basic
Pretrain Basic
updated
Oct 7, 2023
Upvote
-
Skylion007/openwebtext
Viewer
•
Updated
Dec 26, 2025
•
8.01M
•
70.5k
•
517
legacy-datasets/c4
Updated
Mar 5, 2024
•
12.8k
•
242
legacy-datasets/wikipedia
Updated
Mar 11, 2024
•
103k
•
638
tiiuae/falcon-refinedweb
Viewer
•
Updated
Jun 20, 2023
•
968M
•
16.9k
•
923
bookcorpus/bookcorpus
Updated
May 3, 2024
•
21.1k
•
354
EleutherAI/the_pile_deduplicated
Viewer
•
Updated
Dec 2, 2022
•
134M
•
27.2k
•
113
Upvote
-
Share collection
View history
Collection guide
Browse collections