Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
appvoid
's Collections
symbolic
cool datasets
arco releases
cool spaces
cool datasets
updated
1 day ago
some interesting datasets to use for language modeling
Upvote
-
appvoid/raw-corpus
Viewer
•
Updated
Feb 23, 2025
•
1.6M
•
8
pszemraj/simple_wikipedia
Viewer
•
Updated
Dec 29, 2025
•
238k
•
232
•
8
common-pile/youtube
Viewer
•
Updated
Jun 6, 2025
•
1.13M
•
422
•
10
srinivasbilla/self-instruct-base
Viewer
•
Updated
Jan 24, 2023
•
82.6k
•
45
•
5
agentlans/high-quality-english-sentences
Viewer
•
Updated
Oct 1, 2024
•
1.71M
•
950
•
31
agentlans/note-taking-v2
Viewer
•
Updated
Sep 22, 2025
•
17.6k
•
80
PleIAs/SYNTH
Viewer
•
Updated
Nov 11, 2025
•
68M
•
66.1k
•
250
Upvote
-
Share collection
View history
Collection guide
Browse collections