Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
appvoid 's Collections
symbolic
cool datasets
arco releases
cool spaces

cool datasets

updated 1 day ago

some interesting datasets to use for language modeling

Upvote
-

  • appvoid/raw-corpus

    Viewer • Updated Feb 23, 2025 • 1.6M • 8

  • pszemraj/simple_wikipedia

    Viewer • Updated Dec 29, 2025 • 238k • 232 • 8

  • common-pile/youtube

    Viewer • Updated Jun 6, 2025 • 1.13M • 422 • 10

  • srinivasbilla/self-instruct-base

    Viewer • Updated Jan 24, 2023 • 82.6k • 45 • 5

  • agentlans/high-quality-english-sentences

    Viewer • Updated Oct 1, 2024 • 1.71M • 950 • 31

  • agentlans/note-taking-v2

    Viewer • Updated Sep 22, 2025 • 17.6k • 80

  • PleIAs/SYNTH

    Viewer • Updated Nov 11, 2025 • 68M • 66.1k • 250
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs