Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
81
10
15
Guilherme Penedo
guipenedo
Follow
asgeyehospitals's profile picture
hameedullahkhan's profile picture
knd1979's profile picture
708 followers
·
6 following
gui_penedo
guipenedo
AI & ML interests
None yet
Articles
FineWeb2-C: Help Build Better Language Models in Your Language
2 days ago
•
10
Organizations
guipenedo
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
New activity in
HuggingFaceFW/fineweb
7 days ago
Simple exact deduplication removes 2/3 of data.
4
#49 opened 5 months ago by
egor-pakhomov
Torrent?
3
#4 opened 8 months ago by
emilss
Any plan to train models on larger subset of dataset?
1
#8 opened 8 months ago by
mrfakename
Are copyrighted works included in this dataset?
4
#9 opened 8 months ago by
umm-maybe
Reprocessing for a new language
14
#12 opened 8 months ago by
pere
Training configs for data ablation study
2
#14 opened 8 months ago by
jimmyhbx
tiny-fineweb
3
#19 opened 8 months ago by
3thn
Unsafe files
1
#25 opened 8 months ago by
alielfilali01
"Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20" using fineweb by Karpathy
#28 opened 7 months ago by
clem
Regarding to the newly updated indexes(writen as deduplication issues)
5
#29 opened 7 months ago by
kimcando
Dedup
1
#32 opened 7 months ago by
shawnkx
Language subset
3
#33 opened 7 months ago by
talmor
How to compute the aggerate score?
1
#35 opened 7 months ago by
mornmirror
why do you apply "All filters except the (very destructive) terminal_punct"
3
#36 opened 7 months ago by
bpwl0121
Reproducibility of the work for other languages
3
#38 opened 6 months ago by
camillop
Fineweb train configuration
3
#39 opened 6 months ago by
nezhazheng
Casting Issue?
4
#40 opened 6 months ago by
FelixLabelle
Any plans to release warc content after the language filtering steps?
2
#41 opened 6 months ago by
Splend1dchan
Is there an official test set for benchmarking objectively?
2
#42 opened 6 months ago by
SophieOstmeier
Fineweb Download Size Discrepency
1
#43 opened 6 months ago by
msmmpts
Load more