Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
4
1
Egor Pakhomov
egor-pakhomov
Follow
21world's profile picture
1 follower
·
1 following
AI & ML interests
None yet
Recent Activity
updated
a dataset
about 5 hours ago
Salesforce/fineweb_deduplicated
liked
a dataset
4 months ago
Salesforce/fineweb_deduplicated
new
activity
6 months ago
airtrain-ai/fineweb-edu-fortified:
Deduped version of fineweb on HuggingFace yields "This dataset has 218 files that have been marked as unsafe."
View all activity
Organizations
egor-pakhomov
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
updated
a dataset
about 5 hours ago
Salesforce/fineweb_deduplicated
Viewer
•
Updated
about 5 hours ago
•
6.43B
•
146
•
31
liked
a dataset
4 months ago
Salesforce/fineweb_deduplicated
Viewer
•
Updated
about 5 hours ago
•
6.43B
•
146
•
31
New activity in
airtrain-ai/fineweb-edu-fortified
6 months ago
Deduped version of fineweb on HuggingFace yields "This dataset has 218 files that have been marked as unsafe."
1
#103 opened 6 months ago by
egor-pakhomov
New activity in
HuggingFaceFW/fineweb
6 months ago
Exact copy of this dataset on HuggingFace yields "This dataset has 218 files that have been marked as unsafe."
1
#50 opened 6 months ago by
egor-pakhomov
Simple exact deduplication removes 2/3 of data.
4
#49 opened 6 months ago by
egor-pakhomov
New activity in
mlfoundations/dclm-baseline-1.0
7 months ago
Can't read with Spark
4
#1 opened 7 months ago by
egor-pakhomov