Egor Pakhomov
egor-pakhomov
AI & ML interests
None yet
Recent Activity
updated
a dataset
about 2 hours ago
Salesforce/fineweb_deduplicated
liked
a dataset
4 months ago
Salesforce/fineweb_deduplicated
updated
a dataset
5 months ago
Salesforce/fineweb_deduplicated
Organizations
egor-pakhomov's activity
Exact copy of this dataset on HuggingFace yields "This dataset has 218 files that have been marked as unsafe."
1
#50 opened 6 months ago
by
egor-pakhomov
Simple exact deduplication removes 2/3 of data.
4
#49 opened 6 months ago
by
egor-pakhomov
Can't read with Spark
4
#1 opened 7 months ago
by
egor-pakhomov