Lists of URLs from various training datasets
Nick Hagar
nhagar
AI & ML interests
digital media, collective attention, computational social science
Recent Activity
updated
a dataset
25 minutes ago
nhagar/zyda-2_urls_fwe3
published
a dataset
about 7 hours ago
nhagar/zyda-2_urls_fwe3
updated
a dataset
about 9 hours ago
nhagar/dclm-dedup_urls
Organizations
models
None public yet
datasets
200
nhagar/zyda-2_urls_fwe3
Viewer
•
Updated
•
677M
nhagar/dclm-dedup_urls
Viewer
•
Updated
•
615M
nhagar/zyda-2_urls_dclm_crossdeduped
Viewer
•
Updated
•
21.6M
•
6
nhagar/zyda_urls
Viewer
•
Updated
•
1.17B
•
190
nhagar/dclm-baseline-1.0-parquet_urls
Viewer
•
Updated
•
195M
•
187
nhagar/falcon-refinedweb_urls
Viewer
•
Updated
•
968M
•
607
nhagar/CC-MAIN-2015-06_nyt_urls
Viewer
•
Updated
•
756k
•
42
nhagar/CC-MAIN-2018-05_nyt_urls
Viewer
•
Updated
•
517k
•
38
nhagar/CC-MAIN-2015-48_nyt_urls
Viewer
•
Updated
•
778k
•
42
nhagar/CC-MAIN-2015-14_nyt_urls
Viewer
•
Updated
•
531k
•
40