Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
CONDA-Workshop
/
Data-Contamination-Database
like
16
Sleeping
App
Files
Files
Community
29
ad06fdc
Data-Contamination-Database
14 contributors
History:
17 commits
vishaal27
Add data from "Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus"
ad06fdc
verified
8 months ago
.gitattributes
Safe
1.52 kB
initial commit
10 months ago
.gitignore
Safe
12 Bytes
Style + gitignore
10 months ago
README.md
Safe
352 Bytes
Initital commit
10 months ago
app.py
Safe
6.23 kB
Increase tab font size
9 months ago
contamination_report.csv
Safe
34.5 kB
Add data from "Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus"
8 months ago
dataset.py
Safe
9.64 kB
Add PR links to previous commits
9 months ago
markdown.py
Safe
9.83 kB
update urls
9 months ago
requirements.txt
Safe
73 Bytes
Initital commit
10 months ago
utils.py
Safe
6.11 kB
Get token from environment
9 months ago