AmeyaPrabhu
commited on
Update contamination_report.csv
Browse filesAdded GSM-8k and MATH training contamination information.
Added other datasets with smaller contamination numbers
Added the information that BigBench was determined to be contaminated badly enough that they could not evaluate on it.
- contamination_report.csv +6 -2
contamination_report.csv
CHANGED
@@ -466,5 +466,9 @@ RadNLI;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.08493;8
|
|
466 |
|
467 |
openai_humaneval;;GPT-4;model;;;25.0;data-based;https://arxiv.org/abs/2303.08774;11
|
468 |
ucinlp/drop;;GPT-4;model;;21.0;;data-based;https://arxiv.org/abs/2303.08774;11
|
469 |
-
|
470 |
-
|
|
|
|
|
|
|
|
|
|
466 |
|
467 |
openai_humaneval;;GPT-4;model;;;25.0;data-based;https://arxiv.org/abs/2303.08774;11
|
468 |
ucinlp/drop;;GPT-4;model;;21.0;;data-based;https://arxiv.org/abs/2303.08774;11
|
469 |
+
bigbench;;GPT-4;model;;;100.0;data-based;https://arxiv.org/abs/2303.08774;11
|
470 |
+
gsm8k;;GPT-4;model;100.0;;1.0;data-based;https://arxiv.org/abs/2303.08774;11
|
471 |
+
EleutherAI/hendrycks_math;;GPT-4;model;100.0;;;data-based;https://arxiv.org/abs/2303.08774;11
|
472 |
+
cais/mmlu;;GPT-4;model;;;0.6;data-based;https://arxiv.org/abs/2303.08774;11
|
473 |
+
ibragim-bad/arc_challenge;;GPT-4;model;;;3.4;data-based;https://arxiv.org/abs/2303.08774;11
|
474 |
+
winogrande;;GPT-4;model;;;0.9;data-based;https://arxiv.org/abs/2303.08774;11
|