Syed Hasan
commited on
Commit
•
da126e2
1
Parent(s):
07f39fb
Update README.md
Browse files
README.md
CHANGED
@@ -61,7 +61,31 @@ Average: 75.9% without mmlu
|
|
61 |
|truthfulqa_mc| 1|mc1 |62.79|± | 1.69|
|
62 |
| | |mc2 |77.90|± | 1.37|
|
63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
|
|
|
65 |
### Training hyperparameters
|
66 |
|
67 |
The following hyperparameters were used during training:
|
|
|
61 |
|truthfulqa_mc| 1|mc1 |62.79|± | 1.69|
|
62 |
| | |mc2 |77.90|± | 1.37|
|
63 |
|
64 |
+
### BigBench Reasoning Test
|
65 |
+
| Task | Version | Metric | Value | | Stderr|
|
66 |
+
|------------------------------------------------|---------|-----------------------|--------------|---|-------|
|
67 |
+
| bigbench_causal_judgement | 0| multiple_choice_grade | 0.6000 | _ | 0.0356 |
|
68 |
+
| bigbench_date_understanding | 0| multiple_choice_grade | 0.620596205962| _ | 0.0253 |
|
69 |
+
| bigbench_disambiguation_qa | 0| multiple_choice_grade | 0.542635658915| _ | 0.0311 |
|
70 |
+
| bigbench_geometric_shapes | 0| multiple_choice_grade | 0.239554317549| _ | 0.0226 |
|
71 |
+
| ... | | exact_str_match | | | |
|
72 |
+
| bigbench_geometric_shapes | 0| exact_str_match | 0.0000 | _ | 0.0000 |
|
73 |
+
| bigbench_logical_deduction_five_objects | 0| multiple_choice_grade | 0.3280 | _ | 0.0210 |
|
74 |
+
| bigbench_logical_deduction_seven_objects | 0| multiple_choice_grade | 0.238571428571| _ | 0.0161 |
|
75 |
+
| bigbench_logical_deduction_three_objects | 0| multiple_choice_grade | 0.593333333333| _ | 0.0284 |
|
76 |
+
| bigbench_movie_recommendation | 0| multiple_choice_grade | 0.5800 | _ | 0.0221 |
|
77 |
+
| bigbench_navigate | 0| multiple_choice_grade | 0.5600 | _ | 0.0157 |
|
78 |
+
| bigbench_reasoning_about_colored_objects | 0| multiple_choice_grade | 0.6920 | _ | 0.0103 |
|
79 |
+
| bigbench_ruin_names | 0| multiple_choice_grade | 0.553571428571| _ | 0.0235 |
|
80 |
+
| bigbench_salient_translation_error_detection | 0| multiple_choice_grade | 0.414829659319| _ | 0.0156 |
|
81 |
+
| bigbench_snarks | 0| multiple_choice_grade | 0.734806629834| _ | 0.0329 |
|
82 |
+
| bigbench_sports_understanding | 0| multiple_choice_grade | 0.760649087221| _ | 0.0136 |
|
83 |
+
| bigbench_temporal_sequences | 0| multiple_choice_grade | 0.5550 | _ | 0.0157 |
|
84 |
+
| bigbench_tracking_shuffled_objects_five_objects| 0| multiple_choice_grade | 0.2328 | _ | 0.0120 |
|
85 |
+
| bigbench_tracking_shuffled_objects_seven_objects| 0| multiple_choice_grade | 0.193714285714| _ | 0.0094 |
|
86 |
+
| bigbench_tracking_shuffled_objects_three_objects| 0| multiple_choice_grade | 0.593333333333| _ | 0.0284 |
|
87 |
|
88 |
+
Average: 49.08%
|
89 |
### Training hyperparameters
|
90 |
|
91 |
The following hyperparameters were used during training:
|