Syed Hasan commited on
Commit
da126e2
1 Parent(s): 07f39fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -61,7 +61,31 @@ Average: 75.9% without mmlu
61
  |truthfulqa_mc| 1|mc1 |62.79|± | 1.69|
62
  | | |mc2 |77.90|± | 1.37|
63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
 
65
  ### Training hyperparameters
66
 
67
  The following hyperparameters were used during training:
 
61
  |truthfulqa_mc| 1|mc1 |62.79|± | 1.69|
62
  | | |mc2 |77.90|± | 1.37|
63
 
64
+ ### BigBench Reasoning Test
65
+ | Task | Version | Metric | Value | | Stderr|
66
+ |------------------------------------------------|---------|-----------------------|--------------|---|-------|
67
+ | bigbench_causal_judgement | 0| multiple_choice_grade | 0.6000 | _ | 0.0356 |
68
+ | bigbench_date_understanding | 0| multiple_choice_grade | 0.620596205962| _ | 0.0253 |
69
+ | bigbench_disambiguation_qa | 0| multiple_choice_grade | 0.542635658915| _ | 0.0311 |
70
+ | bigbench_geometric_shapes | 0| multiple_choice_grade | 0.239554317549| _ | 0.0226 |
71
+ | ... | | exact_str_match | | | |
72
+ | bigbench_geometric_shapes | 0| exact_str_match | 0.0000 | _ | 0.0000 |
73
+ | bigbench_logical_deduction_five_objects | 0| multiple_choice_grade | 0.3280 | _ | 0.0210 |
74
+ | bigbench_logical_deduction_seven_objects | 0| multiple_choice_grade | 0.238571428571| _ | 0.0161 |
75
+ | bigbench_logical_deduction_three_objects | 0| multiple_choice_grade | 0.593333333333| _ | 0.0284 |
76
+ | bigbench_movie_recommendation | 0| multiple_choice_grade | 0.5800 | _ | 0.0221 |
77
+ | bigbench_navigate | 0| multiple_choice_grade | 0.5600 | _ | 0.0157 |
78
+ | bigbench_reasoning_about_colored_objects | 0| multiple_choice_grade | 0.6920 | _ | 0.0103 |
79
+ | bigbench_ruin_names | 0| multiple_choice_grade | 0.553571428571| _ | 0.0235 |
80
+ | bigbench_salient_translation_error_detection | 0| multiple_choice_grade | 0.414829659319| _ | 0.0156 |
81
+ | bigbench_snarks | 0| multiple_choice_grade | 0.734806629834| _ | 0.0329 |
82
+ | bigbench_sports_understanding | 0| multiple_choice_grade | 0.760649087221| _ | 0.0136 |
83
+ | bigbench_temporal_sequences | 0| multiple_choice_grade | 0.5550 | _ | 0.0157 |
84
+ | bigbench_tracking_shuffled_objects_five_objects| 0| multiple_choice_grade | 0.2328 | _ | 0.0120 |
85
+ | bigbench_tracking_shuffled_objects_seven_objects| 0| multiple_choice_grade | 0.193714285714| _ | 0.0094 |
86
+ | bigbench_tracking_shuffled_objects_three_objects| 0| multiple_choice_grade | 0.593333333333| _ | 0.0284 |
87
 
88
+ Average: 49.08%
89
  ### Training hyperparameters
90
 
91
  The following hyperparameters were used during training: