rocket-3B / README.md
pansophic's picture
Create README.md
04bf1c7
|
raw
history blame
2.06 kB

Task Performance Metrics

The following table displays the performance metrics for various tasks, including accuracy (acc) and normalized accuracy (acc_norm). The 'Value' column represents the accuracy, and 'Stderr' indicates the standard error for each metric.

Task Version Metric Value Stderr
arc_challenge 0 acc 0.4334 ± 0.0145
acc_norm 0.4394 ± 0.0145
---------------- ------------- ------------ ----------- ------------
arc_easy 0 acc 0.6974 ± 0.0094
acc_norm 0.6170 ± 0.0100
---------------- ------------- ------------ ----------- ------------
boolq 1 acc 0.8171 ± 0.0068
---------------- ------------- ------------ ----------- ------------
hellaswag 0 acc 0.5770 ± 0.0049
acc_norm 0.7391 ± 0.0044
---------------- ------------- ------------ ----------- ------------
openbookqa 0 acc 0.2800 ± 0.0201
acc_norm 0.3760 ± 0.0217
---------------- ------------- ------------ ----------- ------------
piqa 0 acc 0.7797 ± 0.0097
acc_norm 0.7622 ± 0.0099
---------------- ------------- ------------ ----------- ------------
toxigen 0 acc 0.4777 ± 0.0163
acc_norm 0.4340 ± 0.0162
---------------- ------------- ------------ ----------- ------------
winogrande 0 acc 0.6322 ± 0.0136
---------------- ------------- ------------ ----------- ------------
gsm8k 0 acc 0.0144 ± 0.0033