Task Performance Metrics
The following table displays the performance metrics for various tasks, including accuracy (acc
) and normalized accuracy (acc_norm
). The 'Value' column represents the accuracy, and 'Stderr' indicates the standard error for each metric.
Task | Version | Metric | Value | Stderr |
---|---|---|---|---|
arc_challenge | 0 | acc | 0.4334 | ± 0.0145 |
acc_norm | 0.4394 | ± 0.0145 | ||
---------------- | ------------- | ------------ | ----------- | ------------ |
arc_easy | 0 | acc | 0.6974 | ± 0.0094 |
acc_norm | 0.6170 | ± 0.0100 | ||
---------------- | ------------- | ------------ | ----------- | ------------ |
boolq | 1 | acc | 0.8171 | ± 0.0068 |
---------------- | ------------- | ------------ | ----------- | ------------ |
hellaswag | 0 | acc | 0.5770 | ± 0.0049 |
acc_norm | 0.7391 | ± 0.0044 | ||
---------------- | ------------- | ------------ | ----------- | ------------ |
openbookqa | 0 | acc | 0.2800 | ± 0.0201 |
acc_norm | 0.3760 | ± 0.0217 | ||
---------------- | ------------- | ------------ | ----------- | ------------ |
piqa | 0 | acc | 0.7797 | ± 0.0097 |
acc_norm | 0.7622 | ± 0.0099 | ||
---------------- | ------------- | ------------ | ----------- | ------------ |
winogrande | 0 | acc | 0.6322 | ± 0.0136 |
---------------- | ------------- | ------------ | ----------- | ------------ |
Average: 0.6261