|
## Task Performance Metrics |
|
|
|
The following table displays the performance metrics for various tasks, including accuracy (`acc`) and normalized accuracy (`acc_norm`). The 'Value' column represents the accuracy, and 'Stderr' indicates the standard error for each metric. |
|
|
|
| **Task** | **Version** | **Metric** | **Value** | **Stderr** | |
|
|----------------|-------------|------------|-----------|------------| |
|
| arc_challenge | 0 | acc | 0.4334 | ± 0.0145 | |
|
| | | acc_norm | 0.4394 | ± 0.0145 | |
|
|----------------|-------------|------------|-----------|------------| |
|
| arc_easy | 0 | acc | 0.6974 | ± 0.0094 | |
|
| | | acc_norm | 0.6170 | ± 0.0100 | |
|
|----------------|-------------|------------|-----------|------------| |
|
| boolq | 1 | acc | 0.8171 | ± 0.0068 | |
|
|----------------|-------------|------------|-----------|------------| |
|
| hellaswag | 0 | acc | 0.5770 | ± 0.0049 | |
|
| | | acc_norm | 0.7391 | ± 0.0044 | |
|
|----------------|-------------|------------|-----------|------------| |
|
| openbookqa | 0 | acc | 0.2800 | ± 0.0201 | |
|
| | | acc_norm | 0.3760 | ± 0.0217 | |
|
|----------------|-------------|------------|-----------|------------| |
|
| piqa | 0 | acc | 0.7797 | ± 0.0097 | |
|
| | | acc_norm | 0.7622 | ± 0.0099 | |
|
|----------------|-------------|------------|-----------|------------| |
|
| toxigen | 0 | acc | 0.4777 | ± 0.0163 | |
|
| | | acc_norm | 0.4340 | ± 0.0162 | |
|
|----------------|-------------|------------|-----------|------------| |
|
| winogrande | 0 | acc | 0.6322 | ± 0.0136 | |
|
|----------------|-------------|------------|-----------|------------| |
|
| gsm8k | 0 | acc | 0.0144 | ± 0.0033 | |
|
|