|---|---|---| | 수학(Math) | 5.86 | 5.14 | | 문법(Grammar) | 4.71 | 1.29 | | 이해(Understanding) | 4.00 | 4.43 | | 추론(Reasoning) | 5.14 | 6.71 | | 코딩(Coding) | 7.43 | 7.57 | | 글쓰기(Writing) | 8.43 | 8.00 | | Category | Score | |---|---| | Single turn | 5.93 | | Multi turn | 5.52 | | Overall | 5.73 | | Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr| |--------|------:|----------------|-----:|-----------------------|---|-----:|---|------| |gsm8k | 3|flexible-extract| 5|exact_match |↑ |0.7013|± |0.0126| | | |strict-match | 5|exact_match |↑ |0.2418|± |0.0118| |gsm8k-ko| 1|flexible-extract| 5|exact_match |↑ |0.4466|± |0.0137| | | |strict-match | 5|exact_match |↑ |0.4420|± |0.0137| |ifeval | 4|none | 0|inst_level_loose_acc |↑ |0.8549|± | N/A| | | |none | 0|inst_level_strict_acc |↑ |0.8225|± | N/A| | | |none | 0|prompt_level_loose_acc |↑ |0.7874|± |0.0176| | | |none | 0|prompt_level_strict_acc|↑ |0.7468|± |0.0187|