AIJapanese
/

Moriyasu_Qwen2_JP_7B

@@ -24,7 +24,7 @@ We used the [lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluatio
 |---|---|---|---|---|---|---|---|---|---|
 |   |3-shot|3-shot|0-shot|2-shot|1-shot|1-shot|0-shot|5-shot|   |
 |   |Acc.|Balanced Acc.|Balanced Acc.|Char-F1|Char-F1|ROUGE-2|Acc.|Acc.|   |
-| Moriyasu_Qwen2_JP_7B (OURS) | **0.9491** | **0.9111** | 0.9550 | 0.8748 | 0.8924 | 0.1966 | **0.8238** | 0.5560 | **0.7699** |
 | Qwen2-7B-Instruct | 0.9080 | 0.7807 | 0.9329 | 0.9290 | 0.8334 | 0.1905 | 0.7216 | **0.6120** | 0.7385 |
 | SakanaAI/EvoLLM-JP-v1-7B | 0.8919 | 0.6602 | 0.9555 | 0.9210 | 0.8641 | **0.2331** | 0.8165 | 0.4760 | 0.7273 |
 | Llama-3-ELYZA-JP-8B |0.9240 | 0.6485 | **0.9567** | 0.9204 | 0.8743 | 0.2135 | 0.7821 | 0.4920 | 0.7264 |
@@ -42,7 +42,7 @@ The results of other models are taken from the report
 |---|---|---|---|---|---|---|---|---|---|---|---|
 |   |4-shot|4-shot|4-shot|4-shot|1-shot|4-shot|4-shot|4-shot|5-shot|0-shot|   |
 |   |EM acc|Char-F1|Char-F1|Char-F1|ROUGE-2|EM acc|BLEU|BLEU|EM acc|pass@1|   |
-| Moriyasu_Qwen2_JP_7B (OURS)| **0.9321** | 0.4823 | **0.6046** | **0.9201** | 0.1382 | 0.5560 | 0.2636 | 0.1892 | 0.5273 | 0.2976 | 0.4911 |
 | RakutenAI-7B-chat | 0.9035 | 0.2600 | 0.4619 | 0.8647 | 0.1339 | 0.2120 | 0.2667 | 0.1966 | 0.4504 | 0.2299 | 0.3980 |
 | Qwen2-7B-Instruct | 0.8856 | 0.3902 | 0.3859 | 0.8967 | 0.1277 | 0.5720 | 0.2041 | 0.1909 | 0.5713 | **0.5683** | 0.4793 |
 | Qwen2.5-7B-Instruct | 0.9151 | 0.4293 | 0.3910 | 0.8908 | 0.1676 | **0.6240** | 0.2108 | 0.1916 | **0.6252** | 0.5305 | 0.4976 |
@@ -64,7 +64,7 @@ Due to limited computational resources, we conducted evaluations on only a selec
 |Model|coding|extraction|humanities|math|reasoning|roleplay|stem|writing|JMTAvg|
 |---|---|---|---|---|---|---|---|---|---|
-| Moriyasu_Qwen2_JP_7B (OURS)       | **0.515** | 0.710 | **0.845** | **0.685** | **0.585** | **0.815** | **0.710** | **0.765** | **0.704** |
 | Llama-3-ELYZA-JP-8B               | 0.365 | **0.72** | 0.730 | 0.400 | 0.555 | 0.670 | 0.580 | 0.785 | 0.601 |
 | Llama 3.1 Swallow 8B Instruct v0.1| 0.480 | 0.680 | 0.705 | 0.475 | 0.425 | 0.710 | 0.620 | 0.645 | 0.592 |
@@ -74,7 +74,7 @@ For this benchmark, we use  [Elyza task 100](https://huggingface.co/datasets/ely
 |Model|Score|
 |---|---|
-| Moriyasu_Qwen2_JP_7B (OURS)       | 3.37 |
 | Llama-3-ELYZA-JP-8B               | **3.66** |
 | Llama 3.1 Swallow 8B Instruct v0.1| 3.32 |

 |---|---|---|---|---|---|---|---|---|---|
 |   |3-shot|3-shot|0-shot|2-shot|1-shot|1-shot|0-shot|5-shot|   |
 |   |Acc.|Balanced Acc.|Balanced Acc.|Char-F1|Char-F1|ROUGE-2|Acc.|Acc.|   |
+| Moriyasu_Qwen2_JP_7B (ours) | **0.9491** | **0.9111** | 0.9550 | 0.8748 | 0.8924 | 0.1966 | **0.8238** | 0.5560 | **0.7699** |
 | Qwen2-7B-Instruct | 0.9080 | 0.7807 | 0.9329 | 0.9290 | 0.8334 | 0.1905 | 0.7216 | **0.6120** | 0.7385 |
 | SakanaAI/EvoLLM-JP-v1-7B | 0.8919 | 0.6602 | 0.9555 | 0.9210 | 0.8641 | **0.2331** | 0.8165 | 0.4760 | 0.7273 |
 | Llama-3-ELYZA-JP-8B |0.9240 | 0.6485 | **0.9567** | 0.9204 | 0.8743 | 0.2135 | 0.7821 | 0.4920 | 0.7264 |
 |---|---|---|---|---|---|---|---|---|---|---|---|
 |   |4-shot|4-shot|4-shot|4-shot|1-shot|4-shot|4-shot|4-shot|5-shot|0-shot|   |
 |   |EM acc|Char-F1|Char-F1|Char-F1|ROUGE-2|EM acc|BLEU|BLEU|EM acc|pass@1|   |
+| Moriyasu_Qwen2_JP_7B (ours)| **0.9321** | 0.4823 | **0.6046** | **0.9201** | 0.1382 | 0.5560 | 0.2636 | 0.1892 | 0.5273 | 0.2976 | 0.4911 |
 | RakutenAI-7B-chat | 0.9035 | 0.2600 | 0.4619 | 0.8647 | 0.1339 | 0.2120 | 0.2667 | 0.1966 | 0.4504 | 0.2299 | 0.3980 |
 | Qwen2-7B-Instruct | 0.8856 | 0.3902 | 0.3859 | 0.8967 | 0.1277 | 0.5720 | 0.2041 | 0.1909 | 0.5713 | **0.5683** | 0.4793 |
 | Qwen2.5-7B-Instruct | 0.9151 | 0.4293 | 0.3910 | 0.8908 | 0.1676 | **0.6240** | 0.2108 | 0.1916 | **0.6252** | 0.5305 | 0.4976 |
 |Model|coding|extraction|humanities|math|reasoning|roleplay|stem|writing|JMTAvg|
 |---|---|---|---|---|---|---|---|---|---|
+| Moriyasu_Qwen2_JP_7B (ours)       | **0.515** | 0.710 | **0.845** | **0.685** | **0.585** | **0.815** | **0.710** | **0.765** | **0.704** |
 | Llama-3-ELYZA-JP-8B               | 0.365 | **0.72** | 0.730 | 0.400 | 0.555 | 0.670 | 0.580 | 0.785 | 0.601 |
 | Llama 3.1 Swallow 8B Instruct v0.1| 0.480 | 0.680 | 0.705 | 0.475 | 0.425 | 0.710 | 0.620 | 0.645 | 0.592 |
 |Model|Score|
 |---|---|
+| Moriyasu_Qwen2_JP_7B (ours)       | 3.37 |
 | Llama-3-ELYZA-JP-8B               | **3.66** |
 | Llama 3.1 Swallow 8B Instruct v0.1| 3.32 |