AIJapanese commited on
Commit
00022e8
·
verified ·
1 Parent(s): 3f8b407

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -24,7 +24,7 @@ We used the [lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluatio
24
  |---|---|---|---|---|---|---|---|---|---|
25
  | |3-shot|3-shot|0-shot|2-shot|1-shot|1-shot|0-shot|5-shot| |
26
  | |Acc.|Balanced Acc.|Balanced Acc.|Char-F1|Char-F1|ROUGE-2|Acc.|Acc.| |
27
- | Moriyasu_Qwen2_JP_7B (OURS) | **0.9491** | **0.9111** | 0.9550 | 0.8748 | 0.8924 | 0.1966 | **0.8238** | 0.5560 | **0.7699** |
28
  | Qwen2-7B-Instruct | 0.9080 | 0.7807 | 0.9329 | 0.9290 | 0.8334 | 0.1905 | 0.7216 | **0.6120** | 0.7385 |
29
  | SakanaAI/EvoLLM-JP-v1-7B | 0.8919 | 0.6602 | 0.9555 | 0.9210 | 0.8641 | **0.2331** | 0.8165 | 0.4760 | 0.7273 |
30
  | Llama-3-ELYZA-JP-8B |0.9240 | 0.6485 | **0.9567** | 0.9204 | 0.8743 | 0.2135 | 0.7821 | 0.4920 | 0.7264 |
@@ -42,7 +42,7 @@ The results of other models are taken from the report
42
  |---|---|---|---|---|---|---|---|---|---|---|---|
43
  | |4-shot|4-shot|4-shot|4-shot|1-shot|4-shot|4-shot|4-shot|5-shot|0-shot| |
44
  | |EM acc|Char-F1|Char-F1|Char-F1|ROUGE-2|EM acc|BLEU|BLEU|EM acc|pass@1| |
45
- | Moriyasu_Qwen2_JP_7B (OURS)| **0.9321** | 0.4823 | **0.6046** | **0.9201** | 0.1382 | 0.5560 | 0.2636 | 0.1892 | 0.5273 | 0.2976 | 0.4911 |
46
  | RakutenAI-7B-chat | 0.9035 | 0.2600 | 0.4619 | 0.8647 | 0.1339 | 0.2120 | 0.2667 | 0.1966 | 0.4504 | 0.2299 | 0.3980 |
47
  | Qwen2-7B-Instruct | 0.8856 | 0.3902 | 0.3859 | 0.8967 | 0.1277 | 0.5720 | 0.2041 | 0.1909 | 0.5713 | **0.5683** | 0.4793 |
48
  | Qwen2.5-7B-Instruct | 0.9151 | 0.4293 | 0.3910 | 0.8908 | 0.1676 | **0.6240** | 0.2108 | 0.1916 | **0.6252** | 0.5305 | 0.4976 |
@@ -64,7 +64,7 @@ Due to limited computational resources, we conducted evaluations on only a selec
64
 
65
  |Model|coding|extraction|humanities|math|reasoning|roleplay|stem|writing|JMTAvg|
66
  |---|---|---|---|---|---|---|---|---|---|
67
- | Moriyasu_Qwen2_JP_7B (OURS) | **0.515** | 0.710 | **0.845** | **0.685** | **0.585** | **0.815** | **0.710** | **0.765** | **0.704** |
68
  | Llama-3-ELYZA-JP-8B | 0.365 | **0.72** | 0.730 | 0.400 | 0.555 | 0.670 | 0.580 | 0.785 | 0.601 |
69
  | Llama 3.1 Swallow 8B Instruct v0.1| 0.480 | 0.680 | 0.705 | 0.475 | 0.425 | 0.710 | 0.620 | 0.645 | 0.592 |
70
 
@@ -74,7 +74,7 @@ For this benchmark, we use [Elyza task 100](https://huggingface.co/datasets/ely
74
 
75
  |Model|Score|
76
  |---|---|
77
- | Moriyasu_Qwen2_JP_7B (OURS) | 3.37 |
78
  | Llama-3-ELYZA-JP-8B | **3.66** |
79
  | Llama 3.1 Swallow 8B Instruct v0.1| 3.32 |
80
 
 
24
  |---|---|---|---|---|---|---|---|---|---|
25
  | |3-shot|3-shot|0-shot|2-shot|1-shot|1-shot|0-shot|5-shot| |
26
  | |Acc.|Balanced Acc.|Balanced Acc.|Char-F1|Char-F1|ROUGE-2|Acc.|Acc.| |
27
+ | Moriyasu_Qwen2_JP_7B (ours) | **0.9491** | **0.9111** | 0.9550 | 0.8748 | 0.8924 | 0.1966 | **0.8238** | 0.5560 | **0.7699** |
28
  | Qwen2-7B-Instruct | 0.9080 | 0.7807 | 0.9329 | 0.9290 | 0.8334 | 0.1905 | 0.7216 | **0.6120** | 0.7385 |
29
  | SakanaAI/EvoLLM-JP-v1-7B | 0.8919 | 0.6602 | 0.9555 | 0.9210 | 0.8641 | **0.2331** | 0.8165 | 0.4760 | 0.7273 |
30
  | Llama-3-ELYZA-JP-8B |0.9240 | 0.6485 | **0.9567** | 0.9204 | 0.8743 | 0.2135 | 0.7821 | 0.4920 | 0.7264 |
 
42
  |---|---|---|---|---|---|---|---|---|---|---|---|
43
  | |4-shot|4-shot|4-shot|4-shot|1-shot|4-shot|4-shot|4-shot|5-shot|0-shot| |
44
  | |EM acc|Char-F1|Char-F1|Char-F1|ROUGE-2|EM acc|BLEU|BLEU|EM acc|pass@1| |
45
+ | Moriyasu_Qwen2_JP_7B (ours)| **0.9321** | 0.4823 | **0.6046** | **0.9201** | 0.1382 | 0.5560 | 0.2636 | 0.1892 | 0.5273 | 0.2976 | 0.4911 |
46
  | RakutenAI-7B-chat | 0.9035 | 0.2600 | 0.4619 | 0.8647 | 0.1339 | 0.2120 | 0.2667 | 0.1966 | 0.4504 | 0.2299 | 0.3980 |
47
  | Qwen2-7B-Instruct | 0.8856 | 0.3902 | 0.3859 | 0.8967 | 0.1277 | 0.5720 | 0.2041 | 0.1909 | 0.5713 | **0.5683** | 0.4793 |
48
  | Qwen2.5-7B-Instruct | 0.9151 | 0.4293 | 0.3910 | 0.8908 | 0.1676 | **0.6240** | 0.2108 | 0.1916 | **0.6252** | 0.5305 | 0.4976 |
 
64
 
65
  |Model|coding|extraction|humanities|math|reasoning|roleplay|stem|writing|JMTAvg|
66
  |---|---|---|---|---|---|---|---|---|---|
67
+ | Moriyasu_Qwen2_JP_7B (ours) | **0.515** | 0.710 | **0.845** | **0.685** | **0.585** | **0.815** | **0.710** | **0.765** | **0.704** |
68
  | Llama-3-ELYZA-JP-8B | 0.365 | **0.72** | 0.730 | 0.400 | 0.555 | 0.670 | 0.580 | 0.785 | 0.601 |
69
  | Llama 3.1 Swallow 8B Instruct v0.1| 0.480 | 0.680 | 0.705 | 0.475 | 0.425 | 0.710 | 0.620 | 0.645 | 0.592 |
70
 
 
74
 
75
  |Model|Score|
76
  |---|---|
77
+ | Moriyasu_Qwen2_JP_7B (ours) | 3.37 |
78
  | Llama-3-ELYZA-JP-8B | **3.66** |
79
  | Llama 3.1 Swallow 8B Instruct v0.1| 3.32 |
80