AIJapanese commited on
Commit
1781c0a
·
verified ·
1 Parent(s): 6dc4fa5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -65,4 +65,14 @@ The results of other models are taken from the report
65
  | Llama 3.1 Swallow 8B Instruct v0.2| 0.9294 | 0.5601 | 0.5988 | 0.9148 | 0.1372 | 0.5280 | **0.2878** | 0.2270 | 0.5504 | 0.4079 | **0.5141** |
66
  | Moriyasu_Qwen2_JP_7B (OURS)| **0.9321** | 0.4823 | **0.6046** | **0.9201** | 0.1382 | 0.5560 | 0.2636 | 0.1892 | 0.5273 | 0.2976 | 0.4911 |
67
 
 
68
 
 
 
 
 
 
 
 
 
 
 
65
  | Llama 3.1 Swallow 8B Instruct v0.2| 0.9294 | 0.5601 | 0.5988 | 0.9148 | 0.1372 | 0.5280 | **0.2878** | 0.2270 | 0.5504 | 0.4079 | **0.5141** |
66
  | Moriyasu_Qwen2_JP_7B (OURS)| **0.9321** | 0.4823 | **0.6046** | **0.9201** | 0.1382 | 0.5560 | 0.2636 | 0.1892 | 0.5273 | 0.2976 | 0.4911 |
67
 
68
+ ### Japanese MTBench
69
 
70
+ For this evaluation, we use [FastChat](https://github.com/Stability-AI/FastChat/tree/jp-stable) and **gpt-4o-2024-08-06** for judgement and reference answer.
71
+
72
+ Due to limited computational resources, we conducted evaluations on only a select number of models.
73
+
74
+ |Model|coding|extraction|humanities|math|reasoning|roleplay|stem|writing|JMTAvg|
75
+ |---|---|---|---|---|---|---|---|---|---|
76
+ | Moriyasu_Qwen2_JP_7B (OURS) | **5.15** | 7.10 | **8.45** | **6.85** | **5.85** | **8.15** | **7.10** | **7.65** | **7.04** |
77
+ | Llama-3-ELYZA-JP-8B | 3.65 | **7.2** | 7.3 | 4.00 | 5.55 | 6.70 | 5.80 | 7.85 | 6.01 |
78
+ | Llama 3.1 Swallow 8B Instruct v0.1| 4.80 | 6.80 | 7.05 | 4.75 | 4.25 | 7.10 | 6.20 | 6.45 | 5.92 |