AIJapanese
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -65,4 +65,14 @@ The results of other models are taken from the report
|
|
65 |
| Llama 3.1 Swallow 8B Instruct v0.2| 0.9294 | 0.5601 | 0.5988 | 0.9148 | 0.1372 | 0.5280 | **0.2878** | 0.2270 | 0.5504 | 0.4079 | **0.5141** |
|
66 |
| Moriyasu_Qwen2_JP_7B (OURS)| **0.9321** | 0.4823 | **0.6046** | **0.9201** | 0.1382 | 0.5560 | 0.2636 | 0.1892 | 0.5273 | 0.2976 | 0.4911 |
|
67 |
|
|
|
68 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
| Llama 3.1 Swallow 8B Instruct v0.2| 0.9294 | 0.5601 | 0.5988 | 0.9148 | 0.1372 | 0.5280 | **0.2878** | 0.2270 | 0.5504 | 0.4079 | **0.5141** |
|
66 |
| Moriyasu_Qwen2_JP_7B (OURS)| **0.9321** | 0.4823 | **0.6046** | **0.9201** | 0.1382 | 0.5560 | 0.2636 | 0.1892 | 0.5273 | 0.2976 | 0.4911 |
|
67 |
|
68 |
+
### Japanese MTBench
|
69 |
|
70 |
+
For this evaluation, we use [FastChat](https://github.com/Stability-AI/FastChat/tree/jp-stable) and **gpt-4o-2024-08-06** for judgement and reference answer.
|
71 |
+
|
72 |
+
Due to limited computational resources, we conducted evaluations on only a select number of models.
|
73 |
+
|
74 |
+
|Model|coding|extraction|humanities|math|reasoning|roleplay|stem|writing|JMTAvg|
|
75 |
+
|---|---|---|---|---|---|---|---|---|---|
|
76 |
+
| Moriyasu_Qwen2_JP_7B (OURS) | **5.15** | 7.10 | **8.45** | **6.85** | **5.85** | **8.15** | **7.10** | **7.65** | **7.04** |
|
77 |
+
| Llama-3-ELYZA-JP-8B | 3.65 | **7.2** | 7.3 | 4.00 | 5.55 | 6.70 | 5.80 | 7.85 | 6.01 |
|
78 |
+
| Llama 3.1 Swallow 8B Instruct v0.1| 4.80 | 6.80 | 7.05 | 4.75 | 4.25 | 7.10 | 6.20 | 6.45 | 5.92 |
|