tokyotech-llm
/

Llama-3.1-Swallow-70B-Instruct-v0.3

@@ -63,19 +63,18 @@ The website [https://swallow-llm.github.io/](https://swallow-llm.github.io/) pro
 |Model|coding|extraction|humanities|math|reasoning|roleplay|stem|writing|JMTAvg|
 |---|---|---|---|---|---|---|---|---|---|
-| RakutenAI-7B-chat | 0.2475 | 0.3522 | 0.4692 | 0.2140 | 0.3926 | 0.4427 | 0.3977 | 0.4434 | 0.3699 |
-| Qwen2-7B-Instruct | 0.4635 | 0.6909 | 0.6857 | **0.5970** | 0.5042 | 0.6667 | 0.5353 | 0.6808 | 0.6030 |
-| Qwen2.5-7B-Instruct | **0.5111** | 0.7489 | 0.6913 | 0.5742 | 0.4851 | 0.6810 | 0.5350 | 0.6810 | 0.6134 |
-| Tanuki-8B-dpo-v1.0 | 0.3019 | 0.4772 | 0.5658 | 0.4129 | 0.3590 | 0.5120 | 0.4770 | 0.6159 | 0.4652 |
-| Llama 3 8B Instruct | 0.3744 | 0.6876 | 0.6225 | 0.2070 | 0.5032 | 0.5248 | 0.5326 | 0.4884 | 0.4926 |
-| Llama 3.1 8B Instruct | 0.3234 | 0.7362 | 0.4973 | 0.4787 | 0.3210 | 0.4670 | 0.4656 | 0.4314 | 0.4651 |
-| Llama 3 Youko 8B Instruct | 0.2950 | 0.7332 | 0.7125 | 0.2533 | 0.4987 | 0.6514 | 0.5438 | 0.7091 | 0.5496 |
-| Llama-3-ELYZA-JP-8B | 0.2908 | 0.6421 | 0.6406 | 0.3088 | **0.5500** | 0.6740 | 0.5251 | 0.6744 | 0.5382 |
-| Llama 3 heron brain 8B v0.3 | 0.2929 | 0.5635 | 0.6241 | 0.2135 | 0.4582 | 0.5354 | 0.5273 | 0.5099 | 0.4656 |
-| Llama 3 Swallow 8B Instruct | 0.3547 | 0.6508 | 0.5371 | 0.2718 | 0.4007 | 0.5493 | 0.4752 | 0.5730 | 0.4766 |
-| Llama 3.1 Swallow 8B Instruct v0.1| 0.3132 | **0.7734** | 0.6645 | 0.3880 | 0.5230 | 0.5711 | 0.4953 | 0.5330 | 0.5327 |
-| Llama 3.1 Swallow 8B Instruct v0.2| 0.4307 | 0.7089 | 0.6937 | 0.3881 | 0.5140 | 0.6277 | 0.5253 | 0.5787 | 0.5584 |
-| Llama 3.1 Swallow 8B Instruct v0.3 | 0.4849 | 0.6845 | **0.8180** | 0.4817 | 0.5240 | **0.7370** | **0.6473** | **0.7615** | **0.6424** |
 ### Japanese tasks

 |Model|coding|extraction|humanities|math|reasoning|roleplay|stem|writing|JMTAvg|
 |---|---|---|---|---|---|---|---|---|---|
+| Llama 3 Youko 70B Instruct | 0.6632|	0.8387|	0.8108|	0.4655|	0.7013|	0.7778|	0.7544|	0.7662|	0.7222|
+| Llama-3.1-70B-Japanese-Instruct-2407 | 0.6267|	0.7525|	0.7938|	0.5750|	0.5590|	0.7725|	0.7240|	0.7180|	0.6902|
+| Llama 3 heron brain 70B v0.3 | 0.3762| 0.7892| 0.7274|	0.5589|	0.5070|	0.6662|	0.6880|	0.6996|	0.6266|
+| Llama 3 Swallow 70B Instruct |0.5969|	0.8410|	0.7120|	0.4481|	0.4884|	0.7117|	0.6510|	0.6900|	0.6424|
+| Llama 3.1 Swallow 70B Instruct | 0.5252|	0.7846|	0.7086|	0.5063|	0.6979|	0.6888|	0.6402|	0.6653|	0.6521|
+| Llama 3.3 Swallow 70B Instruct | 0.5193|	0.7750|	0.7213|	0.5228|	0.6721|	0.7407|	0.6386|	0.7043|	0.6618|
+| Llama 3.1 Swallow 70B Instruct v0.1| 0.5676|	0.7859|	0.7490|	0.5437|	0.6383|	0.6870|	0.6121|	0.6540|	0.6547|
+| Llama 3.1 Swallow 70B Instruct v0.3 | 0.6063|	0.8052|	0.8410|	0.5591|	0.6280|	0.7774|	0.6920|	0.7832|	0.7115|
+| Qwen2-72B-Instruct |0.5699|	0.7858|	0.8222|	0.5096|	0.7032|	0.7963|	0.7728|	0.8223|	0.7228|
+| Qwen2.5-72B-Instruct |0.7060|	0.7866|	0.8122|	0.6968|	0.6536|	0.8301|	0.8060|	0.7841|	0.7594|
+| GPT-3.5 (gpt-3.5-turbo-0125) | 0.6851|0.7641|	0.7414|	0.5522|	0.5128|	0.7104|	0.6266|	0.7361|	0.6661|
+| GPT-4o (gpt-4o-2024-05-13) | 0.7296|	0.8540|	0.8646|	0.6641|	0.6661|	0.8274|	0.8184|	0.8085|	0.7791|
 ### Japanese tasks