tokyotech-llm
/

Llama-3.1-Swallow-70B-Instruct-v0.3

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

kazukifujii commited on 16 days ago

Commit

fdb58fd

·

verified ·

1 Parent(s): 8d22ee2

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -100,7 +100,16 @@ The website [https://swallow-llm.github.io/](https://swallow-llm.github.io/) pro
 |---|---|---|---|---|---|---|---|---|---|---|
 |   |4-shot|4-shot|4-shot|4-shot|4-shot|5-shot|4-shot|3-shot|0-shot|   |
 |   |Acc|EM acc|Acc|EM acc|Acc|Acc|EM acc|CoT EM Acc|pass@1|   |
-| Llama 3 Youko 70B Instruct |  |  |  |  |  |  |  | |  |  |
 ## Evaluation Benchmarks

 |---|---|---|---|---|---|---|---|---|---|---|
 |   |4-shot|4-shot|4-shot|4-shot|4-shot|5-shot|4-shot|3-shot|0-shot|   |
 |   |Acc|EM acc|Acc|EM acc|Acc|Acc|EM acc|CoT EM Acc|pass@1|   |
+| Llama 3 Youko 70B Instruct | 0.4500|	0.7973|	0.6863|	0.3914|	0.9153|	0.8055|	0.8923|	0.7814|	0.6598|	0.7088|
+| Llama-3.1-70B-Japanese-Instruct-2407| 0.4220|	0.8104|	0.6481|	0.3744|	0.9170|	0.8071|	0.8893|	0.8228|	0.7463|	0.7153|
+| Llama 3 heron brain 70B v0.3| 0.4460	|0.8107	|0.6682|	0.4085|	0.9174|	0.7898|	0.8772|	0.7586|	0.6713|	0.7053|
+| Llama 3 70B Instruct |0.4400|	0.7999|	0.6552|	0.4024|	0.9127|	0.7992|	0.9052|	0.8326|	0.7555|	0.7225|
+| Llama 3.1 70B Instruct |0.4300|	0.8212|	0.6621|	0.3921|	0.9157|	0.8213|	0.8764|	0.8390|	0.7915|	0.7277|
+| Llama 3.3 70B Instruct |0.4260|	0.8172|	0.6674|	0.3933|	0.9174|	0.8240|	0.8901|	0.8529|	0.8341|	**0.7358**|
+| Llama 3.1 Swallow 70B Instruct v0.1 |0.4520|	0.8148|	0.6834|	0.4012|	0.9157|	0.7855|	0.8886|	0.8486|	0.5823|	0.7080|
+| **Llama 3.1 Swallow 70B Instruct v0.3** |0.4540|	0.8245|	0.6915|	0.4082|	0.9187|	0.7770|	0.8726|	0.8148|	0.6378|	0.7110|
+| Qwen2-72B-Instruct |0.4360|	0.7588|	0.6857|	0.3913|	0.9110|	0.8391|	0.8499|	0.2436|	0.6939|	0.6455|
+| Qwen2.5-72B-Instruct |0.4540|	0.6764|	0.7064|	0.3550|	0.8895|	0.8478|	0.9113|	0.4027|	0.6165|	0.6511|
 ## Evaluation Benchmarks