Upload README.md
Browse files
README.md
CHANGED
|
@@ -34,8 +34,7 @@ We released the 7B and 70B models without vocabulary expansion on January 26th,
|
|
| 34 |

|
| 35 |
|
| 36 |
This repository provides large language models developed by [TokyoTech-LLM](https://tokyotech-llm.github.io/).
|
| 37 |
-
Read our [blog post](https://zenn.dev/tokyotech_lm/articles/d6cb3a8fdfc907) or our paper
|
| 38 |
-
|
| 39 |
|
| 40 |
## Model Details
|
| 41 |
|
|
@@ -47,7 +46,7 @@ Read our [blog post](https://zenn.dev/tokyotech_lm/articles/d6cb3a8fdfc907) or o
|
|
| 47 |
|
| 48 |
## Base Model Performance
|
| 49 |
|
| 50 |
-
### Japanese
|
| 51 |
|
| 52 |
|Model|Size|JCommonsenseQA|JEMHopQA|NIILC|JSQuAD|XL-Sum|MGSM|WMT20-en-ja|WMT20-ja-en|
|
| 53 |
|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -62,7 +61,7 @@ Read our [blog post](https://zenn.dev/tokyotech_lm/articles/d6cb3a8fdfc907) or o
|
|
| 62 |
| Llama 2 | 70B | 0.8686 | 0.4656 | 0.5256 | 0.9080 | 0.2361 | 0.3560 | 0.2643 | **0.2398** |
|
| 63 |
| Swallow | 70B | 0.9348 | **0.6290** | 0.6960 | 0.9176 | 0.2266 | **0.4840** | **0.3043** | 0.2298 |
|
| 64 |
| Swallow-NVE | 70B | **0.9410** | 0.5759 | **0.7024** | **0.9254** | **0.2758** | 0.4720 | 0.3042 | 0.2322 |
|
| 65 |
-
### English
|
| 66 |
|
| 67 |
|Model|Size|OpenBookQA|TriviaQA|HellaSwag|SQuAD2.0|XWINO|GSM8K|
|
| 68 |
|---|---|---|---|---|---|---|---|
|
|
@@ -78,6 +77,33 @@ Read our [blog post](https://zenn.dev/tokyotech_lm/articles/d6cb3a8fdfc907) or o
|
|
| 78 |
| Swallow | 70B | 0.4220 | 0.7756 | 0.6458 | 0.3745 | 0.9204 | 0.4867 |
|
| 79 |
| Swallow-NVE | 70B | 0.4240 | 0.7817 | 0.6439 | 0.3451 | 0.9256 | 0.4943 |
|
| 80 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
## Usage
|
| 82 |
|
| 83 |
First install additional dependencies in [requirements.txt](./requirements.txt):
|
|
|
|
| 34 |

|
| 35 |
|
| 36 |
This repository provides large language models developed by [TokyoTech-LLM](https://tokyotech-llm.github.io/).
|
| 37 |
+
Read our [blog post](https://zenn.dev/tokyotech_lm/articles/d6cb3a8fdfc907) or our [paper](https://www.anlp.jp/proceedings/annual_meeting/2024/pdf_dir/A8-5.pdf)
|
|
|
|
| 38 |
|
| 39 |
## Model Details
|
| 40 |
|
|
|
|
| 46 |
|
| 47 |
## Base Model Performance
|
| 48 |
|
| 49 |
+
### Japanese tasks
|
| 50 |
|
| 51 |
|Model|Size|JCommonsenseQA|JEMHopQA|NIILC|JSQuAD|XL-Sum|MGSM|WMT20-en-ja|WMT20-ja-en|
|
| 52 |
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| 61 |
| Llama 2 | 70B | 0.8686 | 0.4656 | 0.5256 | 0.9080 | 0.2361 | 0.3560 | 0.2643 | **0.2398** |
|
| 62 |
| Swallow | 70B | 0.9348 | **0.6290** | 0.6960 | 0.9176 | 0.2266 | **0.4840** | **0.3043** | 0.2298 |
|
| 63 |
| Swallow-NVE | 70B | **0.9410** | 0.5759 | **0.7024** | **0.9254** | **0.2758** | 0.4720 | 0.3042 | 0.2322 |
|
| 64 |
+
### English tasks
|
| 65 |
|
| 66 |
|Model|Size|OpenBookQA|TriviaQA|HellaSwag|SQuAD2.0|XWINO|GSM8K|
|
| 67 |
|---|---|---|---|---|---|---|---|
|
|
|
|
| 77 |
| Swallow | 70B | 0.4220 | 0.7756 | 0.6458 | 0.3745 | 0.9204 | 0.4867 |
|
| 78 |
| Swallow-NVE | 70B | 0.4240 | 0.7817 | 0.6439 | 0.3451 | 0.9256 | 0.4943 |
|
| 79 |
|
| 80 |
+
## Evaluation Benchmarks
|
| 81 |
+
|
| 82 |
+
### Japanese evaluation benchmarks
|
| 83 |
+
|
| 84 |
+
We used llm-jp-eval(v1.0.0) and JP Language Model Evaluation Harness(commit #9b42d41). The details are as follows:
|
| 85 |
+
|
| 86 |
+
- Multiple-choice question answering (JCommonsenseQA [Kurihara+, 2022])
|
| 87 |
+
- Open-ended question answering (JEMHopQA [Ishii+, 2023])
|
| 88 |
+
- Open-ended question answering (NIILC [Sekine, 2003])
|
| 89 |
+
- Machine reading comprehension (JSQuAD [Kurihara+, 2022])
|
| 90 |
+
- Automatic summarization (XL-Sum [Hasan+, 2021])
|
| 91 |
+
- Machine translation (WMT2020 ja-en [Barrault+, 2020])
|
| 92 |
+
- Machine translation (WMT2020 en-ja [Barrault+, 2020])
|
| 93 |
+
- Mathematical reasoning (MGSM [Shi+, 2023])
|
| 94 |
+
|
| 95 |
+
### English evaluation benchmarks
|
| 96 |
+
|
| 97 |
+
We used the Language Model Evaluation Harness(v.0.3.0). The details are as follows:
|
| 98 |
+
|
| 99 |
+
- Multiple-choice question answering (OpenBookQA [Mihaylov+, 2018])
|
| 100 |
+
- Open-ended question answering (TriviaQA [Joshi+, 2017])
|
| 101 |
+
- Machine reading comprehension (SQuAD 2.0 [Rajpurkar+, 2018])
|
| 102 |
+
- Commonsense reasoning (XWINO [Tikhonov & Ryabinin, 2021])
|
| 103 |
+
- Natural language inference (HellaSwag [Zellers+, 2019])
|
| 104 |
+
- Mathematical reasoning (GSM8k [Cobbe+, 2021])
|
| 105 |
+
|
| 106 |
+
|
| 107 |
## Usage
|
| 108 |
|
| 109 |
First install additional dependencies in [requirements.txt](./requirements.txt):
|