license: apache-2.0
language:
- ja
base_model:
- Qwen/Qwen2-7B
pipeline_tag: text-generation
library_name: transformers
Moriyasu_Qwen2_JP_7B
Model Description
Moriyasu_Qwen2_JP_7B is a is a large language model trained by Moriyasu. Based on Qwen/Qwen2-7B, it has been enhanced for Japanese usage through additional pre-training and instruction tuning.
Training Datasets
Pre-training dataset
The model is continually pre-trained on Japanese data from the Qwen2-7b model while maintaining the model's English ability (80% Japanese, 20% English). We use about 120 billion tokens sampled from, Japanese and English Wikipedia articles, Japanese CC-100 Japanese C4, Japanese OSCAR ,The Pile, Webfined, Japanese websites, book data, mathematics and code,...
Instruction Tuning
We generated about 1 million Instruction data from various methods such as generated data, translated data, and data manually tagged by humans.
Model Performance
JGLUE tasks
We used the lm-evaluation-harness repo to evaluate across 8 tasks, and the results are as follows:
Model | JCommonsenseQA | JNLI | JMARC | JSQuAD | JAQKET-V2 | XL-SUM | XWINOGRAD | MGSM | JA AVG |
---|---|---|---|---|---|---|---|---|---|
3-shot | 3-shot | 0-shot | 2-shot | 1-shot | 1-shot | 0-shot | 5-shot | ||
Acc. | Balanced Acc. | Balanced Acc. | Char-F1 | Char-F1 | ROUGE-2 | Acc. | Acc. | ||
Moriyasu_Qwen2_JP_7B (OURS) | 94.91 | 91.11 | 95.50 | 87.48 | 89.24 | 19.66 | 82.38 | 55.60 | 76.99 |
Qwen2-7B-Instruct | 90.80 | 78.07 | 93.29 | 92.90 | 83.34 | 19.05 | 72.16 | 61.20 | 73.85 |
SakanaAI/EvoLLM-JP-v1-7B | 89.19 | 66.02 | 95.55 | 92.10 | 86.41 | 23.31 | 81.65 | 47.60 | 72.73 |
Llama-3-ELYZA-JP-8B | 92.40 | 64.85 | 95.67 | 92.04 | 87.43 | 21.35 | 78.21 | 49.20 | 72.64 |
Llama-3-Swallow-8B-Instruct-v0.1 | 92.49 | 62.12 | 94.27 | 93.73 | 90.83 | 19.61 | 74.04 | 50.00 | 72.14 |
Tanuki-8B-dpo-v1.0 | 79.18 | 43.05 | 92.26 | 82.29 | 77.99 | 11.68 | 70.39 | 43.60 | 62.56 |
Japanese tasks
For this evaluation, we used swallow-evaluation repo to evaluate our model. The results of other models are taken from the report Llama-3.1-Swallow-8B-Instruct-v0.2 .
Model | JCom. | JEMHopQA | NIILC | JSQuAD | XL-Sum | MGSM | WMT20-en-ja | WMT20-ja-en | JMMLU | JHumanEval | Ja Avg |
---|---|---|---|---|---|---|---|---|---|---|---|
4-shot | 4-shot | 4-shot | 4-shot | 1-shot | 4-shot | 4-shot | 4-shot | 5-shot | 0-shot | ||
EM acc | Char-F1 | Char-F1 | Char-F1 | ROUGE-2 | EM acc | BLEU | BLEU | EM acc | pass@1 | ||
RakutenAI-7B-chat | 0.9035 | 0.2600 | 0.4619 | 0.8647 | 0.1339 | 0.2120 | 0.2667 | 0.1966 | 0.4504 | 0.2299 | 0.3980 |
Qwen2-7B-Instruct | 0.8856 | 0.3902 | 0.3859 | 0.8967 | 0.1277 | 0.5720 | 0.2041 | 0.1909 | 0.5713 | 0.5683 | 0.4793 |
Qwen2.5-7B-Instruct | 0.9151 | 0.4293 | 0.3910 | 0.8908 | 0.1676 | 0.6240 | 0.2108 | 0.1916 | 0.6252 | 0.5305 | 0.4976 |
Tanuki-8B-dpo-v1.0 | 0.2770 | 0.2937 | 0.3710 | 0.6669 | 0.1016 | 0.4280 | 0.2385 | 0.1820 | 0.3078 | 0.2555 | 0.3122 |
Llama 3 8B Instruct | 0.8785 | 0.3812 | 0.3936 | 0.8955 | 0.1273 | 0.4160 | 0.2143 | 0.2035 | 0.4719 | 0.2872 | 0.4269 |
Llama 3.1 8B Instruct | 0.8829 | 0.4272 | 0.4112 | 0.8856 | 0.1481 | 0.5280 | 0.2174 | 0.1990 | 0.5086 | 0.4976 | 0.4706 |
Llama 3 Youko 8B Instruct | 0.9196 | 0.4850 | 0.5178 | 0.9001 | 0.2085 | 0.4680 | 0.2559 | 0.1906 | 0.4691 | 0.2695 | 0.4684 |
Llama-3-ELYZA-JP-8B | 0.9017 | 0.5124 | 0.5016 | 0.9113 | 0.1677 | 0.4600 | 0.2509 | 0.1846 | 0.4829 | 0.3811 | 0.4754 |
Llama 3 heron brain 8B v0.3 | 0.9231 | 0.4933 | 0.5694 | 0.9056 | 0.2178 | 0.4560 | 0.2771 | 0.2168 | 0.4993 | 0.3177 | 0.4876 |
Llama 3 Swallow 8B Instruct | 0.9178 | 0.4963 | 0.5168 | 0.9088 | 0.1296 | 0.4880 | 0.2522 | 0.2254 | 0.4835 | 0.3927 | 0.4811 |
Llama 3.1 Swallow 8B Instruct v0.1 | 0.9240 | 0.5874 | 0.5736 | 0.9170 | 0.1380 | 0.5080 | 0.2820 | 0.2282 | 0.5301 | 0.3665 | 0.5055 |
Llama 3.1 Swallow 8B Instruct v0.2 | 0.9294 | 0.5601 | 0.5988 | 0.9148 | 0.1372 | 0.5280 | 0.2878 | 0.2270 | 0.5504 | 0.4079 | 0.5141 |
Moriyasu_Qwen2_JP_7B (OURS) | 0.9321 | 0.4823 | 0.6046 | 0.9201 | 0.1382 | 0.5560 | 0.2636 | 0.1892 | 0.5273 | 0.2976 | 0.4911 |