license: apache-2.0
language:
- ja
base_model:
- Qwen/Qwen2-7B
pipeline_tag: text-generation
library_name: transformers
Moriyasu_Qwen2_JP_7B
Model Description
Moriyasu_Qwen2_JP_7B is a is a large language model trained by Moriyasu. Based on Qwen/Qwen2-7B, it has been enhanced for Japanese usage through additional pre-training and instruction tuning.
Training Datasets
Pre-training dataset
The model is continually pre-trained on Japanese data from the Qwen2-7b model while maintaining the model's English ability (80% Japanese, 20% English). We use about 120 billion tokens sampled from, Japanese and English Wikipedia articles, Japanese CC-100 Japanese C4, Japanese OSCAR ,The Pile, Webfined, Japanese websites, book data, mathematics and code,...
Instruction Tuning
We generated about 1 million Instruction data from various methods such as generated data, translated data, and data manually tagged by humans.
Model Performance
JGLUE tasks
We used the lm-evaluation-harness repo to evaluate across 8 tasks, and the results are as follows:
|Model|JCommonsenseQA|JNLI|JMARC|JSQuAD|JAQKET-V2|XL-SUM|XWINOGRAD|MGSM|JA AVG|
|---|---|---|---|---|---|---|---|---|---|---|---|
| |3-shot|3-shot|0-shot|2-shot|1-shot|1-shot|0-shot|5-shot| |
| |Acc.|Balanced Acc.|Balanced Acc.|Char-F1|Char-F1|ROUGE-2|Acc.|Acc.| |
| Moriyasu_Qwen2_JP_7B (OURS) | 0.9035 | 0.2600 | 0.4619 | 0.8647 | 0.1339 | 0.2120 | 0.2667 | 0.1966 | 0.5504 |
| Qwen2-7B-Instruct | 0.8856 | 0.3902 | 0.3859 | 0.8967 | 0.1277 | 0.5720 | 0.2041 | 0.1909 | 0.5504 |
| Qwen2.5-7B-Instruct | 0.9151 | 0.4293 | 0.3910 | 0.8908 | 0.1676 | 0.6240 | 0.2108 | 0.1916 | 0.5504 |
| Tanuki-8B-dpo-v1.0 | 0.2770 | 0.2937 | 0.3710 | 0.6669 | 0.1016 | 0.4280 | 0.2385 | 0.1820 | 0.5504 |
| Llama 3 8B Instruct | 0.8785 | 0.3812 | 0.3936 | 0.8955 | 0.1273 | 0.4160 | 0.2143 | 0.2035 | 0.5504 |
| Llama 3.1 8B Instruct | 0.8829 | 0.4272 | 0.4112 | 0.8856 | 0.1481 | 0.5280 | 0.2174 | 0.1990 | 0.5504 |
| Llama 3 Youko 8B Instruct | 0.9196 | 0.4850 | 0.5178 | 0.9001 | 0.2085 | 0.4680 | 0.2559 | 0.1906 | 0.5504 |
| Llama-3-ELYZA-JP-8B | 0.9017 | 0.5124 | 0.5016 | 0.9113 | 0.1677 | 0.4600 | 0.2509 | 0.1846 |0.5504 |
| Llama 3 heron brain 8B v0.3 | 0.9231 | 0.4933 | 0.5694 | 0.9056 | 0.2178 | 0.4560 | 0.2771 | 0.2168 | 0.5504 |
| Llama 3 Swallow 8B Instruct | 0.9178 | 0.4963 | 0.5168 | 0.9088 | 0.1296 | 0.4880 | 0.2522 | 0.2254 | 0.5504 |
| Llama 3.1 Swallow 8B Instruct v0.1| 0.9240 | 0.5874 | 0.5736 | 0.9170 | 0.1380 | 0.5080 | 0.2820 | 0.2282 | 0.5504 |
| Llama 3.1 Swallow 8B Instruct v0.2| 0.9294 | 0.5601 | 0.5988 | 0.9148 | 0.1372 | 0.5280 | 0.2878 | 0.2270 | 0.5504 |