AIJapanese's picture
Update README.md
8f71e0d verified
|
raw
history blame
4.72 kB
metadata
license: apache-2.0
language:
  - ja
base_model:
  - Qwen/Qwen2-7B
pipeline_tag: text-generation
library_name: transformers

Moriyasu_Qwen2_JP_7B

Model Description

Moriyasu_Qwen2_JP_7B is a is a large language model trained by Moriyasu. Based on Qwen/Qwen2-7B, it has been enhanced for Japanese usage through additional pre-training and instruction tuning.

Training Datasets

Pre-training dataset

The model is continually pre-trained on Japanese data from the Qwen2-7b model while maintaining the model's English ability (80% Japanese, 20% English). We use about 120 billion tokens sampled from, Japanese and English Wikipedia articles, Japanese CC-100 Japanese C4, Japanese OSCAR ,The Pile, Webfined, Japanese websites, book data, mathematics and code,...

Instruction Tuning

We generated about 1 million Instruction data from various methods such as generated data, translated data, and data manually tagged by humans.

Model Performance

JGLUE tasks

We used the lm-evaluation-harness repo to evaluate across 8 tasks, and the results are as follows:

Model JCommonsenseQA JNLI JMARC JSQuAD JAQKET-V2 XL-SUM XWINOGRAD MGSM JA AVG
3-shot 3-shot 0-shot 2-shot 1-shot 1-shot 0-shot 5-shot
Acc. Balanced Acc. Balanced Acc. Char-F1 Char-F1 ROUGE-2 Acc. Acc.
Moriyasu_Qwen2_JP_7B (OURS) 0.9035 0.2600 0.4619 0.8647 0.1339 0.2120 0.2667 0.1966 0.5504
Qwen2-7B-Instruct 0.8856 0.3902 0.3859 0.8967 0.1277 0.5720 0.2041 0.1909 0.5504
Qwen2.5-7B-Instruct 0.9151 0.4293 0.3910 0.8908 0.1676 0.6240 0.2108 0.1916 0.5504
Tanuki-8B-dpo-v1.0 0.2770 0.2937 0.3710 0.6669 0.1016 0.4280 0.2385 0.1820 0.5504
Llama 3 8B Instruct 0.8785 0.3812 0.3936 0.8955 0.1273 0.4160 0.2143 0.2035 0.5504
Llama 3.1 8B Instruct 0.8829 0.4272 0.4112 0.8856 0.1481 0.5280 0.2174 0.1990 0.5504
Llama 3 Youko 8B Instruct 0.9196 0.4850 0.5178 0.9001 0.2085 0.4680 0.2559 0.1906 0.5504
Llama-3-ELYZA-JP-8B 0.9017 0.5124 0.5016 0.9113 0.1677 0.4600 0.2509 0.1846 0.5504
Llama 3 heron brain 8B v0.3 0.9231 0.4933 0.5694 0.9056 0.2178 0.4560 0.2771 0.2168 0.5504
Llama 3 Swallow 8B Instruct 0.9178 0.4963 0.5168 0.9088 0.1296 0.4880 0.2522 0.2254 0.5504
Llama 3.1 Swallow 8B Instruct v0.1 0.9240 0.5874 0.5736 0.9170 0.1380 0.5080 0.2820 0.2282 0.5504
Llama 3.1 Swallow 8B Instruct v0.2 0.9294 0.5601 0.5988 0.9148 0.1372 0.5280 0.2878 0.2270 0.5504

Japanese tasks

Model JCom. JEMHopQA NIILC JSQuAD XL-Sum MGSM WMT20-en-ja WMT20-ja-en JMMLU JHumanEval Ja Avg
4-shot 4-shot 4-shot 4-shot 1-shot 4-shot 4-shot 4-shot 5-shot 0-shot
EM acc Char-F1 Char-F1 Char-F1 ROUGE-2 EM acc BLEU BLEU EM acc pass@1
RakutenAI-7B-chat 0.9035 0.2600 0.4619 0.8647 0.1339 0.2120 0.2667 0.1966 0.4504 0.2299 0.3980
Qwen2-7B-Instruct 0.8856 0.3902 0.3859 0.8967 0.1277 0.5720 0.2041 0.1909 0.5713 0.5683 0.4793
Qwen2.5-7B-Instruct 0.9151 0.4293 0.3910 0.8908 0.1676 0.6240 0.2108 0.1916 0.6252 0.5305 0.4976
Tanuki-8B-dpo-v1.0 0.2770 0.2937 0.3710 0.6669 0.1016 0.4280 0.2385 0.1820 0.3078 0.2555 0.3122
Llama 3 8B Instruct 0.8785 0.3812 0.3936 0.8955 0.1273 0.4160 0.2143 0.2035 0.4719 0.2872 0.4269
Llama 3.1 8B Instruct 0.8829 0.4272 0.4112 0.8856 0.1481 0.5280 0.2174 0.1990 0.5086 0.4976 0.4706
Llama 3 Youko 8B Instruct 0.9196 0.4850 0.5178 0.9001 0.2085 0.4680 0.2559 0.1906 0.4691 0.2695 0.4684
Llama-3-ELYZA-JP-8B 0.9017 0.5124 0.5016 0.9113 0.1677 0.4600 0.2509 0.1846 0.4829 0.3811 0.4754
Llama 3 heron brain 8B v0.3 0.9231 0.4933 0.5694 0.9056 0.2178 0.4560 0.2771 0.2168 0.4993 0.3177 0.4876
Llama 3 Swallow 8B Instruct 0.9178 0.4963 0.5168 0.9088 0.1296 0.4880 0.2522 0.2254 0.4835 0.3927 0.4811
Llama 3.1 Swallow 8B Instruct v0.1 0.9240 0.5874 0.5736 0.9170 0.1380 0.5080 0.2820 0.2282 0.5301 0.3665 0.5055
Llama 3.1 Swallow 8B Instruct v0.2 0.9294 0.5601 0.5988 0.9148 0.1372 0.5280 0.2878 0.2270 0.5504 0.4079 0.5141