AIJapanese's picture
Update README.md
2aa186d verified
|
raw
history blame
2.11 kB
metadata
license: apache-2.0
language:
  - ja
base_model:
  - Qwen/Qwen2-7B
pipeline_tag: text-generation
library_name: transformers

Moriyasu_Qwen2_JP_7B

Model Description

Moriyasu_Qwen2_JP_7B is a is a large language model trained by Moriyasu. Based on Qwen/Qwen2-7B, it has been enhanced for Japanese usage through additional pre-training and instruction tuning.

Training Datasets

Pre-training dataset

The model is continually pre-trained on Japanese data from the Qwen2-7b model while maintaining the model's English ability (80% Japanese, 20% English). We use about 120 billion tokens sampled from, Japanese and English Wikipedia articles, Japanese CC-100 Japanese C4, Japanese OSCAR ,The Pile, Webfined, Japanese websites, book data, mathematics and code,...

Instruction Tuning

We generated about 1 million Instruction data from various methods such as generated data, translated data, and data manually tagged by humans.

Model Performance

JGLUE tasks

We used the lm-evaluation-harness repo to evaluate across 8 tasks, and the results are as follows:

Model JCommonsenseQA JNLI JMARC JSQuAD JAQKET-V2 XL-SUM XWINOGRAD MGSM JA AVG
3-shot 3-shot 0-shot 2-shot 1-shot 1-shot 0-shot 5-shot
Acc. Balanced Acc. Balanced Acc. Char-F1 Char-F1 ROUGE-2 Acc. Acc.
Moriyasu_Qwen2_JP_7B (OURS) 94.91 91.11 95.50 87.48 89.24 19.66 82.38 55.60 76.99
Qwen2-7B-Instruct 90.80 78.07 93.29 92.90 83.34 19.05 72.16 61.20 73.85
SakanaAI/EvoLLM-JP-v1-7B 89.19 66.02 95.55 92.10 86.41 23.31 81.65 47.60 72.73
Llama-3-ELYZA-JP-8B 92.40 64.85 95.67 92.04 87.43 21.35 78.21 49.20 72.64
Llama-3-Swallow-8B-Instruct-v0.1 92.49 62.12 94.27 93.73 90.83 19.61 74.04 50.00 72.14
Tanuki-8B-dpo-v1.0 79.18 43.05 92.26 82.29 77.99 11.68 70.39 43.60 62.56