File size: 2,105 Bytes
1c9a1ee
 
 
 
 
 
 
 
73b751c
 
fd60b6b
15c7188
0696ded
 
fd60b6b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8f71e0d
3501a5e
 
8077622
 
bd0f5d7
 
 
2aa186d
fd60b6b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
license: apache-2.0
language:
- ja
base_model:
- Qwen/Qwen2-7B
pipeline_tag: text-generation
library_name: transformers
---

# Moriyasu_Qwen2_JP_7B

### Model Description

**Moriyasu_Qwen2_JP_7B** is a is a large language model trained by Moriyasu. Based on [Qwen/Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B), it has been enhanced for Japanese usage through additional pre-training and instruction tuning.  

# Training Datasets

### Pre-training dataset

The model is continually pre-trained on Japanese data from the Qwen2-7b model while maintaining the model's English ability (80% Japanese, 20% English). We use about 120 billion tokens sampled from, Japanese and English Wikipedia articles, Japanese CC-100 Japanese C4, Japanese OSCAR ,The Pile, Webfined, Japanese websites, book data, mathematics and code,...

### Instruction Tuning
We generated about 1 million Instruction data from various methods such as generated data, translated data, and data manually tagged by humans.

# Model Performance

### JGLUE tasks
We used the [lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable) repo to evaluate across 8 tasks, and the results are as follows:


|Model|JCommonsenseQA|JNLI|JMARC|JSQuAD|JAQKET-V2|XL-SUM|XWINOGRAD|MGSM|JA AVG|
|---|---|---|---|---|---|---|---|---|---|
|   |3-shot|3-shot|0-shot|2-shot|1-shot|1-shot|0-shot|5-shot|   |
|   |Acc.|Balanced Acc.|Balanced Acc.|Char-F1|Char-F1|ROUGE-2|Acc.|Acc.|   |
| Moriyasu_Qwen2_JP_7B (OURS) | **94.91** | **91.11** | 95.50 | 87.48 | 89.24 | 19.66 | **82.38** | 55.60 | **76.99** | 
| Qwen2-7B-Instruct | 90.80 | 78.07 | 93.29 | 92.90 | 83.34 | 19.05 | 72.16 | **61.20** | 73.85 | 
| SakanaAI/EvoLLM-JP-v1-7B | 89.19 | 66.02 | 95.55 | 92.10 | 86.41 | **23.31** | 81.65 | 47.60 | 72.73 |
| Llama-3-ELYZA-JP-8B |92.40 | 64.85 | **95.67** | 92.04 | 87.43 | 21.35 | 78.21 | 49.20 | 72.64 |
| Llama-3-Swallow-8B-Instruct-v0.1 | 92.49 | 62.12 | 94.27 | **93.73** | **90.83** | 19.61 | 74.04 | 50.00 | 72.14 | 
| Tanuki-8B-dpo-v1.0| 79.18 | 43.05 | 92.26 | 82.29 | 77.99 | 11.68 | 70.39 | 43.60 | 62.56 |