|
--- |
|
license: apache-2.0 |
|
language: |
|
- ja |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
--- |
|
|
|
# Moriyasu_Qwen2_JP_7B |
|
|
|
### Model Description |
|
|
|
**Moriyasu_Qwen2_JP_7B** is a large language model trained by Moriyasu. Based on [Qwen/Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B), it has been enhanced for Japanese usage through additional pre-training and instruction tuning. |
|
|
|
# Model Performance |
|
|
|
### JGLUE tasks |
|
We used the [lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable) repo to evaluate across 8 tasks, and the results are as follows: |
|
|
|
|
|
|Model|JCommonsenseQA|JNLI|JMARC|JSQuAD|JAQKET-V2|XL-SUM|XWINOGRAD|MGSM|JA AVG| |
|
|---|---|---|---|---|---|---|---|---|---| |
|
| |3-shot|3-shot|0-shot|2-shot|1-shot|1-shot|0-shot|5-shot| | |
|
| |Acc.|Balanced Acc.|Balanced Acc.|Char-F1|Char-F1|ROUGE-2|Acc.|Acc.| | |
|
| Moriyasu_Qwen2_JP_7B (ours) | **0.9491** | **0.9111** | 0.9550 | 0.8748 | 0.8924 | 0.1966 | **0.8238** | 0.5560 | **0.7699** | |
|
| Qwen2-7B-Instruct | 0.9080 | 0.7807 | 0.9329 | 0.9290 | 0.8334 | 0.1905 | 0.7216 | **0.6120** | 0.7385 | |
|
| SakanaAI/EvoLLM-JP-v1-7B | 0.8919 | 0.6602 | 0.9555 | 0.9210 | 0.8641 | **0.2331** | 0.8165 | 0.4760 | 0.7273 | |
|
| Llama-3-ELYZA-JP-8B |0.9240 | 0.6485 | **0.9567** | 0.9204 | 0.8743 | 0.2135 | 0.7821 | 0.4920 | 0.7264 | |
|
| Llama-3-Swallow-8B-Instruct-v0.1 | 0.9249 | 0.6212 | 0.9427 | **0.9373** | **0.9083** | 0.1961 | 0.7404 | 0.5000 | 0.7214 | |
|
| Tanuki-8B-dpo-v1.0| 0.7918 | 0.4305 | 0.9226 | 0.8229 | 0.7799 | 0.1168 | 0.7039 | 0.4360 | 0.6256 | |
|
|
|
|
|
### Japanese tasks |
|
|
|
For this evaluation, we used [swallow-evaluation](https://github.com/swallow-llm/swallow-evaluation) repo to evaluate our model. |
|
The results of other models are taken from the report |
|
[Llama-3.1-Swallow-8B-Instruct-v0.2](https://huggingface.co/tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.2) . |
|
|
|
|Model|JCom.|JEMHopQA|NIILC|JSQuAD|XL-Sum|MGSM|WMT20-en-ja|WMT20-ja-en|JMMLU|JHumanEval|Ja Avg| |
|
|---|---|---|---|---|---|---|---|---|---|---|---| |
|
| |4-shot|4-shot|4-shot|4-shot|1-shot|4-shot|4-shot|4-shot|5-shot|0-shot| | |
|
| |EM acc|Char-F1|Char-F1|Char-F1|ROUGE-2|EM acc|BLEU|BLEU|EM acc|pass@1| | |
|
| Moriyasu_Qwen2_JP_7B (ours)| **0.9321** | 0.4823 | **0.6046** | **0.9201** | 0.1382 | 0.5560 | 0.2636 | 0.1892 | 0.5273 | 0.2976 | 0.4911 | |
|
| RakutenAI-7B-chat | 0.9035 | 0.2600 | 0.4619 | 0.8647 | 0.1339 | 0.2120 | 0.2667 | 0.1966 | 0.4504 | 0.2299 | 0.3980 | |
|
| Qwen2-7B-Instruct | 0.8856 | 0.3902 | 0.3859 | 0.8967 | 0.1277 | 0.5720 | 0.2041 | 0.1909 | 0.5713 | **0.5683** | 0.4793 | |
|
| Qwen2.5-7B-Instruct | 0.9151 | 0.4293 | 0.3910 | 0.8908 | 0.1676 | **0.6240** | 0.2108 | 0.1916 | **0.6252** | 0.5305 | 0.4976 | |
|
| Tanuki-8B-dpo-v1.0 | 0.2770 | 0.2937 | 0.3710 | 0.6669 | 0.1016 | 0.4280 | 0.2385 | 0.1820 | 0.3078 | 0.2555 | 0.3122 | |
|
| Llama 3 8B Instruct | 0.8785 | 0.3812 | 0.3936 | 0.8955 | 0.1273 | 0.4160 | 0.2143 | 0.2035 | 0.4719 | 0.2872 | 0.4269 | |
|
| Llama 3.1 8B Instruct | 0.8829 | 0.4272 | 0.4112 | 0.8856 | 0.1481 | 0.5280 | 0.2174 | 0.1990 | 0.5086 | 0.4976 | 0.4706 | |
|
| Llama 3 Youko 8B Instruct | 0.9196 | 0.4850 | 0.5178 | 0.9001 | 0.2085 | 0.4680 | 0.2559 | 0.1906 | 0.4691 | 0.2695 | 0.4684 | |
|
| Llama-3-ELYZA-JP-8B | 0.9017 | 0.5124 | 0.5016 | 0.9113 | 0.1677 | 0.4600 | 0.2509 | 0.1846 | 0.4829 | 0.3811 | 0.4754 | |
|
| Llama 3 heron brain 8B v0.3 | 0.9231 | 0.4933 | 0.5694 | 0.9056 | **0.2178** | 0.4560 | 0.2771 | 0.2168 | 0.4993 | 0.3177 | 0.4876 | |
|
| Llama 3 Swallow 8B Instruct | 0.9178 | 0.4963 | 0.5168 | 0.9088 | 0.1296 | 0.4880 | 0.2522 | 0.2254 | 0.4835 | 0.3927 | 0.4811 | |
|
| Llama 3.1 Swallow 8B Instruct v0.1| 0.9240 | **0.5874** | 0.5736 | 0.9170 | 0.1380 | 0.5080 | 0.2820 | **0.2282** | 0.5301 | 0.3665 | 0.5055 | |
|
| Llama 3.1 Swallow 8B Instruct v0.2| 0.9294 | 0.5601 | 0.5988 | 0.9148 | 0.1372 | 0.5280 | **0.2878** | 0.2270 | 0.5504 | 0.4079 | **0.5141** | |
|
|
|
### Japanese MTBench |
|
|
|
For this evaluation, we use [FastChat](https://github.com/Stability-AI/FastChat/tree/jp-stable) and **gpt-4o-2024-08-06** for judgement and reference answer. |
|
|
|
Due to limited computational resources, we conducted evaluations on only a select number of models. |
|
|
|
|Model|coding|extraction|humanities|math|reasoning|roleplay|stem|writing|JMTAvg| |
|
|---|---|---|---|---|---|---|---|---|---| |
|
| Moriyasu_Qwen2_JP_7B (ours) | **0.515** | 0.710 | **0.845** | **0.685** | **0.585** | **0.815** | **0.710** | **0.765** | **0.704** | |
|
| Llama-3-ELYZA-JP-8B | 0.365 | **0.72** | 0.730 | 0.400 | 0.555 | 0.670 | 0.580 | 0.785 | 0.601 | |
|
| Llama 3.1 Swallow 8B Instruct v0.1| 0.480 | 0.680 | 0.705 | 0.475 | 0.425 | 0.710 | 0.620 | 0.645 | 0.592 | |
|
|
|
### Elyza task 100: |
|
|
|
For this benchmark, we use [Elyza task 100](https://huggingface.co/datasets/elyza/ELYZA-tasks-100) dataset and gpt4o scoring prompt of Elyza. Link prompt from [this blog](https://zenn.dev/elyza/articles/7ece3e73ff35f4) |
|
|
|
|Model|Score| |
|
|---|---| |
|
| Moriyasu_Qwen2_JP_7B (ours) | 3.37 | |
|
| Llama-3-ELYZA-JP-8B | **3.66** | |
|
| Llama 3.1 Swallow 8B Instruct v0.1| 3.32 | |
|
|
|
### Nejumi leaderboard 3 |
|
We will contact Nejumi soon to evaluate on this benchmark |
|
|
|
|
|
# Usage |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
path = 'AIJapanese/Moriyasu_Qwen2_JP_7B' |
|
model = AutoModelForCausalLM.from_pretrained( |
|
path, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto", |
|
use_cache=True |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(path) |
|
|
|
system_prompt = "あなたは誠実で優秀な日本人アシスタントです。常に可能な限り最も役立つ回答を提供するように努めてください。" |
|
prompt = "日本で一番高い山は何ですか " |
|
conversation = [{"role": "system", "content": system_prompt }] |
|
conversation.append({"role": "user", "content": prompt}) |
|
text = tokenizer.apply_chat_template( |
|
conversation, |
|
tokenize=False, |
|
add_generation_prompt=True) |
|
|
|
model_inputs = tokenizer(text,return_tensors="pt").to(model.device) |
|
generated_ids = model.generate( |
|
model_inputs.input_ids, |
|
max_new_tokens=2048, |
|
temperature = 0.2, |
|
#top_p=0.95, |
|
#top_k=40, |
|
) |
|
generated_ids = [ |
|
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
|
] |
|
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
print(response) |
|
``` |
|
|
|
# Training Datasets |
|
|
|
### Pre-training dataset |
|
|
|
The model is continually pre-trained on Japanese data from the Qwen2-7b model while maintaining the model's English ability (80% Japanese, 20% English). We use about 120 billion tokens sampled from, Japanese and English Wikipedia articles, Japanese CC-100 Japanese C4, Japanese OSCAR ,The Pile, Webfined, Japanese websites, book data, mathematics and code,... |
|
|
|
### Instruction Tuning |
|
We generated about 1 million Instruction data from various methods such as generated data, translated data, and data manually tagged by humans. |
|
|
|
# Contact: |
|
If you have any questions, please contact me at: [email protected] |