Update README.md

8b51cb6 verified about 1 month ago

7 kB

	---
	license: apache-2.0
	language:
	- ja
	pipeline_tag: text-generation
	library_name: transformers
	---

	# Moriyasu_Qwen2_JP_7B

	### Model Description

	Moriyasu_Qwen2_JP_7B is a large language model trained by Moriyasu. Based on [Qwen/Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B), it has been enhanced for Japanese usage through additional pre-training and instruction tuning.

	# Model Performance

	### JGLUE tasks
	We used the [lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable) repo to evaluate across 8 tasks, and the results are as follows:


	\|Model\|JCommonsenseQA\|JNLI\|JMARC\|JSQuAD\|JAQKET-V2\|XL-SUM\|XWINOGRAD\|MGSM\|JA AVG\|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| \|3-shot\|3-shot\|0-shot\|2-shot\|1-shot\|1-shot\|0-shot\|5-shot\| \|
	\| \|Acc.\|Balanced Acc.\|Balanced Acc.\|Char-F1\|Char-F1\|ROUGE-2\|Acc.\|Acc.\| \|
	\| Moriyasu_Qwen2_JP_7B (ours) \| 0.9491 \| 0.9111 \| 0.9550 \| 0.8748 \| 0.8924 \| 0.1966 \| 0.8238 \| 0.5560 \| 0.7699 \|
	\| Qwen2-7B-Instruct \| 0.9080 \| 0.7807 \| 0.9329 \| 0.9290 \| 0.8334 \| 0.1905 \| 0.7216 \| 0.6120 \| 0.7385 \|
	\| SakanaAI/EvoLLM-JP-v1-7B \| 0.8919 \| 0.6602 \| 0.9555 \| 0.9210 \| 0.8641 \| 0.2331 \| 0.8165 \| 0.4760 \| 0.7273 \|
	\| Llama-3-ELYZA-JP-8B \|0.9240 \| 0.6485 \| 0.9567 \| 0.9204 \| 0.8743 \| 0.2135 \| 0.7821 \| 0.4920 \| 0.7264 \|
	\| Llama-3-Swallow-8B-Instruct-v0.1 \| 0.9249 \| 0.6212 \| 0.9427 \| 0.9373 \| 0.9083 \| 0.1961 \| 0.7404 \| 0.5000 \| 0.7214 \|
	\| Tanuki-8B-dpo-v1.0\| 0.7918 \| 0.4305 \| 0.9226 \| 0.8229 \| 0.7799 \| 0.1168 \| 0.7039 \| 0.4360 \| 0.6256 \|


	### Japanese tasks

	For this evaluation, we used [swallow-evaluation](https://github.com/swallow-llm/swallow-evaluation) repo to evaluate our model.
	The results of other models are taken from the report
	[Llama-3.1-Swallow-8B-Instruct-v0.2](https://huggingface.co/tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.2) .

	\|Model\|JCom.\|JEMHopQA\|NIILC\|JSQuAD\|XL-Sum\|MGSM\|WMT20-en-ja\|WMT20-ja-en\|JMMLU\|JHumanEval\|Ja Avg\|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| \|4-shot\|4-shot\|4-shot\|4-shot\|1-shot\|4-shot\|4-shot\|4-shot\|5-shot\|0-shot\| \|
	\| \|EM acc\|Char-F1\|Char-F1\|Char-F1\|ROUGE-2\|EM acc\|BLEU\|BLEU\|EM acc\|pass@1\| \|
	\| Moriyasu_Qwen2_JP_7B (ours)\| 0.9321 \| 0.4823 \| 0.6046 \| 0.9201 \| 0.1382 \| 0.5560 \| 0.2636 \| 0.1892 \| 0.5273 \| 0.2976 \| 0.4911 \|
	\| RakutenAI-7B-chat \| 0.9035 \| 0.2600 \| 0.4619 \| 0.8647 \| 0.1339 \| 0.2120 \| 0.2667 \| 0.1966 \| 0.4504 \| 0.2299 \| 0.3980 \|
	\| Qwen2-7B-Instruct \| 0.8856 \| 0.3902 \| 0.3859 \| 0.8967 \| 0.1277 \| 0.5720 \| 0.2041 \| 0.1909 \| 0.5713 \| 0.5683 \| 0.4793 \|
	\| Qwen2.5-7B-Instruct \| 0.9151 \| 0.4293 \| 0.3910 \| 0.8908 \| 0.1676 \| 0.6240 \| 0.2108 \| 0.1916 \| 0.6252 \| 0.5305 \| 0.4976 \|
	\| Tanuki-8B-dpo-v1.0 \| 0.2770 \| 0.2937 \| 0.3710 \| 0.6669 \| 0.1016 \| 0.4280 \| 0.2385 \| 0.1820 \| 0.3078 \| 0.2555 \| 0.3122 \|
	\| Llama 3 8B Instruct \| 0.8785 \| 0.3812 \| 0.3936 \| 0.8955 \| 0.1273 \| 0.4160 \| 0.2143 \| 0.2035 \| 0.4719 \| 0.2872 \| 0.4269 \|
	\| Llama 3.1 8B Instruct \| 0.8829 \| 0.4272 \| 0.4112 \| 0.8856 \| 0.1481 \| 0.5280 \| 0.2174 \| 0.1990 \| 0.5086 \| 0.4976 \| 0.4706 \|
	\| Llama 3 Youko 8B Instruct \| 0.9196 \| 0.4850 \| 0.5178 \| 0.9001 \| 0.2085 \| 0.4680 \| 0.2559 \| 0.1906 \| 0.4691 \| 0.2695 \| 0.4684 \|
	\| Llama-3-ELYZA-JP-8B \| 0.9017 \| 0.5124 \| 0.5016 \| 0.9113 \| 0.1677 \| 0.4600 \| 0.2509 \| 0.1846 \| 0.4829 \| 0.3811 \| 0.4754 \|
	\| Llama 3 heron brain 8B v0.3 \| 0.9231 \| 0.4933 \| 0.5694 \| 0.9056 \| 0.2178 \| 0.4560 \| 0.2771 \| 0.2168 \| 0.4993 \| 0.3177 \| 0.4876 \|
	\| Llama 3 Swallow 8B Instruct \| 0.9178 \| 0.4963 \| 0.5168 \| 0.9088 \| 0.1296 \| 0.4880 \| 0.2522 \| 0.2254 \| 0.4835 \| 0.3927 \| 0.4811 \|
	\| Llama 3.1 Swallow 8B Instruct v0.1\| 0.9240 \| 0.5874 \| 0.5736 \| 0.9170 \| 0.1380 \| 0.5080 \| 0.2820 \| 0.2282 \| 0.5301 \| 0.3665 \| 0.5055 \|
	\| Llama 3.1 Swallow 8B Instruct v0.2\| 0.9294 \| 0.5601 \| 0.5988 \| 0.9148 \| 0.1372 \| 0.5280 \| 0.2878 \| 0.2270 \| 0.5504 \| 0.4079 \| 0.5141 \|

	### Japanese MTBench

	For this evaluation, we use [FastChat](https://github.com/Stability-AI/FastChat/tree/jp-stable) and gpt-4o-2024-08-06 for judgement and reference answer.

	Due to limited computational resources, we conducted evaluations on only a select number of models.

	\|Model\|coding\|extraction\|humanities\|math\|reasoning\|roleplay\|stem\|writing\|JMTAvg\|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| Moriyasu_Qwen2_JP_7B (ours) \| 0.515 \| 0.710 \| 0.845 \| 0.685 \| 0.585 \| 0.815 \| 0.710 \| 0.765 \| 0.704 \|
	\| Llama-3-ELYZA-JP-8B \| 0.365 \| 0.72 \| 0.730 \| 0.400 \| 0.555 \| 0.670 \| 0.580 \| 0.785 \| 0.601 \|
	\| Llama 3.1 Swallow 8B Instruct v0.1\| 0.480 \| 0.680 \| 0.705 \| 0.475 \| 0.425 \| 0.710 \| 0.620 \| 0.645 \| 0.592 \|

	### Elyza task 100:

	For this benchmark, we use [Elyza task 100](https://huggingface.co/datasets/elyza/ELYZA-tasks-100) dataset and gpt4o scoring prompt of Elyza. Link prompt from [this blog](https://zenn.dev/elyza/articles/7ece3e73ff35f4)

	\|Model\|Score\|
	\|---\|---\|
	\| Moriyasu_Qwen2_JP_7B (ours) \| 3.37 \|
	\| Llama-3-ELYZA-JP-8B \| 3.66 \|
	\| Llama 3.1 Swallow 8B Instruct v0.1\| 3.32 \|

	### Nejumi leaderboard 3
	We will contact Nejumi soon to evaluate on this benchmark


	# Usage

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	path = 'AIJapanese/Moriyasu_Qwen2_JP_7B'
	model = AutoModelForCausalLM.from_pretrained(
	path,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	use_cache=True
	)
	tokenizer = AutoTokenizer.from_pretrained(path)

	system_prompt = "あなたは誠実で優秀な日本人アシスタントです。常に可能な限り最も役立つ回答を提供するように努めてください。"
	prompt = "日本で一番高い山は何ですか "
	conversation = [{"role": "system", "content": system_prompt }]
	conversation.append({"role": "user", "content": prompt})
	text = tokenizer.apply_chat_template(
	conversation,
	tokenize=False,
	add_generation_prompt=True)

	model_inputs = tokenizer(text,return_tensors="pt").to(model.device)
	generated_ids = model.generate(
	model_inputs.input_ids,
	max_new_tokens=2048,
	temperature = 0.2,
	#top_p=0.95,
	#top_k=40,
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]
	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(response)
	```

	# Training Datasets

	### Pre-training dataset

	The model is continually pre-trained on Japanese data from the Qwen2-7b model while maintaining the model's English ability (80% Japanese, 20% English). We use about 120 billion tokens sampled from, Japanese and English Wikipedia articles, Japanese CC-100 Japanese C4, Japanese OSCAR ,The Pile, Webfined, Japanese websites, book data, mathematics and code,...

	### Instruction Tuning
	We generated about 1 million Instruction data from various methods such as generated data, translated data, and data manually tagged by humans.

	# Contact:
	If you have any questions, please contact me at: [email protected]