AIJapanese
/

Moriyasu_Qwen2_JP_7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Moriyasu_Qwen2_JP_7B / README.md

AIJapanese's picture

Update README.md

3501a5e verified 3 months ago

|

2.84 kB

	---
	license: apache-2.0
	language:
	- ja
	base_model:
	- Qwen/Qwen2-7B
	pipeline_tag: text-generation
	library_name: transformers
	---

	# Moriyasu_Qwen2_JP_7B

	### Model Description

	Moriyasu_Qwen2_JP_7B is a is a large language model trained by Moriyasu. Based on [Qwen/Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B), it has been enhanced for Japanese usage through additional pre-training and instruction tuning.

	# Training Datasets

	### Pre-training dataset

	The model is continually pre-trained on Japanese data from the Qwen2-7b model while maintaining the model's English ability (80% Japanese, 20% English). We use about 120 billion tokens sampled from, Japanese and English Wikipedia articles, Japanese CC-100 Japanese C4, Japanese OSCAR ,The Pile, Webfined, Japanese websites, book data, mathematics and code,...

	### Instruction Tuning
	We generated about 1 million Instruction data from various methods such as generated data, translated data, and data manually tagged by humans.

	# Model Performance

	### JGLUE tasks
	We used the [lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable) repo to evaluate across 8 tasks, and the results are as follows:


	\|Model\|JCommonsenseQA\|JNLI\|JMARC\|JSQuAD\|JAQKET-V2\|XL-SUM\|XWINOGRAD\|MGSM\|JA AVG\|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| \|3-shot\|3-shot\|0-shot\|2-shot\|1-shot\|1-shot\|0-shot\|5-shot\| \|
	\| \|Acc.\|Balanced Acc.\|Balanced Acc.\|Char-F1\|Char-F1\|ROUGE-2\|Acc.\|Acc.\| \|
	\| Moriyasu_Qwen2_JP_7B (OURS) \| 0.9035 \| 0.2600 \| 0.4619 \| 0.8647 \| 0.1339 \| 0.2120 \| 0.2667 \| 0.1966 \| 0.5504 \|
	\| Qwen2-7B-Instruct \| 0.8856 \| 0.3902 \| 0.3859 \| 0.8967 \| 0.1277 \| 0.5720 \| 0.2041 \| 0.1909 \| 0.5504 \|
	\| Qwen2.5-7B-Instruct \| 0.9151 \| 0.4293 \| 0.3910 \| 0.8908 \| 0.1676 \| 0.6240 \| 0.2108 \| 0.1916 \| 0.5504 \|
	\| Tanuki-8B-dpo-v1.0 \| 0.2770 \| 0.2937 \| 0.3710 \| 0.6669 \| 0.1016 \| 0.4280 \| 0.2385 \| 0.1820 \| 0.5504 \|
	\| Llama 3 8B Instruct \| 0.8785 \| 0.3812 \| 0.3936 \| 0.8955 \| 0.1273 \| 0.4160 \| 0.2143 \| 0.2035 \| 0.5504 \|
	\| Llama 3.1 8B Instruct \| 0.8829 \| 0.4272 \| 0.4112 \| 0.8856 \| 0.1481 \| 0.5280 \| 0.2174 \| 0.1990 \| 0.5504 \|
	\| Llama 3 Youko 8B Instruct \| 0.9196 \| 0.4850 \| 0.5178 \| 0.9001 \| 0.2085 \| 0.4680 \| 0.2559 \| 0.1906 \| 0.5504 \|
	\| Llama-3-ELYZA-JP-8B \| 0.9017 \| 0.5124 \| 0.5016 \| 0.9113 \| 0.1677 \| 0.4600 \| 0.2509 \| 0.1846 \|0.5504 \|
	\| Llama 3 heron brain 8B v0.3 \| 0.9231 \| 0.4933 \| 0.5694 \| 0.9056 \| 0.2178 \| 0.4560 \| 0.2771 \| 0.2168 \| 0.5504 \|
	\| Llama 3 Swallow 8B Instruct \| 0.9178 \| 0.4963 \| 0.5168 \| 0.9088 \| 0.1296 \| 0.4880 \| 0.2522 \| 0.2254 \| 0.5504 \|
	\| Llama 3.1 Swallow 8B Instruct v0.1\| 0.9240 \| 0.5874 \| 0.5736 \| 0.9170 \| 0.1380 \| 0.5080 \| 0.2820 \| 0.2282 \| 0.5504 \|
	\| Llama 3.1 Swallow 8B Instruct v0.2\| 0.9294 \| 0.5601 \| 0.5988 \| 0.9148 \| 0.1372 \| 0.5280 \| 0.2878 \| 0.2270 \| 0.5504 \|