japanese-gpt-neox-small
This repository provides a small-sized Japanese GPT-NeoX model. The model was trained using code based on EleutherAI/gpt-neox.
Update log
- 2023/03/20 Update the model weight and config files such that it can be loaded via Huggingface's official GPT-NeoX implementation.
How to use the model
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt-neox-small", use_fast=False)
model = GPTNeoXForCausalLM.from_pretrained("rinna/japanese-gpt-neox-small")
Model architecture
A 12-layer, 768-hidden-size transformer-based language model.
Training
The model was trained on Japanese CC-100, Japanese C4, and Japanese Wikipedia to optimize a traditional language modelling objective.
Tokenization
The model uses a sentencepiece-based tokenizer.
A toy prefix-tuning weight file
Along with pretrained model, we also release a prefix-tuning weight file named smileface_suffix.task0.weight
for demonstration. The toy prefix-tuning weights here is trained to encourage the model to end every generated sentence with a smiling face emoji ๐. Find the training/inference code for prefix-tuning at our Github repo prefix-tuning-gpt.
Here are a few samples generated with and without the toy prefix weights, respectively.
3 samples without the prefix weights
- ใใใฃใจใใใฏ็ตถๅฏพ้้ใฃใฆใชใใญใ ใใใใซใฏ5ใๅฝ่ชใซ4ใคใฎๅคๅฝ่ชใฎๆๅณใชใใฆใใใใชใใ ใงใใใจใใใใใใฎ็ฐกๅใช่ฑๆใใฉใใชๆๅณใๆใคใฎใ็ฅใใใใใญ!ใ
- 25ๅ้ ใซๅ ฌๅใซ็ใใฆใใใณใใซๅบงใฃใฆๅพ ใฃใฆใใใจใใพใใใฆใSๅ ็ใใ้ฃ็ตกใๅ ฅใใพใใใ ็ขบใใๅๅพใฎ็คผๆใฎๆใซ่ชๅใฎๆใฃใฆใใใๅผๅฝใ้ฃในใ่จๆถใ้ฎฎๆใซๆฎใฃใฆใใพใใ ๅพใงใคใณใฟใผใใใใงๆค็ดขใใใใSๅ ็ใฎใใญใฐใซ้ฃใณใพใใใ ไปๆฅใฎๆฉใใฏใใฏ็ผใใในใไฝใฃใฆใฟใพใใ! * ไธใฎๅ็ใฏๆจๆฅใฎๆ็ผใใงใใ
- CTใงๆญฏๅฝขใใงใใฆใใใฎๅพใใใซใใฎๆญฏๅฝขใๅใณๅใใใใใซใชใใฎใฏใไฝใๅๅ ใ ใใ? ่ซๆญฏใซใชใฃใๅๅ ใใๅฃ่ญใใช? ใใใจใๆญฏๅจ็ ใใช? ๆญฏ็ณใใจใใใพใงใใใใใใกใใฃใจใใใใใใ ๅญไพใฎ่ซๆญฏใฃใฆใใชใใชใๆฒปใใชใใงใใใญใ่ฆชๅ ๅผใงไฝๅบฆใใ ๅญไพใฎๆญฏๆ นใฏใ่ฆชใฎใใฎใซใชใใพใใ ใใใฆ่ชๅใฎใใฎใ ใฃใใใ็ฅใใชใ้ใซๆใใใใใ็ใใฆใใใใใใพใใ ๅคงไบบใซใชใฃใฆ่ฆชใใใฟใๅ ดๅใฏใ็ฝใๆญฏใซๅคใใฃใฆใใฆใ้ๅฑใฎใใใผใงใๆชใใชใใ่ฆชใใใฎใใๆญฏใฎๅฟ้ ใฏใชใใงใใใญใ
3 samples with the prefix weights:
- โปๆตทๅคใใฉใณใๅใฎๅ ดๅใฏใ่ฟๅใป่ฟ้็ญใฏใๅใ่ดใใใญใพใใฎใงไบใใไบๆฟ้กใใพใใ โป ๅๅ็บ้ๅพใใๅฎขๆงใธๅๅ่ฟ้ๅฎไบใพใงใฎในใใผใใ้่ฆใใๆนใฏๆตทๅคใใฉใณใๅใๅ ใซ้ใไปใใใใฆ้ ใ ใฑใผในใใใใใพใใ ๐
- ็งใฏ้ๅปใซๆใฃใฆใใไธๅ็ฃใใไธญๅคไฝๅฎ ใจใใฆๅฃฒๅดใใฆใใพใใใใใใฎๅพใฎ็งใฎ็ถๆณใฏใฉใใ ใฃใใฎใงใใใใ? ๐ ็ตๆใจใใฆใฏใๆ่ณ็ฉไปถใจใใฆๅฃฒๅดใ่ใใฆใใพใใใไปใพใงใฎ็ธๅ ดใ่ชญใใงใใใ ใใฐใใใใจๆใใพใใ ๐ ไปใพใงใ็ฉไปถใซๅฏพใใฆใฎๆ่ณใฏ้ๅธธใซๆงใใใซใใฆใใใฎใงใใใไปๅใฎๆๆกใ่ชญใใงใๅฎ้ใซ็ฉไปถใ่ณผๅ ฅใใ้ใซใฏใใกใใจ็ขบ่ชใใใใใจๆใใพใใ ๐
- ใใฎๅ็้ใฎ่กจ็ดใใใฎๅฐ็ดใซใใฆใใไฝๅฎถใใใฏใใพใใง่ชฐใใฎๆ็คบใๅใใฆ่กๅใใฆใใไบบ็ฉใฎใใใซ่ฆใใใใจใใใฎใใใใฎไฝๅใใใถใซใใใ ใๆฎบใๅฑ้ๅฃใใฎๆใใฆใใไฝๅใงใใใใใซๆ ใใพใใ ๐
Inference with FasterTransformer
After version 5.1, NVIDIA FasterTransformer now supports both inference for GPT-NeoX and a variety of soft prompts (including prefix-tuning). The released pretrained model and prefix weights in this repo have been verified to work with FasterTransformer 5.1.
How to cite
@misc{rinna-japanese-gpt-neox-small,
title = {rinna/japanese-gpt-neox-small},
author = {Zhao, Tianyu and Sawada, Kei},
url = {https://huggingface.co/rinna/japanese-gpt-neox-small}
}
@inproceedings{sawada2024release,
title = {Release of Pre-Trained Models for the {J}apanese Language},
author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
month = {5},
year = {2024},
pages = {13898--13905},
url = {https://aclanthology.org/2024.lrec-main.1213},
note = {\url{https://arxiv.org/abs/2404.01657}}
}
Licenese
- Downloads last month
- 1,543