japanese-gpt-neox-small

rinna-icon

This repository provides a small-sized Japanese GPT-NeoX model. The model was trained using code based on EleutherAI/gpt-neox.

Update log

  • 2023/03/20 Update the model weight and config files such that it can be loaded via Huggingface's official GPT-NeoX implementation.

How to use the model

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt-neox-small", use_fast=False)
model = GPTNeoXForCausalLM.from_pretrained("rinna/japanese-gpt-neox-small")

Model architecture

A 12-layer, 768-hidden-size transformer-based language model.

Training

The model was trained on Japanese CC-100, Japanese C4, and Japanese Wikipedia to optimize a traditional language modelling objective.

Tokenization

The model uses a sentencepiece-based tokenizer.

A toy prefix-tuning weight file

Along with pretrained model, we also release a prefix-tuning weight file named smileface_suffix.task0.weight for demonstration. The toy prefix-tuning weights here is trained to encourage the model to end every generated sentence with a smiling face emoji ๐Ÿ˜ƒ. Find the training/inference code for prefix-tuning at our Github repo prefix-tuning-gpt.

Here are a few samples generated with and without the toy prefix weights, respectively.

3 samples without the prefix weights

  1. ใ€Œใใฃใจใใ‚Œใฏ็ตถๅฏพ้–“้•ใฃใฆใชใ„ใญใ€‚ ใ‚ใŸใ—ใซใฏ5ใ‹ๅ›ฝ่ชžใซ4ใคใฎๅค–ๅ›ฝ่ชžใฎๆ„ๅ‘ณใชใ‚“ใฆใ‚ใ‹ใ‚‰ใชใ„ใ€‚ ใงใ‚‚ใ€ใจใ‚Šใ‚ใˆใšใ“ใฎ็ฐกๅ˜ใช่‹ฑๆ–‡ใŒใฉใ‚“ใชๆ„ๅ‘ณใ‚’ๆŒใคใฎใ‹็Ÿฅใ‚ŠใŸใ„ใ‚ˆใญ!ใ€
  2. 25ๅˆ†้ ƒใซๅ…ฌๅœ’ใซ็€ใ„ใฆใ€ใƒ™ใƒณใƒใซๅบงใฃใฆๅพ…ใฃใฆใ„ใ‚‹ใจใ€ใพใŸใ—ใฆใ‚‚Sๅ…ˆ็”Ÿใ‹ใ‚‰้€ฃ็ตกใŒๅ…ฅใ‚Šใพใ—ใŸใ€‚ ็ขบใ‹ใ€ๅˆๅพŒใฎ็คผๆ‹ใฎๆ™‚ใซ่‡ชๅˆ†ใฎๆŒใฃใฆใใŸใŠๅผๅฝ“ใ‚’้ฃŸในใŸ่จ˜ๆ†ถใŒ้ฎฎๆ˜Žใซๆฎ‹ใฃใฆใ„ใพใ™ใ€‚ ๅพŒใงใ‚คใƒณใ‚ฟใƒผใƒใƒƒใƒˆใงๆคœ็ดขใ—ใŸใ‚‰ใ€Sๅ…ˆ็”Ÿใฎใƒ–ใƒญใ‚ฐใซ้ฃ›ใณใพใ—ใŸใ€‚ ไปŠๆ—ฅใฎๆ™ฉใ”ใฏใ‚“ใฏ็„ผใใƒŠใ‚นใ‚’ไฝœใฃใฆใฟใพใ—ใŸ! * ไธŠใฎๅ†™็œŸใฏๆ˜จๆ—ฅใฎๆœ็„ผใ‘ใงใ™ใ€‚
  3. CTใงๆญฏๅฝขใŒใงใใฆใ€ใใฎๅพŒใ•ใ‚‰ใซใใฎๆญฏๅฝขใŒๅ†ใณๅ™›ใ‚ใ‚‹ใ‚ˆใ†ใซใชใ‚‹ใฎใฏใ€ไฝ•ใŒๅŽŸๅ› ใ ใ‚ใ†? ่™ซๆญฏใซใชใฃใŸๅŽŸๅ› ใ‚‚ใ€ๅฃ่‡ญใ‹ใช? ใใ‚Œใจใ‚‚ๆญฏๅ‘จ็—…ใ‹ใช? ๆญฏ็ŸณใŒใจใ‚Œใ‚‹ใพใงใ€ใ€ใ€ใ‚‚ใ†ใกใ‚‡ใฃใจใ‹ใ‹ใ‚Šใใ†ใ€‚ ๅญไพ›ใฎ่™ซๆญฏใฃใฆใ€ใชใ‹ใชใ‹ๆฒปใ‚‰ใชใ„ใงใ™ใ‚ˆใญใ€‚่ฆชๅ…„ๅผŸใงไฝ•ๅบฆใ‹ใ€‚ ๅญไพ›ใฎๆญฏๆ นใฏใ€่ฆชใฎใ‚‚ใฎใซใชใ‚Šใพใ™ใ€‚ ใใ—ใฆ่‡ชๅˆ†ใฎใ‚‚ใฎใ ใฃใŸใ‚Šใ€็Ÿฅใ‚‰ใชใ„้–“ใซๆŠœใ„ใŸใ‚Šใ—ใ€็”ŸใˆใฆใใŸใ‚Šใ‚‚ใ—ใพใ™ใ€‚ ๅคงไบบใซใชใฃใฆ่ฆชใ‹ใ‚‰ใฟใŸๅ ดๅˆใฏใ€็™ฝใ„ๆญฏใซๅค‰ใ‚ใฃใฆใใฆใ€้‡‘ๅฑžใฎใ‚ˆใ†ใƒผใงใ‚‚ๆ‚ชใใชใใ€่ฆชใ‹ใ‚‰ใฎใ‚€ใ—ๆญฏใฎๅฟƒ้…ใฏใชใ„ใงใ™ใ‚ˆใญใ€‚

3 samples with the prefix weights:

  1. โ€ปๆตทๅค–ใƒ–ใƒฉใƒณใƒ‰ๅ“ใฎๅ ดๅˆใฏใ€่ฟ”ๅ“ใƒป่ฟ”้‡‘็ญ‰ใฏใŠๅ—ใ‘่‡ดใ—ใ‹ใญใพใ™ใฎใงไบˆใ‚ใ”ไบ†ๆ‰ฟ้ก˜ใ„ใพใ™ใ€‚ โ€ป ๅ•†ๅ“็™บ้€ๅพŒใ€ใŠๅฎขๆง˜ใธๅ•†ๅ“่ฟ”้€ๅฎŒไบ†ใพใงใฎใ‚นใƒ”ใƒผใƒ‰ใ‚’้‡่ฆ–ใ™ใ‚‹ๆ–นใฏๆตทๅค–ใƒ–ใƒฉใƒณใƒ‰ๅ“ใ‚’ๅ…ˆใซ้€ใ‚Šไป˜ใ‘ใ•ใ›ใฆ้ ‚ใ ใ‚ฑใƒผใ‚นใŒใ”ใ–ใ„ใพใ™ใ€‚ ๐Ÿ˜ƒ
  2. ็งใฏ้ŽๅŽปใซๆŒใฃใฆใ„ใŸไธๅ‹•็”ฃใ‚’ใ€ไธญๅคไฝๅฎ…ใจใ—ใฆๅฃฒๅดใ—ใฆใ„ใพใ—ใŸใŒใ€ใใฎๅพŒใฎ็งใฎ็Šถๆณใฏใฉใ†ใ ใฃใŸใฎใงใ—ใ‚‡ใ†ใ‹? ๐Ÿ˜ƒ ็ตๆžœใจใ—ใฆใฏใ€ๆŠ•่ณ‡็‰ฉไปถใจใ—ใฆๅฃฒๅดใ‚’่€ƒใˆใฆใ„ใพใ™ใŒใ€ไปŠใพใงใฎ็›ธๅ ดใ‚‚่ชญใ‚“ใงใ„ใŸใ ใ‘ใฐใ‚ใ‹ใ‚‹ใจๆ€ใ„ใพใ™ใ€‚ ๐Ÿ˜ƒ ไปŠใพใงใ€็‰ฉไปถใซๅฏพใ—ใฆใฎๆŠ•่ณ‡ใฏ้žๅธธใซๆŽงใˆใ‚ใซใ—ใฆใใŸใฎใงใ™ใŒใ€ไปŠๅ›žใฎๆๆกˆใ‚’่ชญใ‚“ใงใ€ๅฎŸ้š›ใซ็‰ฉไปถใ‚’่ณผๅ…ฅใ™ใ‚‹้š›ใซใฏใใกใ‚“ใจ็ขบ่ชใ‚’ใ—ใ‚ˆใ†ใจๆ€ใ„ใพใ™ใ€‚ ๐Ÿ˜ƒ
  3. ใ“ใฎๅ†™็œŸ้›†ใฎ่กจ็ด™ใ‚’ใ“ใฎๅฐ็ด™ใซใ—ใฆใ„ใ‚‹ไฝœๅฎถใ•ใ‚“ใฏใ€ใพใ‚‹ใง่ชฐใ‹ใฎๆŒ‡็คบใ‚’ๅ—ใ‘ใฆ่กŒๅ‹•ใ—ใฆใ„ใ‚‹ไบบ็‰ฉใฎใ‚ˆใ†ใซ่ฆ‹ใˆใ‚‹ใ€ใจใ„ใ†ใฎใŒใ€ใ“ใฎไฝœๅ“ใ‚’ใ‚„ใถใซใ‚‰ใ‚“ใ ใ€Œๆฎบใ—ๅฑ‹้›†ๅ›ฃใ€ใฎๆใ„ใฆใ„ใ‚‹ไฝœๅ“ใงใ‚ใ‚‹ใ‚ˆใ†ใซๆ€ ใ„ใพใ™ใ€‚ ๐Ÿ˜ƒ

Inference with FasterTransformer

After version 5.1, NVIDIA FasterTransformer now supports both inference for GPT-NeoX and a variety of soft prompts (including prefix-tuning). The released pretrained model and prefix weights in this repo have been verified to work with FasterTransformer 5.1.

How to cite

@misc{rinna-japanese-gpt-neox-small,
    title = {rinna/japanese-gpt-neox-small},
    author = {Zhao, Tianyu and Sawada, Kei},
    url = {https://huggingface.co/rinna/japanese-gpt-neox-small}
}

@inproceedings{sawada2024release,
    title = {Release of Pre-Trained Models for the {J}apanese Language},
    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    month = {5},
    year = {2024},
    pages = {13898--13905},
    url = {https://aclanthology.org/2024.lrec-main.1213},
    note = {\url{https://arxiv.org/abs/2404.01657}}
}

Licenese

The MIT license

Downloads last month
1,543
Safetensors
Model size
204M params
Tensor type
F32
ยท
U8
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Model tree for rinna/japanese-gpt-neox-small

Finetunes
1 model

Datasets used to train rinna/japanese-gpt-neox-small

Spaces using rinna/japanese-gpt-neox-small 6

Collection including rinna/japanese-gpt-neox-small