nlp-waseda
/

gpt2-xl-japanese

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

schnell commited on Dec 16, 2022

Commit

b7a24db

•

1 Parent(s): efbbd44

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -50,7 +50,7 @@ output = model(**encoded_input)
 ### Preprocessing
-The texts are normalized using [neologdn](https://github.com/ikegami-yukino/neologdn), segmented into words using Juman++, and tokenized using BPE. Juman++ 2.0.0-rc3 was used for pretraining.
 The model was trained on 8 NVIDIA A100 GPUs.

 ### Preprocessing
+The texts are normalized using [neologdn](https://github.com/ikegami-yukino/neologdn), segmented into words using [Juman++](https://github.com/ku-nlp/jumanpp), and tokenized by [BPE](https://huggingface.co/docs/tokenizers/api/models#tokenizers.models.BPE). Juman++ 2.0.0-rc3 was used for pretraining.
 The model was trained on 8 NVIDIA A100 GPUs.