Update README.md
Browse files
README.md
CHANGED
@@ -50,7 +50,7 @@ output = model(**encoded_input)
|
|
50 |
|
51 |
### Preprocessing
|
52 |
|
53 |
-
The texts are normalized using [neologdn](https://github.com/ikegami-yukino/neologdn), segmented into words using Juman
|
54 |
|
55 |
The model was trained on 8 NVIDIA A100 GPUs.
|
56 |
|
|
|
50 |
|
51 |
### Preprocessing
|
52 |
|
53 |
+
The texts are normalized using [neologdn](https://github.com/ikegami-yukino/neologdn), segmented into words using [Juman++](https://github.com/ku-nlp/jumanpp), and tokenized by [BPE](https://huggingface.co/docs/tokenizers/api/models#tokenizers.models.BPE). Juman++ 2.0.0-rc3 was used for pretraining.
|
54 |
|
55 |
The model was trained on 8 NVIDIA A100 GPUs.
|
56 |
|