Update README.md
Browse files
README.md
CHANGED
@@ -33,14 +33,13 @@ You can use this model directly with a pipeline for text generation.
|
|
33 |
{'generated_text': '昨日私は京都ではありませんが、自分の住んでる事について色々と'},
|
34 |
{'generated_text': '昨日私は京都では地図を見ることしかしない、京福電車のホームで'},
|
35 |
{'generated_text': '昨日私は京都でこみちに住み始めた時からある不思議な現象で、そ'}]
|
36 |
-
...
|
37 |
```
|
38 |
|
39 |
You can also use this model to get the features of a given text.
|
40 |
|
41 |
## Vocabulary
|
42 |
|
43 |
-
A character-level vocabulary of size 6K is used. To be precise, rare characters may be split into bytes because byte-level byte-pair encoding (BPE) is used. The BPE tokenizer was trained on a small subset of the training data. Since the data were converted into a one-character-per-line format, merge operations never
|
44 |
|
45 |
## Training data
|
46 |
|
|
|
33 |
{'generated_text': '昨日私は京都ではありませんが、自分の住んでる事について色々と'},
|
34 |
{'generated_text': '昨日私は京都では地図を見ることしかしない、京福電車のホームで'},
|
35 |
{'generated_text': '昨日私は京都でこみちに住み始めた時からある不思議な現象で、そ'}]
|
|
|
36 |
```
|
37 |
|
38 |
You can also use this model to get the features of a given text.
|
39 |
|
40 |
## Vocabulary
|
41 |
|
42 |
+
A character-level vocabulary of size 6K is used. To be precise, rare characters may be split into bytes because byte-level byte-pair encoding (BPE) is used. The BPE tokenizer was trained on a small subset of the training data. Since the data were converted into a one-character-per-line format, merge operations never go beyond character boundaries.
|
43 |
|
44 |
## Training data
|
45 |
|