ku-nlp
/

gpt2-small-japanese-char

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

murawaki commited on Apr 21, 2023

Commit

29856d6

•

1 Parent(s): e029f14

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -33,14 +33,13 @@ You can use this model directly with a pipeline for text generation.
  {'generated_text': '昨日私は京都ではありませんが、自分の住んでる事について色々と'},
  {'generated_text': '昨日私は京都では地図を見ることしかしない、京福電車のホームで'},
  {'generated_text': '昨日私は京都でこみちに住み始めた時からある不思議な現象で、そ'}]
-...
 ```
 You can also use this model to get the features of a given text.
 ## Vocabulary
-A character-level vocabulary of size 6K is used. To be precise, rare characters may be split into bytes because byte-level byte-pair encoding (BPE) is used. The BPE tokenizer was trained on a small subset of the training data. Since the data were converted into a one-character-per-line format, merge operations never transgressed character boundaries.
 ## Training data

  {'generated_text': '昨日私は京都ではありませんが、自分の住んでる事について色々と'},
  {'generated_text': '昨日私は京都では地図を見ることしかしない、京福電車のホームで'},
  {'generated_text': '昨日私は京都でこみちに住み始めた時からある不思議な現象で、そ'}]
 ```
 You can also use this model to get the features of a given text.
 ## Vocabulary
+A character-level vocabulary of size 6K is used. To be precise, rare characters may be split into bytes because byte-level byte-pair encoding (BPE) is used. The BPE tokenizer was trained on a small subset of the training data. Since the data were converted into a one-character-per-line format, merge operations never go beyond character boundaries.
 ## Training data