Update README.md
Browse files
README.md
CHANGED
@@ -13,9 +13,9 @@ datasets:
|
|
13 |
|
14 |
t5-v1_1-small pretrained with mlm task on
|
15 |
|
16 |
-
|
17 |
|
18 |
-
|
19 |
|
20 |
|
21 |
tokenizer: sentencepiece unigram, 8K, shared vocabulary
|
|
|
13 |
|
14 |
t5-v1_1-small pretrained with mlm task on
|
15 |
|
16 |
+
• kbd (custom latin script) 835K lines: a pile of scraped text from news sites, books etc.
|
17 |
|
18 |
+
• ru 3M lines: wiki corpus from OPUS
|
19 |
|
20 |
|
21 |
tokenizer: sentencepiece unigram, 8K, shared vocabulary
|