language: | |
- kbd | |
- ru | |
- multilingual | |
license: unknown | |
tags: | |
- circassian | |
- kabardian | |
datasets: | |
- anzorq/kbd_lat-835k_ru-3M | |
t5-v1_1-small pretrained with mlm task on | |
� kbd (custom latin script) 835K lines: a pile of scraped text from news sites, books etc. | |
� ru 3M lines: wiki corpus from OPUS | |
tokenizer: sentencepiece unigram, 8K, shared vocabulary |