File size: 360 Bytes
46d0631 fc8b6a0 46d0631 403fa50 fc8b6a0 403fa50 fc8b6a0 403fa50 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
---
language:
- kbd
- ru
- multilingual
license: unknown
tags:
- circassian
- kabardian
datasets:
- anzorq/kbd_lat-835k_ru-3M
---
t5-v1_1-small pretrained with mlm task on
� kbd (custom latin script) 835K lines: a pile of scraped text from news sites, books etc.
� ru 3M lines: wiki corpus from OPUS
tokenizer: sentencepiece unigram, 8K, shared vocabulary |