File size: 571 Bytes
f4c3101 a6f2fe5 283a5c9 a6f2fe5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
---
datasets:
- oscar
- hieronymusa/MaCoCu-dataset-250k
language:
- cs
- cr
- hr
- pl
- sl
- sk
---
# Slavic T5 Base
Aim of this model is to reach the best results for the Slavic laguages with Latin script.
It is suitable for tasks such as:
- summarization,
- extractive question answering,
- machine translation between slavic languages in Latin script.
The model is trained on the selected parts of OSCAR corpus and MaCoCu corpus.
It supports this languages: Czech, Croatian, Polish , Slovak, Slovenian,
Vocabulary has 120 000 tokens, contains capital letters. |