metadata
datasets:
- oscar
- hieronymusa/MaCoCu-dataset-250k
language:
- cs
- cr
- hr
- pl
- sl
- sk
Slavic T5 Base
Aim of this model is to reach the best results for the Slavic laguages with Latin script.
It is suitable for tasks such as:
- summarization,
- extractive question answering,
- machine translation between slavic languages in Latin script.
The model is trained on the selected parts of OSCAR corpus and MaCoCu corpus.
It supports this languages: Czech, Croatian, Polish , Slovak, Slovenian,
Vocabulary has 120 000 tokens, contains capital letters.