File size: 571 Bytes
f4c3101
 
 
 
 
 
 
 
 
 
 
a6f2fe5
 
 
 
 
 
 
283a5c9
 
 
 
 
 
a6f2fe5
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
datasets:
- oscar
- hieronymusa/MaCoCu-dataset-250k
language:
- cs
- cr
- hr
- pl
- sl
- sk
---


# Slavic T5 Base

Aim of this model is to reach the best results for the Slavic laguages with Latin script.

It is suitable for tasks such as:

- summarization,
- extractive question answering,
- machine translation between slavic languages in Latin script.

The model is trained on the selected parts of OSCAR corpus and MaCoCu corpus.

It supports this languages: Czech, Croatian, Polish , Slovak, Slovenian, 

Vocabulary has 120 000 tokens, contains capital letters.