File size: 2,270 Bytes
8790c71 500e718 78a86c5 68be9dc 7dee493 6bada3c 65772b6 7dee493 65772b6 b14c801 7dee493 65772b6 7dee493 65772b6 6bada3c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
---
license: openrail
datasets:
- WelfCrozzo/kupalinka
language:
- be
- en
- ru
metrics:
- bleu
library_name: transformers
tags:
- translation
widget:
- text: "<extra_id_1>да зорак праз цяжкасці"
example_title: "be -> ru"
- text: "<extra_id_2>да зорак праз цяжкасці"
example_title: "be -> en"
- text: "<extra_id_3>к звездам через трудности"
example_title: "ru -> be"
- text: "<extra_id_5>к звездам через трудности"
example_title: "ru -> en"
- text: "<extra_id_6>to the stars through difficulties."
example_title: "en -> be"
- text: "<extra_id_7>to the stars through difficulties."
example_title: "en -> ru"
---
# T5 for belarusian language
![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)
This model is based on T5-small with sequence length equal 128 tokens. Model trained from scratch on RTX 3090 24GB.
# Supported tasks:
- translation BE to RU: `<extra_id_1>`
- translation BE to EN: `<extra_id_2>`
- translation RU to BE: `<extra_id_3>`
- translation RU to EN: `<extra_id_5>`
- translation EN to BE: `<extra_id_6>`
- translation EN to RU: `<extra_id_7>`
# Metrics:
- [evel/BLEU](https://api.wandb.ai/links/miklgr500/31mq4s36)
- [evel/loss](https://api.wandb.ai/links/miklgr500/rvi2p69n)
- [train/loss](https://api.wandb.ai/links/miklgr500/z9alu3n5)
# How to Get Started with the Model
<details>
<summary> Click to expand </summary>
```python
from transformers import T5TokenizerFast, T5ForConditionalGeneration
tokenizer = T5TokenizerFast.from_pretrained("WelfCrozzo/T5-L128-belarusian")
model = T5ForConditionalGeneration.from_pretrained("WelfCrozzo/T5-L128-belarusian")
x = tokenizer.encode('<extra_id_1>да зорак праз цяжкасці', return_tensors='pt')
result = model.generate(x, return_dict_in_generate=True, output_scores=True,max_length=128)
print(tokenizer.decode(result["sequences"][0]))
```
</details>
# References
- [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://jmlr.org/papers/volume21/20-074/20-074.pdf) |