|
--- |
|
language: |
|
- es |
|
- qu |
|
|
|
tags: |
|
- quechua |
|
- translation |
|
- spanish |
|
|
|
license: apache-2.0 |
|
|
|
metrics: |
|
- bleu |
|
- sacrebleu |
|
|
|
widget: |
|
- text: "Dios ama a los hombres" |
|
- text: "A pesar de todo, soy feliz" |
|
- text: "¿Qué harán allí?" |
|
- text: "Debes aprender a respetar" |
|
|
|
--- |
|
|
|
# Spanish to Quechua translator |
|
|
|
This model is a finetuned version of the [t5-small](https://huggingface.co/t5-small). |
|
|
|
## Model description |
|
|
|
t5-small-finetuned-spanish-to-quechua has trained for 46 epochs with 102 747 sentences, the validation was performed with 12 844 sentences and 12 843 sentences were used for the test. |
|
|
|
## Intended uses & limitations |
|
|
|
A large part of the dataset has been extracted from biblical texts, which makes the model perform better with certain types of sentences. |
|
|
|
### How to use |
|
|
|
You can import this model as follows: |
|
|
|
```python |
|
>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer |
|
>>> model_name = 'hackathon-pln-es/t5-small-finetuned-spanish-to-quechua' |
|
>>> model = AutoModelForSeq2SeqLM.from_pretrained(model_name) |
|
>>> tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
``` |
|
|
|
To translate you can do: |
|
|
|
```python |
|
>>> sentence = "Entonces dijo" |
|
>>> input = tokenizer(sentence, return_tensors="pt") |
|
>>> output = model.generate(input["input_ids"], max_length=40, num_beams=4, early_stopping=True) |
|
>>> print('Original Sentence: {} \nTranslated sentence: {}'.format(sentence, tokenizer.decode(output[0]))) |
|
``` |
|
|
|
### Limitations and bias |
|
|
|
Actually this model only can translate to Quechua of Ayacucho. |
|
|
|
## Training data |
|
|
|
For train this model we use [Spanish to Quechua dataset](https://huggingface.co/datasets/hackathon-pln-es/spanish-to-quechua) |
|
|
|
## Evaluation results |
|
|
|
We obtained the following metrics during the training process: |
|
|
|
- eval_bleu = 2.9691 |
|
- eval_loss = 1.2064628601074219 |
|
|
|
## Team |
|
|
|
- [Sara Benel](https://huggingface.co/sbenel) |
|
- [Jose Vílchez](https://huggingface.co/JCarlos) |
|
|