metadata
language:
- es
- qu
tags:
- quechua
- translation
- spanish
license: apache-2.0
metrics:
- bleu
- sacrebleu
widget:
- text: Dios ama a los hombres
- text: A pesar de todo, soy feliz
- text: ¿Qué harán allí?
- text: Debes aprender a respetar
t5-small-finetuned-spanish-to-quechua
This model is a finetuned version of the t5-small.
Model description
t5-small-finetuned-spanish-to-quechua has trained for 46 epochs with 102 747 sentences, the validation was performed with 12 844 sentences and 12 843 sentences were used for the test.
Intended uses & limitations
A large part of the dataset has been extracted from biblical texts, which makes the model perform better with certain types of sentences.
How to use
You can import this model as follows:
>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
>>> model_name = 'hackathon-pln-es/t5-small-finetuned-spanish-to-quechua'
>>> model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
To translate you can do:
>>> sentence = "Entonces dijo"
>>> input = tokenizer(text, return_tensors="pt")
>>> output = model.generate(input["input_ids"], max_length=40, num_beams=4, early_stopping=True)
>>> print('Original Sentence: {} \nTranslated sentence: {}'.format(sentence, tokenizer.decode(output[0])))
Limitations and bias
Actually this model only can translate to Quechua of Ayacucho.
Training data
For train this model we use Spanish to Quechua dataset
Evaluation results
We obtained the following metrics during the training process:
- eval_bleu = 2.9691
- eval_loss = 1.2064628601074219