lbourdois's picture
Add multilingual to the language tag
ccc63f4
|
raw
history blame
1.91 kB
metadata
language:
  - es
  - qu
  - multilingual
license: apache-2.0
tags:
  - quechua
  - translation
  - spanish
metrics:
  - bleu
  - sacrebleu
widget:
  - text: Dios ama a los hombres
  - text: A pesar de todo, soy feliz
  - text: �Qu� har�n all�?
  - text: Debes aprender a respetar

Spanish to Quechua translator

This model is a finetuned version of the t5-small.

Model description

t5-small-finetuned-spanish-to-quechua has trained for 46 epochs with 102 747 sentences, the validation was performed with 12 844 sentences and 12 843 sentences were used for the test.

Intended uses & limitations

A large part of the dataset has been extracted from biblical texts, which makes the model perform better with certain types of sentences.

How to use

You can import this model as follows:

>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
>>> model_name = 'hackathon-pln-es/t5-small-finetuned-spanish-to-quechua'
>>> model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)

To translate you can do:

>>> sentence = "Entonces dijo"
>>> input = tokenizer(sentence, return_tensors="pt")
>>> output = model.generate(input["input_ids"], max_length=40, num_beams=4, early_stopping=True)
>>> print('Original Sentence: {} \nTranslated sentence: {}'.format(sentence, tokenizer.decode(output[0])))

Limitations and bias

Actually this model only can translate to Quechua of Ayacucho.

Training data

For train this model we use Spanish to Quechua dataset

Evaluation results

We obtained the following metrics during the training process:

  • eval_bleu = 2.9691
  • eval_loss = 1.2064628601074219

Team members