Model Card for mt-fr-en-tatoeba

This is a fine-tuned version of Helsinki-NLP/opus-mt-fr-en, trained on the Tatoeba dataset for French-to-English translation.

Model Details

  • Base Model: Helsinki-NLP/opus-mt-fr-en
  • Dataset Used: opus_tatoeba (French-English)
  • Fine-tuning Epochs: 3
  • Optimizer: AdamW (learning rate: 2e-5)
  • Evaluation Metric: BLEU Score
  • Pretrained BLEU Score: 57.5 (on Tatoeba)
  • Fine-Tuned BLEU Score: 64.43 (on Tatoeba test set, 10% random subset of tatoeba)

Model Description

  • Developed by: Mahdi Ihdeme
  • Model type: Language model for french to english translation
  • Language(s) (NLP): English, French

Usage

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "mihdeme/mt-fr-en-tatoeba"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def translate(sentence):
    inputs = tokenizer(sentence, return_tensors="pt", padding=True, truncation=True)
    outputs = model.generate(**inputs)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

print(translate("Bonjour, comment ça va ?"))

Training Configuration

  • Batch Size: 16
  • Max Sequence Length: 512
  • Hardware Used: Google Colab GPU (Tesla T4)

License

Apache 2.0

Acknowledgments

Trained using Hugging Face Transformers. Original dataset from Tatoeba.

Downloads last month
11
Safetensors
Model size
74.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mihdeme/mt-fr-en-tatoeba

Finetuned
(9)
this model

Dataset used to train mihdeme/mt-fr-en-tatoeba