Model Card for mt-fr-en-tatoeba
This is a fine-tuned version of Helsinki-NLP/opus-mt-fr-en
, trained on the Tatoeba dataset for French-to-English translation.
Model Details
- Base Model:
Helsinki-NLP/opus-mt-fr-en
- Dataset Used:
opus_tatoeba (French-English)
- Fine-tuning Epochs: 3
- Optimizer: AdamW (learning rate: 2e-5)
- Evaluation Metric: BLEU Score
- Pretrained BLEU Score: 57.5 (on Tatoeba)
- Fine-Tuned BLEU Score: 64.43 (on Tatoeba test set, 10% random subset of tatoeba)
Model Description
- Developed by: Mahdi Ihdeme
- Model type: Language model for french to english translation
- Language(s) (NLP): English, French
Usage
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_name = "mihdeme/mt-fr-en-tatoeba"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
def translate(sentence):
inputs = tokenizer(sentence, return_tensors="pt", padding=True, truncation=True)
outputs = model.generate(**inputs)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translate("Bonjour, comment ça va ?"))
Training Configuration
- Batch Size: 16
- Max Sequence Length: 512
- Hardware Used: Google Colab GPU (Tesla T4)
License
Apache 2.0
Acknowledgments
Trained using Hugging Face Transformers. Original dataset from Tatoeba.
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for mihdeme/mt-fr-en-tatoeba
Base model
Helsinki-NLP/opus-mt-fr-en