ParitKansal's picture
Update README.md
311306e verified
metadata
library_name: transformers
license: apache-2.0
base_model: Helsinki-NLP/opus-mt-en-fr
tags:
  - translation
  - generated_from_trainer
datasets:
  - kde4
metrics:
  - bleu
model-index:
  - name: marian-finetuned-kde4-en-to-fr
    results:
      - task:
          name: Sequence-to-sequence Language Modeling
          type: text2text-generation
        dataset:
          name: kde4
          type: kde4
          config: en-fr
          split: train
          args: en-fr
        metrics:
          - name: Bleu
            type: bleu
            value: 50.54449537679619

Marian Fine-Tuned KDE4 (English-to-French)

This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-fr using the KDE4 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9620
  • BLEU: 50.5445

Model Description

This English-to-French translation model has been fine-tuned specifically on the KDE4 dataset. The base model, Helsinki-NLP/opus-mt-en-fr, is part of the MarianMT family, renowned for its efficiency and high-quality neural machine translation capabilities.


Intended Uses & Limitations

Intended Uses

  • Translating English text into French.
  • High-quality translations in the context of software localization, especially related to KDE4.

Limitations

  • Performance may decline on texts outside the KDE4 domain.
  • Struggles with idiomatic expressions, specialized technical jargon, or ambiguous content.

Training & Evaluation Data

The model was fine-tuned on the KDE4 dataset, a specialized resource for machine translation in software localization. The evaluation metrics reflect the model's performance on this domain-specific data.


Training Procedure

Hyperparameters

  • Learning Rate: 2e-05
  • Train Batch Size: 32
  • Eval Batch Size: 64
  • Seed: 42
  • Optimizer: AdamW with betas=(0.9, 0.999), epsilon=1e-08
  • LR Scheduler: Linear
  • Epochs: 1
  • Mixed Precision Training: Native AMP

Results

  • Loss: 0.9620
  • BLEU: 50.5445

Training Loss Progression

Step Training Loss
500 1.2253
1000 1.2165
1500 1.1913
2000 1.1404
2500 1.1178
3000 1.0900
3500 1.0594
4000 1.0512
4500 1.0633
5000 1.0405
5500 1.0316

Framework Versions

  • Transformers: 4.47.1
  • PyTorch: 2.5.1+cu121
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Example Usage

from transformers import pipeline

# Load the model
model_checkpoint = "ParitKansal/marian-finetuned-kde4-en-to-fr"
translator = pipeline("translation", model=model_checkpoint)

# Translate text
translation = translator("Default to expanded threads")
print(translation)

This script demonstrates how to use the model for English-to-French translation tasks.