mT5-LatinSummarizerModel: Fine-Tuned Model for Latin NLP

GitHub Repository
Hugging Face Model
Hugging Face Dataset

Overview

This repository contains the trained checkpoints and tokenizer files for the mT5-LatinSummarizerModel, which was fine-tuned to improve Latin summarization and translation. It is designed to:

  • Translate between English and Latin.
  • Summarize Latin texts effectively.
  • Leverage extractive and abstractive summarization techniques.
  • Utilize curriculum learning for improved training.

Installation & Usage

To download and set up the models (mT5-small and Mistral-7B-Instruct), you can directly run:

bash install_large_models.sh

Project Structure

.
β”œβ”€β”€ final_pipeline (Trained for 30 light epochs with optimizations, and then finetuned on 100 on the small HQ summaries dataset)
β”‚   β”œβ”€β”€ no_stanza
β”‚   β”œβ”€β”€ with_stanza
β”œβ”€β”€ initial_pipeline (Trained for 6 epochs without optimizations)
β”‚   β”œβ”€β”€ mt5-small-en-la-translation-epoch5
β”œβ”€β”€ install_large_models.sh
└── README.md

Training Methodology

We fine-tuned mT5-small in three phases:

  1. Initial Training Pipeline (6 epochs): Used the full dataset without optimizations.
  2. Final Training Pipeline (30 light epochs): Used 10% of training data per epoch for efficiency.
  3. Fine-Tuning (100 epochs): Focused on the 4750 high-quality summaries for final optimization.

Training Configurations:

  • Hardware: 16GB VRAM GPU (lab machines via SSH).
  • Batch Size: Adaptive due to GPU memory constraints.
  • Gradient Accumulation: Enabled for larger effective batch sizes.
  • LoRA-based fine-tuning: LoRA Rank 8, Scaling Factor 32.
  • Dynamic Sequence Length Adjustment: Increased progressively.
  • Learning Rate: 5 Γ— 10^-4 with warm-up steps.
  • Checkpointing: Frequent saves to mitigate power outages.

Evaluation & Results

We evaluated the model using ROUGE, BERTScore, and BLEU/chrF scores.

Metric Before Fine-Tuning After Fine-Tuning
ROUGE-1 0.1675 0.2541
ROUGE-2 0.0427 0.0773
ROUGE-L 0.1459 0.2139
BERTScore-F1 0.6573 0.7140
  • chrF Score (enβ†’la): 33.60 (with Stanza tags) vs 18.03 BLEU (without Stanza).
  • Summarization Density: Maintained at ~6%.

Observations:

  • Pre-training on extractive summaries was crucial.
  • The model retained some excessive extraction, indicating room for further improvement.

License

This model is released under CC-BY-4.0.

Citation

@misc{LatinSummarizerModel,
  author = {Axel Delaval, Elsa Lubek},
  title = {Latin-English Summarization Model (mT5)},
  year = {2025},
  url = {https://huggingface.co/LatinNLP/LatinSummarizerModel}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.