metadata

tags:
  - model
  - checkpoints
  - translation
  - latin
  - english
  - mt5
  - mistral
  - multilingual
  - NLP
language:
  - en
  - la
license: cc-by-4.0
models:
  - mistralai/Mistral-7B-Instruct-v0.3
  - google/mt5-small
model_type: mt5-small
training_epochs: >-
  6 (initial pipeline), 30 (final pipeline with optimizations), 100 (fine-tuning
  on 4750 summaries)
task_categories:
  - translation
  - summarization
  - multilingual-nlp
task_ids:
  - en-la-translation
  - la-en-translation
  - text-generation
pretty_name: mT5-LatinSummarizerModel
storage:
  - git-lfs
  - huggingface-models
size_categories:
  - 5GB<n<10GB

mT5-LatinSummarizerModel: Fine-Tuned Model for Latin NLP

Overview

This repository contains the trained checkpoints and tokenizer files for the mT5-LatinSummarizerModel, which was fine-tuned to improve Latin summarization and translation. It is designed to:

Translate between English and Latin.
Summarize Latin texts effectively.
Leverage extractive and abstractive summarization techniques.
Utilize curriculum learning for improved training.

Installation & Usage

To download and set up the models (mT5-small and Mistral-7B-Instruct), you can directly run:

bash install_large_models.sh

Project Structure

.
├── final_pipeline (Trained for 30 light epochs with optimizations, and then finetuned on 100 on the small HQ summaries dataset)
│   ├── no_stanza
│   ├── with_stanza
├── initial_pipeline (Trained for 6 epochs without optimizations)
│   ├── mt5-small-en-la-translation-epoch5
├── install_large_models.sh
└── README.md

Training Methodology

We fine-tuned mT5-small in three phases:

Initial Training Pipeline (6 epochs): Used the full dataset without optimizations.
Final Training Pipeline (30 light epochs): Used 10% of training data per epoch for efficiency.
Fine-Tuning (100 epochs): Focused on the 4750 high-quality summaries for final optimization.

Training Configurations:

Hardware: 16GB VRAM GPU (lab machines via SSH).
Batch Size: Adaptive due to GPU memory constraints.
Gradient Accumulation: Enabled for larger effective batch sizes.
LoRA-based fine-tuning: LoRA Rank 8, Scaling Factor 32.
Dynamic Sequence Length Adjustment: Increased progressively.
Learning Rate: 5 × 10^-4 with warm-up steps.
Checkpointing: Frequent saves to mitigate power outages.

Evaluation & Results

We evaluated the model using ROUGE, BERTScore, and BLEU/chrF scores.

Metric	Before Fine-Tuning	After Fine-Tuning
ROUGE-1	0.1675	0.2541
ROUGE-2	0.0427	0.0773
ROUGE-L	0.1459	0.2139
BERTScore-F1	0.6573	0.7140

chrF Score (en→la): 33.60 (with Stanza tags) vs 18.03 BLEU (without Stanza).
Summarization Density: Maintained at ~6%.

Observations:

Pre-training on extractive summaries was crucial.
The model retained some excessive extraction, indicating room for further improvement.

License

This model is released under CC-BY-4.0.

Citation

@misc{LatinSummarizerModel,
  author = {Axel Delaval, Elsa Lubek},
  title = {Latin-English Summarization Model (mT5)},
  year = {2025},
  url = {https://huggingface.co/LatinNLP/LatinSummarizerModel}
}