mT5-LatinSummarizerModel: Fine-Tuned Model for Latin NLP
Overview
This repository contains the trained checkpoints and tokenizer files for the mT5-LatinSummarizerModel
, which was fine-tuned to improve Latin summarization and translation. It is designed to:
- Translate between English and Latin.
- Summarize Latin texts effectively.
- Leverage extractive and abstractive summarization techniques.
- Utilize curriculum learning for improved training.
Installation & Usage
To download and set up the models (mT5-small and Mistral-7B-Instruct), you can directly run:
bash install_large_models.sh
Project Structure
.
βββ final_pipeline (Trained for 30 light epochs with optimizations, and then finetuned on 100 on the small HQ summaries dataset)
β βββ no_stanza
β βββ with_stanza
βββ initial_pipeline (Trained for 6 epochs without optimizations)
β βββ mt5-small-en-la-translation-epoch5
βββ install_large_models.sh
βββ README.md
Training Methodology
We fine-tuned mT5-small in three phases:
- Initial Training Pipeline (6 epochs): Used the full dataset without optimizations.
- Final Training Pipeline (30 light epochs): Used 10% of training data per epoch for efficiency.
- Fine-Tuning (100 epochs): Focused on the 4750 high-quality summaries for final optimization.
Training Configurations:
- Hardware: 16GB VRAM GPU (lab machines via SSH).
- Batch Size: Adaptive due to GPU memory constraints.
- Gradient Accumulation: Enabled for larger effective batch sizes.
- LoRA-based fine-tuning: LoRA Rank 8, Scaling Factor 32.
- Dynamic Sequence Length Adjustment: Increased progressively.
- Learning Rate:
5 Γ 10^-4
with warm-up steps. - Checkpointing: Frequent saves to mitigate power outages.
Evaluation & Results
We evaluated the model using ROUGE, BERTScore, and BLEU/chrF scores.
Metric | Before Fine-Tuning | After Fine-Tuning |
---|---|---|
ROUGE-1 | 0.1675 | 0.2541 |
ROUGE-2 | 0.0427 | 0.0773 |
ROUGE-L | 0.1459 | 0.2139 |
BERTScore-F1 | 0.6573 | 0.7140 |
- chrF Score (enβla): 33.60 (with Stanza tags) vs 18.03 BLEU (without Stanza).
- Summarization Density: Maintained at ~6%.
Observations:
- Pre-training on extractive summaries was crucial.
- The model retained some excessive extraction, indicating room for further improvement.
License
This model is released under CC-BY-4.0.
Citation
@misc{LatinSummarizerModel,
author = {Axel Delaval, Elsa Lubek},
title = {Latin-English Summarization Model (mT5)},
year = {2025},
url = {https://huggingface.co/LatinNLP/LatinSummarizerModel}
}
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.