Naela00's picture
Update README.md
06de78a verified
metadata
tags:
  - model
  - checkpoints
  - translation
  - latin
  - english
  - mt5
  - mistral
  - multilingual
  - NLP
language:
  - en
  - la
license: cc-by-4.0
models:
  - mistralai/Mistral-7B-Instruct-v0.3
  - google/mt5-small
model_type: mt5-small
training_epochs: >-
  6 (initial pipeline), 30 (final pipeline with optimizations), 100 (fine-tuning
  on 4750 summaries)
task_categories:
  - translation
  - summarization
  - multilingual-nlp
task_ids:
  - en-la-translation
  - la-en-translation
  - text-generation
pretty_name: mT5-LatinSummarizerModel
storage:
  - git-lfs
  - huggingface-models
size_categories:
  - 5GB<n<10GB

mT5-LatinSummarizerModel: Fine-Tuned Model for Latin NLP

GitHub Repository
Hugging Face Model
Hugging Face Dataset

Overview

This repository contains the trained checkpoints and tokenizer files for the mT5-LatinSummarizerModel, which was fine-tuned to improve Latin summarization and translation. It is designed to:

  • Translate between English and Latin.
  • Summarize Latin texts effectively.
  • Leverage extractive and abstractive summarization techniques.
  • Utilize curriculum learning for improved training.

Installation & Usage

To download and set up the models (mT5-small and Mistral-7B-Instruct), you can directly run:

bash install_large_models.sh

Project Structure

.
β”œβ”€β”€ final_pipeline (Trained for 30 light epochs with optimizations, and then finetuned on 100 on the small HQ summaries dataset)
β”‚   β”œβ”€β”€ no_stanza
β”‚   β”œβ”€β”€ with_stanza
β”œβ”€β”€ initial_pipeline (Trained for 6 epochs without optimizations)
β”‚   β”œβ”€β”€ mt5-small-en-la-translation-epoch5
β”œβ”€β”€ install_large_models.sh
└── README.md

Training Methodology

We fine-tuned mT5-small in three phases:

  1. Initial Training Pipeline (6 epochs): Used the full dataset without optimizations.
  2. Final Training Pipeline (30 light epochs): Used 10% of training data per epoch for efficiency.
  3. Fine-Tuning (100 epochs): Focused on the 4750 high-quality summaries for final optimization.

Training Configurations:

  • Hardware: 16GB VRAM GPU (lab machines via SSH).
  • Batch Size: Adaptive due to GPU memory constraints.
  • Gradient Accumulation: Enabled for larger effective batch sizes.
  • LoRA-based fine-tuning: LoRA Rank 8, Scaling Factor 32.
  • Dynamic Sequence Length Adjustment: Increased progressively.
  • Learning Rate: 5 Γ— 10^-4 with warm-up steps.
  • Checkpointing: Frequent saves to mitigate power outages.

Evaluation & Results

We evaluated the model using ROUGE, BERTScore, and BLEU/chrF scores.

Metric Before Fine-Tuning After Fine-Tuning
ROUGE-1 0.1675 0.2541
ROUGE-2 0.0427 0.0773
ROUGE-L 0.1459 0.2139
BERTScore-F1 0.6573 0.7140
  • chrF Score (enβ†’la): 33.60 (with Stanza tags) vs 18.03 BLEU (without Stanza).
  • Summarization Density: Maintained at ~6%.

Observations:

  • Pre-training on extractive summaries was crucial.
  • The model retained some excessive extraction, indicating room for further improvement.

License

This model is released under CC-BY-4.0.

Citation

@misc{LatinSummarizerModel,
  author = {Axel Delaval, Elsa Lubek},
  title = {Latin-English Summarization Model (mT5)},
  year = {2025},
  url = {https://huggingface.co/LatinNLP/LatinSummarizerModel}
}