ak2603's picture
Fine tuning mt5-small: final iteration
8e902ed verified
metadata
library_name: transformers
license: apache-2.0
base_model: google/mt5-small
tags:
  - summarization
  - generated_from_trainer
metrics:
  - rouge
model-index:
  - name: mt5-small-mt5-finetuned-final
    results: []

mt5-small-mt5-finetuned-final

This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1778
  • Rouge1: 0.2833
  • Rouge2: 0.1521
  • Rougel: 0.2758
  • Rougelsum: 0.2768

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0056
  • train_batch_size: 12
  • eval_batch_size: 12
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum
5.9607 1.0 100 4.8449 0.1763 0.0684 0.1763 0.1761
4.9088 2.0 200 3.9878 0.3076 0.1348 0.2803 0.2815
2.9924 3.0 300 2.2397 0.2790 0.1378 0.2575 0.2592
2.2734 4.0 400 1.9866 0.2987 0.1629 0.2868 0.2872
1.9431 5.0 500 1.7408 0.2251 0.1380 0.2231 0.2237
2.317 6.0 600 1.9235 0.2421 0.0922 0.2276 0.2282
1.8526 7.0 700 1.6342 0.3120 0.1636 0.2943 0.2944
1.7029 8.0 800 1.6244 0.2469 0.1361 0.2421 0.2427
1.6725 9.0 900 1.5803 0.2637 0.1362 0.2551 0.2560
1.5852 10.0 1000 1.5617 0.2963 0.1634 0.2907 0.2917
1.4625 11.0 1100 1.4049 0.2750 0.1383 0.2570 0.2576
1.3895 12.0 1200 1.4234 0.2969 0.1646 0.2917 0.2927
1.3584 13.0 1300 1.3807 0.3370 0.1601 0.3088 0.3099
1.2759 14.0 1400 1.3524 0.2890 0.1307 0.2654 0.2663
1.222 15.0 1500 1.3110 0.2718 0.1339 0.2566 0.2597
1.1515 16.0 1600 1.2297 0.3314 0.1626 0.3033 0.3038
1.0888 17.0 1700 1.1897 0.3028 0.1358 0.2769 0.2792
1.039 18.0 1800 1.1970 0.2833 0.1521 0.2758 0.2768
0.9907 19.0 1900 1.1790 0.2833 0.1521 0.2758 0.2768
0.9563 20.0 2000 1.1778 0.2833 0.1521 0.2758 0.2768

Framework versions

  • Transformers 4.48.3
  • Pytorch 2.5.1+cu124
  • Datasets 3.3.2
  • Tokenizers 0.21.0