mt5-small-mt5-finetuned-final

This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1778
  • Rouge1: 0.2833
  • Rouge2: 0.1521
  • Rougel: 0.2758
  • Rougelsum: 0.2768

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0056
  • train_batch_size: 12
  • eval_batch_size: 12
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum
5.9607 1.0 100 4.8449 0.1763 0.0684 0.1763 0.1761
4.9088 2.0 200 3.9878 0.3076 0.1348 0.2803 0.2815
2.9924 3.0 300 2.2397 0.2790 0.1378 0.2575 0.2592
2.2734 4.0 400 1.9866 0.2987 0.1629 0.2868 0.2872
1.9431 5.0 500 1.7408 0.2251 0.1380 0.2231 0.2237
2.317 6.0 600 1.9235 0.2421 0.0922 0.2276 0.2282
1.8526 7.0 700 1.6342 0.3120 0.1636 0.2943 0.2944
1.7029 8.0 800 1.6244 0.2469 0.1361 0.2421 0.2427
1.6725 9.0 900 1.5803 0.2637 0.1362 0.2551 0.2560
1.5852 10.0 1000 1.5617 0.2963 0.1634 0.2907 0.2917
1.4625 11.0 1100 1.4049 0.2750 0.1383 0.2570 0.2576
1.3895 12.0 1200 1.4234 0.2969 0.1646 0.2917 0.2927
1.3584 13.0 1300 1.3807 0.3370 0.1601 0.3088 0.3099
1.2759 14.0 1400 1.3524 0.2890 0.1307 0.2654 0.2663
1.222 15.0 1500 1.3110 0.2718 0.1339 0.2566 0.2597
1.1515 16.0 1600 1.2297 0.3314 0.1626 0.3033 0.3038
1.0888 17.0 1700 1.1897 0.3028 0.1358 0.2769 0.2792
1.039 18.0 1800 1.1970 0.2833 0.1521 0.2758 0.2768
0.9907 19.0 1900 1.1790 0.2833 0.1521 0.2758 0.2768
0.9563 20.0 2000 1.1778 0.2833 0.1521 0.2758 0.2768

Framework versions

  • Transformers 4.48.3
  • Pytorch 2.5.1+cu124
  • Datasets 3.3.2
  • Tokenizers 0.21.0
Downloads last month
18
Safetensors
Model size
300M params
Tensor type
F32
·
Inference Providers NEW
Examples

Model tree for ak2603/mt5-small-mt5-finetuned-final

Base model

google/mt5-small
Finetuned
(439)
this model