--- library_name: transformers license: apache-2.0 base_model: google/mt5-small tags: - summarization - generated_from_trainer metrics: - rouge model-index: - name: mt5-small-synthetic-data-plus-translated-bs32ep32 results: [] --- # mt5-small-synthetic-data-plus-translated-bs32ep32 This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.9041 - Rouge1: 0.6137 - Rouge2: 0.4715 - Rougel: 0.5917 - Rougelsum: 0.5922 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5.6e-05 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - num_epochs: 32 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:| | 19.5396 | 1.0 | 38 | 11.5658 | 0.0077 | 0.0018 | 0.0067 | 0.0068 | | 13.1079 | 2.0 | 76 | 7.8457 | 0.0077 | 0.0008 | 0.0075 | 0.0070 | | 9.4043 | 3.0 | 114 | 3.9170 | 0.0234 | 0.0025 | 0.0217 | 0.0210 | | 6.388 | 4.0 | 152 | 2.7482 | 0.1030 | 0.0252 | 0.0916 | 0.0910 | | 4.5077 | 5.0 | 190 | 2.0482 | 0.1057 | 0.0414 | 0.0917 | 0.0914 | | 3.3242 | 6.0 | 228 | 1.6075 | 0.1778 | 0.0897 | 0.1525 | 0.1532 | | 2.7 | 7.0 | 266 | 1.3881 | 0.3601 | 0.2141 | 0.3479 | 0.3487 | | 2.3089 | 8.0 | 304 | 1.2989 | 0.4295 | 0.2607 | 0.4095 | 0.4091 | | 2.1141 | 9.0 | 342 | 1.2346 | 0.4337 | 0.2603 | 0.4147 | 0.4146 | | 1.9442 | 10.0 | 380 | 1.1888 | 0.4926 | 0.3337 | 0.4642 | 0.4644 | | 1.8082 | 11.0 | 418 | 1.1418 | 0.5101 | 0.3560 | 0.4920 | 0.4929 | | 1.7142 | 12.0 | 456 | 1.1052 | 0.5341 | 0.3809 | 0.5154 | 0.5155 | | 1.6345 | 13.0 | 494 | 1.0775 | 0.5605 | 0.4071 | 0.5394 | 0.5394 | | 1.5983 | 14.0 | 532 | 1.0539 | 0.5790 | 0.4262 | 0.5585 | 0.5580 | | 1.5376 | 15.0 | 570 | 1.0322 | 0.5713 | 0.4206 | 0.5531 | 0.5532 | | 1.5059 | 16.0 | 608 | 1.0137 | 0.5807 | 0.4302 | 0.5605 | 0.5605 | | 1.4434 | 17.0 | 646 | 0.9970 | 0.6069 | 0.4656 | 0.5874 | 0.5871 | | 1.442 | 18.0 | 684 | 0.9826 | 0.6104 | 0.4671 | 0.5869 | 0.5874 | | 1.4059 | 19.0 | 722 | 0.9688 | 0.6102 | 0.4666 | 0.5886 | 0.5883 | | 1.3618 | 20.0 | 760 | 0.9636 | 0.6127 | 0.4683 | 0.5901 | 0.5906 | | 1.3341 | 21.0 | 798 | 0.9517 | 0.6065 | 0.4632 | 0.5852 | 0.5860 | | 1.3019 | 22.0 | 836 | 0.9397 | 0.6092 | 0.4669 | 0.5886 | 0.5882 | | 1.3114 | 23.0 | 874 | 0.9343 | 0.6091 | 0.4663 | 0.5869 | 0.5871 | | 1.2906 | 24.0 | 912 | 0.9272 | 0.6137 | 0.4702 | 0.5913 | 0.5912 | | 1.255 | 25.0 | 950 | 0.9201 | 0.6153 | 0.4723 | 0.5934 | 0.5934 | | 1.261 | 26.0 | 988 | 0.9186 | 0.6174 | 0.4749 | 0.5966 | 0.5965 | | 1.2363 | 27.0 | 1026 | 0.9124 | 0.6155 | 0.4738 | 0.5948 | 0.5952 | | 1.2993 | 28.0 | 1064 | 0.9078 | 0.6153 | 0.4738 | 0.5937 | 0.5939 | | 1.2653 | 29.0 | 1102 | 0.9054 | 0.6126 | 0.4712 | 0.5919 | 0.5922 | | 1.2287 | 30.0 | 1140 | 0.9046 | 0.6099 | 0.4676 | 0.5894 | 0.5901 | | 1.2279 | 31.0 | 1178 | 0.9041 | 0.6137 | 0.4715 | 0.5917 | 0.5922 | | 1.2348 | 32.0 | 1216 | 0.9041 | 0.6137 | 0.4715 | 0.5917 | 0.5922 | ### Framework versions - Transformers 4.47.1 - Pytorch 2.5.1+cu121 - Datasets 3.2.0 - Tokenizers 0.21.0