mt5-small-synthetic-data-plus-translated

This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5891
  • Rouge1: 0.6390
  • Rouge2: 0.5109
  • Rougel: 0.6157
  • Rougelsum: 0.6175

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5.6e-05
  • train_batch_size: 12
  • eval_batch_size: 12
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 40

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum
14.4747 1.0 100 4.4435 0.0225 0.0054 0.0205 0.0215
5.9023 2.0 200 1.9711 0.1865 0.0791 0.1562 0.1567
3.0374 3.0 300 1.3288 0.3668 0.2195 0.3565 0.3567
2.1905 4.0 400 1.1478 0.4430 0.2741 0.4186 0.4205
1.8996 5.0 500 1.0408 0.4754 0.3275 0.4564 0.4574
1.6959 6.0 600 0.9541 0.5463 0.3972 0.5258 0.5273
1.5593 7.0 700 0.8942 0.5594 0.4138 0.5406 0.5426
1.4334 8.0 800 0.8482 0.6064 0.4683 0.5855 0.5866
1.3929 9.0 900 0.8106 0.6130 0.4714 0.5895 0.5911
1.2918 10.0 1000 0.7851 0.6156 0.4770 0.5929 0.5935
1.2362 11.0 1100 0.7576 0.6270 0.4894 0.6054 0.6060
1.1781 12.0 1200 0.7402 0.6257 0.4867 0.6031 0.6042
1.1476 13.0 1300 0.7212 0.6221 0.4894 0.6018 0.6029
1.1052 14.0 1400 0.7064 0.6214 0.4873 0.5983 0.5995
1.0667 15.0 1500 0.6938 0.6300 0.4972 0.6073 0.6079
1.0421 16.0 1600 0.6855 0.6265 0.4952 0.6026 0.6036
1.0169 17.0 1700 0.6748 0.6244 0.4911 0.6021 0.6029
1.0036 18.0 1800 0.6599 0.6342 0.5087 0.6130 0.6142
0.9828 19.0 1900 0.6510 0.6349 0.5090 0.6136 0.6147
0.9589 20.0 2000 0.6471 0.6370 0.5074 0.6124 0.6135
0.9267 21.0 2100 0.6400 0.6345 0.5081 0.6117 0.6127
0.9361 22.0 2200 0.6318 0.6336 0.5066 0.6126 0.6140
0.8992 23.0 2300 0.6291 0.6346 0.5066 0.6122 0.6125
0.9029 24.0 2400 0.6224 0.6367 0.5103 0.6152 0.6166
0.8815 25.0 2500 0.6159 0.6374 0.5078 0.6141 0.6157
0.8914 26.0 2600 0.6133 0.6356 0.5109 0.6120 0.6138
0.8548 27.0 2700 0.6091 0.6371 0.5089 0.6125 0.6145
0.8683 28.0 2800 0.6047 0.6387 0.5131 0.6149 0.6169
0.8483 29.0 2900 0.6020 0.6368 0.5096 0.6121 0.6133
0.8409 30.0 3000 0.5996 0.6405 0.5118 0.6139 0.6159
0.8407 31.0 3100 0.5997 0.6398 0.5123 0.6159 0.6177
0.8338 32.0 3200 0.5970 0.6385 0.5096 0.6144 0.6164
0.801 33.0 3300 0.5947 0.6361 0.5078 0.6122 0.6141
0.833 34.0 3400 0.5941 0.6386 0.5111 0.6154 0.6172
0.7751 35.0 3500 0.5921 0.6368 0.5065 0.6129 0.6148
0.8281 36.0 3600 0.5906 0.6409 0.5125 0.6183 0.6199
0.7803 37.0 3700 0.5898 0.6377 0.5097 0.6143 0.6162
0.8139 38.0 3800 0.5896 0.6398 0.5116 0.6166 0.6185
0.7922 39.0 3900 0.5894 0.6388 0.5109 0.6156 0.6174
0.8269 40.0 4000 0.5891 0.6390 0.5109 0.6157 0.6175

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
4
Safetensors
Model size
300M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ak2603/mt5-small-synthetic-data-plus-translated

Base model

google/mt5-small
Finetuned
(392)
this model