metadata

license: apache-2.0
base_model: google/mt5-base
tags:
  - generated_from_trainer
metrics:
  - rouge
  - sacrebleu
model-index:
  - name: mT5-TextSimp-LT-BatchSize4-lr5e-5
    results: []

mT5-TextSimp-LT-BatchSize4-lr5e-5

This model is a fine-tuned version of google/mt5-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.1611
Rouge1: 0.46
Rouge2: 0.2767
Rougel: 0.4464
Sacrebleu: 23.2936
Gen Len: 39.0358

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 8

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Sacrebleu	Gen Len
36.6446	0.48	200	31.2765	0.0004	0.0	0.0004	0.0003	512.0
11.5223	0.96	400	6.7786	0.0031	0.0	0.0031	0.0045	89.2816
2.2686	1.44	600	0.6729	0.0054	0.0	0.0053	0.0196	39.0501
0.7009	1.91	800	0.6529	0.0029	0.0	0.0027	0.0424	41.401
0.6213	2.39	1000	0.5630	0.0058	0.0002	0.0056	0.0201	39.0334
0.6435	2.87	1200	0.4697	0.0688	0.0084	0.0608	0.1156	39.0453
0.4154	3.35	1400	10.4655	0.2098	0.1219	0.2011	0.671	350.0334
0.6289	3.83	1600	1.9257	0.3176	0.1945	0.3072	3.6031	138.7494
3.5542	4.31	1800	0.8459	0.373	0.2029	0.3615	16.8305	59.8568
8.1736	4.78	2000	7.2350	0.3147	0.1815	0.3033	7.3572	289.1432
2.3987	5.26	2200	0.8361	0.3616	0.1903	0.3501	16.2229	61.0668
0.9853	5.74	2400	0.4219	0.3635	0.2004	0.3515	15.2744	46.494
0.3575	6.22	2600	0.3516	0.3796	0.2121	0.3687	13.6464	46.1623
0.4497	6.7	2800	0.2597	0.4392	0.2698	0.4263	18.9423	42.2697
0.2582	7.18	3000	0.1583	0.4442	0.2579	0.431	21.5533	38.1671
0.2629	7.66	3200	0.1611	0.46	0.2767	0.4464	23.2936	39.0358

Framework versions

Transformers 4.33.0
Pytorch 2.1.2+cu121
Datasets 2.14.4
Tokenizers 0.13.3