XSum_t5-small_800_adafactor

This model is a fine-tuned version of /content/XSum_t5-small_800_adafactor/checkpoint-11000 on the xsum dataset. It achieves the following results on the evaluation set:

Loss: 2.1714
Rouge1: 33.022
Rouge2: 11.9979
Rougel: 26.7476
Rougelsum: 26.7402
Gen Len: 18.7543

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 25
eval_batch_size: 25
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
2.3404	0.01	100	2.2058	32.4826	11.5807	26.2716	26.2611	18.7842
2.3194	0.02	200	2.2028	32.6393	11.661	26.372	26.3643	18.788
2.3247	0.04	300	2.1999	32.6792	11.6985	26.3876	26.3786	18.7354
2.3276	0.05	400	2.1979	32.6668	11.7272	26.3964	26.3907	18.7957
2.317	0.06	500	2.1957	32.8267	11.8165	26.5075	26.4997	18.7543
2.3214	0.07	600	2.1942	32.8319	11.8064	26.5428	26.5448	18.7693
2.3014	0.09	700	2.1931	32.7136	11.7334	26.4958	26.486	18.7759
2.3294	0.1	800	2.1902	32.6818	11.7684	26.4314	26.4242	18.785
2.299	0.11	900	2.1914	32.672	11.7606	26.4475	26.4367	18.7853
2.3009	0.12	1000	2.1900	32.7816	11.7958	26.5167	26.5099	18.7685
2.2913	0.13	1100	2.1885	32.6438	11.7398	26.4077	26.4051	18.7742
2.293	0.15	1200	2.1854	32.8228	11.841	26.548	26.5415	18.7899
2.2857	0.16	1300	2.1853	32.7118	11.7439	26.4989	26.4941	18.7998
2.2921	0.17	1400	2.1832	32.6705	11.7333	26.4076	26.4082	18.8017
2.3074	0.18	1500	2.1827	32.7543	11.7787	26.4904	26.4923	18.7827
2.3044	0.2	1600	2.1806	32.8573	11.8672	26.5655	26.5619	18.8097
2.2922	0.21	1700	2.1819	32.8394	11.8158	26.5523	26.5467	18.7891
2.2901	0.22	1800	2.1803	32.7219	11.7493	26.4644	26.4572	18.7882
2.286	0.23	1900	2.1790	32.7474	11.852	26.5078	26.5014	18.7699
2.298	0.25	2000	2.1781	32.8662	11.8878	26.618	26.6174	18.7979
2.2787	0.26	2100	2.1775	32.9621	11.9521	26.6955	26.6914	18.7934
2.2823	0.27	2200	2.1777	33.0633	12.0622	26.7715	26.7597	18.7954
2.2889	0.28	2300	2.1742	32.9637	12.0154	26.6771	26.6721	18.7844
2.2847	0.29	2400	2.1774	32.7435	11.8869	26.5334	26.5306	18.756
2.2923	0.31	2500	2.1754	32.8437	11.8977	26.59	26.587	18.7964
2.2877	0.32	2600	2.1740	32.9137	11.9267	26.618	26.6046	18.7678
2.2976	0.33	2700	2.1728	32.9372	11.9048	26.6412	26.6345	18.7838
2.2935	0.34	2800	2.1719	32.7338	11.7836	26.5667	26.5629	18.7659
2.2622	0.36	2900	2.1718	32.9847	11.978	26.7093	26.7008	18.7627
2.2749	0.37	3000	2.1710	32.9835	11.9809	26.7034	26.6946	18.8016
2.2615	0.38	3100	2.1721	32.9343	11.9317	26.6752	26.6695	18.7689
2.2825	0.39	3200	2.1714	33.022	11.9979	26.7476	26.7402	18.7543

Framework versions

Transformers 4.20.1
Pytorch 1.12.0+cu113
Datasets 2.3.2
Tokenizers 0.12.1

oMateos2020
/

XSum_t5-small_800_adafactor

XSum_t5-small_800_adafactor

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train oMateos2020/XSum_t5-small_800_adafactor

Evaluation results