fine_tuned_xsum_callback10

This model is a fine-tuned version of Qwen/Qwen2-1.5B on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.8057	0.0289	100	0.6011	0.7324
0.6239	0.0578	200	0.5307	0.8254
0.4184	0.0867	300	0.3708	0.8417
0.4352	0.1156	400	0.2976	0.8862
0.3868	0.1445	500	0.2695	0.8950
0.3264	0.1734	600	0.7274	0.8739
0.4039	0.2023	700	0.3018	0.9314
0.3415	0.2311	800	0.2797	0.9171
0.3379	0.2600	900	0.1677	0.9360
0.2547	0.2889	1000	0.1600	0.9506
0.3377	0.3178	1100	0.5096	0.9025
0.2786	0.3467	1200	0.1569	0.9496
0.229	0.3756	1300	0.3807	0.9395
0.1867	0.4045	1400	0.2366	0.9564
0.1862	0.4334	1500	0.1283	0.9587
0.2238	0.4623	1600	0.3889	0.9356
0.1845	0.4912	1700	0.1452	0.9610
0.2051	0.5201	1800	0.2200	0.9558
0.2094	0.5490	1900	0.1520	0.9646
0.2217	0.5779	2000	0.3833	0.9265
0.2763	0.6068	2100	0.1593	0.9594
0.2033	0.6357	2200	0.1518	0.9626
0.2259	0.6645	2300	0.1149	0.9626
0.1501	0.6934	2400	0.1935	0.9597
0.1642	0.7223	2500	0.4075	0.9269
0.2433	0.7512	2600	0.1535	0.9642
0.1941	0.7801	2700	0.3230	0.9623
0.1185	0.8090	2800	0.3787	0.9691
0.1735	0.8379	2900	0.3400	0.9626
0.1453	0.8668	3000	0.5315	0.9529
0.164	0.8957	3100	0.2728	0.9678
0.2602	0.9246	3200	0.1789	0.9616
0.1642	0.9535	3300	0.1252	0.9675