qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-5-1e6

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

Loss: 5.9301
Logps: -88.3847
Logits: -1.2661
Objective: 5.9752
Dpo Loss: 3.0906
Regularize: 5.9752
Ranking Simple: 0.5134
Ranking Idealized: 0.5093
Ranking Idealized Expo: 0.5093

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 6
gradient_accumulation_steps: 12
total_train_batch_size: 288
total_eval_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Ranking Idealized	Ranking Idealized Expo
1.7171	0.2834	50	0.9452	-91.4216	-1.3980	0.9804	0.8391	0.9804	0.5114	0.5093	0.5093
4.4116	0.5668	100	2.2889	-91.3584	-1.3646	2.2847	1.3937	2.2847	0.5145	0.5093	0.5093
5.641	0.8503	150	3.6592	-89.6013	-1.3612	3.6993	1.8989	3.6993	0.5124	0.5093	0.5093
5.6662	1.1337	200	4.9017	-91.8203	-1.3129	5.1434	2.5622	5.1434	0.5134	0.5093	0.5093
5.0544	1.4171	250	4.6457	-89.6596	-1.2958	4.6981	2.3884	4.6981	0.5093	0.5093	0.5093
4.799	1.7005	300	5.0697	-89.6459	-1.3128	5.1481	2.5371	5.1481	0.5114	0.5093	0.5093
4.3968	1.9839	350	5.4045	-88.5459	-1.2879	5.3636	2.7971	5.3636	0.5103	0.5093	0.5093
3.8148	2.2674	400	5.7626	-88.2542	-1.2680	5.8200	2.9398	5.8200	0.5093	0.5093	0.5093
3.4169	2.5508	450	5.9539	-88.0116	-1.2897	6.1065	3.1384	6.1065	0.5145	0.5093	0.5093
2.988	2.8342	500	5.9854	-87.9506	-1.2856	6.0183	3.1318	6.0183	0.5093	0.5093	0.5093
2.4859	3.1176	550	6.1946	-88.5030	-1.2805	6.2029	3.1790	6.2029	0.5103	0.5093	0.5093
2.0539	3.4010	600	5.9332	-88.1616	-1.2651	6.0318	3.1111	6.0318	0.5114	0.5093	0.5093
1.664	3.6845	650	5.9239	-88.6992	-1.2608	5.9851	3.0968	5.9851	0.5114	0.5093	0.5093
1.3502	3.9679	700	5.9176	-88.5236	-1.2647	5.9571	3.0895	5.9571	0.5134	0.5093	0.5093
1.0052	4.2513	750	5.9642	-88.3618	-1.2630	6.0061	3.1036	6.0061	0.5134	0.5093	0.5093
0.8548	4.5347	800	5.9238	-88.3534	-1.2662	5.9711	3.0853	5.9711	0.5134	0.5093	0.5093
0.7765	4.8181	850	5.9323	-88.3874	-1.2660	5.9770	3.0916	5.9770	0.5134	0.5093	0.5093

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-5-1e6

qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-5-1e6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-5-1e6

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-5-1e6

Evaluation results