qwen2.5-0.5b-expo-L2EXPO-W0-noES2-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

Loss: 187.9473
Logps: -88.4161
Logits: -1.2945
Objective: 183.8510
Dpo Loss: 0.6799
Regularize: 0.4168
Ranking Simple: 0.5326
Ranking Idealized: 0.6025
Ranking Idealized Expo: 0.5233
Wo Beta: 15.9953

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Ranking Idealized	Ranking Idealized Expo	Wo Beta
181.5404	0.1417	50	182.3598	-90.9189	-1.4233	180.3279	0.6890	0.4088	0.5264	0.6025	0.5233	16.2974
156.9096	0.2834	100	181.8641	-91.4510	-1.4702	180.3150	0.6855	0.4101	0.5316	0.6025	0.5233	16.3731
145.838	0.4251	150	180.6479	-90.7049	-1.4504	178.1705	0.6790	0.4023	0.5383	0.6025	0.5233	16.5880
140.9398	0.5668	200	184.4987	-90.3330	-1.3895	181.5451	0.6803	0.4101	0.5326	0.6025	0.5233	16.1415
131.1439	0.7085	250	182.2077	-91.2246	-1.4789	178.4409	0.6797	0.4059	0.5326	0.6025	0.5233	16.3683
118.3192	0.8503	300	183.4459	-92.5771	-1.4552	180.4714	0.6817	0.4123	0.5326	0.6025	0.5233	16.4041
108.5029	0.9920	350	183.9593	-92.1804	-1.4317	180.1151	0.6782	0.4095	0.5321	0.6025	0.5233	16.3367
104.6813	1.1337	400	183.8759	-89.7261	-1.3930	180.2840	0.6801	0.4094	0.5311	0.6025	0.5233	16.2209
90.5585	1.2754	450	184.0673	-91.2037	-1.3663	180.6296	0.6795	0.4105	0.5357	0.6025	0.5233	16.2889
91.2372	1.4171	500	185.7194	-89.4298	-1.3281	180.8790	0.6782	0.4110	0.5347	0.6025	0.5233	16.0443
85.7307	1.5588	550	186.2241	-91.6866	-1.3683	182.1382	0.6799	0.4147	0.5336	0.6025	0.5233	16.1863
79.9458	1.7005	600	186.2137	-91.0847	-1.3519	181.8686	0.6794	0.4135	0.5373	0.6025	0.5233	16.1060
86.7578	1.8422	650	186.7196	-89.4070	-1.3403	182.4970	0.6797	0.4141	0.5316	0.6025	0.5233	16.0270
76.2665	1.9839	700	186.2802	-89.5857	-1.3223	182.3933	0.6800	0.4136	0.5311	0.6025	0.5233	16.1117
65.1575	2.1256	750	188.1571	-90.2454	-1.3253	184.2076	0.6806	0.4179	0.5321	0.6025	0.5233	15.9179
66.0375	2.2674	800	186.7221	-88.5874	-1.3137	181.9355	0.6781	0.4137	0.5336	0.6025	0.5233	15.9879
55.6773	2.4091	850	189.5397	-88.2689	-1.3112	185.2096	0.6809	0.4203	0.5300	0.6025	0.5233	15.9311
54.3682	2.5508	900	188.2381	-88.3611	-1.3259	184.1678	0.6793	0.4167	0.5311	0.6025	0.5233	15.9680
50.3775	2.6925	950	189.5419	-88.8005	-1.3091	185.0044	0.6802	0.4183	0.5331	0.6025	0.5233	15.9986
45.9449	2.8342	1000	187.7079	-87.8148	-1.2990	183.5676	0.6792	0.4161	0.5300	0.6025	0.5233	15.9960
49.0003	2.9759	1050	188.0040	-88.3004	-1.2778	184.0016	0.6792	0.4173	0.5342	0.6025	0.5233	16.0403
40.2428	3.1176	1100	188.7166	-88.6470	-1.2988	184.3815	0.6801	0.4181	0.5326	0.6025	0.5233	15.9981
37.177	3.2593	1150	188.2563	-87.9239	-1.2843	184.3351	0.6804	0.4183	0.5357	0.6025	0.5233	16.0123
34.9809	3.4010	1200	189.1705	-88.1129	-1.2900	184.8718	0.6806	0.4193	0.5326	0.6025	0.5233	15.9531
34.073	3.5427	1250	188.2203	-88.5617	-1.2892	184.1966	0.6802	0.4177	0.5336	0.6025	0.5233	16.0022
28.4565	3.6845	1300	188.3189	-88.0836	-1.2942	184.1293	0.6803	0.4178	0.5331	0.6025	0.5233	16.0081
27.4636	3.8262	1350	188.3022	-88.4586	-1.2973	184.2191	0.6803	0.4178	0.5321	0.6025	0.5233	15.9996
27.3902	3.9679	1400	187.9691	-88.3135	-1.2974	183.7816	0.6798	0.4168	0.5321	0.6025	0.5233	15.9788
21.2906	4.1096	1450	187.8985	-88.1212	-1.2976	183.6546	0.6796	0.4164	0.5321	0.6025	0.5233	15.9853
19.8787	4.2513	1500	188.0825	-88.3078	-1.2942	183.8684	0.6799	0.4169	0.5321	0.6025	0.5233	15.9839
18.4741	4.3930	1550	188.0407	-88.4855	-1.2951	184.0446	0.6802	0.4173	0.5326	0.6025	0.5233	15.9950
20.4794	4.5347	1600	187.9061	-88.4381	-1.2950	183.8276	0.6799	0.4168	0.5331	0.6025	0.5233	16.0004
17.2115	4.6764	1650	187.9504	-88.4174	-1.2938	183.8566	0.6798	0.4168	0.5326	0.6025	0.5233	15.9942
16.5799	4.8181	1700	187.9360	-88.4220	-1.2946	183.8405	0.6799	0.4168	0.5326	0.6025	0.5233	15.9963
16.689	4.9598	1750	187.9473	-88.4162	-1.2945	183.8510	0.6799	0.4168	0.5326	0.6025	0.5233	15.9953

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-W0-noES2-0.1

qwen2.5-0.5b-expo-L2EXPO-W0-noES2-0.1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-W0-noES2-0.1

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-W0-noES2-0.1

Evaluation results