qwen2.5-0.5b-expo-L2EXPO-ES-0.1-W0

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

Loss: 283.0078
Logps: -81.0689
Logits: -0.5212
Objective: 277.3703
Dpo Loss: 0.7209
Regularize: 0.6310
Ranking Simple: 0.5331
Ranking Idealized: 0.6030
Ranking Idealized Expo: 0.5223
Wo Beta: 14.2695

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Dpo Loss	Logits	Logps	Validation Loss	Objective	Ranking Idealized	Ranking Idealized Expo	Ranking Simple	Regularize	Wo Beta
176.8183	0.1417	50	0.6874	-1.4745	-93.3291	185.6754	183.7365	0.6030	0.5223	0.5212	0.4178	16.4509
168.1755	0.2834	100	0.6819	-1.4487	-93.7829	195.2546	190.2426	0.6030	0.5223	0.5342	0.4320	16.2920
182.7148	0.4251	150	0.6969	-1.2730	-89.7329	218.5299	213.4884	0.6030	0.5223	0.5336	0.4859	15.8045
203.0993	0.5668	200	0.7051	-1.0447	-79.7062	251.9243	242.6406	0.6030	0.5223	0.5326	0.5518	14.6949
207.5481	0.7085	250	0.7055	-1.0362	-80.0940	251.9905	244.3510	0.6030	0.5223	0.5305	0.5542	14.8158
193.4843	0.8503	300	0.7150	-0.7137	-80.4296	266.7107	258.3957	0.6030	0.5223	0.5290	0.5881	14.5431
182.6922	0.9920	350	0.7073	-0.6448	-76.3638	262.3346	254.6360	0.6030	0.5223	0.5357	0.5802	14.6176
166.9683	1.1337	400	0.7152	-0.6392	-78.3482	272.3288	264.9111	0.6030	0.5223	0.5274	0.6056	14.6513
155.9364	1.2754	450	0.7186	-0.4207	-80.5230	275.0490	268.8637	0.6030	0.5223	0.5321	0.6129	14.7777
143.4724	1.4171	500	0.7209	-0.5141	-80.5587	275.9663	270.0383	0.6030	0.5223	0.5269	0.6150	14.4364
141.3444	1.5588	550	0.7139	-0.6338	-81.1271	275.0851	269.2189	0.6030	0.5223	0.5378	0.6159	14.6425
136.172	1.7029	600	273.6681	-79.4221	-0.5857	264.6510	0.7111	0.6012	0.5373	0.6030	0.5223	14.5631
130.7133	1.8446	650	276.3609	-80.2130	-0.4215	269.6939	0.7193	0.6141	0.5342	0.6030	0.5223	14.5456
122.624	1.9863	700	278.4690	-80.9968	-0.5263	271.4757	0.7178	0.6190	0.5378	0.6030	0.5223	14.4664
108.7022	2.1280	750	282.5668	-84.0088	-0.4657	276.0201	0.7207	0.6302	0.5347	0.6030	0.5223	14.4517
104.1923	2.2697	800	278.0555	-81.6313	-0.4640	272.7622	0.7166	0.6210	0.5383	0.6030	0.5223	14.4307
99.0867	2.4114	850	283.0078	-81.0689	-0.5212	277.3703	0.7209	0.6310	0.5331	0.6030	0.5223	14.2695
91.7475	2.5531	900	279.6676	-81.6144	-0.5149	275.1769	0.7200	0.6279	0.5373	0.6030	0.5223	14.3570
87.8681	2.6949	950	281.5718	-81.8544	-0.4428	275.7560	0.7191	0.6277	0.5362	0.6030	0.5223	14.3509
81.742	2.8366	1000	279.1324	-81.4412	-0.4951	274.5647	0.7197	0.6257	0.5336	0.6030	0.5223	14.3551
76.4372	2.9783	1050	279.1884	-82.3960	-0.4502	273.9026	0.7184	0.6249	0.5336	0.6030	0.5223	14.3203
67.4698	3.1200	1100	280.5317	-82.9107	-0.4190	274.7932	0.7169	0.6260	0.5326	0.6030	0.5223	14.3418

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-W0-ES-0.1

qwen2.5-0.5b-expo-L2EXPO-ES-0.1-W0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-W0-ES-0.1

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-W0-ES-0.1

Evaluation results