qwen2.5-0.5b-expo-L2EXPO-ES-100

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

Loss: 486.1626
Logps: -82.8268
Logits: -0.5435
Objective: 489.7928
Dpo Loss: 245.8756
Regularize: 489.7928
Ranking Simple: 0.5254
Ranking Idealized: 0.5212
Ranking Idealized Expo: 0.5212
Wo Beta: 14.0464

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Dpo Loss	Logits	Logps	Validation Loss	Objective	Ranking Idealized	Ranking Idealized Expo	Ranking Simple	Regularize	Wo Beta
43.2587	0.1417	50	26.4475	-1.4448	-90.5292	52.6622	53.6977	0.5212	0.5212	0.5264	53.6977	16.1700
169.8852	0.2834	100	85.7639	-1.3621	-85.2787	173.9861	172.1891	0.5212	0.5212	0.5243	172.1891	15.4391
285.0432	0.4251	150	143.0300	-1.1694	-83.2181	291.4834	293.4404	0.5212	0.5212	0.5280	293.4404	15.2225
355.4066	0.5668	200	189.8469	-0.9274	-84.0320	372.7906	365.2124	0.5212	0.5212	0.5233	365.2124	14.8684
368.9811	0.7085	250	216.4584	-0.7746	-81.5050	446.6966	442.3321	0.5212	0.5212	0.5259	442.3321	14.4790
360.5868	0.8503	300	222.8840	-0.5984	-82.2011	448.9506	443.9051	0.5212	0.5212	0.5248	443.9051	14.3930
338.3987	0.9920	350	232.9365	-0.7855	-84.1638	462.1923	461.2073	0.5212	0.5212	0.5269	461.2073	14.2979
309.1712	1.1337	400	248.0718	-0.6414	-82.4934	480.5965	478.7404	0.5212	0.5212	0.5254	478.7404	14.3872
298.1424	1.2754	450	247.8722	-0.7014	-82.1465	480.3256	482.1766	0.5212	0.5212	0.5238	482.1766	14.3695
282.4504	1.4171	500	252.2093	-0.4578	-83.4101	493.7484	495.7639	0.5212	0.5212	0.5248	495.7639	14.1743
261.1027	1.5588	550	245.8756	-0.5435	-82.8268	486.1626	489.7928	0.5212	0.5212	0.5254	489.7928	14.0464
255.9288	1.7005	600	251.2934	-0.5347	-82.1768	500.3801	502.1727	0.5212	0.5212	0.5269	502.1727	14.2436
248.6787	1.8422	650	254.5959	-0.5140	-81.4923	502.3153	504.1582	0.5212	0.5212	0.5248	504.1582	14.3320
226.4676	1.9839	700	264.1660	-0.4816	-83.4216	512.6990	516.7103	0.5212	0.5212	0.5254	516.7103	14.0834
207.1551	2.1256	750	259.2528	-0.5410	-83.4589	506.4237	510.6129	0.5212	0.5212	0.5238	510.6129	14.1295
197.3545	2.2674	800	262.3102	-0.5659	-84.8747	513.3979	514.3120	0.5212	0.5212	0.5228	514.3120	14.0704
182.3796	2.4138	850	501.8831	-82.8624	-0.5510	504.8523	254.1251	504.8523	0.5274	0.5212	0.5212	14.1707
176.042	2.5555	900	518.1983	-85.0710	-0.5039	519.5008	263.2800	519.5008	0.5238	0.5212	0.5212	14.1123
164.8281	2.6972	950	512.1844	-84.5843	-0.5200	512.7651	262.8074	512.7651	0.5238	0.5212	0.5212	14.1643
150.0401	2.8389	1000	514.7036	-83.7343	-0.5219	516.5959	263.6169	516.5959	0.5259	0.5212	0.5212	14.1800
141.0317	2.9806	1050	519.2467	-84.2676	-0.4953	521.8153	266.9453	521.8153	0.5264	0.5212	0.5212	14.2577

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-ES-100

qwen2.5-0.5b-expo-L2EXPO-ES-100

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-ES-100

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-ES-100

Evaluation results