qwen2.5-0.5b-expo-L2EXPO-ES-1000

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

Loss: 5280.8613
Logps: -85.1920
Logits: -0.4645
Objective: 5329.0571
Dpo Loss: 2703.0312
Regularize: 5329.0571
Ranking Simple: 0.5264
Ranking Idealized: 0.5212
Ranking Idealized Expo: 0.5212
Wo Beta: 14.0257

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Dpo Loss	Logits	Logps	Validation Loss	Objective	Ranking Idealized	Ranking Idealized Expo	Ranking Simple	Regularize	Wo Beta
423.0596	0.1417	50	269.6528	-1.3995	-90.7221	547.3513	547.0782	0.5212	0.5212	0.5238	547.0782	16.2386
1707.6314	0.2834	100	848.0440	-1.3188	-86.9761	1694.1810	1675.6274	0.5212	0.5212	0.5202	1675.6274	15.6407
2824.1562	0.4251	150	1511.3630	-1.2862	-82.2178	3014.1465	2968.7720	0.5212	0.5212	0.5274	2968.7720	15.0323
3551.5363	0.5668	200	1970.6587	-0.7799	-81.0056	3928.5837	3925.4968	0.5212	0.5212	0.5248	3925.4968	14.6132
3769.9247	0.7085	250	2167.5466	-0.7280	-80.5808	4317.6050	4303.1143	0.5212	0.5212	0.5269	4303.1143	14.5829
3591.3281	0.8503	300	2308.4351	-0.5846	-82.7713	4553.7632	4559.6348	0.5212	0.5212	0.5248	4559.6348	14.5914
3315.5613	0.9920	350	2326.0144	-0.7541	-80.8051	4667.9404	4670.2617	0.5212	0.5212	0.5331	4670.2617	14.3052
3140.2284	1.1337	400	2524.6191	-0.6474	-81.5771	4876.3184	4879.0815	0.5212	0.5212	0.5228	4879.0815	14.3271
2984.025	1.2754	450	2466.7131	-0.7908	-84.2705	4773.4326	4785.7534	0.5212	0.5212	0.5248	4785.7534	14.3213
2769.3719	1.4171	500	2513.8191	-0.7098	-81.4917	4863.7148	4866.6235	0.5212	0.5212	0.5192	4866.6235	14.1934
2620.0086	1.5588	550	2463.1169	-0.5653	-81.8307	4887.2939	4877.4683	0.5212	0.5212	0.5248	4877.4683	14.1757
2530.9462	1.7005	600	2522.0715	-0.4886	-82.8727	4965.4233	5013.2871	0.5212	0.5212	0.5233	5013.2871	14.2573
2445.0009	1.8422	650	2509.7644	-0.5173	-81.8303	4964.3994	4986.9541	0.5212	0.5212	0.5243	4986.9541	14.2557
2287.7192	1.9839	700	2561.1602	-0.5354	-83.8738	5034.0654	5065.8521	0.5212	0.5212	0.5217	5065.8521	14.0847
2066.9519	2.1256	750	2654.1794	-0.4949	-82.1944	5229.8853	5264.4932	0.5212	0.5212	0.5254	5264.4932	14.0981
1963.7713	2.2674	800	2636.3833	-0.4790	-82.2307	5180.2388	5235.7695	0.5212	0.5212	0.5243	5235.7695	14.0378
1854.7628	2.4091	850	2612.3875	-0.4900	-82.9664	5130.6069	5171.9189	0.5212	0.5212	0.5269	5171.9189	14.1142
1711.9678	2.5508	900	2703.0312	-0.4645	-85.1920	5280.8613	5329.0571	0.5212	0.5212	0.5264	5329.0571	14.0257
1682.3781	2.6925	950	2644.8484	-0.4320	-83.9376	5177.0815	5195.8457	0.5212	0.5212	0.5254	5195.8457	14.1691
1508.6941	2.8342	1000	2632.4006	-0.5014	-83.4235	5124.7144	5131.5728	0.5212	0.5212	0.5243	5131.5728	14.1501
1432.2169	2.9759	1050	2638.4963	-0.4687	-83.8074	5215.6191	5232.5947	0.5212	0.5212	0.5295	5232.5947	14.2389
1247.6562	3.1223	1100	5184.8696	-84.1614	-0.5461	5190.6357	2631.7661	5190.6357	0.5264	0.5212	0.5212	14.1529
1136.2859	3.2641	1150	5110.2056	-83.8852	-0.5632	5112.0278	2590.1838	5112.0278	0.5280	0.5212	0.5212	14.0933
1042.7762	3.4058	1200	5146.4077	-83.9630	-0.5505	5162.1665	2612.2661	5162.1665	0.5274	0.5212	0.5212	14.1122
978.7787	3.5475	1250	5115.5093	-83.8987	-0.4993	5140.9258	2605.3420	5140.9258	0.5280	0.5212	0.5212	14.1279
864.8715	3.6892	1300	5143.4609	-84.2929	-0.5245	5173.7549	2621.0728	5173.7549	0.5259	0.5212	0.5212	14.1584

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-ES-1000

qwen2.5-0.5b-expo-L2EXPO-ES-1000

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-ES-1000

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-ES-1000

Evaluation results