qwen2.5-0.5b-expo-DPO-ES-1000

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

Loss: 4901.4473
Logps: -79.4462
Logits: -0.5595
Objective: 4906.0884
Dpo Loss: 2071.1946
Regularize: 4906.0884
Ranking Simple: 0.5362
Ranking Idealized: 0.5212
Ranking Idealized Expo: 0.5212
Wo Beta: 14.7030

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Dpo Loss	Logits	Logps	Validation Loss	Objective	Ranking Idealized	Ranking Idealized Expo	Ranking Simple	Regularize	Wo Beta
171.6127	0.1417	50	308.1877	-1.4575	-90.7970	309.7356	308.1877	0.5212	0.5212	0.5254	308.1877	7.7047
582.3149	0.2834	100	708.8133	-1.3877	-88.8040	733.8202	708.8133	0.5212	0.5212	0.5285	708.8133	7.4775
1002.3808	0.4251	150	1245.7263	-1.3138	-83.6606	1283.0697	1245.7263	0.5212	0.5212	0.5311	1245.7263	7.3632
1199.7266	0.5668	200	1471.0287	-1.2584	-79.8249	1530.3123	1471.0287	0.5212	0.5212	0.5347	1471.0287	7.2330
1311.3106	0.7085	250	1842.3601	-1.1799	-78.5750	1873.1123	1842.3601	0.5212	0.5212	0.5347	1842.3601	7.2046
1216.5524	0.8503	300	1949.1084	-1.0463	-80.6875	2001.6104	1949.1084	0.5212	0.5212	0.5326	1949.1084	6.9438
1157.2415	0.9920	350	1956.4012	-0.8782	-79.7493	2064.3220	1956.4012	0.5212	0.5212	0.5440	1956.4012	7.0169
721.9005	1.1337	400	2228.8811	-0.5703	-80.2022	2276.4189	2228.8811	0.5212	0.5212	0.5404	2228.8811	7.2480
779.6797	1.2754	450	2016.3281	-0.7091	-78.4054	2069.4939	2016.3281	0.5212	0.5212	0.5367	2016.3281	6.8242
788.48	1.4171	500	2044.0745	-0.6659	-81.9827	2120.1182	2044.0745	0.5212	0.5212	0.5342	2044.0745	6.8667
684.4246	1.5588	550	2053.8372	-0.6751	-81.6376	2148.1580	2053.8372	0.5212	0.5212	0.5342	2053.8372	6.7901
708.5259	1.7005	600	2071.1946	-0.5595	-79.4462	2179.6001	2071.1946	0.5212	0.5212	0.5362	2071.1946	6.6511
690.9902	1.8422	650	2158.4885	-0.5552	-80.5108	2241.3740	2158.4885	0.5212	0.5212	0.5414	2158.4885	6.7740
617.6108	1.9839	700	2132.2517	-0.5079	-80.3825	2230.5115	2132.2517	0.5212	0.5212	0.5404	2132.2517	6.7954
343.0455	2.1256	750	2123.3604	-0.5398	-81.3539	2199.7175	2123.3604	0.5212	0.5212	0.5430	2123.3604	6.7578
311.7518	2.2674	800	2038.6656	-0.5497	-80.2739	2139.7871	2038.6656	0.5212	0.5212	0.5378	2038.6656	6.6768
315.5968	2.4091	850	2184.7112	-0.5282	-83.3843	2249.3201	2184.7112	0.5212	0.5212	0.5404	2184.7112	6.8072
6263.3387	2.5555	900	6365.9199	-90.6956	-0.1035	6410.7383	3261.5381	6410.7383	0.5248	0.5212	0.5212	14.3919
4964.9731	2.6972	950	6126.6899	-88.8726	-0.1541	6172.9790	3203.6243	6172.9790	0.5259	0.5212	0.5212	14.2868
4278.7487	2.8389	1000	6092.5342	-87.8890	-0.0946	6127.7734	3153.0500	6127.7734	0.5243	0.5212	0.5212	14.2488
3830.4475	2.9806	1050	6040.4917	-86.9566	-0.1365	6063.3457	3127.4956	6063.3457	0.5233	0.5212	0.5212	14.0310
3079.7456	3.1223	1100	6014.5977	-88.1664	-0.1249	6010.9189	3144.7844	6010.9189	0.5217	0.5212	0.5212	14.2338

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-DPO-ES-1000

qwen2.5-0.5b-expo-DPO-ES-1000

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-ES-1000

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-ES-1000

Evaluation results