qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-500-5e6

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

Loss: 2234.1663
Logps: -82.0980
Logits: -0.6597
Objective: 2265.8794
Dpo Loss: 1141.8063
Regularize: 2265.8794
Ranking Simple: 0.5124
Ranking Idealized: 0.5093
Ranking Idealized Expo: 0.5093

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 6
gradient_accumulation_steps: 12
total_train_batch_size: 288
total_eval_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Ranking Idealized	Ranking Idealized Expo
737.2396	0.2834	50	421.9188	-92.0238	-1.3125	428.8152	221.7374	428.8152	0.5093	0.5093	0.5093
1559.0492	0.5668	100	1500.8584	-82.7953	-1.0226	1492.8041	733.0991	1492.8041	0.5072	0.5093	0.5093
1544.2886	0.8503	150	1796.8794	-83.3935	-0.8486	1837.2159	936.6276	1837.2159	0.4990	0.5093	0.5093
1387.1779	1.1337	200	1946.6445	-81.0039	-0.8010	1988.5870	1018.1060	1988.5870	0.5010	0.5093	0.5093
1257.5858	1.4171	250	2056.7834	-79.5628	-0.8937	2078.7400	1059.1973	2078.7400	0.5031	0.5093	0.5093
1062.9078	1.7005	300	2170.6946	-79.7273	-0.7209	2202.7805	1115.7678	2202.7805	0.5031	0.5093	0.5093
1015.0369	1.9839	350	2227.1714	-83.5951	-0.6739	2262.3740	1156.4828	2262.3740	0.5124	0.5093	0.5093
849.8354	2.2674	400	2210.6672	-83.3996	-0.6954	2238.0188	1124.7909	2238.0188	0.5155	0.5093	0.5093
749.1392	2.5508	450	2232.3298	-80.8498	-0.6204	2283.4070	1157.1035	2283.4070	0.5134	0.5093	0.5093
663.6063	2.8342	500	2235.3254	-81.1036	-0.6463	2277.7737	1152.4823	2277.7737	0.5083	0.5093	0.5093
547.2687	3.1176	550	2247.6917	-81.3519	-0.6623	2265.5049	1133.8970	2265.5049	0.5145	0.5093	0.5093
451.9043	3.4010	600	2235.0491	-81.8093	-0.6081	2263.8958	1143.4464	2263.8958	0.5114	0.5093	0.5093
383.0005	3.6845	650	2233.3066	-81.6021	-0.6417	2277.5994	1148.9692	2277.5994	0.5124	0.5093	0.5093
316.0834	3.9679	700	2236.5557	-82.0739	-0.6441	2269.0681	1143.5380	2269.0681	0.5134	0.5093	0.5093
230.1662	4.2513	750	2241.1863	-82.1894	-0.6514	2272.5417	1146.1786	2272.5417	0.5124	0.5093	0.5093
198.8015	4.5347	800	2236.1729	-82.0761	-0.6625	2266.9819	1141.6486	2266.9819	0.5134	0.5093	0.5093
189.8097	4.8181	850	2234.3398	-82.0995	-0.6599	2266.1760	1141.9380	2266.1760	0.5124	0.5093	0.5093

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-500-5e6

qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-500-5e6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-500-5e6

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-500-5e6

Evaluation results