qwen2.5-0.5b-expo-L2EXPO-W2-ES-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

Loss: 0.0580
Logps: -81.6322
Logits: -0.3472
Objective: 0.0564
Dpo Loss: 0.7177
Regularize: 0.6547
Ranking Simple: 0.5450
Ranking Idealized: 0.6030
Ranking Idealized Expo: 0.5223
Wo Beta: 14.1503

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Ranking Idealized	Ranking Idealized Expo	Wo Beta
0.0387	0.1417	50	0.0387	-89.7823	-1.4878	0.0386	0.6849	0.4152	0.5259	0.6030	0.5223	16.3818
0.0376	0.2834	100	0.0400	-87.3577	-1.4813	0.0392	0.6794	0.4266	0.5326	0.6030	0.5223	16.1307
0.041	0.4251	150	0.0453	-80.2072	-1.3514	0.0446	0.6965	0.4931	0.5280	0.6030	0.5223	15.6134
0.0451	0.5668	200	0.0473	-78.7022	-0.9891	0.0464	0.6948	0.5121	0.5300	0.6030	0.5223	15.2852
0.0483	0.7085	250	0.0523	-73.9930	-0.9959	0.0507	0.7054	0.5778	0.5393	0.6030	0.5223	15.1111
0.0487	0.8503	300	0.0531	-79.5956	-1.0801	0.0509	0.7126	0.5977	0.5342	0.6030	0.5223	14.5847
0.0485	0.9920	350	0.0548	-76.9095	-0.8726	0.0533	0.7110	0.6159	0.5378	0.6030	0.5223	14.4121
0.0529	1.1337	400	0.0587	-78.7635	-0.4139	0.0575	0.7255	0.6577	0.5378	0.6030	0.5223	14.3951
0.0493	1.2754	450	0.0584	-78.9623	-0.4738	0.0572	0.7243	0.6702	0.5430	0.6030	0.5223	14.5363
0.0447	1.4171	500	0.0572	-78.1551	-0.4434	0.0565	0.7180	0.6433	0.5336	0.6030	0.5223	14.5089
0.0421	1.5588	550	0.0577	-78.4112	-0.3865	0.0563	0.7126	0.6425	0.5399	0.6030	0.5223	14.5141
0.0415	1.7005	600	0.0583	-80.4593	-0.2526	0.0569	0.7205	0.6520	0.5352	0.6030	0.5223	14.5863
0.0409	1.8422	650	0.0573	-78.7705	-0.3179	0.0556	0.7195	0.6460	0.5409	0.6030	0.5223	14.3763
0.0377	1.9839	700	0.0579	-79.7789	-0.4899	0.0557	0.7221	0.6579	0.5450	0.6030	0.5223	14.5156
0.0339	2.1256	750	0.0577	-80.8265	-0.4062	0.0555	0.7193	0.6551	0.5455	0.6030	0.5223	14.2194
0.0346	2.2674	800	0.0577	-81.8186	-0.2681	0.0559	0.7190	0.6534	0.5440	0.6030	0.5223	14.3033
0.0334	2.4091	850	0.0585	-83.2126	-0.2941	0.0564	0.7213	0.6627	0.5419	0.6030	0.5223	14.4189
0.032	2.5508	900	0.0580	-82.8344	-0.2672	0.0564	0.7173	0.6562	0.5404	0.6030	0.5223	14.2070
0.029	2.6925	950	0.0580	-81.6322	-0.3472	0.0564	0.7177	0.6547	0.5450	0.6030	0.5223	14.1503
0.0242	2.8342	1000	0.0572	-81.8476	-0.3613	0.0555	0.7141	0.6463	0.5435	0.6030	0.5223	14.2684
0.0262	2.9759	1050	0.0582	-82.2240	-0.3030	0.0566	0.7193	0.6593	0.5409	0.6030	0.5223	14.2806
0.0234	3.1176	1100	0.0584	-83.6653	-0.2790	0.0568	0.7198	0.6624	0.5404	0.6030	0.5223	14.3429
0.022	3.2593	1150	0.0581	-83.5282	-0.3076	0.0563	0.7167	0.6564	0.5440	0.6030	0.5223	14.3960
0.021	3.4010	1200	0.0574	-82.2867	-0.3495	0.0557	0.7152	0.6455	0.5393	0.6030	0.5223	14.2067

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-W2-ES-0.1

qwen2.5-0.5b-expo-L2EXPO-W2-ES-0.1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-W2-ES-0.1

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-W2-ES-0.1

Evaluation results