qwen2.5-0.5b-expo-DPO-noES-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

Loss: 0.8493
Logps: -132.8567
Logits: -1.8165
Objective: 0.8653
Dpo Loss: 0.8653
Regularize: 0.8653
Ranking Simple: 0.5347
Wo Beta: 10.9418

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Wo Beta
0.6719	0.1417	50	0.6856	-89.6776	-1.4697	0.6879	0.6879	0.6879	0.5269	7.9221
0.6459	0.2834	100	0.6765	-92.9954	-1.6511	0.6793	0.6793	0.6793	0.5347	7.8727
0.5993	0.4251	150	0.6771	-95.2729	-1.6963	0.6805	0.6805	0.6805	0.5347	8.2155
0.5557	0.5668	200	0.6858	-115.4680	-1.8150	0.6866	0.6866	0.6866	0.5295	7.9607
0.5428	0.7085	250	0.6745	-102.5668	-1.8495	0.6741	0.6741	0.6741	0.5367	7.9891
0.4987	0.8503	300	0.7119	-110.0949	-1.9277	0.7203	0.7203	0.7203	0.5373	8.9267
0.4599	0.9920	350	0.6886	-104.9833	-1.8474	0.6912	0.6912	0.6912	0.5352	8.3749
0.3498	1.1337	400	0.7463	-115.0889	-1.8807	0.7518	0.7518	0.7518	0.5518	9.5505
0.3361	1.2754	450	0.7563	-116.8004	-1.8356	0.7673	0.7673	0.7673	0.5419	9.7252
0.3584	1.4171	500	0.7635	-117.5167	-1.8626	0.7695	0.7695	0.7695	0.5419	9.6319
0.3343	1.5588	550	0.7698	-123.3863	-1.8209	0.7814	0.7814	0.7814	0.5352	9.8258
0.3105	1.7005	600	0.7679	-119.8231	-1.7866	0.7761	0.7761	0.7761	0.5383	9.8031
0.3412	1.8422	650	0.7750	-122.2944	-1.8323	0.7848	0.7848	0.7848	0.5383	9.9494
0.3156	1.9839	700	0.8013	-126.3939	-1.8338	0.8139	0.8139	0.8139	0.5378	10.3247
0.2183	2.1256	750	0.8467	-131.1257	-1.7999	0.8604	0.8604	0.8604	0.5352	10.8931
0.2338	2.2674	800	0.8480	-132.1160	-1.8070	0.8641	0.8641	0.8641	0.5352	10.9810
0.2015	2.4091	850	0.8572	-133.3811	-1.8018	0.8720	0.8720	0.8720	0.5378	11.0252
0.2348	2.5508	900	0.8530	-133.6796	-1.8114	0.8675	0.8675	0.8675	0.5378	10.9423
0.2268	2.6925	950	0.8525	-133.2829	-1.8136	0.8684	0.8684	0.8684	0.5336	10.9785
0.2198	2.8342	1000	0.8493	-132.8809	-1.8167	0.8652	0.8652	0.8652	0.5342	10.9383
0.2221	2.9759	1050	0.8493	-132.8567	-1.8165	0.8653	0.8653	0.8653	0.5347	10.9418

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-DPO-noES-0.1

qwen2.5-0.5b-expo-DPO-noES-0.1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-noES-0.1

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-noES-0.1

Evaluation results