metadata

license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
tags:
  - alignment-handbook
  - ndcg
  - trl
  - expo
  - generated_from_trainer
  - trl
  - expo
  - generated_from_trainer
datasets:
  - hZzy/train_pairwise
model-index:
  - name: qwen2.5-0.5b-expo-DPO-ES-1
    results: []

qwen2.5-0.5b-expo-DPO-ES-1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

Loss: 2.3243
Logps: -83.2882
Logits: -0.6651
Objective: 2.2471
Dpo Loss: 2.2471
Regularize: 2.2471
Ranking Simple: 0.5378
Ranking Idealized: 0.5295
Ranking Idealized Expo: 0.5212
Wo Beta: 6.6815

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Ranking Idealized	Ranking Idealized Expo	Wo Beta
0.7017	0.1417	50	0.8470	-93.0243	-1.4582	0.8570	0.8570	0.8570	0.5238	0.5295	0.5212	7.8507
0.8112	0.2834	100	1.0529	-86.6835	-1.4382	1.0273	1.0273	1.0273	0.5285	0.5295	0.5212	7.4982
1.0895	0.4251	150	1.4497	-84.4337	-1.2965	1.4010	1.4010	1.4010	0.5321	0.5295	0.5212	7.2692
1.2363	0.5668	200	1.7035	-77.7201	-1.2956	1.6116	1.6116	1.6116	0.5321	0.5295	0.5212	7.2264
1.3152	0.7085	250	1.9222	-92.7241	-1.2565	1.8319	1.8319	1.8319	0.5311	0.5295	0.5212	7.1856
1.1899	0.8503	300	2.0298	-90.9373	-0.9785	1.9588	1.9588	1.9588	0.5367	0.5295	0.5212	6.9336
1.1443	0.9920	350	2.1654	-82.1414	-1.0214	2.0541	2.0541	2.0541	0.5435	0.5295	0.5212	7.0024
0.725	1.1337	400	2.2884	-84.2526	-0.7535	2.2360	2.2360	2.2360	0.5336	0.5295	0.5212	7.1525
0.7629	1.2754	450	2.1606	-80.4165	-0.8866	2.0671	2.0671	2.0671	0.5321	0.5295	0.5212	6.7949
0.8044	1.4171	500	2.2094	-82.3927	-0.7503	2.0981	2.0981	2.0981	0.5347	0.5295	0.5212	6.8050
0.7105	1.5588	550	2.1697	-84.9780	-0.6734	2.0733	2.0733	2.0733	0.5321	0.5295	0.5212	6.8722
0.6925	1.7005	600	2.1957	-81.5342	-0.7411	2.0558	2.0558	2.0558	0.5357	0.5295	0.5212	6.7186
0.6883	1.8422	650	2.2080	-82.7303	-0.6908	2.1330	2.1330	2.1330	0.5383	0.5295	0.5212	6.8081
0.6486	1.9839	700	2.3243	-83.2882	-0.6651	2.2471	2.2471	2.2471	0.5378	0.5295	0.5212	6.6815
0.3793	2.1256	750	2.2675	-84.2296	-0.7879	2.1825	2.1825	2.1825	0.5409	0.5295	0.5212	6.8794
0.3314	2.2674	800	2.2106	-84.3675	-0.6651	2.1041	2.1041	2.1041	0.5414	0.5295	0.5212	6.7463
0.3301	2.4091	850	2.2964	-84.8913	-0.6177	2.2221	2.2221	2.2221	0.5388	0.5295	0.5212	6.8020
0.3509	2.5508	900	2.2796	-84.3833	-0.6097	2.2099	2.2099	2.2099	0.5393	0.5295	0.5212	6.7934
0.321	2.6925	950	2.3403	-83.2967	-0.7158	2.2649	2.2649	2.2649	0.5331	0.5295	0.5212	6.8864

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1