qwen2.5-0.5b-expo-DPO-25-2

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-25-1 on the hZzy/train_pairwise_all_new dataset. It achieves the following results on the evaluation set:

Loss: 0.6550
Objective: 0.6503
Ranking Simple: 0.5435
Reward Accuracy: 0.6184
Logp Accuracy: 0.5435
Log Diff Policy: 2.7547
Chosen Logps: -100.4952
Rejected Logps: -103.2499
Chosen Rewards: -0.6340
Rejected Rewards: -0.8270
Logits: -1.3276

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 6
gradient_accumulation_steps: 12
total_train_batch_size: 288
total_eval_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Objective	Ranking Simple	Reward Accuracy	Logp Accuracy	Log Diff Policy	Chosen Logps	Rejected Logps	Chosen Rewards	Rejected Rewards	Logits
0.6657	0.1413	50	0.6741	0.6713	0.5193	0.5851	0.5193	1.5224	-96.4932	-98.0156	-0.2338	-0.3036	-1.1053
0.6364	0.2826	100	0.6705	0.6646	0.5405	0.5972	0.5405	2.1936	-100.8120	-103.0055	-0.6656	-0.8026	-1.2277
0.6244	0.4238	150	0.6577	0.6551	0.5368	0.6087	0.5368	2.2902	-98.6182	-100.9084	-0.4463	-0.5929	-1.3179
0.5938	0.5651	200	0.6590	0.6558	0.5362	0.6159	0.5362	2.4752	-99.3723	-101.8475	-0.5217	-0.6868	-1.2858
0.5876	0.7064	250	0.6543	0.6504	0.5447	0.6171	0.5447	2.6997	-99.8204	-102.5200	-0.5665	-0.7541	-1.3215
0.5705	0.8477	300	0.6554	0.6504	0.5447	0.6190	0.5447	2.7566	-100.6262	-103.3828	-0.6471	-0.8403	-1.3272
0.5864	0.9889	350	0.6550	0.6503	0.5435	0.6184	0.5435	2.7547	-100.4952	-103.2499	-0.6340	-0.8270	-1.3276

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-DPO-25-2

qwen2.5-0.5b-expo-DPO-25-2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-25-2

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-25-2

Evaluation results