qwen2.5-0.5b-expo-DPO-W2-noES6-1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

Loss: 0.1677
Logps: -77.5515
Logits: -1.0043
Objective: 0.1588
Regularize: 1.7957
Ranking Simple: 0.5461
Wo Beta: 6.9275

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Regularize	Ranking Simple	Wo Beta
0.0717	0.1417	50	0.0827	-89.4542	-1.3980	0.0833	0.9434	0.5233	7.6909
0.0782	0.2834	100	0.1198	-90.3233	-1.3848	0.1299	1.2948	0.5274	7.5300
0.1041	0.4251	150	0.1430	-80.8115	-1.4142	0.1445	1.5652	0.5331	7.2053
0.103	0.5668	200	0.1575	-79.3788	-1.2549	0.1620	1.6985	0.5383	7.0079
0.1301	0.7085	250	0.1765	-80.7283	-1.2947	0.1779	1.9721	0.5373	7.3699
0.0929	0.8503	300	0.1742	-83.1364	-1.0915	0.1719	1.9650	0.5399	7.3246
0.092	0.9920	350	0.1930	-78.9639	-1.2151	0.1810	1.9965	0.5492	6.8384
0.0713	1.1337	400	0.1963	-76.5565	-1.1860	0.1929	2.1515	0.5399	7.1718
0.0243	1.2754	450	0.1856	-78.4444	-1.1245	0.1782	2.0181	0.5414	7.0177
0.0514	1.4171	500	0.1857	-77.6606	-1.1929	0.1755	1.9383	0.5393	6.9356
0.0577	1.5588	550	0.1760	-79.1478	-1.0419	0.1699	1.8917	0.5450	6.9556
0.0391	1.7005	600	0.1791	-80.1474	-0.8913	0.1668	1.9362	0.5461	6.8670
0.0392	1.8422	650	0.1726	-78.0514	-0.9358	0.1615	1.8093	0.5512	6.8786
0.0385	1.9839	700	0.1687	-77.0163	-1.0116	0.1563	1.8321	0.5471	6.9309
0.0198	2.1256	750	0.1707	-78.2445	-1.0465	0.1584	1.8388	0.5492	6.8687
0.0072	2.2674	800	0.1708	-78.1994	-1.0332	0.1614	1.8241	0.5461	6.8566
0.0128	2.4091	850	0.1695	-77.6488	-0.9753	0.1603	1.8026	0.5487	6.8586
0.0105	2.5508	900	0.1680	-77.8885	-1.0018	0.1587	1.8027	0.5461	6.9417
0.0111	2.6925	950	0.1676	-77.6180	-1.0011	0.1585	1.8000	0.5466	6.9417
0.0122	2.8342	1000	0.1676	-77.5617	-1.0044	0.1588	1.7963	0.5461	6.9304
0.0117	2.9759	1050	0.1677	-77.5515	-1.0043	0.1588	1.7957	0.5461	6.9275

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-DPO-W2-noES6-1

qwen2.5-0.5b-expo-DPO-W2-noES6-1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-W2-noES6-1

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-W2-noES6-1

Evaluation results