qwen2.5-0.5b-expo-DPO-W0-noES5-1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

Loss: 883.8085
Logps: -80.9858
Logits: -0.7331
Objective: 840.0047
Dpo Loss: 1.9391
Regularize: 1.9391
Ranking Simple: 0.5419
Wo Beta: 6.8860

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Wo Beta
421.2896	0.1417	50	429.8647	-92.0336	-1.3989	424.8991	0.9812	0.9812	0.5243	7.8078
504.7676	0.2834	100	586.4431	-92.6050	-1.2546	565.0956	1.2757	1.2757	0.5336	7.4997
647.489	0.4251	150	806.8841	-81.5770	-1.2821	788.4058	1.8202	1.8202	0.5367	7.2972
549.7892	0.5668	200	883.9946	-76.2048	-1.1825	832.8786	1.8889	1.8889	0.5316	7.1899
598.0575	0.7085	250	912.8014	-79.1561	-1.0494	878.0984	2.0106	2.0106	0.5316	7.1417
490.4698	0.8503	300	908.3519	-84.3491	-0.8243	883.9852	2.0489	2.0489	0.5373	6.9721
374.0952	0.9920	350	968.3278	-82.5209	-0.7335	906.4931	2.0826	2.0826	0.5342	6.9381
270.3782	1.1337	400	980.7469	-79.6276	-0.6857	943.6777	2.1842	2.1842	0.5316	7.0585
260.6353	1.2754	450	933.4417	-79.3049	-0.8704	893.1430	2.0753	2.0753	0.5357	6.9556
272.6055	1.4171	500	950.2914	-81.4393	-0.8079	901.2256	2.0878	2.0878	0.5269	6.8227
201.6789	1.5588	550	942.4045	-82.8839	-0.8612	899.7697	2.0626	2.0626	0.5362	6.8595
190.6931	1.7005	600	909.0859	-80.5821	-0.7143	874.9576	2.0213	2.0213	0.5362	6.8605
308.8635	1.8422	650	903.3456	-81.4960	-0.7771	858.3967	1.9757	1.9757	0.5342	6.7432
176.7641	1.9839	700	901.6802	-80.9281	-0.6642	855.0222	1.9719	1.9719	0.5399	6.8200
56.904	2.1256	750	887.2380	-81.6086	-0.7334	839.1237	1.9393	1.9393	0.5388	6.8787
63.8462	2.2674	800	877.9467	-81.2591	-0.7491	832.3641	1.9230	1.9230	0.5388	6.8255
60.559	2.4091	850	876.0621	-81.4314	-0.7166	834.9629	1.9304	1.9304	0.5393	6.8969
61.5447	2.5508	900	885.5046	-81.6223	-0.7120	842.5768	1.9455	1.9455	0.5414	6.9054
76.2992	2.6925	950	885.0244	-81.0106	-0.7336	840.8616	1.9409	1.9409	0.5414	6.8752
65.003	2.8342	1000	883.8387	-80.9709	-0.7340	839.7760	1.9386	1.9386	0.5419	6.8808
59.5302	2.9759	1050	883.8083	-80.9858	-0.7331	840.0045	1.9391	1.9391	0.5419	6.8860

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-DPO-W0-noES5-1

qwen2.5-0.5b-expo-DPO-W0-noES5-1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-W0-noES5-1

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-W0-noES5-1

Evaluation results