qwen2.5-0.5b-expo-L2EXPO-noES-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

Loss: 0.4111
Logps: -89.2333
Logits: -1.4016
Objective: 0.4056
Dpo Loss: 0.6787
Regularize: 0.4056
Ranking Simple: 0.5352
Ranking Idealized: 0.6025
Ranking Idealized Expo: 0.5233
Wo Beta: 16.2455

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Ranking Idealized	Ranking Idealized Expo	Wo Beta
0.4024	0.1417	50	0.4110	-90.5653	-1.4391	0.4097	0.6885	0.4097	0.5264	0.6025	0.5233	16.2893
0.3435	0.2834	100	0.4067	-93.4765	-1.4915	0.4041	0.6822	0.4041	0.5305	0.6025	0.5233	16.4079
0.3184	0.4251	150	0.4066	-91.6151	-1.4215	0.4020	0.6786	0.4020	0.5342	0.6025	0.5233	16.5344
0.2935	0.5668	200	0.4100	-91.5667	-1.3884	0.4060	0.6791	0.4060	0.5336	0.6025	0.5233	16.4082
0.2854	0.7085	250	0.4143	-90.3560	-1.4712	0.4087	0.6802	0.4087	0.5342	0.6025	0.5233	16.3706
0.249	0.8503	300	0.4091	-89.8744	-1.4821	0.4058	0.6798	0.4058	0.5336	0.6025	0.5233	16.2613
0.2289	0.9920	350	0.4118	-89.6124	-1.4819	0.4047	0.6786	0.4047	0.5362	0.6025	0.5233	16.3965
0.2105	1.1337	400	0.4060	-88.2954	-1.3976	0.4024	0.6778	0.4024	0.5352	0.6025	0.5233	16.3633
0.1773	1.2754	450	0.4122	-89.4120	-1.3974	0.4051	0.6770	0.4051	0.5373	0.6025	0.5233	16.2171
0.1579	1.4171	500	0.4140	-89.1284	-1.3760	0.4073	0.6801	0.4073	0.5378	0.6025	0.5233	16.2211
0.1534	1.5588	550	0.4124	-87.6963	-1.3890	0.4048	0.6781	0.4048	0.5388	0.6025	0.5233	16.2085
0.1396	1.7005	600	0.4126	-88.8736	-1.4152	0.4050	0.6781	0.4050	0.5357	0.6025	0.5233	16.2840
0.1433	1.8422	650	0.4109	-89.4824	-1.3995	0.4050	0.6781	0.4050	0.5357	0.6025	0.5233	16.2822
0.1202	1.9839	700	0.4113	-89.1037	-1.3927	0.4061	0.6790	0.4061	0.5336	0.6025	0.5233	16.2384
0.0927	2.1256	750	0.4115	-89.5013	-1.4006	0.4053	0.6785	0.4053	0.5362	0.6025	0.5233	16.1916
0.0932	2.2674	800	0.4109	-88.9918	-1.4040	0.4055	0.6784	0.4055	0.5357	0.6025	0.5233	16.2422
0.076	2.4091	850	0.4112	-89.0524	-1.4000	0.4056	0.6788	0.4056	0.5352	0.6025	0.5233	16.2403
0.0802	2.5508	900	0.4114	-89.2338	-1.4061	0.4059	0.6787	0.4059	0.5352	0.6025	0.5233	16.2290
0.0696	2.6925	950	0.4111	-89.2200	-1.4037	0.4056	0.6787	0.4056	0.5347	0.6025	0.5233	16.2510
0.0722	2.8342	1000	0.4111	-89.2367	-1.4019	0.4057	0.6787	0.4057	0.5352	0.6025	0.5233	16.2505
0.0733	2.9759	1050	0.4111	-89.2333	-1.4016	0.4056	0.6787	0.4056	0.5352	0.6025	0.5233	16.2455

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-noES-0.1

qwen2.5-0.5b-expo-L2EXPO-noES-0.1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-noES-0.1

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-noES-0.1

Evaluation results