qwen2.5-0.5b-expo-L1EXPO-ES-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

Loss: 0.5234
Logps: -82.5192
Logits: -0.4757
Objective: 0.5225
Dpo Loss: 0.7512
Regularize: 0.5225
Ranking Simple: 0.5254
Ranking Idealized: 0.6030
Ranking Idealized Expo: 0.5223
Wo Beta: 14.0055

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Dpo Loss	Logits	Logps	Validation Loss	Objective	Ranking Idealized	Ranking Idealized Expo	Ranking Simple	Regularize	Wo Beta
0.0448	0.1417	50	0.6936	-1.4299	-90.3888	0.0622	0.0621	0.6030	0.5223	0.5243	0.0621	16.0768
0.1716	0.2834	100	0.6982	-1.3597	-88.7675	0.1556	0.1559	0.6030	0.5223	0.5274	0.1559	15.9436
0.2858	0.4251	150	0.7183	-1.2546	-79.5067	0.2912	0.2923	0.6030	0.5223	0.5228	0.2923	15.0570
0.3544	0.5668	200	0.7309	-0.8432	-83.8485	0.3898	0.3890	0.6030	0.5223	0.5228	0.3890	14.7122
0.375	0.7085	250	0.7353	-0.6734	-81.2900	0.4398	0.4375	0.6030	0.5223	0.5243	0.4375	14.4729
0.3592	0.8503	300	0.7348	-0.5501	-84.4144	0.4422	0.4388	0.6030	0.5223	0.5233	0.4388	14.4403
0.3351	0.9920	350	0.7354	-0.5360	-82.9375	0.4676	0.4602	0.6030	0.5223	0.5342	0.4602	14.2722
0.3056	1.1337	400	0.7470	-0.5686	-80.5606	0.4842	0.4804	0.6030	0.5223	0.5254	0.4804	14.2812
0.2932	1.2754	450	0.7439	-0.5565	-83.6231	0.4805	0.4755	0.6030	0.5223	0.5280	0.4755	14.4640
0.2864	1.4171	500	0.7510	-0.6557	-82.9178	0.4964	0.4971	0.6030	0.5223	0.5274	0.4971	14.2823
0.2635	1.5588	550	0.7503	-0.6184	-81.1614	0.5023	0.5043	0.6030	0.5223	0.5228	0.5043	14.0632
0.2561	1.7005	600	0.7487	-0.5805	-84.7039	0.4980	0.4964	0.6030	0.5223	0.5233	0.4964	14.3352
0.2448	1.8422	650	0.7503	-0.4274	-83.4629	0.5171	0.5191	0.6030	0.5223	0.5233	0.5191	14.2153
0.2235	1.9839	700	0.7483	-0.5057	-81.7196	0.4963	0.4949	0.6030	0.5223	0.5233	0.4949	14.2026
0.21	2.1256	750	0.7512	-0.4757	-82.5192	0.5234	0.5225	0.6030	0.5223	0.5254	0.5225	14.0055
0.1988	2.2674	800	0.7496	-0.5578	-81.0564	0.5140	0.5114	0.6030	0.5223	0.5295	0.5114	14.1030
0.1845	2.4091	850	0.7516	-0.5129	-82.6326	0.5205	0.5186	0.6030	0.5223	0.5311	0.5186	14.1518
0.1741	2.5508	900	0.7507	-0.4790	-82.9809	0.5132	0.5118	0.6030	0.5223	0.5238	0.5118	14.2459
0.1659	2.6925	950	0.7500	-0.4840	-83.8330	0.5189	0.5193	0.6030	0.5223	0.5238	0.5193	14.3029
0.1539	2.8342	1000	0.7499	-0.4671	-82.8831	0.5137	0.5127	0.6030	0.5223	0.5269	0.5127	14.1925
0.1445	2.9806	1050	0.5116	-83.1677	-0.5531	0.5112	0.7478	0.5112	0.5248	0.6030	0.5223	14.2141
0.1261	3.1223	1100	0.5157	-83.5954	-0.5488	0.5165	0.7515	0.5165	0.5233	0.6030	0.5223	14.1783
0.1146	3.2641	1150	0.5175	-83.4265	-0.5372	0.5161	0.7487	0.5161	0.5264	0.6030	0.5223	14.1956
0.1076	3.4058	1200	0.5169	-83.9912	-0.4946	0.5160	0.7492	0.5160	0.5274	0.6030	0.5223	14.1241
0.0981	3.5475	1250	0.5175	-83.3791	-0.5087	0.5185	0.7500	0.5185	0.5311	0.6030	0.5223	14.2158

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L1EXPO-ES-0.1

qwen2.5-0.5b-expo-L1EXPO-ES-0.1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L1EXPO-ES-0.1

Dataset used to train hZzy/qwen2.5-0.5b-expo-L1EXPO-ES-0.1

Evaluation results