qwen2.5-0.5b-expo-L1EXPO-ES-1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

Loss: 4.8354
Logps: -80.1753
Logits: -0.6936
Objective: 4.8114
Dpo Loss: 2.5735
Regularize: 4.8114
Ranking Simple: 0.5248
Ranking Idealized: 0.5295
Ranking Idealized Expo: 0.5212
Wo Beta: 13.9356

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Ranking Idealized	Ranking Idealized Expo	Wo Beta
0.4306	0.1417	50	0.5493	-90.4264	-1.4289	0.5433	0.7632	0.5433	0.5212	0.5295	0.5212	16.2237
1.748	0.2834	100	1.6975	-88.0491	-1.2535	1.6864	1.1354	1.6864	0.5228	0.5295	0.5212	15.6834
2.8697	0.4251	150	2.9624	-82.4967	-1.2524	2.8923	1.6846	2.8923	0.5243	0.5295	0.5212	15.1970
3.5268	0.5668	200	4.0302	-75.9716	-0.9581	3.9597	2.1590	3.9597	0.5238	0.5295	0.5212	14.5792
3.7241	0.7085	250	4.2694	-81.3047	-0.7680	4.2728	2.3310	4.2728	0.5259	0.5295	0.5212	14.5615
3.6109	0.8503	300	4.4908	-83.9815	-0.6388	4.4573	2.4072	4.4573	0.5264	0.5295	0.5212	14.3464
3.36	0.9920	350	4.6586	-80.7491	-0.5030	4.6212	2.4991	4.6212	0.5212	0.5295	0.5212	14.3467
3.112	1.1337	400	4.7244	-82.4974	-0.5664	4.7293	2.5403	4.7293	0.5186	0.5295	0.5212	14.4038
2.9448	1.2754	450	4.8354	-80.1753	-0.6936	4.8114	2.5735	4.8114	0.5248	0.5295	0.5212	13.9356
2.8517	1.4171	500	5.0044	-80.7676	-0.5973	5.0058	2.6782	5.0058	0.5269	0.5295	0.5212	14.2626
2.632	1.5588	550	4.8777	-80.5219	-0.6149	4.8844	2.5752	4.8844	0.5223	0.5295	0.5212	14.1469
2.5208	1.7005	600	4.9258	-80.1775	-0.5875	4.9621	2.5974	4.9621	0.5243	0.5295	0.5212	14.2669
2.4198	1.8422	650	5.0327	-81.0550	-0.5441	5.0454	2.6345	5.0454	0.5269	0.5295	0.5212	14.2479
2.2699	1.9839	700	4.9659	-79.7376	-0.5594	4.9951	2.6292	4.9951	0.5212	0.5295	0.5212	14.1755

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L1EXPO-ES-1

qwen2.5-0.5b-expo-L1EXPO-ES-1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L1EXPO-ES-1

Dataset used to train hZzy/qwen2.5-0.5b-expo-L1EXPO-ES-1

Evaluation results