qwen2.5-0.5b-expo-L2EXPO-ES-1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

Loss: 4.8991
Logps: -80.1851
Logits: -0.4846
Objective: 4.9208
Dpo Loss: 2.5732
Regularize: 4.9208
Ranking Simple: 0.5238
Ranking Idealized: 0.5295
Ranking Idealized Expo: 0.5212
Wo Beta: 14.1271

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Ranking Idealized	Ranking Idealized Expo	Wo Beta
0.6418	0.1417	50	0.7369	-89.5788	-1.4384	0.7343	0.7480	0.7343	0.5248	0.5295	0.5212	16.0414
1.7208	0.2834	100	1.7082	-87.8064	-1.3168	1.6950	1.0867	1.6950	0.5228	0.5295	0.5212	15.5148
2.841	0.4251	150	2.9302	-83.1791	-1.1086	2.8768	1.6352	2.8768	0.5300	0.5295	0.5212	15.0680
3.5072	0.5668	200	4.2317	-80.2960	-0.8688	4.2210	2.3120	4.2210	0.5155	0.5295	0.5212	14.5319
3.7707	0.7085	250	4.3648	-80.5389	-0.7639	4.3627	2.2988	4.3627	0.5212	0.5295	0.5212	14.5663
3.5773	0.8503	300	4.3904	-83.8565	-0.5388	4.3972	2.2955	4.3972	0.5238	0.5295	0.5212	14.3098
3.359	0.9920	350	4.6868	-82.1212	-0.5555	4.6293	2.4176	4.6293	0.5264	0.5295	0.5212	14.3177
3.0892	1.1337	400	4.8991	-80.1851	-0.4846	4.9208	2.5732	4.9208	0.5238	0.5295	0.5212	14.1271
3.001	1.2754	450	4.8651	-82.0773	-0.5097	4.8038	2.4966	4.8038	0.5233	0.5295	0.5212	14.2309
2.8358	1.4171	500	4.8734	-81.9592	-0.4937	4.8544	2.5685	4.8544	0.5243	0.5295	0.5212	14.2662
2.6622	1.5588	550	4.8760	-81.5020	-0.5513	4.9098	2.5441	4.9098	0.5243	0.5295	0.5212	14.2522
2.5417	1.7005	600	5.0324	-83.9181	-0.5043	5.0251	2.5863	5.0251	0.5259	0.5295	0.5212	14.2325
2.435	1.8422	650	5.0286	-83.8820	-0.4938	5.0013	2.6194	5.0013	0.5197	0.5295	0.5212	14.2504

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-ES-1

qwen2.5-0.5b-expo-L2EXPO-ES-1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-ES-1

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-ES-1

Evaluation results