qwen2.5-0.5b-expo-L2EXPO-ES-0.001

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

Loss: 0.3942
Logps: -573.5075
Logits: -8.8910
Objective: 0.3931
Dpo Loss: 0.6728
Regularize: 0.3931
Ranking Simple: 0.6102
Ranking Idealized: 0.9871
Ranking Idealized Expo: 0.6320
Wo Beta: 160.3578

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Dpo Loss	Logits	Logps	Validation Loss	Objective	Ranking Idealized	Ranking Idealized Expo	Ranking Simple	Regularize	Wo Beta
0.418	0.1417	50	0.6927	-1.8105	-107.4392	0.4150	0.4128	0.9871	0.6320	0.5352	0.4128	22.0269
0.416	0.2834	100	0.6896	-2.0485	-230.8855	0.4087	0.4081	0.9871	0.6320	0.5559	0.4081	52.9494
0.387	0.4251	150	0.6844	-3.9840	-343.5519	0.4032	0.4021	0.9871	0.6320	0.5766	0.4021	90.2215
0.3587	0.5668	200	0.6754	-6.1681	-390.3867	0.3917	0.3893	0.9871	0.6320	0.6004	0.3893	124.6577
0.3299	0.7085	250	0.6765	-7.7444	-474.0688	0.3968	0.3968	0.9871	0.6320	0.5958	0.3968	147.7626
0.294	0.8503	300	0.6728	-8.8910	-573.5075	0.3942	0.3931	0.9871	0.6320	0.6102	0.3931	160.3578
0.2753	0.9920	350	0.6731	-9.9981	-593.1101	0.3965	0.3960	0.9871	0.6320	0.5937	0.3960	171.5761
0.2316	1.1337	400	0.6718	-9.6479	-564.7661	0.3966	0.3956	0.9871	0.6320	0.5875	0.3956	171.6054
0.2205	1.2754	450	0.6725	-10.9673	-599.2516	0.3962	0.3983	0.9871	0.6320	0.5859	0.3983	182.4877
0.2058	1.4171	500	0.6741	-9.6175	-589.5045	0.4005	0.4029	0.9871	0.6320	0.5797	0.4029	188.1013
0.2027	1.5588	550	0.6730	-10.3937	-622.4691	0.3995	0.4000	0.9871	0.6320	0.5947	0.4000	185.8620
0.1897	1.7029	600	0.4028	-755.1119	-11.5540	0.4023	0.6716	0.4023	0.5952	0.9871	0.6320	201.2357
0.1797	1.8446	650	0.3997	-673.7770	-10.8193	0.3992	0.6730	0.3992	0.5942	0.9871	0.6320	188.3079
0.1689	1.9863	700	0.3985	-653.8336	-11.0772	0.3970	0.6713	0.3970	0.5911	0.9871	0.6320	182.3852
0.1492	2.1280	750	0.3959	-624.3672	-11.4717	0.3956	0.6708	0.3956	0.6025	0.9871	0.6320	182.7602
0.143	2.2697	800	0.3955	-657.3067	-11.2559	0.3958	0.6701	0.3958	0.6009	0.9871	0.6320	190.5371

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-ES-0.001

qwen2.5-0.5b-expo-L2EXPO-ES-0.001

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-ES-0.001

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-ES-0.001

Evaluation results