qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-0.05-5e6

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

Loss: 0.4402
Logps: -77.2011
Logits: -0.8985
Objective: 0.4385
Dpo Loss: 0.6860
Regularize: 0.4385
Ranking Simple: 0.5320
Ranking Idealized: 0.6570
Ranking Idealized Expo: 0.5114

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 6
gradient_accumulation_steps: 12
total_train_batch_size: 288
total_eval_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Ranking Idealized	Ranking Idealized Expo
0.3566	0.2834	50	0.4133	-96.4831	-1.6203	0.4237	0.6898	0.4237	0.5165	0.6570	0.5114
0.3027	0.5668	100	0.4142	-88.4100	-1.3063	0.4151	0.6862	0.4151	0.5217	0.6570	0.5114
0.2706	0.8503	150	0.4262	-87.3674	-1.1981	0.4277	0.6857	0.4277	0.5279	0.6570	0.5114
0.2256	1.1337	200	0.4347	-81.8119	-1.2023	0.4344	0.6862	0.4344	0.5248	0.6570	0.5114
0.2005	1.4171	250	0.4292	-81.8212	-1.0616	0.4289	0.6815	0.4289	0.5227	0.6570	0.5114
0.187	1.7005	300	0.4369	-80.0077	-1.0398	0.4362	0.6845	0.4362	0.5258	0.6570	0.5114
0.1664	1.9839	350	0.4382	-79.6308	-0.9982	0.4359	0.6842	0.4359	0.5289	0.6570	0.5114
0.1368	2.2674	400	0.4408	-80.2038	-1.0155	0.4378	0.6859	0.4378	0.5320	0.6570	0.5114
0.122	2.5508	450	0.4415	-78.4288	-0.8946	0.4404	0.6863	0.4404	0.5258	0.6570	0.5114
0.1063	2.8342	500	0.4411	-78.1278	-0.8683	0.4384	0.6861	0.4384	0.5300	0.6570	0.5114
0.0878	3.1176	550	0.4406	-77.6391	-0.8292	0.4378	0.6848	0.4378	0.5331	0.6570	0.5114
0.0719	3.4010	600	0.4396	-77.4923	-0.8875	0.4373	0.6851	0.4373	0.5310	0.6570	0.5114
0.0618	3.6845	650	0.4395	-77.1838	-0.9103	0.4386	0.6855	0.4386	0.5269	0.6570	0.5114
0.0551	3.9679	700	0.4402	-77.7209	-0.9137	0.4388	0.6859	0.4388	0.5289	0.6570	0.5114
0.0388	4.2513	750	0.4404	-77.0700	-0.8976	0.4386	0.6859	0.4386	0.5310	0.6570	0.5114
0.0382	4.5347	800	0.4402	-77.2473	-0.8972	0.4384	0.6859	0.4384	0.5320	0.6570	0.5114
0.032	4.8181	850	0.4402	-77.2053	-0.8983	0.4385	0.6860	0.4385	0.5320	0.6570	0.5114

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-0.05-5e6

qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-0.05-5e6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-0.05-5e6

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-0.05-5e6

Evaluation results