qwen2.5-0.5b-expo-L1EXPO-25-1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-25-1 on the hZzy/train_pairwise_all_new dataset. It achieves the following results on the evaluation set:

Loss: 0.0631
Objective: 0.0616
Ranking Simple: 0.5109
Reward Accuracy: 0.5
Logp Accuracy: 0.5109
Log Diff Policy: 0.8111
Chosen Logps: -92.6775
Rejected Logps: -93.4886
Chosen Rewards: 0.1478
Rejected Rewards: 0.1491
Logits: -1.0626

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 6
gradient_accumulation_steps: 12
total_train_batch_size: 288
total_eval_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Objective	Ranking Simple	Reward Accuracy	Logp Accuracy	Log Diff Policy	Chosen Logps	Rejected Logps	Chosen Rewards	Rejected Rewards	Logits
0.0296	0.1413	50	0.0419	0.0420	0.5121	0.4970	0.5121	0.8157	-94.2174	-95.0331	-0.0062	-0.0054	-1.1300
0.0535	0.2826	100	0.0373	0.0374	0.5121	0.5157	0.5121	0.8461	-94.2168	-95.0629	-0.0061	-0.0083	-1.1346
0.0894	0.4238	150	0.0618	0.0622	0.5097	0.5181	0.5097	0.8479	-93.3303	-94.1782	0.0825	0.0801	-1.0961
0.0933	0.5651	200	0.0542	0.0542	0.5115	0.5097	0.5115	0.8495	-92.8150	-93.6645	0.1341	0.1315	-1.1223
0.103	0.7064	250	0.0636	0.0638	0.5145	0.4988	0.5145	0.8236	-92.1619	-92.9855	0.1994	0.1994	-1.1356
0.1048	0.8477	300	0.0686	0.0682	0.5103	0.5030	0.5103	0.8174	-92.5324	-93.3498	0.1623	0.1630	-1.1014
0.0953	0.9889	350	0.0698	0.0692	0.5109	0.4958	0.5109	0.8028	-91.7412	-92.5441	0.2414	0.2435	-1.0740
0.0929	1.1302	400	0.0767	0.0746	0.5103	0.4825	0.5103	0.8116	-91.4983	-92.3098	0.2657	0.2670	-1.0664
0.0919	1.2715	450	0.0755	0.0731	0.5079	0.5091	0.5079	0.8534	-93.0254	-93.8788	0.1130	0.1101	-1.0604
0.0835	1.4128	500	0.0727	0.0709	0.5097	0.4915	0.5097	0.8109	-92.0981	-92.9090	0.2057	0.2070	-1.0825
0.0763	1.5540	550	0.0728	0.0717	0.5103	0.4928	0.5103	0.7904	-93.0918	-93.8822	0.1064	0.1097	-1.0860
0.0716	1.6953	600	0.0714	0.0686	0.5091	0.5012	0.5091	0.8268	-93.5857	-94.4125	0.0570	0.0567	-1.0902
0.061	1.8366	650	0.0684	0.0672	0.5091	0.5018	0.5091	0.8199	-93.3807	-94.2006	0.0775	0.0779	-1.0824
0.0548	1.9779	700	0.0686	0.0674	0.5109	0.4783	0.5109	0.7991	-92.6224	-93.4215	0.1533	0.1558	-1.0672
0.0458	2.1191	750	0.0668	0.0648	0.5091	0.5079	0.5091	0.8338	-92.9366	-93.7704	0.1219	0.1209	-1.0574
0.0424	2.2604	800	0.0660	0.0646	0.5115	0.4873	0.5115	0.8124	-92.4204	-93.2327	0.1735	0.1747	-1.0651
0.0388	2.4017	850	0.0655	0.0638	0.5109	0.5012	0.5109	0.8093	-92.5458	-93.3551	0.1610	0.1624	-1.0695
0.036	2.5430	900	0.0643	0.0628	0.5109	0.4934	0.5109	0.8045	-92.7430	-93.5475	0.1413	0.1432	-1.0605
0.0329	2.6842	950	0.0636	0.0622	0.5115	0.4964	0.5115	0.8095	-92.6690	-93.4786	0.1487	0.1501	-1.0624
0.0309	2.8255	1000	0.0629	0.0614	0.5109	0.4982	0.5109	0.8099	-92.6948	-93.5047	0.1461	0.1475	-1.0631
0.0329	2.9668	1050	0.0631	0.0616	0.5109	0.5	0.5109	0.8111	-92.6775	-93.4886	0.1478	0.1491	-1.0626

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L1EXPO-25-1

qwen2.5-0.5b-expo-L1EXPO-25-1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L1EXPO-25-1

Dataset used to train hZzy/qwen2.5-0.5b-expo-L1EXPO-25-1

Evaluation results