qwen2.5-0.5b-expo-L2EXPO-25-2

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-25-1 on the hZzy/train_pairwise_all_new dataset. It achieves the following results on the evaluation set:

Loss: 0.3732
Objective: 0.3661
Ranking Simple: 0.5272
Reward Accuracy: 0.6184
Logp Accuracy: 0.5272
Log Diff Policy: 1.4964
Chosen Logps: -93.9669
Rejected Logps: -95.4632
Chosen Rewards: 0.0189
Rejected Rewards: -0.0484
Logits: -1.0973

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 6
gradient_accumulation_steps: 12
total_train_batch_size: 288
total_eval_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 4

Training results

Training Loss	Epoch	Step	Validation Loss	Objective	Ranking Simple	Reward Accuracy	Logp Accuracy	Log Diff Policy	Chosen Logps	Rejected Logps	Chosen Rewards	Rejected Rewards	Logits
0.3775	0.1413	50	0.3834	0.3808	0.5163	0.5785	0.5163	1.0022	-94.6418	-95.6439	-0.0486	-0.0664	-1.1440
0.3567	0.2826	100	0.3814	0.3756	0.5193	0.6141	0.5193	1.2191	-95.2743	-96.4934	-0.1119	-0.1514	-1.1248
0.3556	0.4238	150	0.3829	0.3778	0.5187	0.6008	0.5187	1.3120	-97.2010	-98.5130	-0.3045	-0.3534	-1.1497
0.3116	0.5651	200	0.3788	0.3741	0.5236	0.6178	0.5236	1.3794	-95.2373	-96.6166	-0.1082	-0.1637	-1.1020
0.3111	0.7064	250	0.3802	0.3731	0.5217	0.6081	0.5217	1.3607	-95.5764	-96.9371	-0.1421	-0.1958	-1.1156
0.2888	0.8477	300	0.3775	0.3719	0.5254	0.6178	0.5254	1.4271	-95.7193	-97.1464	-0.1564	-0.2167	-1.0972
0.2742	0.9889	350	0.3778	0.3731	0.5278	0.6310	0.5278	1.4149	-92.7176	-94.1325	0.1438	0.0847	-1.1577
0.2295	1.1302	400	0.3764	0.3696	0.5272	0.6171	0.5272	1.5014	-94.6456	-96.1470	-0.0490	-0.1168	-1.1084
0.2234	1.2715	450	0.3742	0.3703	0.5248	0.6069	0.5248	1.4271	-93.7809	-95.2079	0.0375	-0.0228	-1.1391
0.2144	1.4128	500	0.3741	0.3682	0.5248	0.6220	0.5248	1.4393	-93.1956	-94.6349	0.0960	0.0345	-1.0827
0.2186	1.5540	550	0.3751	0.3683	0.5260	0.6220	0.5260	1.4528	-92.7123	-94.1651	0.1443	0.0814	-1.1178
0.205	1.6953	600	0.3762	0.3692	0.5266	0.6232	0.5266	1.4922	-93.6128	-95.1051	0.0543	-0.0126	-1.1120
0.1908	1.8366	650	0.3754	0.3680	0.5223	0.6159	0.5223	1.4726	-93.8479	-95.3205	0.0308	-0.0341	-1.1085
0.1851	1.9779	700	0.3740	0.3671	0.5242	0.6220	0.5242	1.4626	-94.0915	-95.5541	0.0064	-0.0575	-1.0983
0.1453	2.1191	750	0.3738	0.3702	0.5242	0.6178	0.5242	1.4582	-92.8502	-94.3084	0.1305	0.0671	-1.0918
0.149	2.2604	800	0.3734	0.3662	0.5290	0.625	0.5290	1.5033	-94.1187	-95.6221	0.0037	-0.0643	-1.0989
0.1548	2.4017	850	0.3725	0.3662	0.5236	0.6184	0.5236	1.4822	-94.0088	-95.4911	0.0147	-0.0512	-1.0865
0.1333	2.5430	900	0.3721	0.3650	0.5260	0.6202	0.5260	1.4965	-94.1236	-95.6201	0.0032	-0.0641	-1.1158
0.1414	2.6842	950	0.3729	0.3671	0.5266	0.6214	0.5266	1.4965	-94.4185	-95.9149	-0.0263	-0.0935	-1.0838
0.1371	2.8255	1000	0.3739	0.3688	0.5248	0.6147	0.5248	1.4881	-93.8768	-95.3649	0.0279	-0.0385	-1.0965
0.1193	2.9668	1050	0.3736	0.3660	0.5266	0.6153	0.5266	1.4860	-93.4251	-94.9111	0.0730	0.0068	-1.0944
0.1002	3.1081	1100	0.3729	0.3656	0.5260	0.6178	0.5260	1.4959	-93.4099	-94.9058	0.0746	0.0074	-1.0990
0.1031	3.2494	1150	0.3733	0.3665	0.5266	0.6208	0.5266	1.4998	-94.1445	-95.6443	0.0011	-0.0665	-1.0853
0.095	3.3906	1200	0.3732	0.3659	0.5260	0.6208	0.5260	1.4867	-93.9840	-95.4707	0.0172	-0.0491	-1.0953
0.1014	3.5319	1250	0.3734	0.3665	0.5272	0.6226	0.5272	1.4976	-94.1020	-95.5996	0.0054	-0.0620	-1.0973
0.0949	3.6732	1300	0.3734	0.3664	0.5272	0.6178	0.5272	1.4947	-93.9755	-95.4702	0.0180	-0.0491	-1.0977
0.096	3.8145	1350	0.3733	0.3661	0.5272	0.6190	0.5272	1.4969	-93.9574	-95.4542	0.0198	-0.0475	-1.0971
0.1032	3.9557	1400	0.3732	0.3661	0.5272	0.6184	0.5272	1.4964	-93.9669	-95.4632	0.0189	-0.0484	-1.0973

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-25-2

qwen2.5-0.5b-expo-L2EXPO-25-2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-25-2

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-25-2

Evaluation results