zephyr-7b-dpo-qlora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.4863
Rewards/chosen: -2.8122
Rewards/rejected: -3.9101
Rewards/accuracies: 0.7395
Rewards/margins: 1.0979
Logps/rejected: -635.6185
Logps/chosen: -545.8760
Logits/rejected: -1.1318
Logits/chosen: -1.2525

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6821	0.03	100	0.6821	0.0498	0.0267	0.6565	0.0231	-241.9392	-259.6706	-1.9557	-2.0951
0.6496	0.05	200	0.6487	-0.0543	-0.1608	0.6810	0.1065	-260.6906	-270.0797	-1.9313	-2.0680
0.6042	0.08	300	0.6216	-0.3050	-0.5140	0.6730	0.2090	-296.0115	-295.1514	-1.8895	-2.0229
0.6218	0.1	400	0.5940	-0.6189	-0.9584	0.6810	0.3395	-340.4455	-326.5407	-1.8155	-1.9431
0.5674	0.13	500	0.5780	-1.5729	-2.0527	0.7040	0.4797	-449.8770	-421.9457	-1.6637	-1.7893
0.5632	0.16	600	0.5649	-0.7810	-1.2808	0.7040	0.4999	-372.6913	-342.7494	-1.6489	-1.7786
0.5331	0.18	700	0.5607	-1.9088	-2.6807	0.7060	0.7719	-512.6751	-455.5275	-1.4691	-1.5919
0.4996	0.21	800	0.5433	-1.4500	-2.1596	0.7070	0.7096	-460.5685	-409.6544	-1.5461	-1.6710
0.514	0.24	900	0.5440	-1.2657	-1.9170	0.7190	0.6512	-436.3041	-391.2230	-1.5014	-1.6214
0.5468	0.26	1000	0.5418	-1.3702	-2.0703	0.7175	0.7001	-451.6408	-401.6767	-1.4449	-1.5656
0.569	0.29	1100	0.5299	-1.1397	-1.8623	0.7210	0.7227	-430.8414	-378.6177	-1.4278	-1.5524
0.5732	0.31	1200	0.5185	-1.1057	-1.8287	0.7250	0.7231	-427.4810	-375.2183	-1.3596	-1.4804
0.5332	0.34	1300	0.5315	-2.1367	-3.0509	0.7240	0.9142	-549.7025	-478.3255	-1.1977	-1.3072
0.5431	0.37	1400	0.5211	-1.2563	-2.0974	0.7260	0.8411	-454.3522	-390.2846	-1.3130	-1.4314
0.4862	0.39	1500	0.5162	-1.3677	-2.2741	0.7355	0.9063	-472.0146	-401.4262	-1.2795	-1.4015
0.5858	0.42	1600	0.5073	-1.8100	-2.6996	0.7365	0.8896	-514.5671	-445.6515	-1.1534	-1.2718
0.5147	0.44	1700	0.5000	-2.2681	-3.2167	0.7340	0.9486	-566.2829	-491.4621	-1.1468	-1.2691
0.4809	0.47	1800	0.5022	-2.9278	-3.9903	0.7405	1.0625	-643.6409	-557.4312	-1.0617	-1.1786
0.46	0.5	1900	0.5003	-2.4333	-3.5014	0.7355	1.0681	-594.7523	-507.9823	-1.1041	-1.2253
0.477	0.52	2000	0.4989	-2.3912	-3.3897	0.7345	0.9985	-583.5771	-503.7692	-1.1185	-1.2392
0.5068	0.55	2100	0.4939	-2.4778	-3.4672	0.7430	0.9894	-591.3240	-512.4297	-1.1255	-1.2462
0.4832	0.58	2200	0.4925	-2.1250	-3.0518	0.7425	0.9268	-549.7868	-477.1522	-1.1670	-1.2899
0.4731	0.6	2300	0.4923	-2.8792	-4.0084	0.7435	1.1291	-645.4448	-552.5742	-1.0953	-1.2155
0.4782	0.63	2400	0.4923	-2.8503	-3.9248	0.7420	1.0745	-637.0914	-549.6804	-1.0794	-1.1978
0.4983	0.65	2500	0.4906	-2.5713	-3.6558	0.7410	1.0845	-610.1890	-521.7778	-1.1292	-1.2522
0.4746	0.68	2600	0.4947	-2.5857	-3.7233	0.7365	1.1375	-616.9340	-523.2234	-1.1267	-1.2491
0.514	0.71	2700	0.4924	-2.6975	-3.8049	0.7355	1.1074	-625.0958	-534.3994	-1.1248	-1.2463
0.4662	0.73	2800	0.4899	-2.8300	-3.9668	0.7380	1.1368	-641.2913	-547.6557	-1.1134	-1.2345
0.5111	0.76	2900	0.4873	-2.9392	-4.0635	0.7405	1.1244	-650.9627	-558.5706	-1.1188	-1.2396
0.4758	0.79	3000	0.4866	-2.8621	-3.9416	0.7410	1.0795	-638.7724	-550.8655	-1.1318	-1.2526
0.4908	0.81	3100	0.4869	-2.8503	-3.9411	0.7420	1.0908	-638.7193	-549.6837	-1.1347	-1.2555
0.4641	0.84	3200	0.4866	-2.8111	-3.8990	0.7405	1.0878	-634.5079	-545.7666	-1.1347	-1.2554
0.5096	0.86	3300	0.4864	-2.7992	-3.8880	0.7395	1.0887	-633.4041	-544.5740	-1.1379	-1.2586
0.455	0.89	3400	0.4866	-2.8126	-3.9082	0.7395	1.0956	-635.4322	-545.9153	-1.1336	-1.2544
0.5262	0.92	3500	0.4864	-2.8110	-3.9081	0.7410	1.0971	-635.4207	-545.7535	-1.1342	-1.2550
0.466	0.94	3600	0.4866	-2.8133	-3.9106	0.7400	1.0973	-635.6727	-545.9836	-1.1347	-1.2555
0.4945	0.97	3700	0.4864	-2.8101	-3.9080	0.7400	1.0979	-635.4124	-545.6666	-1.1321	-1.2528
0.5013	0.99	3800	0.4864	-2.8126	-3.9101	0.7395	1.0975	-635.6184	-545.9131	-1.1317	-1.2524

Framework versions

PEFT 0.7.1
Transformers 4.39.0.dev0
Pytorch 2.1.2
Datasets 2.14.6
Tokenizers 0.15.2

objects76
/

zephyr-7b-dpo-qlora

zephyr-7b-dpo-qlora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for objects76/zephyr-7b-dpo-qlora

Dataset used to train objects76/zephyr-7b-dpo-qlora

Evaluation results