zephyr-7b-dpo-uffull-qlora-5e-7

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.5924
Rewards/chosen: -0.2516
Rewards/rejected: -0.6013
Rewards/accuracies: 0.7321
Rewards/margins: 0.3497
Rewards/margins Max: 1.2300
Rewards/margins Min: -0.5547
Rewards/margins Std: 0.6038
Logps/rejected: -322.2831
Logps/chosen: -309.6581
Logits/rejected: -2.6832
Logits/chosen: -2.7155

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 16
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Rewards/margins Max	Rewards/margins Min	Rewards/margins Std	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6929	0.03	100	0.6930	0.0001	-0.0003	0.5377	0.0004	0.0054	-0.0041	0.0032	-262.1841	-284.4886	-2.7819	-2.8200
0.6922	0.05	200	0.6923	0.0008	-0.0010	0.6627	0.0019	0.0100	-0.0058	0.0051	-262.2543	-284.4120	-2.7814	-2.8195
0.6908	0.08	300	0.6903	0.0041	-0.0025	0.7143	0.0066	0.0281	-0.0141	0.0137	-262.3995	-284.0884	-2.7806	-2.8185
0.689	0.1	400	0.6870	0.0093	-0.0046	0.7183	0.0140	0.0586	-0.0282	0.0285	-262.6125	-283.5621	-2.7783	-2.8162
0.6813	0.13	500	0.6813	0.0235	-0.0040	0.7242	0.0275	0.1137	-0.0534	0.0551	-262.5450	-282.1426	-2.7758	-2.8132
0.6712	0.16	600	0.6742	0.0200	-0.0247	0.7262	0.0447	0.1814	-0.0859	0.0884	-264.6151	-282.4901	-2.7638	-2.8015
0.6643	0.18	700	0.6653	0.0004	-0.0668	0.7242	0.0672	0.2707	-0.1305	0.1329	-268.8295	-284.4591	-2.7558	-2.7925
0.6421	0.21	800	0.6562	-0.0231	-0.1154	0.7222	0.0923	0.3706	-0.1761	0.1820	-273.6847	-286.8017	-2.7519	-2.7880
0.648	0.24	900	0.6480	-0.0748	-0.1938	0.7183	0.1190	0.4823	-0.2242	0.2359	-281.5314	-291.9791	-2.7477	-2.7835
0.6547	0.26	1000	0.6378	-0.0763	-0.2278	0.7183	0.1515	0.5995	-0.2816	0.2954	-284.9341	-292.1262	-2.7446	-2.7798
0.6408	0.29	1100	0.6317	-0.0432	-0.2136	0.7262	0.1704	0.6414	-0.2953	0.3163	-283.5132	-288.8173	-2.7545	-2.7885
0.6358	0.31	1200	0.6260	-0.0529	-0.2480	0.7183	0.1952	0.7219	-0.3249	0.3520	-286.9514	-289.7809	-2.7585	-2.7914
0.6297	0.34	1300	0.6215	-0.1213	-0.3378	0.7143	0.2165	0.8114	-0.3727	0.4028	-295.9312	-296.6275	-2.7489	-2.7816
0.6165	0.37	1400	0.6213	-0.2177	-0.4420	0.7103	0.2243	0.8626	-0.4022	0.4264	-306.3474	-306.2648	-2.7404	-2.7733
0.6185	0.39	1500	0.6162	-0.1021	-0.3356	0.7063	0.2335	0.8779	-0.3976	0.4349	-295.7101	-294.7082	-2.7425	-2.7745
0.6066	0.42	1600	0.6141	-0.1696	-0.4256	0.7123	0.2560	0.9394	-0.4398	0.4678	-304.7078	-301.4554	-2.7367	-2.7689
0.6048	0.44	1700	0.6123	-0.1220	-0.3748	0.7123	0.2529	0.9411	-0.4235	0.4656	-299.6321	-296.6920	-2.7315	-2.7638
0.609	0.47	1800	0.6090	-0.1424	-0.4122	0.7282	0.2698	0.9829	-0.4478	0.4813	-303.3703	-298.7344	-2.7251	-2.7574
0.5909	0.5	1900	0.6062	-0.2373	-0.5239	0.7183	0.2866	1.0475	-0.4860	0.5181	-314.5422	-308.2264	-2.7186	-2.7507
0.6011	0.52	2000	0.6048	-0.1288	-0.4109	0.7242	0.2821	1.0037	-0.4627	0.4932	-303.2409	-297.3789	-2.7100	-2.7425
0.6047	0.55	2100	0.6031	-0.1486	-0.4420	0.7262	0.2934	1.0559	-0.4792	0.5193	-306.3505	-299.3512	-2.7123	-2.7448
0.592	0.58	2200	0.6011	-0.2623	-0.5777	0.7242	0.3154	1.1326	-0.5284	0.5638	-319.9217	-310.7270	-2.7100	-2.7423
0.6285	0.6	2300	0.6022	-0.3099	-0.6207	0.7242	0.3108	1.1254	-0.5181	0.5570	-324.2166	-315.4819	-2.7044	-2.7370
0.6258	0.63	2400	0.6005	-0.1642	-0.4737	0.7302	0.3095	1.0716	-0.4957	0.5259	-309.5165	-300.9170	-2.6960	-2.7291
0.5855	0.65	2500	0.5981	-0.2145	-0.5381	0.7341	0.3237	1.1337	-0.5235	0.5568	-315.9617	-305.9418	-2.6924	-2.7253
0.6095	0.68	2600	0.5970	-0.2416	-0.5724	0.7262	0.3308	1.1753	-0.5364	0.5756	-319.3885	-308.6579	-2.6859	-2.7187
0.6013	0.71	2700	0.5961	-0.2450	-0.5789	0.7262	0.3340	1.1924	-0.5460	0.5830	-320.0433	-308.9903	-2.6845	-2.7170
0.6233	0.73	2800	0.5954	-0.2426	-0.5787	0.7302	0.3361	1.2015	-0.5491	0.5882	-320.0177	-308.7550	-2.6852	-2.7174
0.6119	0.76	2900	0.5944	-0.2613	-0.6032	0.7282	0.3419	1.2206	-0.5595	0.6006	-322.4701	-310.6289	-2.6853	-2.7176
0.5644	0.79	3000	0.5938	-0.2218	-0.5648	0.7282	0.3430	1.1989	-0.5312	0.5872	-318.6263	-306.6716	-2.6826	-2.7150
0.5946	0.81	3100	0.5932	-0.2763	-0.6239	0.7262	0.3476	1.2359	-0.5639	0.6094	-324.5376	-312.1256	-2.6762	-2.7090
0.5961	0.84	3200	0.5930	-0.2713	-0.6200	0.7262	0.3487	1.2365	-0.5595	0.6090	-324.1454	-311.6203	-2.6815	-2.7140
0.5841	0.86	3300	0.5927	-0.2686	-0.6177	0.7302	0.3491	1.2362	-0.5602	0.6093	-323.9175	-311.3521	-2.6834	-2.7157
0.611	0.89	3400	0.5925	-0.2485	-0.5979	0.7361	0.3493	1.2281	-0.5496	0.6023	-321.9356	-309.3477	-2.6821	-2.7145
0.5458	0.92	3500	0.5925	-0.2494	-0.5988	0.7341	0.3494	1.2280	-0.5516	0.6025	-322.0256	-309.4359	-2.6792	-2.7118
0.5926	0.94	3600	0.5925	-0.2520	-0.6014	0.7321	0.3494	1.2312	-0.5539	0.6042	-322.2860	-309.6909	-2.6837	-2.7160
0.6096	0.97	3700	0.5926	-0.2517	-0.6015	0.7341	0.3497	1.2313	-0.5539	0.6042	-322.2966	-309.6683	-2.6793	-2.7119
0.5865	0.99	3800	0.5925	-0.2517	-0.6019	0.7341	0.3502	1.2316	-0.5546	0.6038	-322.3433	-309.6684	-2.6801	-2.7126

Framework versions

PEFT 0.7.1
Transformers 4.39.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

just1nseo
/

zephyr-7b-dpo-uffull-qlora-5e-7

zephyr-7b-dpo-uffull-qlora-5e-7

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for just1nseo/zephyr-7b-dpo-uffull-qlora-5e-7

Dataset used to train just1nseo/zephyr-7b-dpo-uffull-qlora-5e-7

Evaluation results