zephyr-7b-dpo-lora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.6776
Rewards/chosen: 0.0182
Rewards/rejected: -0.0146
Rewards/accuracies: 0.6855
Rewards/margins: 0.0328
Logps/rejected: -262.9002
Logps/chosen: -280.9537
Logits/rejected: -2.8233
Logits/chosen: -2.8504

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6929	0.0262	100	0.6930	0.0001	-0.0001	0.5135	0.0002	-261.4512	-282.7630	-2.8381	-2.8655
0.693	0.0523	200	0.6928	0.0001	-0.0005	0.5470	0.0007	-261.4925	-282.7611	-2.8349	-2.8626
0.692	0.0785	300	0.6921	0.0010	-0.0011	0.6050	0.0021	-261.5461	-282.6746	-2.8378	-2.8650
0.6913	0.1047	400	0.6910	0.0036	-0.0008	0.6395	0.0044	-261.5211	-282.4127	-2.8349	-2.8622
0.689	0.1309	500	0.6895	0.0049	-0.0024	0.6700	0.0073	-261.6805	-282.2831	-2.8389	-2.8656
0.6875	0.1570	600	0.6880	0.0059	-0.0047	0.6690	0.0106	-261.9060	-282.1841	-2.8332	-2.8603
0.6874	0.1832	700	0.6864	0.0084	-0.0055	0.6785	0.0138	-261.9842	-281.9370	-2.8342	-2.8610
0.682	0.2094	800	0.6850	0.0107	-0.0060	0.6800	0.0167	-262.0419	-281.7033	-2.8307	-2.8578
0.6837	0.2355	900	0.6840	0.0136	-0.0054	0.6840	0.0190	-261.9797	-281.4180	-2.8304	-2.8573
0.6819	0.2617	1000	0.6828	0.0161	-0.0054	0.6810	0.0215	-261.9830	-281.1678	-2.8269	-2.8540
0.6836	0.2879	1100	0.6818	0.0179	-0.0057	0.6785	0.0236	-262.0052	-280.9853	-2.8258	-2.8529
0.685	0.3141	1200	0.6810	0.0221	-0.0032	0.6810	0.0253	-261.7610	-280.5679	-2.8238	-2.8510
0.6785	0.3402	1300	0.6803	0.0209	-0.0061	0.6840	0.0270	-262.0453	-280.6852	-2.8259	-2.8529
0.6828	0.3664	1400	0.6796	0.0217	-0.0066	0.6865	0.0283	-262.1007	-280.6062	-2.8233	-2.8505
0.6795	0.3926	1500	0.6792	0.0226	-0.0068	0.6830	0.0293	-262.1143	-280.5175	-2.8250	-2.8520
0.6801	0.4187	1600	0.6788	0.0194	-0.0107	0.6845	0.0301	-262.5066	-280.8286	-2.8245	-2.8516
0.6839	0.4449	1700	0.6785	0.0204	-0.0104	0.6855	0.0308	-262.4770	-280.7289	-2.8261	-2.8530
0.6793	0.4711	1800	0.6782	0.0188	-0.0126	0.6870	0.0314	-262.6961	-280.8936	-2.8248	-2.8519
0.6766	0.4973	1900	0.6781	0.0188	-0.0129	0.6810	0.0317	-262.7311	-280.8921	-2.8281	-2.8548
0.6762	0.5234	2000	0.6778	0.0190	-0.0133	0.6840	0.0323	-262.7651	-280.8749	-2.8270	-2.8538
0.6796	0.5496	2100	0.6777	0.0184	-0.0141	0.6795	0.0325	-262.8513	-280.9321	-2.8299	-2.8564
0.6736	0.5758	2200	0.6777	0.0181	-0.0145	0.6825	0.0326	-262.8893	-280.9635	-2.8306	-2.8571
0.6779	0.6019	2300	0.6776	0.0176	-0.0152	0.6875	0.0327	-262.9558	-281.0184	-2.8281	-2.8548
0.6782	0.6281	2400	0.6777	0.0179	-0.0148	0.6835	0.0327	-262.9155	-280.9810	-2.8273	-2.8540
0.6753	0.6543	2500	0.6776	0.0181	-0.0147	0.6805	0.0328	-262.9074	-280.9631	-2.8256	-2.8525
0.6776	0.6805	2600	0.6776	0.0181	-0.0148	0.6775	0.0329	-262.9167	-280.9641	-2.8226	-2.8498
0.6774	0.7066	2700	0.6775	0.0182	-0.0149	0.6860	0.0331	-262.9263	-280.9553	-2.8261	-2.8530
0.679	0.7328	2800	0.6774	0.0184	-0.0148	0.6850	0.0332	-262.9162	-280.9359	-2.8271	-2.8539
0.6782	0.7590	2900	0.6775	0.0181	-0.0150	0.6845	0.0330	-262.9336	-280.9681	-2.8260	-2.8529
0.6784	0.7851	3000	0.6774	0.0180	-0.0152	0.6890	0.0332	-262.9586	-280.9731	-2.8283	-2.8550
0.6713	0.8113	3100	0.6775	0.0181	-0.0149	0.6825	0.0330	-262.9238	-280.9596	-2.8280	-2.8547
0.6774	0.8375	3200	0.6774	0.0182	-0.0150	0.6830	0.0332	-262.9411	-280.9583	-2.8275	-2.8543
0.6781	0.8636	3300	0.6775	0.0182	-0.0148	0.6810	0.0329	-262.9146	-280.9559	-2.8293	-2.8559
0.6733	0.8898	3400	0.6775	0.0180	-0.0150	0.6825	0.0330	-262.9403	-280.9770	-2.8237	-2.8508
0.6739	0.9160	3500	0.6775	0.0180	-0.0150	0.6850	0.0331	-262.9413	-280.9686	-2.8311	-2.8575
0.6807	0.9422	3600	0.6775	0.0182	-0.0148	0.6855	0.0330	-262.9205	-280.9524	-2.8257	-2.8527
0.6731	0.9683	3700	0.6775	0.0182	-0.0147	0.6835	0.0330	-262.9113	-280.9514	-2.8239	-2.8510
0.675	0.9945	3800	0.6776	0.0182	-0.0146	0.6855	0.0328	-262.9002	-280.9546	-2.8233	-2.8504

Framework versions

PEFT 0.10.0
Transformers 4.40.2
Pytorch 2.2.0
Datasets 2.16.1
Tokenizers 0.19.1

jmajkutewicz
/

zephyr-7b-dpo-lora

zephyr-7b-dpo-lora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jmajkutewicz/zephyr-7b-dpo-lora

Dataset used to train jmajkutewicz/zephyr-7b-dpo-lora

Evaluation results