zephyr-7b-dpo-full-prometheus-high-margin-3-epochs

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5829
Rewards/chosen: -2.8437
Rewards/rejected: -4.4971
Rewards/accuracies: 0.7629
Rewards/margins: 1.6534
Logps/rejected: -697.9909
Logps/chosen: -544.3318
Logits/rejected: 4.2569
Logits/chosen: 2.5320

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 55
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.5554	0.3802	50	0.6140	-0.1671	-0.4180	0.6897	0.2509	-290.0737	-276.6687	-2.5786	-2.6244
0.3147	0.7605	100	0.5617	-1.1542	-2.1116	0.7328	0.9574	-459.4361	-375.3786	1.5949	0.7404
0.214	1.1407	150	0.5560	-1.2961	-2.4277	0.7457	1.1316	-491.0475	-389.5718	2.0487	0.8077
0.1866	1.5209	200	0.5364	-1.4494	-2.6242	0.7414	1.1748	-510.6940	-404.8973	2.0297	0.7929
0.1899	1.9011	250	0.5391	-1.7883	-3.0786	0.7457	1.2902	-556.1323	-438.7930	2.5714	1.2524
0.1083	2.2814	300	0.5584	-2.1009	-3.5313	0.7629	1.4304	-601.4120	-470.0527	3.6920	2.2903
0.0955	2.6616	350	0.5829	-2.8437	-4.4971	0.7629	1.6534	-697.9909	-544.3318	4.2569	2.5320

Framework versions

Transformers 4.44.0.dev0
Pytorch 2.1.2
Datasets 2.20.0
Tokenizers 0.19.1

sfulay
/

zephyr-7b-dpo-full-prometheus-high-margin-3-epochs

zephyr-7b-dpo-full-prometheus-high-margin-3-epochs

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for sfulay/zephyr-7b-dpo-full-prometheus-high-margin-3-epochs

Evaluation results