zephyr-7b-dpo-full-prometheus-high-bleu-3-epochs

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.7540
Rewards/chosen: -3.8924
Rewards/rejected: -4.8360
Rewards/accuracies: 0.6810
Rewards/margins: 0.9436
Logps/rejected: -731.8801
Logps/chosen: -649.2000
Logits/rejected: 1.8046
Logits/chosen: 1.3830

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 55
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6701	0.3802	50	0.6608	-0.0605	-0.1235	0.6379	0.0630	-260.6238	-266.0125	-2.5430	-2.5790
0.6369	0.7605	100	0.6256	-0.4573	-0.6699	0.6379	0.2125	-315.2623	-305.6931	-2.2506	-2.3145
0.4762	1.1407	150	0.6095	-1.0277	-1.3947	0.6638	0.3669	-387.7436	-362.7340	-2.0713	-2.1464
0.4416	1.5209	200	0.6303	-1.5256	-2.0301	0.6897	0.5044	-451.2823	-412.5244	-1.8055	-1.8831
0.4058	1.9011	250	0.6470	-2.1413	-2.7297	0.6724	0.5884	-521.2467	-474.0945	-0.8046	-0.9765
0.2288	2.2814	300	0.7265	-3.4237	-4.3014	0.6724	0.8777	-678.4208	-602.3348	1.1516	0.7332
0.21	2.6616	350	0.7540	-3.8924	-4.8360	0.6810	0.9436	-731.8801	-649.2000	1.8046	1.3830

Framework versions

Transformers 4.44.0.dev0
Pytorch 2.1.2
Datasets 2.20.0
Tokenizers 0.19.1

sfulay
/

zephyr-7b-dpo-full-prometheus-high-bleu-3-epochs

zephyr-7b-dpo-full-prometheus-high-bleu-3-epochs

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for sfulay/zephyr-7b-dpo-full-prometheus-high-bleu-3-epochs

Evaluation results