metadata

license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-full-prometheus-reward-scale-05
    results: []

zephyr-7b-dpo-full-prometheus-reward-scale-05

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5142
Rewards/chosen: -1.9575
Rewards/rejected: -3.2488
Rewards/accuracies: 0.7198
Rewards/margins: 1.2913
Logps/rejected: -573.1553
Logps/chosen: -455.7142
Logits/rejected: 4.0394
Logits/chosen: 3.0591

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 55
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6718	0.1143	50	0.6623	-0.0203	-0.1341	0.6595	0.1139	-261.6897	-261.9885	-2.5701	-2.6137
0.5837	0.2286	100	0.5855	-0.7924	-1.5143	0.6595	0.7219	-399.7070	-339.2018	-0.2076	-0.5772
0.5481	0.3429	150	0.5498	-1.3440	-2.2915	0.6940	0.9475	-477.4253	-394.3634	2.4336	1.6485
0.5289	0.4571	200	0.5409	-1.6426	-2.7545	0.6983	1.1119	-523.7253	-424.2230	3.5667	2.6372
0.54	0.5714	250	0.5280	-1.5391	-2.7058	0.7026	1.1667	-518.8563	-413.8667	2.6437	1.4166
0.5147	0.6857	300	0.5204	-1.8487	-3.0990	0.7112	1.2504	-558.1808	-444.8300	3.7771	2.6969
0.5033	0.8	350	0.5168	-1.8756	-3.1461	0.7414	1.2705	-562.8871	-447.5259	3.8637	2.7948
0.52	0.9143	400	0.5142	-1.9575	-3.2488	0.7198	1.2913	-573.1553	-455.7142	4.0394	3.0591

Framework versions

Transformers 4.44.0.dev0
Pytorch 2.1.2
Datasets 2.20.0
Tokenizers 0.19.1