zephyr-7b-dpo-lora-r16-20k

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5302
  • Rewards/chosen: -0.7891
  • Rewards/rejected: -1.4667
  • Rewards/accuracies: 0.7183
  • Rewards/margins: 0.6776
  • Logps/rejected: -394.6997
  • Logps/chosen: -362.1445
  • Logits/rejected: -2.5080
  • Logits/chosen: -2.5508

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6899 0.08 100 0.6897 0.0098 0.0028 0.6667 0.0070 -247.7543 -282.2605 -2.8468 -2.8890
0.6532 0.16 200 0.6569 -0.0128 -0.0950 0.6885 0.0822 -257.5306 -284.5143 -2.8386 -2.8782
0.6372 0.24 300 0.6181 -0.2381 -0.4406 0.6825 0.2026 -292.0921 -307.0444 -2.8033 -2.8402
0.5699 0.32 400 0.6034 -0.2658 -0.5383 0.6964 0.2725 -301.8563 -309.8138 -2.7952 -2.8319
0.5622 0.4 500 0.5688 -0.5565 -0.9794 0.7143 0.4229 -345.9727 -338.8872 -2.6913 -2.7320
0.5826 0.48 600 0.5457 -0.5456 -1.1188 0.7242 0.5732 -359.9116 -337.7992 -2.6523 -2.6907
0.5313 0.56 700 0.5387 -0.7142 -1.3304 0.7242 0.6162 -381.0734 -354.6571 -2.6173 -2.6586
0.5332 0.64 800 0.5386 -0.7256 -1.3351 0.7183 0.6096 -381.5442 -355.7965 -2.5760 -2.6167
0.5334 0.72 900 0.5368 -0.7061 -1.3229 0.7163 0.6168 -380.3204 -353.8529 -2.5574 -2.5999
0.5837 0.8 1000 0.5302 -0.7953 -1.4787 0.7163 0.6834 -395.8991 -362.7657 -2.5273 -2.5706
0.5144 0.88 1100 0.5327 -0.7410 -1.4021 0.7123 0.6611 -388.2353 -357.3381 -2.5162 -2.5586
0.5196 0.96 1200 0.5301 -0.7870 -1.4645 0.7202 0.6775 -394.4780 -361.9388 -2.5045 -2.5477

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.1.2+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
5
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LaoRay/zephyr-7b-dpo-lora-r16-20k

Adapter
(136)
this model

Dataset used to train LaoRay/zephyr-7b-dpo-lora-r16-20k