Edit model card

zephyr-7b-dpo-lora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6776
  • Rewards/chosen: 0.0182
  • Rewards/rejected: -0.0146
  • Rewards/accuracies: 0.6855
  • Rewards/margins: 0.0328
  • Logps/rejected: -262.9002
  • Logps/chosen: -280.9537
  • Logits/rejected: -2.8233
  • Logits/chosen: -2.8504

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6929 0.0262 100 0.6930 0.0001 -0.0001 0.5135 0.0002 -261.4512 -282.7630 -2.8381 -2.8655
0.693 0.0523 200 0.6928 0.0001 -0.0005 0.5470 0.0007 -261.4925 -282.7611 -2.8349 -2.8626
0.692 0.0785 300 0.6921 0.0010 -0.0011 0.6050 0.0021 -261.5461 -282.6746 -2.8378 -2.8650
0.6913 0.1047 400 0.6910 0.0036 -0.0008 0.6395 0.0044 -261.5211 -282.4127 -2.8349 -2.8622
0.689 0.1309 500 0.6895 0.0049 -0.0024 0.6700 0.0073 -261.6805 -282.2831 -2.8389 -2.8656
0.6875 0.1570 600 0.6880 0.0059 -0.0047 0.6690 0.0106 -261.9060 -282.1841 -2.8332 -2.8603
0.6874 0.1832 700 0.6864 0.0084 -0.0055 0.6785 0.0138 -261.9842 -281.9370 -2.8342 -2.8610
0.682 0.2094 800 0.6850 0.0107 -0.0060 0.6800 0.0167 -262.0419 -281.7033 -2.8307 -2.8578
0.6837 0.2355 900 0.6840 0.0136 -0.0054 0.6840 0.0190 -261.9797 -281.4180 -2.8304 -2.8573
0.6819 0.2617 1000 0.6828 0.0161 -0.0054 0.6810 0.0215 -261.9830 -281.1678 -2.8269 -2.8540
0.6836 0.2879 1100 0.6818 0.0179 -0.0057 0.6785 0.0236 -262.0052 -280.9853 -2.8258 -2.8529
0.685 0.3141 1200 0.6810 0.0221 -0.0032 0.6810 0.0253 -261.7610 -280.5679 -2.8238 -2.8510
0.6785 0.3402 1300 0.6803 0.0209 -0.0061 0.6840 0.0270 -262.0453 -280.6852 -2.8259 -2.8529
0.6828 0.3664 1400 0.6796 0.0217 -0.0066 0.6865 0.0283 -262.1007 -280.6062 -2.8233 -2.8505
0.6795 0.3926 1500 0.6792 0.0226 -0.0068 0.6830 0.0293 -262.1143 -280.5175 -2.8250 -2.8520
0.6801 0.4187 1600 0.6788 0.0194 -0.0107 0.6845 0.0301 -262.5066 -280.8286 -2.8245 -2.8516
0.6839 0.4449 1700 0.6785 0.0204 -0.0104 0.6855 0.0308 -262.4770 -280.7289 -2.8261 -2.8530
0.6793 0.4711 1800 0.6782 0.0188 -0.0126 0.6870 0.0314 -262.6961 -280.8936 -2.8248 -2.8519
0.6766 0.4973 1900 0.6781 0.0188 -0.0129 0.6810 0.0317 -262.7311 -280.8921 -2.8281 -2.8548
0.6762 0.5234 2000 0.6778 0.0190 -0.0133 0.6840 0.0323 -262.7651 -280.8749 -2.8270 -2.8538
0.6796 0.5496 2100 0.6777 0.0184 -0.0141 0.6795 0.0325 -262.8513 -280.9321 -2.8299 -2.8564
0.6736 0.5758 2200 0.6777 0.0181 -0.0145 0.6825 0.0326 -262.8893 -280.9635 -2.8306 -2.8571
0.6779 0.6019 2300 0.6776 0.0176 -0.0152 0.6875 0.0327 -262.9558 -281.0184 -2.8281 -2.8548
0.6782 0.6281 2400 0.6777 0.0179 -0.0148 0.6835 0.0327 -262.9155 -280.9810 -2.8273 -2.8540
0.6753 0.6543 2500 0.6776 0.0181 -0.0147 0.6805 0.0328 -262.9074 -280.9631 -2.8256 -2.8525
0.6776 0.6805 2600 0.6776 0.0181 -0.0148 0.6775 0.0329 -262.9167 -280.9641 -2.8226 -2.8498
0.6774 0.7066 2700 0.6775 0.0182 -0.0149 0.6860 0.0331 -262.9263 -280.9553 -2.8261 -2.8530
0.679 0.7328 2800 0.6774 0.0184 -0.0148 0.6850 0.0332 -262.9162 -280.9359 -2.8271 -2.8539
0.6782 0.7590 2900 0.6775 0.0181 -0.0150 0.6845 0.0330 -262.9336 -280.9681 -2.8260 -2.8529
0.6784 0.7851 3000 0.6774 0.0180 -0.0152 0.6890 0.0332 -262.9586 -280.9731 -2.8283 -2.8550
0.6713 0.8113 3100 0.6775 0.0181 -0.0149 0.6825 0.0330 -262.9238 -280.9596 -2.8280 -2.8547
0.6774 0.8375 3200 0.6774 0.0182 -0.0150 0.6830 0.0332 -262.9411 -280.9583 -2.8275 -2.8543
0.6781 0.8636 3300 0.6775 0.0182 -0.0148 0.6810 0.0329 -262.9146 -280.9559 -2.8293 -2.8559
0.6733 0.8898 3400 0.6775 0.0180 -0.0150 0.6825 0.0330 -262.9403 -280.9770 -2.8237 -2.8508
0.6739 0.9160 3500 0.6775 0.0180 -0.0150 0.6850 0.0331 -262.9413 -280.9686 -2.8311 -2.8575
0.6807 0.9422 3600 0.6775 0.0182 -0.0148 0.6855 0.0330 -262.9205 -280.9524 -2.8257 -2.8527
0.6731 0.9683 3700 0.6775 0.0182 -0.0147 0.6835 0.0330 -262.9113 -280.9514 -2.8239 -2.8510
0.675 0.9945 3800 0.6776 0.0182 -0.0146 0.6855 0.0328 -262.9002 -280.9546 -2.8233 -2.8504

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.2
  • Pytorch 2.2.0
  • Datasets 2.16.1
  • Tokenizers 0.19.1
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for jmajkutewicz/zephyr-7b-dpo-lora

Adapter
(136)
this model

Dataset used to train jmajkutewicz/zephyr-7b-dpo-lora