metadata
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
base_model: alignment-handbook/zephyr-7b-sft-full
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: zephyr-7b-dpo-lora
results: []
zephyr-7b-dpo-lora
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.6776
- Rewards/chosen: 0.0182
- Rewards/rejected: -0.0146
- Rewards/accuracies: 0.6855
- Rewards/margins: 0.0328
- Logps/rejected: -262.9002
- Logps/chosen: -280.9537
- Logits/rejected: -2.8233
- Logits/chosen: -2.8504
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6929 | 0.0262 | 100 | 0.6930 | 0.0001 | -0.0001 | 0.5135 | 0.0002 | -261.4512 | -282.7630 | -2.8381 | -2.8655 |
0.693 | 0.0523 | 200 | 0.6928 | 0.0001 | -0.0005 | 0.5470 | 0.0007 | -261.4925 | -282.7611 | -2.8349 | -2.8626 |
0.692 | 0.0785 | 300 | 0.6921 | 0.0010 | -0.0011 | 0.6050 | 0.0021 | -261.5461 | -282.6746 | -2.8378 | -2.8650 |
0.6913 | 0.1047 | 400 | 0.6910 | 0.0036 | -0.0008 | 0.6395 | 0.0044 | -261.5211 | -282.4127 | -2.8349 | -2.8622 |
0.689 | 0.1309 | 500 | 0.6895 | 0.0049 | -0.0024 | 0.6700 | 0.0073 | -261.6805 | -282.2831 | -2.8389 | -2.8656 |
0.6875 | 0.1570 | 600 | 0.6880 | 0.0059 | -0.0047 | 0.6690 | 0.0106 | -261.9060 | -282.1841 | -2.8332 | -2.8603 |
0.6874 | 0.1832 | 700 | 0.6864 | 0.0084 | -0.0055 | 0.6785 | 0.0138 | -261.9842 | -281.9370 | -2.8342 | -2.8610 |
0.682 | 0.2094 | 800 | 0.6850 | 0.0107 | -0.0060 | 0.6800 | 0.0167 | -262.0419 | -281.7033 | -2.8307 | -2.8578 |
0.6837 | 0.2355 | 900 | 0.6840 | 0.0136 | -0.0054 | 0.6840 | 0.0190 | -261.9797 | -281.4180 | -2.8304 | -2.8573 |
0.6819 | 0.2617 | 1000 | 0.6828 | 0.0161 | -0.0054 | 0.6810 | 0.0215 | -261.9830 | -281.1678 | -2.8269 | -2.8540 |
0.6836 | 0.2879 | 1100 | 0.6818 | 0.0179 | -0.0057 | 0.6785 | 0.0236 | -262.0052 | -280.9853 | -2.8258 | -2.8529 |
0.685 | 0.3141 | 1200 | 0.6810 | 0.0221 | -0.0032 | 0.6810 | 0.0253 | -261.7610 | -280.5679 | -2.8238 | -2.8510 |
0.6785 | 0.3402 | 1300 | 0.6803 | 0.0209 | -0.0061 | 0.6840 | 0.0270 | -262.0453 | -280.6852 | -2.8259 | -2.8529 |
0.6828 | 0.3664 | 1400 | 0.6796 | 0.0217 | -0.0066 | 0.6865 | 0.0283 | -262.1007 | -280.6062 | -2.8233 | -2.8505 |
0.6795 | 0.3926 | 1500 | 0.6792 | 0.0226 | -0.0068 | 0.6830 | 0.0293 | -262.1143 | -280.5175 | -2.8250 | -2.8520 |
0.6801 | 0.4187 | 1600 | 0.6788 | 0.0194 | -0.0107 | 0.6845 | 0.0301 | -262.5066 | -280.8286 | -2.8245 | -2.8516 |
0.6839 | 0.4449 | 1700 | 0.6785 | 0.0204 | -0.0104 | 0.6855 | 0.0308 | -262.4770 | -280.7289 | -2.8261 | -2.8530 |
0.6793 | 0.4711 | 1800 | 0.6782 | 0.0188 | -0.0126 | 0.6870 | 0.0314 | -262.6961 | -280.8936 | -2.8248 | -2.8519 |
0.6766 | 0.4973 | 1900 | 0.6781 | 0.0188 | -0.0129 | 0.6810 | 0.0317 | -262.7311 | -280.8921 | -2.8281 | -2.8548 |
0.6762 | 0.5234 | 2000 | 0.6778 | 0.0190 | -0.0133 | 0.6840 | 0.0323 | -262.7651 | -280.8749 | -2.8270 | -2.8538 |
0.6796 | 0.5496 | 2100 | 0.6777 | 0.0184 | -0.0141 | 0.6795 | 0.0325 | -262.8513 | -280.9321 | -2.8299 | -2.8564 |
0.6736 | 0.5758 | 2200 | 0.6777 | 0.0181 | -0.0145 | 0.6825 | 0.0326 | -262.8893 | -280.9635 | -2.8306 | -2.8571 |
0.6779 | 0.6019 | 2300 | 0.6776 | 0.0176 | -0.0152 | 0.6875 | 0.0327 | -262.9558 | -281.0184 | -2.8281 | -2.8548 |
0.6782 | 0.6281 | 2400 | 0.6777 | 0.0179 | -0.0148 | 0.6835 | 0.0327 | -262.9155 | -280.9810 | -2.8273 | -2.8540 |
0.6753 | 0.6543 | 2500 | 0.6776 | 0.0181 | -0.0147 | 0.6805 | 0.0328 | -262.9074 | -280.9631 | -2.8256 | -2.8525 |
0.6776 | 0.6805 | 2600 | 0.6776 | 0.0181 | -0.0148 | 0.6775 | 0.0329 | -262.9167 | -280.9641 | -2.8226 | -2.8498 |
0.6774 | 0.7066 | 2700 | 0.6775 | 0.0182 | -0.0149 | 0.6860 | 0.0331 | -262.9263 | -280.9553 | -2.8261 | -2.8530 |
0.679 | 0.7328 | 2800 | 0.6774 | 0.0184 | -0.0148 | 0.6850 | 0.0332 | -262.9162 | -280.9359 | -2.8271 | -2.8539 |
0.6782 | 0.7590 | 2900 | 0.6775 | 0.0181 | -0.0150 | 0.6845 | 0.0330 | -262.9336 | -280.9681 | -2.8260 | -2.8529 |
0.6784 | 0.7851 | 3000 | 0.6774 | 0.0180 | -0.0152 | 0.6890 | 0.0332 | -262.9586 | -280.9731 | -2.8283 | -2.8550 |
0.6713 | 0.8113 | 3100 | 0.6775 | 0.0181 | -0.0149 | 0.6825 | 0.0330 | -262.9238 | -280.9596 | -2.8280 | -2.8547 |
0.6774 | 0.8375 | 3200 | 0.6774 | 0.0182 | -0.0150 | 0.6830 | 0.0332 | -262.9411 | -280.9583 | -2.8275 | -2.8543 |
0.6781 | 0.8636 | 3300 | 0.6775 | 0.0182 | -0.0148 | 0.6810 | 0.0329 | -262.9146 | -280.9559 | -2.8293 | -2.8559 |
0.6733 | 0.8898 | 3400 | 0.6775 | 0.0180 | -0.0150 | 0.6825 | 0.0330 | -262.9403 | -280.9770 | -2.8237 | -2.8508 |
0.6739 | 0.9160 | 3500 | 0.6775 | 0.0180 | -0.0150 | 0.6850 | 0.0331 | -262.9413 | -280.9686 | -2.8311 | -2.8575 |
0.6807 | 0.9422 | 3600 | 0.6775 | 0.0182 | -0.0148 | 0.6855 | 0.0330 | -262.9205 | -280.9524 | -2.8257 | -2.8527 |
0.6731 | 0.9683 | 3700 | 0.6775 | 0.0182 | -0.0147 | 0.6835 | 0.0330 | -262.9113 | -280.9514 | -2.8239 | -2.8510 |
0.675 | 0.9945 | 3800 | 0.6776 | 0.0182 | -0.0146 | 0.6855 | 0.0328 | -262.9002 | -280.9546 | -2.8233 | -2.8504 |
Framework versions
- PEFT 0.10.0
- Transformers 4.40.2
- Pytorch 2.2.0
- Datasets 2.16.1
- Tokenizers 0.19.1