|
--- |
|
license: apache-2.0 |
|
library_name: peft |
|
tags: |
|
- alignment-handbook |
|
- trl |
|
- dpo |
|
- generated_from_trainer |
|
base_model: alignment-handbook/zephyr-7b-sft-full |
|
datasets: |
|
- HuggingFaceH4/ultrafeedback_binarized |
|
model-index: |
|
- name: zephyr-7b-dpo-lora |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# zephyr-7b-dpo-lora |
|
|
|
This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.6776 |
|
- Rewards/chosen: 0.0182 |
|
- Rewards/rejected: -0.0146 |
|
- Rewards/accuracies: 0.6855 |
|
- Rewards/margins: 0.0328 |
|
- Logps/rejected: -262.9002 |
|
- Logps/chosen: -280.9537 |
|
- Logits/rejected: -2.8233 |
|
- Logits/chosen: -2.8504 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 5e-07 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 2 |
|
- total_train_batch_size: 16 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: cosine |
|
- lr_scheduler_warmup_ratio: 0.1 |
|
- num_epochs: 1 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |
|
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| |
|
| 0.6929 | 0.0262 | 100 | 0.6930 | 0.0001 | -0.0001 | 0.5135 | 0.0002 | -261.4512 | -282.7630 | -2.8381 | -2.8655 | |
|
| 0.693 | 0.0523 | 200 | 0.6928 | 0.0001 | -0.0005 | 0.5470 | 0.0007 | -261.4925 | -282.7611 | -2.8349 | -2.8626 | |
|
| 0.692 | 0.0785 | 300 | 0.6921 | 0.0010 | -0.0011 | 0.6050 | 0.0021 | -261.5461 | -282.6746 | -2.8378 | -2.8650 | |
|
| 0.6913 | 0.1047 | 400 | 0.6910 | 0.0036 | -0.0008 | 0.6395 | 0.0044 | -261.5211 | -282.4127 | -2.8349 | -2.8622 | |
|
| 0.689 | 0.1309 | 500 | 0.6895 | 0.0049 | -0.0024 | 0.6700 | 0.0073 | -261.6805 | -282.2831 | -2.8389 | -2.8656 | |
|
| 0.6875 | 0.1570 | 600 | 0.6880 | 0.0059 | -0.0047 | 0.6690 | 0.0106 | -261.9060 | -282.1841 | -2.8332 | -2.8603 | |
|
| 0.6874 | 0.1832 | 700 | 0.6864 | 0.0084 | -0.0055 | 0.6785 | 0.0138 | -261.9842 | -281.9370 | -2.8342 | -2.8610 | |
|
| 0.682 | 0.2094 | 800 | 0.6850 | 0.0107 | -0.0060 | 0.6800 | 0.0167 | -262.0419 | -281.7033 | -2.8307 | -2.8578 | |
|
| 0.6837 | 0.2355 | 900 | 0.6840 | 0.0136 | -0.0054 | 0.6840 | 0.0190 | -261.9797 | -281.4180 | -2.8304 | -2.8573 | |
|
| 0.6819 | 0.2617 | 1000 | 0.6828 | 0.0161 | -0.0054 | 0.6810 | 0.0215 | -261.9830 | -281.1678 | -2.8269 | -2.8540 | |
|
| 0.6836 | 0.2879 | 1100 | 0.6818 | 0.0179 | -0.0057 | 0.6785 | 0.0236 | -262.0052 | -280.9853 | -2.8258 | -2.8529 | |
|
| 0.685 | 0.3141 | 1200 | 0.6810 | 0.0221 | -0.0032 | 0.6810 | 0.0253 | -261.7610 | -280.5679 | -2.8238 | -2.8510 | |
|
| 0.6785 | 0.3402 | 1300 | 0.6803 | 0.0209 | -0.0061 | 0.6840 | 0.0270 | -262.0453 | -280.6852 | -2.8259 | -2.8529 | |
|
| 0.6828 | 0.3664 | 1400 | 0.6796 | 0.0217 | -0.0066 | 0.6865 | 0.0283 | -262.1007 | -280.6062 | -2.8233 | -2.8505 | |
|
| 0.6795 | 0.3926 | 1500 | 0.6792 | 0.0226 | -0.0068 | 0.6830 | 0.0293 | -262.1143 | -280.5175 | -2.8250 | -2.8520 | |
|
| 0.6801 | 0.4187 | 1600 | 0.6788 | 0.0194 | -0.0107 | 0.6845 | 0.0301 | -262.5066 | -280.8286 | -2.8245 | -2.8516 | |
|
| 0.6839 | 0.4449 | 1700 | 0.6785 | 0.0204 | -0.0104 | 0.6855 | 0.0308 | -262.4770 | -280.7289 | -2.8261 | -2.8530 | |
|
| 0.6793 | 0.4711 | 1800 | 0.6782 | 0.0188 | -0.0126 | 0.6870 | 0.0314 | -262.6961 | -280.8936 | -2.8248 | -2.8519 | |
|
| 0.6766 | 0.4973 | 1900 | 0.6781 | 0.0188 | -0.0129 | 0.6810 | 0.0317 | -262.7311 | -280.8921 | -2.8281 | -2.8548 | |
|
| 0.6762 | 0.5234 | 2000 | 0.6778 | 0.0190 | -0.0133 | 0.6840 | 0.0323 | -262.7651 | -280.8749 | -2.8270 | -2.8538 | |
|
| 0.6796 | 0.5496 | 2100 | 0.6777 | 0.0184 | -0.0141 | 0.6795 | 0.0325 | -262.8513 | -280.9321 | -2.8299 | -2.8564 | |
|
| 0.6736 | 0.5758 | 2200 | 0.6777 | 0.0181 | -0.0145 | 0.6825 | 0.0326 | -262.8893 | -280.9635 | -2.8306 | -2.8571 | |
|
| 0.6779 | 0.6019 | 2300 | 0.6776 | 0.0176 | -0.0152 | 0.6875 | 0.0327 | -262.9558 | -281.0184 | -2.8281 | -2.8548 | |
|
| 0.6782 | 0.6281 | 2400 | 0.6777 | 0.0179 | -0.0148 | 0.6835 | 0.0327 | -262.9155 | -280.9810 | -2.8273 | -2.8540 | |
|
| 0.6753 | 0.6543 | 2500 | 0.6776 | 0.0181 | -0.0147 | 0.6805 | 0.0328 | -262.9074 | -280.9631 | -2.8256 | -2.8525 | |
|
| 0.6776 | 0.6805 | 2600 | 0.6776 | 0.0181 | -0.0148 | 0.6775 | 0.0329 | -262.9167 | -280.9641 | -2.8226 | -2.8498 | |
|
| 0.6774 | 0.7066 | 2700 | 0.6775 | 0.0182 | -0.0149 | 0.6860 | 0.0331 | -262.9263 | -280.9553 | -2.8261 | -2.8530 | |
|
| 0.679 | 0.7328 | 2800 | 0.6774 | 0.0184 | -0.0148 | 0.6850 | 0.0332 | -262.9162 | -280.9359 | -2.8271 | -2.8539 | |
|
| 0.6782 | 0.7590 | 2900 | 0.6775 | 0.0181 | -0.0150 | 0.6845 | 0.0330 | -262.9336 | -280.9681 | -2.8260 | -2.8529 | |
|
| 0.6784 | 0.7851 | 3000 | 0.6774 | 0.0180 | -0.0152 | 0.6890 | 0.0332 | -262.9586 | -280.9731 | -2.8283 | -2.8550 | |
|
| 0.6713 | 0.8113 | 3100 | 0.6775 | 0.0181 | -0.0149 | 0.6825 | 0.0330 | -262.9238 | -280.9596 | -2.8280 | -2.8547 | |
|
| 0.6774 | 0.8375 | 3200 | 0.6774 | 0.0182 | -0.0150 | 0.6830 | 0.0332 | -262.9411 | -280.9583 | -2.8275 | -2.8543 | |
|
| 0.6781 | 0.8636 | 3300 | 0.6775 | 0.0182 | -0.0148 | 0.6810 | 0.0329 | -262.9146 | -280.9559 | -2.8293 | -2.8559 | |
|
| 0.6733 | 0.8898 | 3400 | 0.6775 | 0.0180 | -0.0150 | 0.6825 | 0.0330 | -262.9403 | -280.9770 | -2.8237 | -2.8508 | |
|
| 0.6739 | 0.9160 | 3500 | 0.6775 | 0.0180 | -0.0150 | 0.6850 | 0.0331 | -262.9413 | -280.9686 | -2.8311 | -2.8575 | |
|
| 0.6807 | 0.9422 | 3600 | 0.6775 | 0.0182 | -0.0148 | 0.6855 | 0.0330 | -262.9205 | -280.9524 | -2.8257 | -2.8527 | |
|
| 0.6731 | 0.9683 | 3700 | 0.6775 | 0.0182 | -0.0147 | 0.6835 | 0.0330 | -262.9113 | -280.9514 | -2.8239 | -2.8510 | |
|
| 0.675 | 0.9945 | 3800 | 0.6776 | 0.0182 | -0.0146 | 0.6855 | 0.0328 | -262.9002 | -280.9546 | -2.8233 | -2.8504 | |
|
|
|
|
|
### Framework versions |
|
|
|
- PEFT 0.10.0 |
|
- Transformers 4.40.2 |
|
- Pytorch 2.2.0 |
|
- Datasets 2.16.1 |
|
- Tokenizers 0.19.1 |