--- license: apache-2.0 library_name: peft tags: - alignment-handbook - trl - dpo - generated_from_trainer base_model: alignment-handbook/zephyr-7b-sft-full datasets: - HuggingFaceH4/ultrafeedback_binarized model-index: - name: zephyr-7b-dpo-lora results: [] --- # zephyr-7b-dpo-lora This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 0.5894 - Rewards/chosen: -0.2738 - Rewards/rejected: -0.6020 - Rewards/accuracies: 0.7035 - Rewards/margins: 0.3282 - Logps/rejected: -321.6407 - Logps/chosen: -310.1199 - Logits/rejected: -2.7529 - Logits/chosen: -2.7746 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6929 | 0.0262 | 100 | 0.6930 | -0.0001 | -0.0004 | 0.5250 | 0.0003 | -261.4788 | -282.7496 | -2.8388 | -2.8661 | | 0.6923 | 0.0523 | 200 | 0.6923 | 0.0008 | -0.0009 | 0.6050 | 0.0017 | -261.5316 | -282.6624 | -2.8380 | -2.8653 | | 0.6898 | 0.0785 | 300 | 0.6903 | 0.0035 | -0.0024 | 0.6640 | 0.0058 | -261.6760 | -282.3918 | -2.8350 | -2.8623 | | 0.6872 | 0.1047 | 400 | 0.6862 | 0.0165 | 0.0021 | 0.6670 | 0.0144 | -261.2256 | -281.0900 | -2.8308 | -2.8577 | | 0.6783 | 0.1309 | 500 | 0.6804 | 0.0209 | -0.0059 | 0.6835 | 0.0267 | -262.0230 | -280.6481 | -2.8215 | -2.8486 | | 0.6729 | 0.1570 | 600 | 0.6733 | 0.0154 | -0.0272 | 0.6840 | 0.0426 | -264.1608 | -281.1958 | -2.8138 | -2.8410 | | 0.6665 | 0.1832 | 700 | 0.6638 | -0.0035 | -0.0689 | 0.6755 | 0.0654 | -268.3266 | -283.0863 | -2.8060 | -2.8327 | | 0.6427 | 0.2094 | 800 | 0.6546 | -0.0214 | -0.1104 | 0.6815 | 0.0889 | -272.4747 | -284.8825 | -2.8020 | -2.8283 | | 0.6428 | 0.2355 | 900 | 0.6458 | -0.0247 | -0.1383 | 0.6770 | 0.1136 | -275.2685 | -285.2050 | -2.7942 | -2.8199 | | 0.6381 | 0.2617 | 1000 | 0.6358 | -0.0638 | -0.2074 | 0.6785 | 0.1436 | -282.1761 | -289.1206 | -2.7887 | -2.8138 | | 0.6488 | 0.2879 | 1100 | 0.6284 | -0.1378 | -0.3055 | 0.6790 | 0.1677 | -291.9890 | -296.5138 | -2.7826 | -2.8071 | | 0.6427 | 0.3141 | 1200 | 0.6223 | -0.1104 | -0.2986 | 0.6835 | 0.1882 | -291.3028 | -293.7785 | -2.7931 | -2.8165 | | 0.6131 | 0.3402 | 1300 | 0.6172 | -0.1466 | -0.3514 | 0.6865 | 0.2049 | -296.5806 | -297.3945 | -2.7951 | -2.8180 | | 0.6326 | 0.3664 | 1400 | 0.6155 | -0.1752 | -0.3896 | 0.6860 | 0.2144 | -300.3966 | -300.2597 | -2.7920 | -2.8147 | | 0.6128 | 0.3926 | 1500 | 0.6180 | -0.0630 | -0.2687 | 0.6890 | 0.2057 | -288.3090 | -289.0369 | -2.7980 | -2.8198 | | 0.6223 | 0.4187 | 1600 | 0.6088 | -0.1688 | -0.4097 | 0.6945 | 0.2409 | -302.4074 | -299.6220 | -2.7926 | -2.8148 | | 0.6338 | 0.4449 | 1700 | 0.6061 | -0.2152 | -0.4665 | 0.6925 | 0.2513 | -308.0869 | -304.2535 | -2.7961 | -2.8181 | | 0.585 | 0.4711 | 1800 | 0.6050 | -0.1327 | -0.3850 | 0.6915 | 0.2523 | -299.9368 | -296.0054 | -2.7949 | -2.8174 | | 0.577 | 0.4973 | 1900 | 0.6013 | -0.2170 | -0.4883 | 0.6965 | 0.2713 | -310.2670 | -304.4333 | -2.7954 | -2.8176 | | 0.5945 | 0.5234 | 2000 | 0.5992 | -0.2107 | -0.4899 | 0.6995 | 0.2793 | -310.4293 | -303.8028 | -2.7903 | -2.8122 | | 0.5913 | 0.5496 | 2100 | 0.5981 | -0.2373 | -0.5251 | 0.7025 | 0.2879 | -313.9529 | -306.4641 | -2.7863 | -2.8085 | | 0.5816 | 0.5758 | 2200 | 0.5989 | -0.2688 | -0.5570 | 0.6970 | 0.2883 | -317.1411 | -309.6146 | -2.7849 | -2.8070 | | 0.5824 | 0.6019 | 2300 | 0.5961 | -0.2227 | -0.5189 | 0.6955 | 0.2961 | -313.3233 | -305.0098 | -2.7821 | -2.8037 | | 0.602 | 0.6281 | 2400 | 0.5969 | -0.2683 | -0.5669 | 0.6990 | 0.2986 | -318.1251 | -309.5652 | -2.7744 | -2.7961 | | 0.5792 | 0.6543 | 2500 | 0.5963 | -0.2102 | -0.5041 | 0.6975 | 0.2938 | -311.8429 | -303.7615 | -2.7763 | -2.7980 | | 0.6028 | 0.6805 | 2600 | 0.5974 | -0.1896 | -0.4790 | 0.6920 | 0.2895 | -309.3417 | -301.6964 | -2.7717 | -2.7933 | | 0.5854 | 0.7066 | 2700 | 0.5930 | -0.2517 | -0.5615 | 0.7020 | 0.3098 | -317.5864 | -307.9027 | -2.7676 | -2.7892 | | 0.5994 | 0.7328 | 2800 | 0.5920 | -0.2607 | -0.5775 | 0.7045 | 0.3167 | -319.1838 | -308.8107 | -2.7636 | -2.7851 | | 0.5837 | 0.7590 | 2900 | 0.5913 | -0.2540 | -0.5721 | 0.7055 | 0.3181 | -318.6511 | -308.1379 | -2.7619 | -2.7834 | | 0.5858 | 0.7851 | 3000 | 0.5910 | -0.2625 | -0.5835 | 0.7055 | 0.3210 | -319.7853 | -308.9898 | -2.7605 | -2.7819 | | 0.5685 | 0.8113 | 3100 | 0.5914 | -0.2383 | -0.5571 | 0.7040 | 0.3188 | -317.1507 | -306.5707 | -2.7558 | -2.7777 | | 0.5753 | 0.8375 | 3200 | 0.5903 | -0.2623 | -0.5868 | 0.7020 | 0.3246 | -320.1224 | -308.9666 | -2.7567 | -2.7783 | | 0.5769 | 0.8636 | 3300 | 0.5900 | -0.2673 | -0.5934 | 0.7030 | 0.3260 | -320.7757 | -309.4716 | -2.7555 | -2.7771 | | 0.5608 | 0.8898 | 3400 | 0.5896 | -0.2716 | -0.5988 | 0.7020 | 0.3273 | -321.3196 | -309.8930 | -2.7520 | -2.7739 | | 0.6008 | 0.9160 | 3500 | 0.5895 | -0.2716 | -0.5994 | 0.7035 | 0.3277 | -321.3745 | -309.9000 | -2.7539 | -2.7755 | | 0.585 | 0.9422 | 3600 | 0.5895 | -0.2722 | -0.6000 | 0.7020 | 0.3279 | -321.4418 | -309.9531 | -2.7549 | -2.7764 | | 0.567 | 0.9683 | 3700 | 0.5893 | -0.2738 | -0.6022 | 0.7015 | 0.3284 | -321.6555 | -310.1171 | -2.7539 | -2.7755 | | 0.5834 | 0.9945 | 3800 | 0.5893 | -0.2740 | -0.6023 | 0.7025 | 0.3283 | -321.6666 | -310.1333 | -2.7525 | -2.7742 | ### Framework versions - PEFT 0.10.0 - Transformers 4.40.0 - Pytorch 2.2.0 - Datasets 2.16.1 - Tokenizers 0.19.1