--- base_model: alignment-handbook/zephyr-7b-sft-full datasets: - generation/UF - generation/UFfull2 library_name: peft license: apache-2.0 tags: - alignment-handbook - trl - dpo - generated_from_trainer model-index: - name: zephyr-dpop-qlora-uf-ours-uffull-5e-6 results: [] --- # zephyr-dpop-qlora-uf-ours-uffull-5e-6 This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the generation/UF and the generation/UFfull2 datasets. It achieves the following results on the evaluation set: - Loss: 0.6950 - Positive Losses: 0.5820 - Dpo Losses: 0.6380 - Rewards/chosen: 0.2290 - Rewards/rejected: 0.0996 - Rewards/accuracies: 0.7060 - Rewards/margins: 0.1294 - Rewards/margins Max: 0.5134 - Rewards/margins Min: -0.1814 - Rewards/margins Std: 0.2328 - Logps/rejected: -255.8980 - Logps/chosen: -261.5583 - Logits/rejected: -2.6096 - Logits/chosen: -2.6435 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:---------------:|:----------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6915 | 0.02 | 100 | 0.6917 | 0.0059 | 0.6910 | 0.0266 | 0.0222 | 0.6170 | 0.0043 | 0.0246 | -0.0134 | 0.0126 | -263.6297 | -281.7968 | -2.7663 | -2.8014 | | 0.6797 | 0.05 | 200 | 0.6897 | 0.0702 | 0.6800 | 0.0886 | 0.0608 | 0.6570 | 0.0278 | 0.1378 | -0.0648 | 0.0675 | -259.7737 | -275.5939 | -2.7413 | -2.7759 | | 0.6804 | 0.07 | 300 | 0.6845 | 0.0848 | 0.6724 | 0.1325 | 0.0877 | 0.6675 | 0.0448 | 0.2086 | -0.0924 | 0.1004 | -257.0813 | -271.2012 | -2.7504 | -2.7853 | | 0.6951 | 0.1 | 400 | 0.6829 | 0.1179 | 0.6671 | 0.1575 | 0.1005 | 0.6715 | 0.0570 | 0.2589 | -0.1125 | 0.1237 | -255.7986 | -268.7028 | -2.6989 | -2.7337 | | 0.6599 | 0.12 | 500 | 0.6868 | 0.1747 | 0.6620 | 0.1717 | 0.1030 | 0.6805 | 0.0688 | 0.2913 | -0.1240 | 0.1393 | -255.5571 | -267.2820 | -2.6656 | -2.7019 | | 0.6899 | 0.14 | 600 | 0.6773 | 0.1322 | 0.6631 | 0.1930 | 0.1265 | 0.6805 | 0.0665 | 0.2912 | -0.1245 | 0.1385 | -253.2036 | -265.1512 | -2.6976 | -2.7346 | | 0.6596 | 0.17 | 700 | 0.6841 | 0.2476 | 0.6579 | 0.1952 | 0.1160 | 0.6790 | 0.0792 | 0.3399 | -0.1420 | 0.1603 | -254.2511 | -264.9378 | -2.6481 | -2.6842 | | 0.6618 | 0.19 | 800 | 0.7055 | 0.6819 | 0.6582 | 0.1938 | 0.1128 | 0.6725 | 0.0810 | 0.3642 | -0.1653 | 0.1763 | -254.5748 | -265.0780 | -2.6749 | -2.7097 | | 0.6742 | 0.22 | 900 | 0.7031 | 0.6125 | 0.6568 | 0.1979 | 0.1141 | 0.6810 | 0.0839 | 0.3706 | -0.1651 | 0.1783 | -254.4471 | -264.6613 | -2.6218 | -2.6566 | | 0.6751 | 0.24 | 1000 | 0.7010 | 0.6677 | 0.6601 | 0.2068 | 0.1295 | 0.6755 | 0.0773 | 0.3517 | -0.1632 | 0.1718 | -252.9047 | -263.7737 | -2.6192 | -2.6553 | | 0.7098 | 0.26 | 1100 | 0.7131 | 0.8234 | 0.6548 | 0.1971 | 0.1068 | 0.6775 | 0.0903 | 0.3961 | -0.1800 | 0.1920 | -255.1729 | -264.7435 | -2.6144 | -2.6518 | | 0.6678 | 0.29 | 1200 | 0.7126 | 0.8054 | 0.6533 | 0.2007 | 0.1068 | 0.6810 | 0.0938 | 0.4066 | -0.1769 | 0.1949 | -255.1695 | -264.3879 | -2.5888 | -2.6260 | | 0.6611 | 0.31 | 1300 | 0.7072 | 0.7968 | 0.6584 | 0.2114 | 0.1291 | 0.6725 | 0.0823 | 0.3729 | -0.1733 | 0.1825 | -252.9392 | -263.3107 | -2.5893 | -2.6265 | | 0.6852 | 0.34 | 1400 | 0.7117 | 0.8828 | 0.6578 | 0.2125 | 0.1283 | 0.6865 | 0.0842 | 0.3801 | -0.1702 | 0.1839 | -253.0243 | -263.2099 | -2.5908 | -2.6269 | | 0.7148 | 0.36 | 1500 | 0.7147 | 0.8994 | 0.6537 | 0.2082 | 0.1146 | 0.6775 | 0.0936 | 0.4107 | -0.1826 | 0.1980 | -254.3940 | -263.6350 | -2.5606 | -2.5971 | | 0.734 | 0.38 | 1600 | 0.7263 | 0.9562 | 0.6467 | 0.1975 | 0.0887 | 0.7005 | 0.1088 | 0.4496 | -0.1881 | 0.2128 | -256.9880 | -264.7073 | -2.5414 | -2.5748 | | 0.68 | 0.41 | 1700 | 0.6886 | 0.4934 | 0.6531 | 0.2201 | 0.1281 | 0.6895 | 0.0920 | 0.3890 | -0.1655 | 0.1858 | -253.0398 | -262.4442 | -2.6144 | -2.6469 | | 0.9221 | 0.43 | 1800 | 0.6972 | 0.5938 | 0.6479 | 0.2127 | 0.1083 | 0.6855 | 0.1044 | 0.4219 | -0.1737 | 0.2001 | -255.0207 | -263.1860 | -2.6572 | -2.6883 | | 0.6965 | 0.45 | 1900 | 0.7029 | 0.5493 | 0.6415 | 0.2047 | 0.0857 | 0.6980 | 0.1190 | 0.4554 | -0.1734 | 0.2113 | -257.2836 | -263.9902 | -2.6385 | -2.6680 | | 0.6754 | 0.48 | 2000 | 0.6736 | 0.2085 | 0.6476 | 0.2262 | 0.1217 | 0.6960 | 0.1045 | 0.4193 | -0.1652 | 0.1960 | -253.6813 | -261.8383 | -2.6573 | -2.6879 | | 0.6527 | 0.5 | 2100 | 0.6734 | 0.1901 | 0.6479 | 0.2309 | 0.1262 | 0.6940 | 0.1046 | 0.4298 | -0.1721 | 0.2013 | -253.2316 | -261.3691 | -2.6274 | -2.6587 | | 0.6693 | 0.53 | 2200 | 0.6811 | 0.3594 | 0.6470 | 0.2250 | 0.1186 | 0.6885 | 0.1064 | 0.4311 | -0.1714 | 0.2022 | -253.9932 | -261.9567 | -2.6328 | -2.6644 | | 0.6652 | 0.55 | 2300 | 0.6946 | 0.5078 | 0.6431 | 0.2178 | 0.1017 | 0.6895 | 0.1161 | 0.4629 | -0.1818 | 0.2158 | -255.6816 | -262.6781 | -2.6122 | -2.6429 | | 0.6511 | 0.57 | 2400 | 0.6755 | 0.2132 | 0.6463 | 0.2309 | 0.1228 | 0.6960 | 0.1081 | 0.4351 | -0.1715 | 0.2030 | -253.5698 | -261.3663 | -2.6075 | -2.6392 | | 0.6512 | 0.6 | 2500 | 0.7102 | 0.5940 | 0.6370 | 0.2139 | 0.0822 | 0.6990 | 0.1318 | 0.5141 | -0.1918 | 0.2364 | -257.6378 | -263.0636 | -2.6184 | -2.6519 | | 0.7342 | 0.62 | 2600 | 0.6884 | 0.3826 | 0.6413 | 0.2233 | 0.1023 | 0.7040 | 0.1210 | 0.4842 | -0.1791 | 0.2219 | -255.6233 | -262.1221 | -2.6165 | -2.6506 | | 0.6754 | 0.65 | 2700 | 0.6847 | 0.3415 | 0.6419 | 0.2283 | 0.1092 | 0.7055 | 0.1192 | 0.4752 | -0.1765 | 0.2181 | -254.9368 | -261.6212 | -2.6158 | -2.6511 | | 0.7445 | 0.67 | 2800 | 0.6769 | 0.2621 | 0.6445 | 0.2313 | 0.1188 | 0.7020 | 0.1125 | 0.4532 | -0.1690 | 0.2084 | -253.9747 | -261.3299 | -2.6176 | -2.6513 | | 0.6656 | 0.69 | 2900 | 0.6867 | 0.4407 | 0.6412 | 0.2299 | 0.1090 | 0.7045 | 0.1208 | 0.4813 | -0.1757 | 0.2199 | -254.9489 | -261.4680 | -2.6212 | -2.6566 | | 0.6641 | 0.72 | 3000 | 0.6918 | 0.5290 | 0.6395 | 0.2278 | 0.1026 | 0.7025 | 0.1252 | 0.4930 | -0.1780 | 0.2250 | -255.5911 | -261.6767 | -2.6344 | -2.6687 | | 0.6752 | 0.74 | 3100 | 0.6963 | 0.6115 | 0.6398 | 0.2272 | 0.1021 | 0.7030 | 0.1252 | 0.5000 | -0.1806 | 0.2279 | -255.6473 | -261.7339 | -2.6282 | -2.6628 | | 0.6417 | 0.77 | 3200 | 0.7057 | 0.7185 | 0.6364 | 0.2246 | 0.0908 | 0.7040 | 0.1338 | 0.5276 | -0.1863 | 0.2394 | -256.7738 | -261.9981 | -2.6277 | -2.6619 | | 0.6436 | 0.79 | 3300 | 0.7146 | 0.8124 | 0.6342 | 0.2203 | 0.0808 | 0.7040 | 0.1395 | 0.5452 | -0.1905 | 0.2463 | -257.7732 | -262.4228 | -2.6190 | -2.6530 | | 0.7092 | 0.81 | 3400 | 0.6972 | 0.6209 | 0.6389 | 0.2266 | 0.0993 | 0.7015 | 0.1273 | 0.5073 | -0.1826 | 0.2310 | -255.9223 | -261.7928 | -2.6091 | -2.6431 | | 0.6491 | 0.84 | 3500 | 0.6972 | 0.6241 | 0.6390 | 0.2273 | 0.1003 | 0.7020 | 0.1270 | 0.5062 | -0.1824 | 0.2306 | -255.8255 | -261.7234 | -2.6038 | -2.6383 | | 0.6879 | 0.86 | 3600 | 0.7091 | 0.7585 | 0.6353 | 0.2220 | 0.0856 | 0.7060 | 0.1364 | 0.5352 | -0.1870 | 0.2418 | -257.2982 | -262.2594 | -2.6103 | -2.6440 | | 0.6129 | 0.89 | 3700 | 0.7033 | 0.6942 | 0.6366 | 0.2255 | 0.0924 | 0.7065 | 0.1331 | 0.5254 | -0.1849 | 0.2379 | -256.6156 | -261.9067 | -2.6075 | -2.6417 | | 0.6578 | 0.91 | 3800 | 0.6956 | 0.5982 | 0.6385 | 0.2286 | 0.1002 | 0.7040 | 0.1284 | 0.5109 | -0.1818 | 0.2321 | -255.8333 | -261.5916 | -2.6073 | -2.6413 | | 0.6535 | 0.93 | 3900 | 0.6949 | 0.5854 | 0.6383 | 0.2289 | 0.1000 | 0.7045 | 0.1288 | 0.5118 | -0.1813 | 0.2323 | -255.8504 | -261.5681 | -2.6069 | -2.6411 | | 0.6876 | 0.96 | 4000 | 0.6951 | 0.5831 | 0.6380 | 0.2289 | 0.0994 | 0.7035 | 0.1295 | 0.5141 | -0.1813 | 0.2330 | -255.9116 | -261.5652 | -2.6055 | -2.6398 | | 0.6531 | 0.98 | 4100 | 0.6952 | 0.5853 | 0.6381 | 0.2289 | 0.0995 | 0.7040 | 0.1294 | 0.5136 | -0.1815 | 0.2329 | -255.9032 | -261.5644 | -2.6099 | -2.6438 | ### Framework versions - PEFT 0.7.1 - Transformers 4.39.0.dev0 - Pytorch 2.1.2+cu121 - Datasets 2.14.6 - Tokenizers 0.15.2