zephyr-dpop-qlora-uf-ours-uffull-5e-6
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF and the generation/UFfull2 datasets. It achieves the following results on the evaluation set:
- Loss: 0.6950
- Positive Losses: 0.5820
- Dpo Losses: 0.6380
- Rewards/chosen: 0.2290
- Rewards/rejected: 0.0996
- Rewards/accuracies: 0.7060
- Rewards/margins: 0.1294
- Rewards/margins Max: 0.5134
- Rewards/margins Min: -0.1814
- Rewards/margins Std: 0.2328
- Logps/rejected: -255.8980
- Logps/chosen: -261.5583
- Logits/rejected: -2.6096
- Logits/chosen: -2.6435
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6915 | 0.02 | 100 | 0.6917 | 0.0059 | 0.6910 | 0.0266 | 0.0222 | 0.6170 | 0.0043 | 0.0246 | -0.0134 | 0.0126 | -263.6297 | -281.7968 | -2.7663 | -2.8014 |
0.6797 | 0.05 | 200 | 0.6897 | 0.0702 | 0.6800 | 0.0886 | 0.0608 | 0.6570 | 0.0278 | 0.1378 | -0.0648 | 0.0675 | -259.7737 | -275.5939 | -2.7413 | -2.7759 |
0.6804 | 0.07 | 300 | 0.6845 | 0.0848 | 0.6724 | 0.1325 | 0.0877 | 0.6675 | 0.0448 | 0.2086 | -0.0924 | 0.1004 | -257.0813 | -271.2012 | -2.7504 | -2.7853 |
0.6951 | 0.1 | 400 | 0.6829 | 0.1179 | 0.6671 | 0.1575 | 0.1005 | 0.6715 | 0.0570 | 0.2589 | -0.1125 | 0.1237 | -255.7986 | -268.7028 | -2.6989 | -2.7337 |
0.6599 | 0.12 | 500 | 0.6868 | 0.1747 | 0.6620 | 0.1717 | 0.1030 | 0.6805 | 0.0688 | 0.2913 | -0.1240 | 0.1393 | -255.5571 | -267.2820 | -2.6656 | -2.7019 |
0.6899 | 0.14 | 600 | 0.6773 | 0.1322 | 0.6631 | 0.1930 | 0.1265 | 0.6805 | 0.0665 | 0.2912 | -0.1245 | 0.1385 | -253.2036 | -265.1512 | -2.6976 | -2.7346 |
0.6596 | 0.17 | 700 | 0.6841 | 0.2476 | 0.6579 | 0.1952 | 0.1160 | 0.6790 | 0.0792 | 0.3399 | -0.1420 | 0.1603 | -254.2511 | -264.9378 | -2.6481 | -2.6842 |
0.6618 | 0.19 | 800 | 0.7055 | 0.6819 | 0.6582 | 0.1938 | 0.1128 | 0.6725 | 0.0810 | 0.3642 | -0.1653 | 0.1763 | -254.5748 | -265.0780 | -2.6749 | -2.7097 |
0.6742 | 0.22 | 900 | 0.7031 | 0.6125 | 0.6568 | 0.1979 | 0.1141 | 0.6810 | 0.0839 | 0.3706 | -0.1651 | 0.1783 | -254.4471 | -264.6613 | -2.6218 | -2.6566 |
0.6751 | 0.24 | 1000 | 0.7010 | 0.6677 | 0.6601 | 0.2068 | 0.1295 | 0.6755 | 0.0773 | 0.3517 | -0.1632 | 0.1718 | -252.9047 | -263.7737 | -2.6192 | -2.6553 |
0.7098 | 0.26 | 1100 | 0.7131 | 0.8234 | 0.6548 | 0.1971 | 0.1068 | 0.6775 | 0.0903 | 0.3961 | -0.1800 | 0.1920 | -255.1729 | -264.7435 | -2.6144 | -2.6518 |
0.6678 | 0.29 | 1200 | 0.7126 | 0.8054 | 0.6533 | 0.2007 | 0.1068 | 0.6810 | 0.0938 | 0.4066 | -0.1769 | 0.1949 | -255.1695 | -264.3879 | -2.5888 | -2.6260 |
0.6611 | 0.31 | 1300 | 0.7072 | 0.7968 | 0.6584 | 0.2114 | 0.1291 | 0.6725 | 0.0823 | 0.3729 | -0.1733 | 0.1825 | -252.9392 | -263.3107 | -2.5893 | -2.6265 |
0.6852 | 0.34 | 1400 | 0.7117 | 0.8828 | 0.6578 | 0.2125 | 0.1283 | 0.6865 | 0.0842 | 0.3801 | -0.1702 | 0.1839 | -253.0243 | -263.2099 | -2.5908 | -2.6269 |
0.7148 | 0.36 | 1500 | 0.7147 | 0.8994 | 0.6537 | 0.2082 | 0.1146 | 0.6775 | 0.0936 | 0.4107 | -0.1826 | 0.1980 | -254.3940 | -263.6350 | -2.5606 | -2.5971 |
0.734 | 0.38 | 1600 | 0.7263 | 0.9562 | 0.6467 | 0.1975 | 0.0887 | 0.7005 | 0.1088 | 0.4496 | -0.1881 | 0.2128 | -256.9880 | -264.7073 | -2.5414 | -2.5748 |
0.68 | 0.41 | 1700 | 0.6886 | 0.4934 | 0.6531 | 0.2201 | 0.1281 | 0.6895 | 0.0920 | 0.3890 | -0.1655 | 0.1858 | -253.0398 | -262.4442 | -2.6144 | -2.6469 |
0.9221 | 0.43 | 1800 | 0.6972 | 0.5938 | 0.6479 | 0.2127 | 0.1083 | 0.6855 | 0.1044 | 0.4219 | -0.1737 | 0.2001 | -255.0207 | -263.1860 | -2.6572 | -2.6883 |
0.6965 | 0.45 | 1900 | 0.7029 | 0.5493 | 0.6415 | 0.2047 | 0.0857 | 0.6980 | 0.1190 | 0.4554 | -0.1734 | 0.2113 | -257.2836 | -263.9902 | -2.6385 | -2.6680 |
0.6754 | 0.48 | 2000 | 0.6736 | 0.2085 | 0.6476 | 0.2262 | 0.1217 | 0.6960 | 0.1045 | 0.4193 | -0.1652 | 0.1960 | -253.6813 | -261.8383 | -2.6573 | -2.6879 |
0.6527 | 0.5 | 2100 | 0.6734 | 0.1901 | 0.6479 | 0.2309 | 0.1262 | 0.6940 | 0.1046 | 0.4298 | -0.1721 | 0.2013 | -253.2316 | -261.3691 | -2.6274 | -2.6587 |
0.6693 | 0.53 | 2200 | 0.6811 | 0.3594 | 0.6470 | 0.2250 | 0.1186 | 0.6885 | 0.1064 | 0.4311 | -0.1714 | 0.2022 | -253.9932 | -261.9567 | -2.6328 | -2.6644 |
0.6652 | 0.55 | 2300 | 0.6946 | 0.5078 | 0.6431 | 0.2178 | 0.1017 | 0.6895 | 0.1161 | 0.4629 | -0.1818 | 0.2158 | -255.6816 | -262.6781 | -2.6122 | -2.6429 |
0.6511 | 0.57 | 2400 | 0.6755 | 0.2132 | 0.6463 | 0.2309 | 0.1228 | 0.6960 | 0.1081 | 0.4351 | -0.1715 | 0.2030 | -253.5698 | -261.3663 | -2.6075 | -2.6392 |
0.6512 | 0.6 | 2500 | 0.7102 | 0.5940 | 0.6370 | 0.2139 | 0.0822 | 0.6990 | 0.1318 | 0.5141 | -0.1918 | 0.2364 | -257.6378 | -263.0636 | -2.6184 | -2.6519 |
0.7342 | 0.62 | 2600 | 0.6884 | 0.3826 | 0.6413 | 0.2233 | 0.1023 | 0.7040 | 0.1210 | 0.4842 | -0.1791 | 0.2219 | -255.6233 | -262.1221 | -2.6165 | -2.6506 |
0.6754 | 0.65 | 2700 | 0.6847 | 0.3415 | 0.6419 | 0.2283 | 0.1092 | 0.7055 | 0.1192 | 0.4752 | -0.1765 | 0.2181 | -254.9368 | -261.6212 | -2.6158 | -2.6511 |
0.7445 | 0.67 | 2800 | 0.6769 | 0.2621 | 0.6445 | 0.2313 | 0.1188 | 0.7020 | 0.1125 | 0.4532 | -0.1690 | 0.2084 | -253.9747 | -261.3299 | -2.6176 | -2.6513 |
0.6656 | 0.69 | 2900 | 0.6867 | 0.4407 | 0.6412 | 0.2299 | 0.1090 | 0.7045 | 0.1208 | 0.4813 | -0.1757 | 0.2199 | -254.9489 | -261.4680 | -2.6212 | -2.6566 |
0.6641 | 0.72 | 3000 | 0.6918 | 0.5290 | 0.6395 | 0.2278 | 0.1026 | 0.7025 | 0.1252 | 0.4930 | -0.1780 | 0.2250 | -255.5911 | -261.6767 | -2.6344 | -2.6687 |
0.6752 | 0.74 | 3100 | 0.6963 | 0.6115 | 0.6398 | 0.2272 | 0.1021 | 0.7030 | 0.1252 | 0.5000 | -0.1806 | 0.2279 | -255.6473 | -261.7339 | -2.6282 | -2.6628 |
0.6417 | 0.77 | 3200 | 0.7057 | 0.7185 | 0.6364 | 0.2246 | 0.0908 | 0.7040 | 0.1338 | 0.5276 | -0.1863 | 0.2394 | -256.7738 | -261.9981 | -2.6277 | -2.6619 |
0.6436 | 0.79 | 3300 | 0.7146 | 0.8124 | 0.6342 | 0.2203 | 0.0808 | 0.7040 | 0.1395 | 0.5452 | -0.1905 | 0.2463 | -257.7732 | -262.4228 | -2.6190 | -2.6530 |
0.7092 | 0.81 | 3400 | 0.6972 | 0.6209 | 0.6389 | 0.2266 | 0.0993 | 0.7015 | 0.1273 | 0.5073 | -0.1826 | 0.2310 | -255.9223 | -261.7928 | -2.6091 | -2.6431 |
0.6491 | 0.84 | 3500 | 0.6972 | 0.6241 | 0.6390 | 0.2273 | 0.1003 | 0.7020 | 0.1270 | 0.5062 | -0.1824 | 0.2306 | -255.8255 | -261.7234 | -2.6038 | -2.6383 |
0.6879 | 0.86 | 3600 | 0.7091 | 0.7585 | 0.6353 | 0.2220 | 0.0856 | 0.7060 | 0.1364 | 0.5352 | -0.1870 | 0.2418 | -257.2982 | -262.2594 | -2.6103 | -2.6440 |
0.6129 | 0.89 | 3700 | 0.7033 | 0.6942 | 0.6366 | 0.2255 | 0.0924 | 0.7065 | 0.1331 | 0.5254 | -0.1849 | 0.2379 | -256.6156 | -261.9067 | -2.6075 | -2.6417 |
0.6578 | 0.91 | 3800 | 0.6956 | 0.5982 | 0.6385 | 0.2286 | 0.1002 | 0.7040 | 0.1284 | 0.5109 | -0.1818 | 0.2321 | -255.8333 | -261.5916 | -2.6073 | -2.6413 |
0.6535 | 0.93 | 3900 | 0.6949 | 0.5854 | 0.6383 | 0.2289 | 0.1000 | 0.7045 | 0.1288 | 0.5118 | -0.1813 | 0.2323 | -255.8504 | -261.5681 | -2.6069 | -2.6411 |
0.6876 | 0.96 | 4000 | 0.6951 | 0.5831 | 0.6380 | 0.2289 | 0.0994 | 0.7035 | 0.1295 | 0.5141 | -0.1813 | 0.2330 | -255.9116 | -261.5652 | -2.6055 | -2.6398 |
0.6531 | 0.98 | 4100 | 0.6952 | 0.5853 | 0.6381 | 0.2289 | 0.0995 | 0.7040 | 0.1294 | 0.5136 | -0.1815 | 0.2329 | -255.9032 | -261.5644 | -2.6099 | -2.6438 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2
- Downloads last month
- 30
Model tree for just1nseo/zephyr-dpop-qlora-uf-ours-uffull-5e-6
Base model
mistralai/Mistral-7B-v0.1
Finetuned
alignment-handbook/zephyr-7b-sft-full