zephyr-7b-dpo-qlora-v1
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.4853
- Rewards/chosen: -1.9997
- Rewards/rejected: -3.0850
- Rewards/accuracies: 0.6725
- Rewards/margins: 1.0854
- Logps/rejected: -520.1135
- Logps/chosen: -431.9709
- Logits/rejected: -0.9261
- Logits/chosen: -1.0556
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6933 | 0.01 | 100 | 0.6927 | 0.0023 | 0.0014 | 0.4950 | 0.0009 | -211.4760 | -231.7798 | -2.1609 | -2.3494 |
0.691 | 0.01 | 200 | 0.6900 | 0.0094 | 0.0031 | 0.5825 | 0.0063 | -211.3033 | -231.0670 | -2.1586 | -2.3468 |
0.6796 | 0.02 | 300 | 0.6832 | 0.0364 | 0.0156 | 0.5785 | 0.0208 | -210.0561 | -228.3676 | -2.1598 | -2.3479 |
0.6558 | 0.03 | 400 | 0.6709 | 0.0348 | -0.0139 | 0.6030 | 0.0487 | -213.0039 | -228.5253 | -2.1556 | -2.3431 |
0.6509 | 0.03 | 500 | 0.6525 | -0.0685 | -0.1665 | 0.6060 | 0.0980 | -228.2622 | -238.8526 | -2.1523 | -2.3397 |
0.6521 | 0.04 | 600 | 0.6306 | -0.1447 | -0.3161 | 0.6010 | 0.1714 | -243.2220 | -246.4779 | -2.2043 | -2.3956 |
0.6828 | 0.05 | 700 | 0.6355 | -0.4797 | -0.6338 | 0.5995 | 0.1541 | -274.9947 | -279.9760 | -2.2205 | -2.4135 |
0.6578 | 0.05 | 800 | 0.6070 | -0.4183 | -0.6993 | 0.6050 | 0.2810 | -281.5427 | -273.8341 | -2.2567 | -2.4512 |
0.6272 | 0.06 | 900 | 0.6149 | -0.2798 | -0.5197 | 0.6060 | 0.2398 | -263.5772 | -259.9874 | -2.1332 | -2.3184 |
0.6772 | 0.07 | 1000 | 0.5979 | -0.5950 | -0.8996 | 0.6125 | 0.3045 | -301.5699 | -291.5083 | -2.0915 | -2.2731 |
0.629 | 0.07 | 1100 | 0.5842 | -1.1663 | -1.5846 | 0.6255 | 0.4183 | -370.0742 | -348.6391 | -1.8959 | -2.0642 |
0.6763 | 0.08 | 1200 | 0.5800 | -1.2262 | -1.6772 | 0.625 | 0.4510 | -379.3279 | -354.6231 | -1.7782 | -1.9453 |
0.6468 | 0.09 | 1300 | 0.5959 | -1.4323 | -1.7335 | 0.6265 | 0.3012 | -384.9624 | -375.2356 | -1.8355 | -2.0034 |
0.5302 | 0.09 | 1400 | 0.5790 | -1.0222 | -1.4230 | 0.6370 | 0.4008 | -353.9126 | -334.2268 | -1.8706 | -2.0396 |
0.5512 | 0.1 | 1500 | 0.5627 | -0.8389 | -1.3789 | 0.6370 | 0.5400 | -349.4973 | -315.8946 | -1.6729 | -1.8295 |
0.6386 | 0.1 | 1600 | 0.5758 | -0.8213 | -1.2877 | 0.6245 | 0.4664 | -340.3790 | -314.1301 | -1.5010 | -1.6500 |
0.5515 | 0.11 | 1700 | 0.5789 | -0.6172 | -1.0478 | 0.6155 | 0.4306 | -316.3881 | -293.7214 | -1.4651 | -1.6102 |
0.5693 | 0.12 | 1800 | 0.5637 | -0.9140 | -1.3485 | 0.6435 | 0.4346 | -346.4651 | -323.4023 | -1.5711 | -1.7296 |
0.4312 | 0.12 | 1900 | 0.5713 | -1.6389 | -2.2013 | 0.6300 | 0.5624 | -431.7438 | -395.8936 | -1.3446 | -1.4912 |
0.6104 | 0.13 | 2000 | 0.5692 | -2.5833 | -3.1248 | 0.6295 | 0.5416 | -524.0952 | -490.3331 | -1.1864 | -1.3215 |
0.589 | 0.14 | 2100 | 0.5548 | -1.2062 | -1.8842 | 0.6355 | 0.6780 | -400.0314 | -352.6257 | -1.4682 | -1.6258 |
0.632 | 0.14 | 2200 | 0.5550 | -1.7218 | -2.4957 | 0.6340 | 0.7739 | -461.1841 | -404.1832 | -0.9609 | -1.0862 |
0.5211 | 0.15 | 2300 | 0.5417 | -0.9631 | -1.6396 | 0.6375 | 0.6765 | -375.5683 | -328.3126 | -1.2698 | -1.4156 |
0.4854 | 0.16 | 2400 | 0.5439 | -1.4291 | -2.0590 | 0.6405 | 0.6299 | -417.5105 | -374.9135 | -1.1047 | -1.2360 |
0.4768 | 0.16 | 2500 | 0.5402 | -2.0118 | -2.7496 | 0.6360 | 0.7377 | -486.5682 | -433.1884 | -0.8693 | -0.9927 |
0.562 | 0.17 | 2600 | 0.5278 | -2.0156 | -2.7483 | 0.6605 | 0.7326 | -486.4391 | -433.5695 | -0.8911 | -1.0129 |
0.4748 | 0.18 | 2700 | 0.5315 | -1.4482 | -2.1044 | 0.6515 | 0.6562 | -422.0545 | -376.8264 | -1.1406 | -1.2759 |
0.5099 | 0.18 | 2800 | 0.5306 | -1.6029 | -2.2872 | 0.6550 | 0.6843 | -440.3303 | -392.2982 | -0.9484 | -1.0749 |
0.4184 | 0.19 | 2900 | 0.5267 | -1.6154 | -2.4104 | 0.6515 | 0.7949 | -452.6504 | -393.5496 | -0.7930 | -0.9077 |
0.468 | 0.2 | 3000 | 0.5223 | -1.7343 | -2.5635 | 0.6555 | 0.8291 | -467.9596 | -405.4379 | -0.8916 | -1.0169 |
0.5857 | 0.2 | 3100 | 0.5290 | -1.2637 | -1.9922 | 0.6520 | 0.7284 | -410.8308 | -358.3795 | -1.1037 | -1.2386 |
0.4504 | 0.21 | 3200 | 0.5196 | -2.6280 | -3.5656 | 0.6515 | 0.9376 | -568.1714 | -494.8058 | -0.9832 | -1.1167 |
0.5336 | 0.22 | 3300 | 0.5212 | -1.3201 | -2.1095 | 0.6515 | 0.7894 | -422.5596 | -364.0115 | -1.0917 | -1.2265 |
0.5781 | 0.22 | 3400 | 0.5176 | -1.7501 | -2.6224 | 0.6575 | 0.8723 | -473.8530 | -407.0196 | -0.9397 | -1.0673 |
0.4228 | 0.23 | 3500 | 0.5153 | -1.7241 | -2.5518 | 0.6590 | 0.8277 | -466.7913 | -404.4118 | -1.0211 | -1.1501 |
0.5345 | 0.24 | 3600 | 0.5146 | -1.9883 | -2.7936 | 0.6580 | 0.8054 | -490.9767 | -430.8306 | -0.7439 | -0.8562 |
0.6089 | 0.24 | 3700 | 0.5182 | -2.4209 | -3.3002 | 0.6505 | 0.8794 | -541.6331 | -474.0902 | -1.0100 | -1.1421 |
0.4123 | 0.25 | 3800 | 0.5434 | -3.5880 | -4.2465 | 0.6360 | 0.6585 | -636.2662 | -590.8090 | -0.5056 | -0.6039 |
0.6359 | 0.26 | 3900 | 0.5269 | -2.6651 | -3.5331 | 0.6410 | 0.8680 | -564.9203 | -498.5152 | -0.6802 | -0.7944 |
0.5634 | 0.26 | 4000 | 0.5224 | -2.3672 | -3.1722 | 0.6515 | 0.8050 | -528.8313 | -468.7206 | -0.9063 | -1.0345 |
0.7537 | 0.27 | 4100 | 0.5229 | -1.2274 | -2.0411 | 0.6525 | 0.8138 | -415.7260 | -354.7430 | -1.3053 | -1.4554 |
0.5164 | 0.27 | 4200 | 0.5161 | -2.2621 | -3.1010 | 0.6490 | 0.8389 | -521.7140 | -458.2183 | -0.9361 | -1.0663 |
0.6486 | 0.28 | 4300 | 0.5247 | -0.7764 | -1.5282 | 0.6550 | 0.7518 | -364.4350 | -309.6467 | -1.3301 | -1.4797 |
0.4663 | 0.29 | 4400 | 0.5215 | -1.6682 | -2.6407 | 0.6525 | 0.9725 | -475.6791 | -398.8208 | -0.9512 | -1.0872 |
0.5322 | 0.29 | 4500 | 0.5166 | -2.3459 | -3.2929 | 0.6485 | 0.9470 | -540.9030 | -466.5963 | -0.9451 | -1.0830 |
0.5485 | 0.3 | 4600 | 0.5371 | -1.2907 | -1.8740 | 0.6510 | 0.5833 | -399.0143 | -361.0744 | -1.2451 | -1.3869 |
0.4012 | 0.31 | 4700 | 0.5190 | -2.6301 | -3.6818 | 0.6515 | 1.0518 | -579.7961 | -495.0129 | -0.8302 | -0.9635 |
0.4963 | 0.31 | 4800 | 0.5126 | -1.9284 | -3.0117 | 0.6540 | 1.0832 | -512.7780 | -424.8492 | -1.0117 | -1.1538 |
0.5004 | 0.32 | 4900 | 0.5151 | -2.9464 | -3.7231 | 0.6615 | 0.7767 | -583.9199 | -526.6473 | -0.7704 | -0.8908 |
0.465 | 0.33 | 5000 | 0.5096 | -2.3399 | -3.2128 | 0.6675 | 0.8729 | -532.8920 | -465.9922 | -0.9343 | -1.0639 |
0.4609 | 0.33 | 5100 | 0.5073 | -1.9864 | -2.8868 | 0.6655 | 0.9004 | -500.2922 | -430.6409 | -0.9175 | -1.0513 |
0.4666 | 0.34 | 5200 | 0.5154 | -1.5968 | -2.3504 | 0.6600 | 0.7536 | -446.6525 | -391.6843 | -1.0364 | -1.1704 |
0.6107 | 0.35 | 5300 | 0.5146 | -2.2432 | -3.1008 | 0.6570 | 0.8577 | -521.6948 | -456.3209 | -0.8068 | -0.9357 |
0.5853 | 0.35 | 5400 | 0.5090 | -1.6956 | -2.5963 | 0.6625 | 0.9008 | -471.2449 | -401.5629 | -0.9616 | -1.0984 |
0.5086 | 0.36 | 5500 | 0.5214 | -1.7374 | -2.4619 | 0.6595 | 0.7245 | -457.7994 | -405.7403 | -0.9733 | -1.1007 |
0.4764 | 0.37 | 5600 | 0.5124 | -1.6197 | -2.4123 | 0.6625 | 0.7927 | -452.8468 | -393.9726 | -0.9317 | -1.0609 |
0.6562 | 0.37 | 5700 | 0.5097 | -1.3717 | -2.1420 | 0.6710 | 0.7703 | -425.8073 | -369.1749 | -1.0711 | -1.2060 |
0.5178 | 0.38 | 5800 | 0.5039 | -1.3554 | -2.3601 | 0.6615 | 1.0047 | -447.6251 | -367.5433 | -1.1354 | -1.2822 |
0.5391 | 0.39 | 5900 | 0.5039 | -1.3774 | -2.2739 | 0.6615 | 0.8965 | -439.0063 | -369.7460 | -1.1068 | -1.2484 |
0.4757 | 0.39 | 6000 | 0.5028 | -1.5428 | -2.4713 | 0.6655 | 0.9286 | -458.7466 | -386.2829 | -0.9611 | -1.0946 |
0.5633 | 0.4 | 6100 | 0.5061 | -1.4468 | -2.3254 | 0.6605 | 0.8786 | -444.1477 | -376.6841 | -0.8871 | -1.0140 |
0.4512 | 0.41 | 6200 | 0.5027 | -1.1960 | -2.0747 | 0.6590 | 0.8787 | -419.0789 | -351.6017 | -0.9586 | -1.0898 |
0.4765 | 0.41 | 6300 | 0.5008 | -2.1828 | -3.1237 | 0.6655 | 0.9408 | -523.9770 | -450.2899 | -0.7242 | -0.8425 |
0.5056 | 0.42 | 6400 | 0.5051 | -1.7258 | -2.6125 | 0.6590 | 0.8868 | -472.8661 | -404.5825 | -0.9811 | -1.1095 |
0.5037 | 0.43 | 6500 | 0.5053 | -2.3741 | -3.2980 | 0.6645 | 0.9240 | -541.4145 | -469.4124 | -0.9467 | -1.0773 |
0.5839 | 0.43 | 6600 | 0.5009 | -1.4314 | -2.3462 | 0.6710 | 0.9149 | -446.2347 | -375.1405 | -1.2409 | -1.3891 |
0.6173 | 0.44 | 6700 | 0.5004 | -1.8395 | -2.7068 | 0.6695 | 0.8673 | -482.2916 | -415.9502 | -1.2478 | -1.3958 |
0.4917 | 0.44 | 6800 | 0.4987 | -1.8070 | -2.6650 | 0.6670 | 0.8580 | -478.1150 | -412.7094 | -1.1952 | -1.3386 |
0.4834 | 0.45 | 6900 | 0.4964 | -2.4167 | -3.3898 | 0.6680 | 0.9731 | -550.5955 | -473.6739 | -0.8230 | -0.9490 |
0.4668 | 0.46 | 7000 | 0.5033 | -1.6735 | -2.5449 | 0.6700 | 0.8714 | -466.1047 | -399.3541 | -1.1272 | -1.2659 |
0.4544 | 0.46 | 7100 | 0.4963 | -1.5912 | -2.5910 | 0.6715 | 0.9997 | -470.7080 | -391.1266 | -0.9393 | -1.0685 |
0.5048 | 0.47 | 7200 | 0.5001 | -1.6418 | -2.4761 | 0.6675 | 0.8344 | -459.2229 | -396.1804 | -0.9988 | -1.1263 |
0.5141 | 0.48 | 7300 | 0.4977 | -2.0855 | -3.2272 | 0.6680 | 1.1416 | -534.3281 | -440.5570 | -0.8169 | -0.9431 |
0.646 | 0.48 | 7400 | 0.4976 | -1.9253 | -2.8543 | 0.6680 | 0.9290 | -497.0415 | -424.5315 | -0.9287 | -1.0571 |
0.3417 | 0.49 | 7500 | 0.4937 | -1.7911 | -2.8197 | 0.6715 | 1.0286 | -493.5840 | -411.1139 | -1.0098 | -1.1436 |
0.4662 | 0.5 | 7600 | 0.5001 | -1.5015 | -2.5022 | 0.6670 | 1.0007 | -461.8301 | -382.1551 | -1.1592 | -1.2992 |
0.5059 | 0.5 | 7700 | 0.4979 | -1.4138 | -2.3752 | 0.6710 | 0.9614 | -449.1288 | -373.3851 | -1.1849 | -1.3246 |
0.4464 | 0.51 | 7800 | 0.5017 | -2.2094 | -3.1960 | 0.6740 | 0.9866 | -531.2133 | -452.9458 | -0.9725 | -1.0978 |
0.3597 | 0.52 | 7900 | 0.4956 | -1.7191 | -2.8268 | 0.6725 | 1.1077 | -494.2937 | -403.9176 | -0.9468 | -1.0762 |
0.6685 | 0.52 | 8000 | 0.4940 | -2.1435 | -3.1275 | 0.6695 | 0.9839 | -524.3576 | -446.3575 | -0.7171 | -0.8314 |
0.5494 | 0.53 | 8100 | 0.4914 | -2.1363 | -3.2125 | 0.6655 | 1.0762 | -532.8622 | -445.6346 | -0.8910 | -1.0210 |
0.4703 | 0.54 | 8200 | 0.4949 | -2.0165 | -2.9677 | 0.6660 | 0.9512 | -508.3776 | -433.6510 | -1.0550 | -1.1886 |
0.4901 | 0.54 | 8300 | 0.4976 | -1.8477 | -2.7569 | 0.6635 | 0.9092 | -487.3053 | -416.7779 | -1.0724 | -1.2041 |
0.4759 | 0.55 | 8400 | 0.4949 | -2.4730 | -3.5475 | 0.6655 | 1.0744 | -566.3603 | -479.3096 | -0.8860 | -1.0123 |
0.5511 | 0.56 | 8500 | 0.4967 | -2.6613 | -3.8456 | 0.6690 | 1.1843 | -596.1694 | -498.1316 | -0.8653 | -0.9928 |
0.4126 | 0.56 | 8600 | 0.4945 | -1.8268 | -2.8529 | 0.6665 | 1.0261 | -496.9024 | -414.6831 | -1.1029 | -1.2387 |
0.4881 | 0.57 | 8700 | 0.4980 | -1.5900 | -2.6377 | 0.6620 | 1.0477 | -475.3844 | -391.0065 | -1.0996 | -1.2381 |
0.4813 | 0.58 | 8800 | 0.4959 | -1.8619 | -2.9832 | 0.6620 | 1.1213 | -509.9336 | -418.1949 | -1.0136 | -1.1491 |
0.535 | 0.58 | 8900 | 0.4916 | -2.0436 | -3.1481 | 0.6660 | 1.1045 | -526.4249 | -436.3648 | -0.9509 | -1.0819 |
0.5399 | 0.59 | 9000 | 0.4938 | -1.9094 | -3.0372 | 0.6630 | 1.1278 | -515.3349 | -422.9481 | -0.9098 | -1.0398 |
0.512 | 0.6 | 9100 | 0.4937 | -1.5132 | -2.4976 | 0.6730 | 0.9844 | -461.3710 | -383.3268 | -1.0658 | -1.2002 |
0.5069 | 0.6 | 9200 | 0.4931 | -1.7907 | -2.7553 | 0.6715 | 0.9646 | -487.1392 | -411.0757 | -0.9101 | -1.0346 |
0.4272 | 0.61 | 9300 | 0.4919 | -1.8152 | -2.8886 | 0.6730 | 1.0734 | -500.4742 | -413.5278 | -0.9300 | -1.0575 |
0.4398 | 0.62 | 9400 | 0.4936 | -2.0627 | -3.0248 | 0.6705 | 0.9621 | -514.0956 | -438.2756 | -0.8459 | -0.9658 |
0.498 | 0.62 | 9500 | 0.4930 | -2.5316 | -3.6053 | 0.6645 | 1.0737 | -572.1414 | -485.1664 | -0.6523 | -0.7637 |
0.4865 | 0.63 | 9600 | 0.4916 | -2.4312 | -3.5934 | 0.6685 | 1.1621 | -570.9479 | -475.1278 | -0.6562 | -0.7693 |
0.5823 | 0.63 | 9700 | 0.4904 | -2.5963 | -3.6784 | 0.6705 | 1.0821 | -579.4501 | -491.6361 | -0.6136 | -0.7246 |
0.5332 | 0.64 | 9800 | 0.4906 | -2.5457 | -3.6787 | 0.6705 | 1.1330 | -579.4781 | -486.5714 | -0.5180 | -0.6230 |
0.524 | 0.65 | 9900 | 0.4901 | -2.1327 | -3.1507 | 0.6750 | 1.0180 | -526.6770 | -445.2742 | -0.6355 | -0.7448 |
0.4316 | 0.65 | 10000 | 0.4896 | -1.9944 | -3.0402 | 0.6725 | 1.0458 | -515.6310 | -431.4487 | -0.7432 | -0.8593 |
0.3164 | 0.66 | 10100 | 0.4900 | -1.8657 | -2.9973 | 0.6715 | 1.1316 | -511.3380 | -418.5705 | -0.8276 | -0.9510 |
0.517 | 0.67 | 10200 | 0.4926 | -2.3350 | -3.3238 | 0.6680 | 0.9887 | -543.9870 | -465.5092 | -0.7372 | -0.8519 |
0.4479 | 0.67 | 10300 | 0.4911 | -2.3958 | -3.4309 | 0.6640 | 1.0351 | -554.7045 | -471.5843 | -0.7681 | -0.8859 |
0.4663 | 0.68 | 10400 | 0.4915 | -2.0540 | -3.1053 | 0.6675 | 1.0513 | -522.1436 | -437.4019 | -0.8684 | -0.9939 |
0.5752 | 0.69 | 10500 | 0.4915 | -2.0426 | -3.1656 | 0.6680 | 1.1230 | -528.1689 | -436.2607 | -0.9209 | -1.0516 |
0.463 | 0.69 | 10600 | 0.4911 | -1.9536 | -3.0610 | 0.6655 | 1.1073 | -517.7099 | -427.3689 | -0.8792 | -1.0066 |
0.5865 | 0.7 | 10700 | 0.4881 | -2.2678 | -3.3722 | 0.6680 | 1.1044 | -548.8290 | -458.7841 | -0.7627 | -0.8827 |
0.3972 | 0.71 | 10800 | 0.4904 | -2.3637 | -3.4886 | 0.6690 | 1.1249 | -560.4706 | -468.3778 | -0.7830 | -0.9055 |
0.5572 | 0.71 | 10900 | 0.4892 | -2.3609 | -3.5063 | 0.6680 | 1.1454 | -562.2438 | -468.0954 | -0.7710 | -0.8925 |
0.6689 | 0.72 | 11000 | 0.4884 | -2.2106 | -3.2813 | 0.6685 | 1.0707 | -539.7462 | -453.0659 | -0.8341 | -0.9571 |
0.4435 | 0.73 | 11100 | 0.4877 | -2.1188 | -3.2148 | 0.6705 | 1.0960 | -533.0965 | -443.8869 | -0.8864 | -1.0134 |
0.5282 | 0.73 | 11200 | 0.4871 | -2.0567 | -3.1524 | 0.6715 | 1.0957 | -526.8535 | -437.6731 | -0.9027 | -1.0309 |
0.4652 | 0.74 | 11300 | 0.4870 | -1.8621 | -2.9346 | 0.6690 | 1.0725 | -505.0730 | -418.2159 | -0.9259 | -1.0542 |
0.4956 | 0.75 | 11400 | 0.4867 | -2.0149 | -3.1930 | 0.6725 | 1.1781 | -530.9140 | -433.4950 | -0.8660 | -0.9940 |
0.5636 | 0.75 | 11500 | 0.4873 | -2.1217 | -3.2145 | 0.6705 | 1.0928 | -533.0626 | -444.1773 | -0.8628 | -0.9883 |
0.4554 | 0.76 | 11600 | 0.4888 | -2.2988 | -3.3917 | 0.6705 | 1.0929 | -550.7822 | -461.8896 | -0.8416 | -0.9660 |
0.4871 | 0.77 | 11700 | 0.4900 | -2.3167 | -3.3673 | 0.6655 | 1.0507 | -548.3438 | -463.6716 | -0.8322 | -0.9553 |
0.527 | 0.77 | 11800 | 0.4890 | -1.9018 | -2.9657 | 0.6690 | 1.0639 | -508.1792 | -422.1820 | -0.9603 | -1.0908 |
0.569 | 0.78 | 11900 | 0.4888 | -2.0736 | -3.1962 | 0.6670 | 1.1225 | -531.2298 | -439.3680 | -0.9052 | -1.0341 |
0.4233 | 0.79 | 12000 | 0.4888 | -2.0965 | -3.1915 | 0.6705 | 1.0950 | -530.7664 | -441.6599 | -0.9173 | -1.0466 |
0.3903 | 0.79 | 12100 | 0.4903 | -1.6617 | -2.7032 | 0.6665 | 1.0414 | -481.9285 | -398.1773 | -1.0563 | -1.1908 |
0.4775 | 0.8 | 12200 | 0.4900 | -1.6698 | -2.7266 | 0.6680 | 1.0568 | -484.2725 | -398.9855 | -1.0601 | -1.1954 |
0.4513 | 0.8 | 12300 | 0.4890 | -1.6321 | -2.6987 | 0.6705 | 1.0666 | -481.4833 | -395.2168 | -1.0618 | -1.1973 |
0.5514 | 0.81 | 12400 | 0.4893 | -1.6054 | -2.6422 | 0.6665 | 1.0368 | -475.8312 | -392.5486 | -1.0565 | -1.1916 |
0.4187 | 0.82 | 12500 | 0.4877 | -1.6813 | -2.7806 | 0.6685 | 1.0993 | -489.6676 | -400.1340 | -1.0093 | -1.1437 |
0.549 | 0.82 | 12600 | 0.4874 | -1.6772 | -2.7981 | 0.6695 | 1.1209 | -491.4220 | -399.7243 | -1.0171 | -1.1529 |
0.5839 | 0.83 | 12700 | 0.4880 | -1.6149 | -2.7051 | 0.6690 | 1.0903 | -482.1238 | -393.4917 | -1.0345 | -1.1701 |
0.6596 | 0.84 | 12800 | 0.4864 | -1.7916 | -2.8825 | 0.6705 | 1.0909 | -499.8600 | -411.1650 | -0.9965 | -1.1303 |
0.5277 | 0.84 | 12900 | 0.4859 | -1.8558 | -2.9500 | 0.6695 | 1.0942 | -506.6070 | -417.5810 | -0.9771 | -1.1100 |
0.4608 | 0.85 | 13000 | 0.4859 | -1.8954 | -2.9737 | 0.6735 | 1.0783 | -508.9827 | -421.5428 | -0.9614 | -1.0929 |
0.5661 | 0.86 | 13100 | 0.4860 | -1.8942 | -2.9630 | 0.6725 | 1.0688 | -507.9122 | -421.4239 | -0.9514 | -1.0824 |
0.4732 | 0.86 | 13200 | 0.4857 | -1.8424 | -2.9279 | 0.6705 | 1.0855 | -504.4016 | -416.2484 | -0.9614 | -1.0934 |
0.5427 | 0.87 | 13300 | 0.4858 | -1.9079 | -3.0019 | 0.6710 | 1.0941 | -511.8058 | -422.7933 | -0.9451 | -1.0766 |
0.5223 | 0.88 | 13400 | 0.4863 | -1.9008 | -2.9681 | 0.6720 | 1.0673 | -508.4213 | -422.0847 | -0.9559 | -1.0872 |
0.4808 | 0.88 | 13500 | 0.4859 | -1.9388 | -3.0281 | 0.6735 | 1.0893 | -514.4193 | -425.8812 | -0.9376 | -1.0681 |
0.5138 | 0.89 | 13600 | 0.4856 | -1.9843 | -3.0731 | 0.6715 | 1.0888 | -518.9196 | -430.4352 | -0.9361 | -1.0668 |
0.5878 | 0.9 | 13700 | 0.4855 | -2.0426 | -3.1226 | 0.6695 | 1.0800 | -523.8743 | -436.2664 | -0.9280 | -1.0581 |
0.4051 | 0.9 | 13800 | 0.4853 | -2.0332 | -3.1257 | 0.6725 | 1.0925 | -524.1822 | -435.3295 | -0.9284 | -1.0587 |
0.5562 | 0.91 | 13900 | 0.4854 | -2.0142 | -3.0992 | 0.6725 | 1.0850 | -521.5326 | -433.4284 | -0.9257 | -1.0554 |
0.4542 | 0.92 | 14000 | 0.4857 | -2.0204 | -3.0943 | 0.6715 | 1.0739 | -521.0421 | -434.0428 | -0.9270 | -1.0565 |
0.4657 | 0.92 | 14100 | 0.4855 | -2.0038 | -3.0783 | 0.6695 | 1.0745 | -519.4431 | -432.3822 | -0.9273 | -1.0567 |
0.3963 | 0.93 | 14200 | 0.4853 | -1.9858 | -3.0706 | 0.6710 | 1.0848 | -518.6724 | -430.5839 | -0.9247 | -1.0540 |
0.4414 | 0.94 | 14300 | 0.4855 | -1.9946 | -3.0790 | 0.6715 | 1.0844 | -519.5145 | -431.4666 | -0.9262 | -1.0557 |
0.5011 | 0.94 | 14400 | 0.4854 | -1.9991 | -3.0852 | 0.6725 | 1.0861 | -520.1354 | -431.9193 | -0.9237 | -1.0528 |
0.4677 | 0.95 | 14500 | 0.4853 | -2.0012 | -3.0897 | 0.6715 | 1.0885 | -520.5853 | -432.1261 | -0.9249 | -1.0543 |
0.4234 | 0.96 | 14600 | 0.4854 | -2.0010 | -3.0866 | 0.6710 | 1.0856 | -520.2672 | -432.1037 | -0.9283 | -1.0579 |
0.4681 | 0.96 | 14700 | 0.4855 | -1.9998 | -3.0848 | 0.6700 | 1.0851 | -520.0927 | -431.9801 | -0.9267 | -1.0560 |
0.4417 | 0.97 | 14800 | 0.4853 | -2.0018 | -3.0877 | 0.6715 | 1.0859 | -520.3868 | -432.1882 | -0.9254 | -1.0549 |
0.516 | 0.97 | 14900 | 0.4854 | -2.0013 | -3.0874 | 0.6700 | 1.0861 | -520.3481 | -432.1320 | -0.9249 | -1.0543 |
0.5369 | 0.98 | 15000 | 0.4854 | -2.0014 | -3.0872 | 0.6705 | 1.0857 | -520.3271 | -432.1479 | -0.9244 | -1.0537 |
0.442 | 0.99 | 15100 | 0.4853 | -2.0000 | -3.0858 | 0.6715 | 1.0857 | -520.1915 | -432.0099 | -0.9254 | -1.0546 |
0.4814 | 0.99 | 15200 | 0.4854 | -1.9998 | -3.0852 | 0.6720 | 1.0854 | -520.1320 | -431.9893 | -0.9286 | -1.0581 |
Framework versions
- PEFT 0.7.1
- Transformers 4.36.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2
- Downloads last month
- 0
Model tree for DUAL-GPO/zephyr-7b-dpo-qlora-v1
Base model
mistralai/Mistral-7B-v0.1