mistral-7b-dpo-qlora-2ep
This model is a fine-tuned version of mimicheng/mistral-7b-sft-qlora-2ep on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.6446
- Rewards/chosen: -0.4217
- Rewards/rejected: -0.5814
- Rewards/accuracies: 0.6290
- Rewards/margins: 0.1596
- Logps/rejected: -1409.8003
- Logps/chosen: -1604.7235
- Logits/rejected: -2.6937
- Logits/chosen: -2.7021
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 16
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6932 | 0.03 | 100 | 0.6931 | 0.0001 | 0.0002 | 0.4940 | -0.0001 | -1351.6440 | -1562.5353 | -2.7909 | -2.7984 |
0.6923 | 0.05 | 200 | 0.6925 | 0.0045 | 0.0029 | 0.5119 | 0.0016 | -1351.3734 | -1562.0991 | -2.7899 | -2.7974 |
0.6937 | 0.08 | 300 | 0.6909 | 0.0097 | 0.0052 | 0.5377 | 0.0045 | -1351.1462 | -1561.5815 | -2.7872 | -2.7945 |
0.6867 | 0.1 | 400 | 0.6893 | 0.0145 | 0.0060 | 0.5595 | 0.0085 | -1351.0632 | -1561.1024 | -2.7853 | -2.7923 |
0.6921 | 0.13 | 500 | 0.6867 | 0.0007 | -0.0122 | 0.5734 | 0.0129 | -1352.8849 | -1562.4756 | -2.7829 | -2.7893 |
0.6895 | 0.16 | 600 | 0.6838 | 0.0046 | -0.0162 | 0.5913 | 0.0208 | -1353.2866 | -1562.0875 | -2.7740 | -2.7806 |
0.6792 | 0.18 | 700 | 0.6819 | -0.0194 | -0.0440 | 0.5992 | 0.0246 | -1356.0621 | -1564.4910 | -2.7592 | -2.7657 |
0.6802 | 0.21 | 800 | 0.6791 | -0.0527 | -0.0819 | 0.5813 | 0.0293 | -1359.8597 | -1567.8170 | -2.7551 | -2.7611 |
0.6812 | 0.24 | 900 | 0.6772 | -0.0403 | -0.0826 | 0.5714 | 0.0423 | -1359.9243 | -1566.5771 | -2.7588 | -2.7655 |
0.6714 | 0.26 | 1000 | 0.6746 | -0.0886 | -0.1361 | 0.5714 | 0.0475 | -1365.2759 | -1571.4064 | -2.7418 | -2.7476 |
0.676 | 0.29 | 1100 | 0.6744 | -0.1141 | -0.1733 | 0.5893 | 0.0592 | -1368.9943 | -1573.9617 | -2.7433 | -2.7505 |
0.6779 | 0.31 | 1200 | 0.6703 | -0.1056 | -0.1703 | 0.5933 | 0.0647 | -1368.6935 | -1573.1090 | -2.7431 | -2.7511 |
0.6888 | 0.34 | 1300 | 0.6676 | -0.1136 | -0.1850 | 0.5972 | 0.0713 | -1370.1599 | -1573.9121 | -2.7375 | -2.7452 |
0.6664 | 0.37 | 1400 | 0.6669 | -0.1425 | -0.2165 | 0.6071 | 0.0739 | -1373.3110 | -1576.8027 | -2.7302 | -2.7375 |
0.6705 | 0.39 | 1500 | 0.6665 | -0.1804 | -0.2701 | 0.6071 | 0.0897 | -1378.6722 | -1580.5913 | -2.7481 | -2.7546 |
0.6411 | 0.42 | 1600 | 0.6653 | -0.1924 | -0.2728 | 0.6329 | 0.0804 | -1378.9417 | -1581.7911 | -2.7249 | -2.7317 |
0.665 | 0.44 | 1700 | 0.6644 | -0.1967 | -0.2789 | 0.6131 | 0.0823 | -1379.5565 | -1582.2147 | -2.7355 | -2.7422 |
0.6563 | 0.47 | 1800 | 0.6639 | -0.2073 | -0.2940 | 0.6210 | 0.0867 | -1381.0635 | -1583.2751 | -2.7257 | -2.7325 |
0.6668 | 0.5 | 1900 | 0.6620 | -0.2260 | -0.3252 | 0.6171 | 0.0992 | -1384.1846 | -1585.1470 | -2.7350 | -2.7426 |
0.6632 | 0.52 | 2000 | 0.6605 | -0.1924 | -0.2828 | 0.6329 | 0.0904 | -1379.9453 | -1581.7920 | -2.7371 | -2.7449 |
0.6427 | 0.55 | 2100 | 0.6597 | -0.2106 | -0.3114 | 0.6230 | 0.1007 | -1382.8007 | -1583.6138 | -2.7260 | -2.7333 |
0.6923 | 0.58 | 2200 | 0.6592 | -0.2129 | -0.3178 | 0.6230 | 0.1049 | -1383.4486 | -1583.8400 | -2.7175 | -2.7243 |
0.6496 | 0.6 | 2300 | 0.6581 | -0.2352 | -0.3443 | 0.6290 | 0.1091 | -1386.0916 | -1586.0706 | -2.7159 | -2.7235 |
0.6668 | 0.63 | 2400 | 0.6577 | -0.2503 | -0.3563 | 0.6290 | 0.1061 | -1387.2981 | -1587.5769 | -2.7321 | -2.7410 |
0.6477 | 0.65 | 2500 | 0.6560 | -0.2661 | -0.3858 | 0.6310 | 0.1196 | -1390.2400 | -1589.1620 | -2.7287 | -2.7370 |
0.6444 | 0.68 | 2600 | 0.6550 | -0.2830 | -0.3993 | 0.6270 | 0.1163 | -1391.5975 | -1590.8505 | -2.7240 | -2.7330 |
0.6594 | 0.71 | 2700 | 0.6566 | -0.3546 | -0.4862 | 0.6190 | 0.1316 | -1400.2867 | -1598.0084 | -2.6748 | -2.6818 |
0.6329 | 0.73 | 2800 | 0.6544 | -0.2748 | -0.3936 | 0.625 | 0.1189 | -1391.0292 | -1590.0247 | -2.6985 | -2.7063 |
0.6351 | 0.76 | 2900 | 0.6545 | -0.2928 | -0.4152 | 0.6270 | 0.1224 | -1393.1847 | -1591.8256 | -2.7050 | -2.7136 |
0.6724 | 0.79 | 3000 | 0.6528 | -0.3067 | -0.4418 | 0.6448 | 0.1351 | -1395.8458 | -1593.2202 | -2.6986 | -2.7069 |
0.6413 | 0.81 | 3100 | 0.6514 | -0.3153 | -0.4541 | 0.6548 | 0.1388 | -1397.0781 | -1594.0812 | -2.6892 | -2.6985 |
0.6242 | 0.84 | 3200 | 0.6523 | -0.3197 | -0.4618 | 0.6349 | 0.1421 | -1397.8459 | -1594.5162 | -2.7123 | -2.7206 |
0.6773 | 0.86 | 3300 | 0.6506 | -0.3038 | -0.4433 | 0.6508 | 0.1395 | -1395.9939 | -1592.9280 | -2.7042 | -2.7136 |
0.6531 | 0.89 | 3400 | 0.6505 | -0.3036 | -0.4426 | 0.6329 | 0.1390 | -1395.9207 | -1592.9099 | -2.6620 | -2.6712 |
0.6499 | 0.92 | 3500 | 0.6504 | -0.3509 | -0.4975 | 0.6448 | 0.1467 | -1401.4177 | -1597.6368 | -2.6611 | -2.6701 |
0.6439 | 0.94 | 3600 | 0.6509 | -0.3522 | -0.4975 | 0.6349 | 0.1453 | -1401.4176 | -1597.7729 | -2.6758 | -2.6841 |
0.6279 | 0.97 | 3700 | 0.6505 | -0.4035 | -0.5500 | 0.6310 | 0.1466 | -1406.6675 | -1602.8950 | -2.6918 | -2.7012 |
0.6443 | 0.99 | 3800 | 0.6497 | -0.3970 | -0.5441 | 0.6290 | 0.1471 | -1406.0728 | -1602.2509 | -2.6876 | -2.6965 |
0.6355 | 1.02 | 3900 | 0.6484 | -0.3538 | -0.4986 | 0.6349 | 0.1449 | -1401.5294 | -1597.9247 | -2.6950 | -2.7039 |
0.6683 | 1.05 | 4000 | 0.6482 | -0.3608 | -0.5119 | 0.6349 | 0.1511 | -1402.8545 | -1598.6262 | -2.6992 | -2.7080 |
0.6459 | 1.07 | 4100 | 0.6475 | -0.3305 | -0.4760 | 0.6448 | 0.1455 | -1399.2634 | -1595.5988 | -2.6852 | -2.6944 |
0.6451 | 1.1 | 4200 | 0.6471 | -0.3471 | -0.4991 | 0.6369 | 0.1519 | -1401.5713 | -1597.2633 | -2.6954 | -2.7042 |
0.6744 | 1.13 | 4300 | 0.6483 | -0.3619 | -0.5112 | 0.6429 | 0.1493 | -1402.7870 | -1598.7428 | -2.7008 | -2.7095 |
0.6355 | 1.15 | 4400 | 0.6477 | -0.4040 | -0.5558 | 0.6270 | 0.1518 | -1407.2480 | -1602.9531 | -2.6916 | -2.7001 |
0.6187 | 1.18 | 4500 | 0.6472 | -0.4050 | -0.5534 | 0.6349 | 0.1485 | -1407.0084 | -1603.0441 | -2.6883 | -2.6963 |
0.6555 | 1.2 | 4600 | 0.6472 | -0.3883 | -0.5354 | 0.6310 | 0.1471 | -1405.2079 | -1601.3826 | -2.7075 | -2.7168 |
0.6178 | 1.23 | 4700 | 0.6476 | -0.3993 | -0.5414 | 0.6190 | 0.1422 | -1405.8092 | -1602.4763 | -2.6912 | -2.7006 |
0.6242 | 1.26 | 4800 | 0.6477 | -0.4302 | -0.5746 | 0.625 | 0.1444 | -1409.1267 | -1605.5714 | -2.6917 | -2.7016 |
0.6221 | 1.28 | 4900 | 0.6464 | -0.3848 | -0.5302 | 0.6349 | 0.1454 | -1404.6871 | -1601.0272 | -2.7073 | -2.7167 |
0.6582 | 1.31 | 5000 | 0.6460 | -0.3995 | -0.5463 | 0.6310 | 0.1468 | -1406.2927 | -1602.5012 | -2.7174 | -2.7268 |
0.6276 | 1.33 | 5100 | 0.6458 | -0.4048 | -0.5543 | 0.6310 | 0.1495 | -1407.0914 | -1603.0245 | -2.7192 | -2.7281 |
0.6573 | 1.36 | 5200 | 0.6452 | -0.4069 | -0.5580 | 0.6290 | 0.1512 | -1407.4680 | -1603.2344 | -2.7142 | -2.7230 |
0.6672 | 1.39 | 5300 | 0.6458 | -0.4020 | -0.5504 | 0.6329 | 0.1485 | -1406.7059 | -1602.7441 | -2.6997 | -2.7080 |
0.6112 | 1.41 | 5400 | 0.6460 | -0.4035 | -0.5510 | 0.6290 | 0.1475 | -1406.7632 | -1602.8997 | -2.6953 | -2.7036 |
0.6421 | 1.44 | 5500 | 0.6449 | -0.3915 | -0.5414 | 0.6409 | 0.1499 | -1405.8010 | -1601.6963 | -2.6991 | -2.7081 |
0.658 | 1.47 | 5600 | 0.6451 | -0.4023 | -0.5553 | 0.6429 | 0.1530 | -1407.1986 | -1602.7803 | -2.6938 | -2.7027 |
0.6437 | 1.49 | 5700 | 0.6454 | -0.4050 | -0.5555 | 0.6389 | 0.1505 | -1407.2163 | -1603.0527 | -2.6883 | -2.6972 |
0.6289 | 1.52 | 5800 | 0.6443 | -0.3986 | -0.5520 | 0.6468 | 0.1534 | -1406.8611 | -1602.4105 | -2.7007 | -2.7094 |
0.6361 | 1.54 | 5900 | 0.6442 | -0.4036 | -0.5574 | 0.6409 | 0.1538 | -1407.4087 | -1602.9125 | -2.6962 | -2.7047 |
0.6374 | 1.57 | 6000 | 0.6446 | -0.4164 | -0.5717 | 0.6429 | 0.1553 | -1408.8311 | -1604.1853 | -2.6963 | -2.7048 |
0.6423 | 1.6 | 6100 | 0.6448 | -0.4212 | -0.5781 | 0.6349 | 0.1569 | -1409.4735 | -1604.6692 | -2.6905 | -2.6992 |
0.6611 | 1.62 | 6200 | 0.6453 | -0.4344 | -0.5916 | 0.625 | 0.1572 | -1410.8239 | -1605.9866 | -2.6925 | -2.7010 |
0.6355 | 1.65 | 6300 | 0.6451 | -0.4325 | -0.5909 | 0.625 | 0.1584 | -1410.7570 | -1605.8035 | -2.6922 | -2.7008 |
0.6555 | 1.67 | 6400 | 0.6451 | -0.4326 | -0.5912 | 0.6230 | 0.1586 | -1410.7894 | -1605.8125 | -2.6935 | -2.7021 |
0.6584 | 1.7 | 6500 | 0.6449 | -0.4310 | -0.5905 | 0.6270 | 0.1595 | -1410.7151 | -1605.6461 | -2.6900 | -2.6987 |
0.6371 | 1.73 | 6600 | 0.6448 | -0.4266 | -0.5864 | 0.6310 | 0.1598 | -1410.3033 | -1605.2112 | -2.6897 | -2.6985 |
0.6051 | 1.75 | 6700 | 0.6446 | -0.4220 | -0.5821 | 0.6329 | 0.1601 | -1409.8746 | -1604.7469 | -2.6927 | -2.7012 |
0.6136 | 1.78 | 6800 | 0.6446 | -0.4219 | -0.5822 | 0.6310 | 0.1603 | -1409.8861 | -1604.7394 | -2.6940 | -2.7024 |
0.6503 | 1.81 | 6900 | 0.6445 | -0.4222 | -0.5826 | 0.6349 | 0.1603 | -1409.9208 | -1604.7736 | -2.6947 | -2.7030 |
0.6318 | 1.83 | 7000 | 0.6445 | -0.4216 | -0.5817 | 0.6329 | 0.1601 | -1409.8387 | -1604.7111 | -2.6925 | -2.7010 |
0.6493 | 1.86 | 7100 | 0.6445 | -0.4215 | -0.5815 | 0.6329 | 0.1600 | -1409.8179 | -1604.7026 | -2.6940 | -2.7025 |
0.6292 | 1.88 | 7200 | 0.6446 | -0.4217 | -0.5816 | 0.6329 | 0.1599 | -1409.8223 | -1604.7195 | -2.6943 | -2.7027 |
0.625 | 1.91 | 7300 | 0.6445 | -0.4215 | -0.5816 | 0.6329 | 0.1600 | -1409.8219 | -1604.7013 | -2.6937 | -2.7022 |
0.6306 | 1.94 | 7400 | 0.6446 | -0.4218 | -0.5814 | 0.6290 | 0.1596 | -1409.8014 | -1604.7244 | -2.6937 | -2.7021 |
0.6446 | 1.96 | 7500 | 0.6446 | -0.4217 | -0.5814 | 0.6290 | 0.1596 | -1409.8003 | -1604.7235 | -2.6937 | -2.7021 |
0.6394 | 1.99 | 7600 | 0.6446 | -0.4217 | -0.5814 | 0.6290 | 0.1596 | -1409.8003 | -1604.7235 | -2.6937 | -2.7021 |
Framework versions
- PEFT 0.7.1
- Transformers 4.36.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.0
- Downloads last month
- 2
Model tree for mimicheng/mistral-7b-dpo-qlora-2ep
Base model
mistralai/Mistral-7B-v0.1