mimicheng commited on
Commit
47d1a60
1 Parent(s): 3c4508f

Model save

Browse files
README.md ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ base_model: mistralai/Mistral-7B-v0.1
9
+ model-index:
10
+ - name: mistral-7b-dpo-qlora-2ep
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # mistral-7b-dpo-qlora-2ep
18
+
19
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.6446
22
+ - Rewards/chosen: -0.4217
23
+ - Rewards/rejected: -0.5814
24
+ - Rewards/accuracies: 0.6290
25
+ - Rewards/margins: 0.1596
26
+ - Logps/rejected: -1409.8003
27
+ - Logps/chosen: -1604.7235
28
+ - Logits/rejected: -2.6937
29
+ - Logits/chosen: -2.7021
30
+
31
+ ## Model description
32
+
33
+ More information needed
34
+
35
+ ## Intended uses & limitations
36
+
37
+ More information needed
38
+
39
+ ## Training and evaluation data
40
+
41
+ More information needed
42
+
43
+ ## Training procedure
44
+
45
+ ### Training hyperparameters
46
+
47
+ The following hyperparameters were used during training:
48
+ - learning_rate: 5e-06
49
+ - train_batch_size: 4
50
+ - eval_batch_size: 8
51
+ - seed: 42
52
+ - distributed_type: multi-GPU
53
+ - num_devices: 4
54
+ - total_train_batch_size: 16
55
+ - total_eval_batch_size: 32
56
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
57
+ - lr_scheduler_type: cosine
58
+ - lr_scheduler_warmup_ratio: 0.1
59
+ - num_epochs: 2
60
+
61
+ ### Training results
62
+
63
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
+ | 0.6932 | 0.03 | 100 | 0.6931 | 0.0001 | 0.0002 | 0.4940 | -0.0001 | -1351.6440 | -1562.5353 | -2.7909 | -2.7984 |
66
+ | 0.6923 | 0.05 | 200 | 0.6925 | 0.0045 | 0.0029 | 0.5119 | 0.0016 | -1351.3734 | -1562.0991 | -2.7899 | -2.7974 |
67
+ | 0.6937 | 0.08 | 300 | 0.6909 | 0.0097 | 0.0052 | 0.5377 | 0.0045 | -1351.1462 | -1561.5815 | -2.7872 | -2.7945 |
68
+ | 0.6867 | 0.1 | 400 | 0.6893 | 0.0145 | 0.0060 | 0.5595 | 0.0085 | -1351.0632 | -1561.1024 | -2.7853 | -2.7923 |
69
+ | 0.6921 | 0.13 | 500 | 0.6867 | 0.0007 | -0.0122 | 0.5734 | 0.0129 | -1352.8849 | -1562.4756 | -2.7829 | -2.7893 |
70
+ | 0.6895 | 0.16 | 600 | 0.6838 | 0.0046 | -0.0162 | 0.5913 | 0.0208 | -1353.2866 | -1562.0875 | -2.7740 | -2.7806 |
71
+ | 0.6792 | 0.18 | 700 | 0.6819 | -0.0194 | -0.0440 | 0.5992 | 0.0246 | -1356.0621 | -1564.4910 | -2.7592 | -2.7657 |
72
+ | 0.6802 | 0.21 | 800 | 0.6791 | -0.0527 | -0.0819 | 0.5813 | 0.0293 | -1359.8597 | -1567.8170 | -2.7551 | -2.7611 |
73
+ | 0.6812 | 0.24 | 900 | 0.6772 | -0.0403 | -0.0826 | 0.5714 | 0.0423 | -1359.9243 | -1566.5771 | -2.7588 | -2.7655 |
74
+ | 0.6714 | 0.26 | 1000 | 0.6746 | -0.0886 | -0.1361 | 0.5714 | 0.0475 | -1365.2759 | -1571.4064 | -2.7418 | -2.7476 |
75
+ | 0.676 | 0.29 | 1100 | 0.6744 | -0.1141 | -0.1733 | 0.5893 | 0.0592 | -1368.9943 | -1573.9617 | -2.7433 | -2.7505 |
76
+ | 0.6779 | 0.31 | 1200 | 0.6703 | -0.1056 | -0.1703 | 0.5933 | 0.0647 | -1368.6935 | -1573.1090 | -2.7431 | -2.7511 |
77
+ | 0.6888 | 0.34 | 1300 | 0.6676 | -0.1136 | -0.1850 | 0.5972 | 0.0713 | -1370.1599 | -1573.9121 | -2.7375 | -2.7452 |
78
+ | 0.6664 | 0.37 | 1400 | 0.6669 | -0.1425 | -0.2165 | 0.6071 | 0.0739 | -1373.3110 | -1576.8027 | -2.7302 | -2.7375 |
79
+ | 0.6705 | 0.39 | 1500 | 0.6665 | -0.1804 | -0.2701 | 0.6071 | 0.0897 | -1378.6722 | -1580.5913 | -2.7481 | -2.7546 |
80
+ | 0.6411 | 0.42 | 1600 | 0.6653 | -0.1924 | -0.2728 | 0.6329 | 0.0804 | -1378.9417 | -1581.7911 | -2.7249 | -2.7317 |
81
+ | 0.665 | 0.44 | 1700 | 0.6644 | -0.1967 | -0.2789 | 0.6131 | 0.0823 | -1379.5565 | -1582.2147 | -2.7355 | -2.7422 |
82
+ | 0.6563 | 0.47 | 1800 | 0.6639 | -0.2073 | -0.2940 | 0.6210 | 0.0867 | -1381.0635 | -1583.2751 | -2.7257 | -2.7325 |
83
+ | 0.6668 | 0.5 | 1900 | 0.6620 | -0.2260 | -0.3252 | 0.6171 | 0.0992 | -1384.1846 | -1585.1470 | -2.7350 | -2.7426 |
84
+ | 0.6632 | 0.52 | 2000 | 0.6605 | -0.1924 | -0.2828 | 0.6329 | 0.0904 | -1379.9453 | -1581.7920 | -2.7371 | -2.7449 |
85
+ | 0.6427 | 0.55 | 2100 | 0.6597 | -0.2106 | -0.3114 | 0.6230 | 0.1007 | -1382.8007 | -1583.6138 | -2.7260 | -2.7333 |
86
+ | 0.6923 | 0.58 | 2200 | 0.6592 | -0.2129 | -0.3178 | 0.6230 | 0.1049 | -1383.4486 | -1583.8400 | -2.7175 | -2.7243 |
87
+ | 0.6496 | 0.6 | 2300 | 0.6581 | -0.2352 | -0.3443 | 0.6290 | 0.1091 | -1386.0916 | -1586.0706 | -2.7159 | -2.7235 |
88
+ | 0.6668 | 0.63 | 2400 | 0.6577 | -0.2503 | -0.3563 | 0.6290 | 0.1061 | -1387.2981 | -1587.5769 | -2.7321 | -2.7410 |
89
+ | 0.6477 | 0.65 | 2500 | 0.6560 | -0.2661 | -0.3858 | 0.6310 | 0.1196 | -1390.2400 | -1589.1620 | -2.7287 | -2.7370 |
90
+ | 0.6444 | 0.68 | 2600 | 0.6550 | -0.2830 | -0.3993 | 0.6270 | 0.1163 | -1391.5975 | -1590.8505 | -2.7240 | -2.7330 |
91
+ | 0.6594 | 0.71 | 2700 | 0.6566 | -0.3546 | -0.4862 | 0.6190 | 0.1316 | -1400.2867 | -1598.0084 | -2.6748 | -2.6818 |
92
+ | 0.6329 | 0.73 | 2800 | 0.6544 | -0.2748 | -0.3936 | 0.625 | 0.1189 | -1391.0292 | -1590.0247 | -2.6985 | -2.7063 |
93
+ | 0.6351 | 0.76 | 2900 | 0.6545 | -0.2928 | -0.4152 | 0.6270 | 0.1224 | -1393.1847 | -1591.8256 | -2.7050 | -2.7136 |
94
+ | 0.6724 | 0.79 | 3000 | 0.6528 | -0.3067 | -0.4418 | 0.6448 | 0.1351 | -1395.8458 | -1593.2202 | -2.6986 | -2.7069 |
95
+ | 0.6413 | 0.81 | 3100 | 0.6514 | -0.3153 | -0.4541 | 0.6548 | 0.1388 | -1397.0781 | -1594.0812 | -2.6892 | -2.6985 |
96
+ | 0.6242 | 0.84 | 3200 | 0.6523 | -0.3197 | -0.4618 | 0.6349 | 0.1421 | -1397.8459 | -1594.5162 | -2.7123 | -2.7206 |
97
+ | 0.6773 | 0.86 | 3300 | 0.6506 | -0.3038 | -0.4433 | 0.6508 | 0.1395 | -1395.9939 | -1592.9280 | -2.7042 | -2.7136 |
98
+ | 0.6531 | 0.89 | 3400 | 0.6505 | -0.3036 | -0.4426 | 0.6329 | 0.1390 | -1395.9207 | -1592.9099 | -2.6620 | -2.6712 |
99
+ | 0.6499 | 0.92 | 3500 | 0.6504 | -0.3509 | -0.4975 | 0.6448 | 0.1467 | -1401.4177 | -1597.6368 | -2.6611 | -2.6701 |
100
+ | 0.6439 | 0.94 | 3600 | 0.6509 | -0.3522 | -0.4975 | 0.6349 | 0.1453 | -1401.4176 | -1597.7729 | -2.6758 | -2.6841 |
101
+ | 0.6279 | 0.97 | 3700 | 0.6505 | -0.4035 | -0.5500 | 0.6310 | 0.1466 | -1406.6675 | -1602.8950 | -2.6918 | -2.7012 |
102
+ | 0.6443 | 0.99 | 3800 | 0.6497 | -0.3970 | -0.5441 | 0.6290 | 0.1471 | -1406.0728 | -1602.2509 | -2.6876 | -2.6965 |
103
+ | 0.6355 | 1.02 | 3900 | 0.6484 | -0.3538 | -0.4986 | 0.6349 | 0.1449 | -1401.5294 | -1597.9247 | -2.6950 | -2.7039 |
104
+ | 0.6683 | 1.05 | 4000 | 0.6482 | -0.3608 | -0.5119 | 0.6349 | 0.1511 | -1402.8545 | -1598.6262 | -2.6992 | -2.7080 |
105
+ | 0.6459 | 1.07 | 4100 | 0.6475 | -0.3305 | -0.4760 | 0.6448 | 0.1455 | -1399.2634 | -1595.5988 | -2.6852 | -2.6944 |
106
+ | 0.6451 | 1.1 | 4200 | 0.6471 | -0.3471 | -0.4991 | 0.6369 | 0.1519 | -1401.5713 | -1597.2633 | -2.6954 | -2.7042 |
107
+ | 0.6744 | 1.13 | 4300 | 0.6483 | -0.3619 | -0.5112 | 0.6429 | 0.1493 | -1402.7870 | -1598.7428 | -2.7008 | -2.7095 |
108
+ | 0.6355 | 1.15 | 4400 | 0.6477 | -0.4040 | -0.5558 | 0.6270 | 0.1518 | -1407.2480 | -1602.9531 | -2.6916 | -2.7001 |
109
+ | 0.6187 | 1.18 | 4500 | 0.6472 | -0.4050 | -0.5534 | 0.6349 | 0.1485 | -1407.0084 | -1603.0441 | -2.6883 | -2.6963 |
110
+ | 0.6555 | 1.2 | 4600 | 0.6472 | -0.3883 | -0.5354 | 0.6310 | 0.1471 | -1405.2079 | -1601.3826 | -2.7075 | -2.7168 |
111
+ | 0.6178 | 1.23 | 4700 | 0.6476 | -0.3993 | -0.5414 | 0.6190 | 0.1422 | -1405.8092 | -1602.4763 | -2.6912 | -2.7006 |
112
+ | 0.6242 | 1.26 | 4800 | 0.6477 | -0.4302 | -0.5746 | 0.625 | 0.1444 | -1409.1267 | -1605.5714 | -2.6917 | -2.7016 |
113
+ | 0.6221 | 1.28 | 4900 | 0.6464 | -0.3848 | -0.5302 | 0.6349 | 0.1454 | -1404.6871 | -1601.0272 | -2.7073 | -2.7167 |
114
+ | 0.6582 | 1.31 | 5000 | 0.6460 | -0.3995 | -0.5463 | 0.6310 | 0.1468 | -1406.2927 | -1602.5012 | -2.7174 | -2.7268 |
115
+ | 0.6276 | 1.33 | 5100 | 0.6458 | -0.4048 | -0.5543 | 0.6310 | 0.1495 | -1407.0914 | -1603.0245 | -2.7192 | -2.7281 |
116
+ | 0.6573 | 1.36 | 5200 | 0.6452 | -0.4069 | -0.5580 | 0.6290 | 0.1512 | -1407.4680 | -1603.2344 | -2.7142 | -2.7230 |
117
+ | 0.6672 | 1.39 | 5300 | 0.6458 | -0.4020 | -0.5504 | 0.6329 | 0.1485 | -1406.7059 | -1602.7441 | -2.6997 | -2.7080 |
118
+ | 0.6112 | 1.41 | 5400 | 0.6460 | -0.4035 | -0.5510 | 0.6290 | 0.1475 | -1406.7632 | -1602.8997 | -2.6953 | -2.7036 |
119
+ | 0.6421 | 1.44 | 5500 | 0.6449 | -0.3915 | -0.5414 | 0.6409 | 0.1499 | -1405.8010 | -1601.6963 | -2.6991 | -2.7081 |
120
+ | 0.658 | 1.47 | 5600 | 0.6451 | -0.4023 | -0.5553 | 0.6429 | 0.1530 | -1407.1986 | -1602.7803 | -2.6938 | -2.7027 |
121
+ | 0.6437 | 1.49 | 5700 | 0.6454 | -0.4050 | -0.5555 | 0.6389 | 0.1505 | -1407.2163 | -1603.0527 | -2.6883 | -2.6972 |
122
+ | 0.6289 | 1.52 | 5800 | 0.6443 | -0.3986 | -0.5520 | 0.6468 | 0.1534 | -1406.8611 | -1602.4105 | -2.7007 | -2.7094 |
123
+ | 0.6361 | 1.54 | 5900 | 0.6442 | -0.4036 | -0.5574 | 0.6409 | 0.1538 | -1407.4087 | -1602.9125 | -2.6962 | -2.7047 |
124
+ | 0.6374 | 1.57 | 6000 | 0.6446 | -0.4164 | -0.5717 | 0.6429 | 0.1553 | -1408.8311 | -1604.1853 | -2.6963 | -2.7048 |
125
+ | 0.6423 | 1.6 | 6100 | 0.6448 | -0.4212 | -0.5781 | 0.6349 | 0.1569 | -1409.4735 | -1604.6692 | -2.6905 | -2.6992 |
126
+ | 0.6611 | 1.62 | 6200 | 0.6453 | -0.4344 | -0.5916 | 0.625 | 0.1572 | -1410.8239 | -1605.9866 | -2.6925 | -2.7010 |
127
+ | 0.6355 | 1.65 | 6300 | 0.6451 | -0.4325 | -0.5909 | 0.625 | 0.1584 | -1410.7570 | -1605.8035 | -2.6922 | -2.7008 |
128
+ | 0.6555 | 1.67 | 6400 | 0.6451 | -0.4326 | -0.5912 | 0.6230 | 0.1586 | -1410.7894 | -1605.8125 | -2.6935 | -2.7021 |
129
+ | 0.6584 | 1.7 | 6500 | 0.6449 | -0.4310 | -0.5905 | 0.6270 | 0.1595 | -1410.7151 | -1605.6461 | -2.6900 | -2.6987 |
130
+ | 0.6371 | 1.73 | 6600 | 0.6448 | -0.4266 | -0.5864 | 0.6310 | 0.1598 | -1410.3033 | -1605.2112 | -2.6897 | -2.6985 |
131
+ | 0.6051 | 1.75 | 6700 | 0.6446 | -0.4220 | -0.5821 | 0.6329 | 0.1601 | -1409.8746 | -1604.7469 | -2.6927 | -2.7012 |
132
+ | 0.6136 | 1.78 | 6800 | 0.6446 | -0.4219 | -0.5822 | 0.6310 | 0.1603 | -1409.8861 | -1604.7394 | -2.6940 | -2.7024 |
133
+ | 0.6503 | 1.81 | 6900 | 0.6445 | -0.4222 | -0.5826 | 0.6349 | 0.1603 | -1409.9208 | -1604.7736 | -2.6947 | -2.7030 |
134
+ | 0.6318 | 1.83 | 7000 | 0.6445 | -0.4216 | -0.5817 | 0.6329 | 0.1601 | -1409.8387 | -1604.7111 | -2.6925 | -2.7010 |
135
+ | 0.6493 | 1.86 | 7100 | 0.6445 | -0.4215 | -0.5815 | 0.6329 | 0.1600 | -1409.8179 | -1604.7026 | -2.6940 | -2.7025 |
136
+ | 0.6292 | 1.88 | 7200 | 0.6446 | -0.4217 | -0.5816 | 0.6329 | 0.1599 | -1409.8223 | -1604.7195 | -2.6943 | -2.7027 |
137
+ | 0.625 | 1.91 | 7300 | 0.6445 | -0.4215 | -0.5816 | 0.6329 | 0.1600 | -1409.8219 | -1604.7013 | -2.6937 | -2.7022 |
138
+ | 0.6306 | 1.94 | 7400 | 0.6446 | -0.4218 | -0.5814 | 0.6290 | 0.1596 | -1409.8014 | -1604.7244 | -2.6937 | -2.7021 |
139
+ | 0.6446 | 1.96 | 7500 | 0.6446 | -0.4217 | -0.5814 | 0.6290 | 0.1596 | -1409.8003 | -1604.7235 | -2.6937 | -2.7021 |
140
+ | 0.6394 | 1.99 | 7600 | 0.6446 | -0.4217 | -0.5814 | 0.6290 | 0.1596 | -1409.8003 | -1604.7235 | -2.6937 | -2.7021 |
141
+
142
+
143
+ ### Framework versions
144
+
145
+ - PEFT 0.7.1
146
+ - Transformers 4.36.2
147
+ - Pytorch 2.1.2+cu121
148
+ - Datasets 2.14.6
149
+ - Tokenizers 0.15.0
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1ba564201aad0496c6d7d491f4dabe78e0b436e6ba204c3ed378187d62ec50b3
3
  size 83945744
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f2685d832e358847f3e3db9cdf8fa17a43c296df854aef025e8a781c5596ffb
3
  size 83945744
all_results.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "eval_logits/chosen": -2.702141284942627,
4
+ "eval_logits/rejected": -2.6936912536621094,
5
+ "eval_logps/chosen": -1604.7235107421875,
6
+ "eval_logps/rejected": -1409.80029296875,
7
+ "eval_loss": 0.6445866227149963,
8
+ "eval_rewards/accuracies": 0.6289682388305664,
9
+ "eval_rewards/chosen": -0.4217440187931061,
10
+ "eval_rewards/margins": 0.15961241722106934,
11
+ "eval_rewards/rejected": -0.5813564658164978,
12
+ "eval_runtime": 221.9698,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 9.01,
15
+ "eval_steps_per_second": 0.284,
16
+ "train_loss": 0.6517634629204897,
17
+ "train_runtime": 44544.264,
18
+ "train_samples": 61135,
19
+ "train_samples_per_second": 2.745,
20
+ "train_steps_per_second": 0.172
21
+ }
eval_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "eval_logits/chosen": -2.702141284942627,
4
+ "eval_logits/rejected": -2.6936912536621094,
5
+ "eval_logps/chosen": -1604.7235107421875,
6
+ "eval_logps/rejected": -1409.80029296875,
7
+ "eval_loss": 0.6445866227149963,
8
+ "eval_rewards/accuracies": 0.6289682388305664,
9
+ "eval_rewards/chosen": -0.4217440187931061,
10
+ "eval_rewards/margins": 0.15961241722106934,
11
+ "eval_rewards/rejected": -0.5813564658164978,
12
+ "eval_runtime": 221.9698,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 9.01,
15
+ "eval_steps_per_second": 0.284
16
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "train_loss": 0.6517634629204897,
4
+ "train_runtime": 44544.264,
5
+ "train_samples": 61135,
6
+ "train_samples_per_second": 2.745,
7
+ "train_steps_per_second": 0.172
8
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff