File size: 16,687 Bytes
47d1a60
 
 
 
56d02df
 
47d1a60
 
 
56d02df
 
47d1a60
 
 
 
 
 
 
 
 
 
 
56d02df
47d1a60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
base_model: mistralai/Mistral-7B-v0.1
model-index:
- name: mistral-7b-dpo-qlora-2ep
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mistral-7b-dpo-qlora-2ep

This model is a fine-tuned version of [mimicheng/mistral-7b-sft-qlora-2ep](https://huggingface.co/mimicheng/mistral-7b-sft-qlora-2ep) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6446
- Rewards/chosen: -0.4217
- Rewards/rejected: -0.5814
- Rewards/accuracies: 0.6290
- Rewards/margins: 0.1596
- Logps/rejected: -1409.8003
- Logps/chosen: -1604.7235
- Logits/rejected: -2.6937
- Logits/chosen: -2.7021

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 16
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6932        | 0.03  | 100  | 0.6931          | 0.0001         | 0.0002           | 0.4940             | -0.0001         | -1351.6440     | -1562.5353   | -2.7909         | -2.7984       |
| 0.6923        | 0.05  | 200  | 0.6925          | 0.0045         | 0.0029           | 0.5119             | 0.0016          | -1351.3734     | -1562.0991   | -2.7899         | -2.7974       |
| 0.6937        | 0.08  | 300  | 0.6909          | 0.0097         | 0.0052           | 0.5377             | 0.0045          | -1351.1462     | -1561.5815   | -2.7872         | -2.7945       |
| 0.6867        | 0.1   | 400  | 0.6893          | 0.0145         | 0.0060           | 0.5595             | 0.0085          | -1351.0632     | -1561.1024   | -2.7853         | -2.7923       |
| 0.6921        | 0.13  | 500  | 0.6867          | 0.0007         | -0.0122          | 0.5734             | 0.0129          | -1352.8849     | -1562.4756   | -2.7829         | -2.7893       |
| 0.6895        | 0.16  | 600  | 0.6838          | 0.0046         | -0.0162          | 0.5913             | 0.0208          | -1353.2866     | -1562.0875   | -2.7740         | -2.7806       |
| 0.6792        | 0.18  | 700  | 0.6819          | -0.0194        | -0.0440          | 0.5992             | 0.0246          | -1356.0621     | -1564.4910   | -2.7592         | -2.7657       |
| 0.6802        | 0.21  | 800  | 0.6791          | -0.0527        | -0.0819          | 0.5813             | 0.0293          | -1359.8597     | -1567.8170   | -2.7551         | -2.7611       |
| 0.6812        | 0.24  | 900  | 0.6772          | -0.0403        | -0.0826          | 0.5714             | 0.0423          | -1359.9243     | -1566.5771   | -2.7588         | -2.7655       |
| 0.6714        | 0.26  | 1000 | 0.6746          | -0.0886        | -0.1361          | 0.5714             | 0.0475          | -1365.2759     | -1571.4064   | -2.7418         | -2.7476       |
| 0.676         | 0.29  | 1100 | 0.6744          | -0.1141        | -0.1733          | 0.5893             | 0.0592          | -1368.9943     | -1573.9617   | -2.7433         | -2.7505       |
| 0.6779        | 0.31  | 1200 | 0.6703          | -0.1056        | -0.1703          | 0.5933             | 0.0647          | -1368.6935     | -1573.1090   | -2.7431         | -2.7511       |
| 0.6888        | 0.34  | 1300 | 0.6676          | -0.1136        | -0.1850          | 0.5972             | 0.0713          | -1370.1599     | -1573.9121   | -2.7375         | -2.7452       |
| 0.6664        | 0.37  | 1400 | 0.6669          | -0.1425        | -0.2165          | 0.6071             | 0.0739          | -1373.3110     | -1576.8027   | -2.7302         | -2.7375       |
| 0.6705        | 0.39  | 1500 | 0.6665          | -0.1804        | -0.2701          | 0.6071             | 0.0897          | -1378.6722     | -1580.5913   | -2.7481         | -2.7546       |
| 0.6411        | 0.42  | 1600 | 0.6653          | -0.1924        | -0.2728          | 0.6329             | 0.0804          | -1378.9417     | -1581.7911   | -2.7249         | -2.7317       |
| 0.665         | 0.44  | 1700 | 0.6644          | -0.1967        | -0.2789          | 0.6131             | 0.0823          | -1379.5565     | -1582.2147   | -2.7355         | -2.7422       |
| 0.6563        | 0.47  | 1800 | 0.6639          | -0.2073        | -0.2940          | 0.6210             | 0.0867          | -1381.0635     | -1583.2751   | -2.7257         | -2.7325       |
| 0.6668        | 0.5   | 1900 | 0.6620          | -0.2260        | -0.3252          | 0.6171             | 0.0992          | -1384.1846     | -1585.1470   | -2.7350         | -2.7426       |
| 0.6632        | 0.52  | 2000 | 0.6605          | -0.1924        | -0.2828          | 0.6329             | 0.0904          | -1379.9453     | -1581.7920   | -2.7371         | -2.7449       |
| 0.6427        | 0.55  | 2100 | 0.6597          | -0.2106        | -0.3114          | 0.6230             | 0.1007          | -1382.8007     | -1583.6138   | -2.7260         | -2.7333       |
| 0.6923        | 0.58  | 2200 | 0.6592          | -0.2129        | -0.3178          | 0.6230             | 0.1049          | -1383.4486     | -1583.8400   | -2.7175         | -2.7243       |
| 0.6496        | 0.6   | 2300 | 0.6581          | -0.2352        | -0.3443          | 0.6290             | 0.1091          | -1386.0916     | -1586.0706   | -2.7159         | -2.7235       |
| 0.6668        | 0.63  | 2400 | 0.6577          | -0.2503        | -0.3563          | 0.6290             | 0.1061          | -1387.2981     | -1587.5769   | -2.7321         | -2.7410       |
| 0.6477        | 0.65  | 2500 | 0.6560          | -0.2661        | -0.3858          | 0.6310             | 0.1196          | -1390.2400     | -1589.1620   | -2.7287         | -2.7370       |
| 0.6444        | 0.68  | 2600 | 0.6550          | -0.2830        | -0.3993          | 0.6270             | 0.1163          | -1391.5975     | -1590.8505   | -2.7240         | -2.7330       |
| 0.6594        | 0.71  | 2700 | 0.6566          | -0.3546        | -0.4862          | 0.6190             | 0.1316          | -1400.2867     | -1598.0084   | -2.6748         | -2.6818       |
| 0.6329        | 0.73  | 2800 | 0.6544          | -0.2748        | -0.3936          | 0.625              | 0.1189          | -1391.0292     | -1590.0247   | -2.6985         | -2.7063       |
| 0.6351        | 0.76  | 2900 | 0.6545          | -0.2928        | -0.4152          | 0.6270             | 0.1224          | -1393.1847     | -1591.8256   | -2.7050         | -2.7136       |
| 0.6724        | 0.79  | 3000 | 0.6528          | -0.3067        | -0.4418          | 0.6448             | 0.1351          | -1395.8458     | -1593.2202   | -2.6986         | -2.7069       |
| 0.6413        | 0.81  | 3100 | 0.6514          | -0.3153        | -0.4541          | 0.6548             | 0.1388          | -1397.0781     | -1594.0812   | -2.6892         | -2.6985       |
| 0.6242        | 0.84  | 3200 | 0.6523          | -0.3197        | -0.4618          | 0.6349             | 0.1421          | -1397.8459     | -1594.5162   | -2.7123         | -2.7206       |
| 0.6773        | 0.86  | 3300 | 0.6506          | -0.3038        | -0.4433          | 0.6508             | 0.1395          | -1395.9939     | -1592.9280   | -2.7042         | -2.7136       |
| 0.6531        | 0.89  | 3400 | 0.6505          | -0.3036        | -0.4426          | 0.6329             | 0.1390          | -1395.9207     | -1592.9099   | -2.6620         | -2.6712       |
| 0.6499        | 0.92  | 3500 | 0.6504          | -0.3509        | -0.4975          | 0.6448             | 0.1467          | -1401.4177     | -1597.6368   | -2.6611         | -2.6701       |
| 0.6439        | 0.94  | 3600 | 0.6509          | -0.3522        | -0.4975          | 0.6349             | 0.1453          | -1401.4176     | -1597.7729   | -2.6758         | -2.6841       |
| 0.6279        | 0.97  | 3700 | 0.6505          | -0.4035        | -0.5500          | 0.6310             | 0.1466          | -1406.6675     | -1602.8950   | -2.6918         | -2.7012       |
| 0.6443        | 0.99  | 3800 | 0.6497          | -0.3970        | -0.5441          | 0.6290             | 0.1471          | -1406.0728     | -1602.2509   | -2.6876         | -2.6965       |
| 0.6355        | 1.02  | 3900 | 0.6484          | -0.3538        | -0.4986          | 0.6349             | 0.1449          | -1401.5294     | -1597.9247   | -2.6950         | -2.7039       |
| 0.6683        | 1.05  | 4000 | 0.6482          | -0.3608        | -0.5119          | 0.6349             | 0.1511          | -1402.8545     | -1598.6262   | -2.6992         | -2.7080       |
| 0.6459        | 1.07  | 4100 | 0.6475          | -0.3305        | -0.4760          | 0.6448             | 0.1455          | -1399.2634     | -1595.5988   | -2.6852         | -2.6944       |
| 0.6451        | 1.1   | 4200 | 0.6471          | -0.3471        | -0.4991          | 0.6369             | 0.1519          | -1401.5713     | -1597.2633   | -2.6954         | -2.7042       |
| 0.6744        | 1.13  | 4300 | 0.6483          | -0.3619        | -0.5112          | 0.6429             | 0.1493          | -1402.7870     | -1598.7428   | -2.7008         | -2.7095       |
| 0.6355        | 1.15  | 4400 | 0.6477          | -0.4040        | -0.5558          | 0.6270             | 0.1518          | -1407.2480     | -1602.9531   | -2.6916         | -2.7001       |
| 0.6187        | 1.18  | 4500 | 0.6472          | -0.4050        | -0.5534          | 0.6349             | 0.1485          | -1407.0084     | -1603.0441   | -2.6883         | -2.6963       |
| 0.6555        | 1.2   | 4600 | 0.6472          | -0.3883        | -0.5354          | 0.6310             | 0.1471          | -1405.2079     | -1601.3826   | -2.7075         | -2.7168       |
| 0.6178        | 1.23  | 4700 | 0.6476          | -0.3993        | -0.5414          | 0.6190             | 0.1422          | -1405.8092     | -1602.4763   | -2.6912         | -2.7006       |
| 0.6242        | 1.26  | 4800 | 0.6477          | -0.4302        | -0.5746          | 0.625              | 0.1444          | -1409.1267     | -1605.5714   | -2.6917         | -2.7016       |
| 0.6221        | 1.28  | 4900 | 0.6464          | -0.3848        | -0.5302          | 0.6349             | 0.1454          | -1404.6871     | -1601.0272   | -2.7073         | -2.7167       |
| 0.6582        | 1.31  | 5000 | 0.6460          | -0.3995        | -0.5463          | 0.6310             | 0.1468          | -1406.2927     | -1602.5012   | -2.7174         | -2.7268       |
| 0.6276        | 1.33  | 5100 | 0.6458          | -0.4048        | -0.5543          | 0.6310             | 0.1495          | -1407.0914     | -1603.0245   | -2.7192         | -2.7281       |
| 0.6573        | 1.36  | 5200 | 0.6452          | -0.4069        | -0.5580          | 0.6290             | 0.1512          | -1407.4680     | -1603.2344   | -2.7142         | -2.7230       |
| 0.6672        | 1.39  | 5300 | 0.6458          | -0.4020        | -0.5504          | 0.6329             | 0.1485          | -1406.7059     | -1602.7441   | -2.6997         | -2.7080       |
| 0.6112        | 1.41  | 5400 | 0.6460          | -0.4035        | -0.5510          | 0.6290             | 0.1475          | -1406.7632     | -1602.8997   | -2.6953         | -2.7036       |
| 0.6421        | 1.44  | 5500 | 0.6449          | -0.3915        | -0.5414          | 0.6409             | 0.1499          | -1405.8010     | -1601.6963   | -2.6991         | -2.7081       |
| 0.658         | 1.47  | 5600 | 0.6451          | -0.4023        | -0.5553          | 0.6429             | 0.1530          | -1407.1986     | -1602.7803   | -2.6938         | -2.7027       |
| 0.6437        | 1.49  | 5700 | 0.6454          | -0.4050        | -0.5555          | 0.6389             | 0.1505          | -1407.2163     | -1603.0527   | -2.6883         | -2.6972       |
| 0.6289        | 1.52  | 5800 | 0.6443          | -0.3986        | -0.5520          | 0.6468             | 0.1534          | -1406.8611     | -1602.4105   | -2.7007         | -2.7094       |
| 0.6361        | 1.54  | 5900 | 0.6442          | -0.4036        | -0.5574          | 0.6409             | 0.1538          | -1407.4087     | -1602.9125   | -2.6962         | -2.7047       |
| 0.6374        | 1.57  | 6000 | 0.6446          | -0.4164        | -0.5717          | 0.6429             | 0.1553          | -1408.8311     | -1604.1853   | -2.6963         | -2.7048       |
| 0.6423        | 1.6   | 6100 | 0.6448          | -0.4212        | -0.5781          | 0.6349             | 0.1569          | -1409.4735     | -1604.6692   | -2.6905         | -2.6992       |
| 0.6611        | 1.62  | 6200 | 0.6453          | -0.4344        | -0.5916          | 0.625              | 0.1572          | -1410.8239     | -1605.9866   | -2.6925         | -2.7010       |
| 0.6355        | 1.65  | 6300 | 0.6451          | -0.4325        | -0.5909          | 0.625              | 0.1584          | -1410.7570     | -1605.8035   | -2.6922         | -2.7008       |
| 0.6555        | 1.67  | 6400 | 0.6451          | -0.4326        | -0.5912          | 0.6230             | 0.1586          | -1410.7894     | -1605.8125   | -2.6935         | -2.7021       |
| 0.6584        | 1.7   | 6500 | 0.6449          | -0.4310        | -0.5905          | 0.6270             | 0.1595          | -1410.7151     | -1605.6461   | -2.6900         | -2.6987       |
| 0.6371        | 1.73  | 6600 | 0.6448          | -0.4266        | -0.5864          | 0.6310             | 0.1598          | -1410.3033     | -1605.2112   | -2.6897         | -2.6985       |
| 0.6051        | 1.75  | 6700 | 0.6446          | -0.4220        | -0.5821          | 0.6329             | 0.1601          | -1409.8746     | -1604.7469   | -2.6927         | -2.7012       |
| 0.6136        | 1.78  | 6800 | 0.6446          | -0.4219        | -0.5822          | 0.6310             | 0.1603          | -1409.8861     | -1604.7394   | -2.6940         | -2.7024       |
| 0.6503        | 1.81  | 6900 | 0.6445          | -0.4222        | -0.5826          | 0.6349             | 0.1603          | -1409.9208     | -1604.7736   | -2.6947         | -2.7030       |
| 0.6318        | 1.83  | 7000 | 0.6445          | -0.4216        | -0.5817          | 0.6329             | 0.1601          | -1409.8387     | -1604.7111   | -2.6925         | -2.7010       |
| 0.6493        | 1.86  | 7100 | 0.6445          | -0.4215        | -0.5815          | 0.6329             | 0.1600          | -1409.8179     | -1604.7026   | -2.6940         | -2.7025       |
| 0.6292        | 1.88  | 7200 | 0.6446          | -0.4217        | -0.5816          | 0.6329             | 0.1599          | -1409.8223     | -1604.7195   | -2.6943         | -2.7027       |
| 0.625         | 1.91  | 7300 | 0.6445          | -0.4215        | -0.5816          | 0.6329             | 0.1600          | -1409.8219     | -1604.7013   | -2.6937         | -2.7022       |
| 0.6306        | 1.94  | 7400 | 0.6446          | -0.4218        | -0.5814          | 0.6290             | 0.1596          | -1409.8014     | -1604.7244   | -2.6937         | -2.7021       |
| 0.6446        | 1.96  | 7500 | 0.6446          | -0.4217        | -0.5814          | 0.6290             | 0.1596          | -1409.8003     | -1604.7235   | -2.6937         | -2.7021       |
| 0.6394        | 1.99  | 7600 | 0.6446          | -0.4217        | -0.5814          | 0.6290             | 0.1596          | -1409.8003     | -1604.7235   | -2.6937         | -2.7021       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.36.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.0