just1nseo commited on
Commit
3ec56ee
1 Parent(s): 4bbf97f

Model save

Browse files
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: alignment-handbook/zephyr-7b-sft-full
3
+ library_name: peft
4
+ license: apache-2.0
5
+ tags:
6
+ - trl
7
+ - dpo
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: zephyr-dpo-qlora-uf6k-5e-7
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # zephyr-dpo-qlora-uf6k-5e-7
18
+
19
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.6889
22
+ - Rewards/chosen: 0.0031
23
+ - Rewards/rejected: -0.0066
24
+ - Rewards/accuracies: 0.6760
25
+ - Rewards/margins: 0.0098
26
+ - Rewards/margins Max: 0.0449
27
+ - Rewards/margins Min: -0.0216
28
+ - Rewards/margins Std: 0.0219
29
+ - Logps/rejected: -259.2438
30
+ - Logps/chosen: -284.2815
31
+ - Logits/rejected: -2.7640
32
+ - Logits/chosen: -2.8026
33
+
34
+ ## Model description
35
+
36
+ More information needed
37
+
38
+ ## Intended uses & limitations
39
+
40
+ More information needed
41
+
42
+ ## Training and evaluation data
43
+
44
+ More information needed
45
+
46
+ ## Training procedure
47
+
48
+ ### Training hyperparameters
49
+
50
+ The following hyperparameters were used during training:
51
+ - learning_rate: 5e-07
52
+ - train_batch_size: 4
53
+ - eval_batch_size: 8
54
+ - seed: 42
55
+ - distributed_type: multi-GPU
56
+ - num_devices: 2
57
+ - gradient_accumulation_steps: 2
58
+ - total_train_batch_size: 16
59
+ - total_eval_batch_size: 16
60
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
61
+ - lr_scheduler_type: cosine
62
+ - lr_scheduler_warmup_ratio: 0.1
63
+ - num_epochs: 1
64
+
65
+ ### Training results
66
+
67
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
68
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:|
69
+ | 0.6914 | 0.3 | 100 | 0.6915 | -0.0004 | -0.0041 | 0.6460 | 0.0037 | 0.0187 | -0.0095 | 0.0093 | -258.9901 | -284.6319 | -2.7677 | -2.8065 |
70
+ | 0.6884 | 0.61 | 200 | 0.6895 | 0.0023 | -0.0061 | 0.6850 | 0.0084 | 0.0389 | -0.0189 | 0.0190 | -259.1880 | -284.3611 | -2.7665 | -2.8049 |
71
+ | 0.6873 | 0.91 | 300 | 0.6889 | 0.0031 | -0.0066 | 0.6760 | 0.0098 | 0.0449 | -0.0216 | 0.0219 | -259.2438 | -284.2815 | -2.7640 | -2.8026 |
72
+
73
+
74
+ ### Framework versions
75
+
76
+ - PEFT 0.7.1
77
+ - Transformers 4.39.0.dev0
78
+ - Pytorch 2.1.2+cu121
79
+ - Datasets 2.14.6
80
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9bf8f4d6e628f8c2bed2d64a9d682a4a0e4198588bdce1f515df1c751daa4f66
3
  size 671150064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af91c8567e56a7a7d7a09a5beac038fd988d31153db35e7f94e82025afe71566
3
  size 671150064
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.6900796745323483,
4
+ "train_runtime": 3893.3874,
5
+ "train_samples": 5263,
6
+ "train_samples_per_second": 1.352,
7
+ "train_steps_per_second": 0.085
8
+ }
runs/Jul25_08-24-43_notebook-deployment-48-7d9b6c99-khd85/events.out.tfevents.1721895976.notebook-deployment-48-7d9b6c99-khd85.3153936.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d8f7c04ac0807fca1e8aba3c1a04938bf3f7f49d5631fbd1ca64d4fd5f506621
3
- size 35193
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f64095a2cf0a5cbe4b46c1acb4cc7651c7e7c4c6308eb27d95cd8f0f6a50f253
3
+ size 37307
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.6900796745323483,
4
+ "train_runtime": 3893.3874,
5
+ "train_samples": 5263,
6
+ "train_samples_per_second": 1.352,
7
+ "train_steps_per_second": 0.085
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,681 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
+ "eval_steps": 100,
6
+ "global_step": 329,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "grad_norm": 2.0906090021590473,
14
+ "learning_rate": 1.5151515151515152e-08,
15
+ "logits/chosen": -2.6820077896118164,
16
+ "logits/rejected": -2.6930205821990967,
17
+ "logps/chosen": -281.2528381347656,
18
+ "logps/rejected": -258.0622253417969,
19
+ "loss": 0.6931,
20
+ "rewards/accuracies": 0.0,
21
+ "rewards/chosen": 0.0,
22
+ "rewards/margins": 0.0,
23
+ "rewards/margins_max": 0.0,
24
+ "rewards/margins_min": 0.0,
25
+ "rewards/margins_std": 0.0,
26
+ "rewards/rejected": 0.0,
27
+ "step": 1
28
+ },
29
+ {
30
+ "epoch": 0.03,
31
+ "grad_norm": 2.1561337868153565,
32
+ "learning_rate": 1.5151515151515152e-07,
33
+ "logits/chosen": -2.7683067321777344,
34
+ "logits/rejected": -2.7538461685180664,
35
+ "logps/chosen": -284.59912109375,
36
+ "logps/rejected": -249.83580017089844,
37
+ "loss": 0.6931,
38
+ "rewards/accuracies": 0.3888888955116272,
39
+ "rewards/chosen": 5.0317983550485224e-05,
40
+ "rewards/margins": -0.00015015894314274192,
41
+ "rewards/margins_max": 0.0020335663575679064,
42
+ "rewards/margins_min": -0.0025187418796122074,
43
+ "rewards/margins_std": 0.0020784963853657246,
44
+ "rewards/rejected": 0.00020047693396918476,
45
+ "step": 10
46
+ },
47
+ {
48
+ "epoch": 0.06,
49
+ "grad_norm": 1.9751920510122447,
50
+ "learning_rate": 3.0303030303030305e-07,
51
+ "logits/chosen": -2.8347439765930176,
52
+ "logits/rejected": -2.7819018363952637,
53
+ "logps/chosen": -291.50921630859375,
54
+ "logps/rejected": -270.4449768066406,
55
+ "loss": 0.693,
56
+ "rewards/accuracies": 0.5625,
57
+ "rewards/chosen": 0.00022530628484673798,
58
+ "rewards/margins": 0.0007139825029298663,
59
+ "rewards/margins_max": 0.004345210734754801,
60
+ "rewards/margins_min": -0.002943573985248804,
61
+ "rewards/margins_std": 0.0032876902259886265,
62
+ "rewards/rejected": -0.0004886762471869588,
63
+ "step": 20
64
+ },
65
+ {
66
+ "epoch": 0.09,
67
+ "grad_norm": 1.6654288543237405,
68
+ "learning_rate": 4.545454545454545e-07,
69
+ "logits/chosen": -2.8627753257751465,
70
+ "logits/rejected": -2.8151745796203613,
71
+ "logps/chosen": -259.2825927734375,
72
+ "logps/rejected": -227.37350463867188,
73
+ "loss": 0.6932,
74
+ "rewards/accuracies": 0.5,
75
+ "rewards/chosen": -0.00037297833478078246,
76
+ "rewards/margins": -0.00020202626183163375,
77
+ "rewards/margins_max": 0.0030375297646969557,
78
+ "rewards/margins_min": -0.0033374775666743517,
79
+ "rewards/margins_std": 0.0028792533557862043,
80
+ "rewards/rejected": -0.00017095205839723349,
81
+ "step": 30
82
+ },
83
+ {
84
+ "epoch": 0.12,
85
+ "grad_norm": 1.6057052994928989,
86
+ "learning_rate": 4.993103596812268e-07,
87
+ "logits/chosen": -2.8291430473327637,
88
+ "logits/rejected": -2.7638001441955566,
89
+ "logps/chosen": -317.513916015625,
90
+ "logps/rejected": -224.7698211669922,
91
+ "loss": 0.6927,
92
+ "rewards/accuracies": 0.625,
93
+ "rewards/chosen": 0.000282805209280923,
94
+ "rewards/margins": 0.0011336280731484294,
95
+ "rewards/margins_max": 0.005270844791084528,
96
+ "rewards/margins_min": -0.002321633044630289,
97
+ "rewards/margins_std": 0.003387246746569872,
98
+ "rewards/rejected": -0.000850822776556015,
99
+ "step": 40
100
+ },
101
+ {
102
+ "epoch": 0.15,
103
+ "grad_norm": 1.75475734119915,
104
+ "learning_rate": 4.959416858332709e-07,
105
+ "logits/chosen": -2.79063081741333,
106
+ "logits/rejected": -2.804368495941162,
107
+ "logps/chosen": -242.9667510986328,
108
+ "logps/rejected": -280.0011901855469,
109
+ "loss": 0.6926,
110
+ "rewards/accuracies": 0.574999988079071,
111
+ "rewards/chosen": -0.00031891773687675595,
112
+ "rewards/margins": 0.0008532041683793068,
113
+ "rewards/margins_max": 0.004698004573583603,
114
+ "rewards/margins_min": -0.002724443329498172,
115
+ "rewards/margins_std": 0.00329922279343009,
116
+ "rewards/rejected": -0.0011721218470484018,
117
+ "step": 50
118
+ },
119
+ {
120
+ "epoch": 0.18,
121
+ "grad_norm": 1.91854731579599,
122
+ "learning_rate": 4.898051734555674e-07,
123
+ "logits/chosen": -2.8335373401641846,
124
+ "logits/rejected": -2.8440303802490234,
125
+ "logps/chosen": -321.90625,
126
+ "logps/rejected": -283.37994384765625,
127
+ "loss": 0.6921,
128
+ "rewards/accuracies": 0.6499999761581421,
129
+ "rewards/chosen": 0.00014833270688541234,
130
+ "rewards/margins": 0.0021868678741157055,
131
+ "rewards/margins_max": 0.008168894797563553,
132
+ "rewards/margins_min": -0.0031033740378916264,
133
+ "rewards/margins_std": 0.005018442869186401,
134
+ "rewards/rejected": -0.0020385351963341236,
135
+ "step": 60
136
+ },
137
+ {
138
+ "epoch": 0.21,
139
+ "grad_norm": 1.5964843630528198,
140
+ "learning_rate": 4.809698831278217e-07,
141
+ "logits/chosen": -2.748741865158081,
142
+ "logits/rejected": -2.735199213027954,
143
+ "logps/chosen": -266.52606201171875,
144
+ "logps/rejected": -246.6175079345703,
145
+ "loss": 0.6922,
146
+ "rewards/accuracies": 0.6499999761581421,
147
+ "rewards/chosen": -0.0004858696775045246,
148
+ "rewards/margins": 0.0019440820906311274,
149
+ "rewards/margins_max": 0.008044283837080002,
150
+ "rewards/margins_min": -0.003816543845459819,
151
+ "rewards/margins_std": 0.005177702754735947,
152
+ "rewards/rejected": -0.002429951447993517,
153
+ "step": 70
154
+ },
155
+ {
156
+ "epoch": 0.24,
157
+ "grad_norm": 2.1014040068240663,
158
+ "learning_rate": 4.6953524759527053e-07,
159
+ "logits/chosen": -2.8426356315612793,
160
+ "logits/rejected": -2.8158562183380127,
161
+ "logps/chosen": -282.353515625,
162
+ "logps/rejected": -275.220458984375,
163
+ "loss": 0.6918,
164
+ "rewards/accuracies": 0.6499999761581421,
165
+ "rewards/chosen": -0.000800526118837297,
166
+ "rewards/margins": 0.0022392510436475277,
167
+ "rewards/margins_max": 0.00997895933687687,
168
+ "rewards/margins_min": -0.005192113574594259,
169
+ "rewards/margins_std": 0.0067059798166155815,
170
+ "rewards/rejected": -0.0030397772789001465,
171
+ "step": 80
172
+ },
173
+ {
174
+ "epoch": 0.27,
175
+ "grad_norm": 1.9914287244112958,
176
+ "learning_rate": 4.5562995274820283e-07,
177
+ "logits/chosen": -2.7992029190063477,
178
+ "logits/rejected": -2.746138095855713,
179
+ "logps/chosen": -295.78399658203125,
180
+ "logps/rejected": -291.9333190917969,
181
+ "loss": 0.6919,
182
+ "rewards/accuracies": 0.5625,
183
+ "rewards/chosen": -0.002320217899978161,
184
+ "rewards/margins": 0.001351921702735126,
185
+ "rewards/margins_max": 0.010480575263500214,
186
+ "rewards/margins_min": -0.009322223253548145,
187
+ "rewards/margins_std": 0.008863108232617378,
188
+ "rewards/rejected": -0.0036721397191286087,
189
+ "step": 90
190
+ },
191
+ {
192
+ "epoch": 0.3,
193
+ "grad_norm": 1.6705180570693485,
194
+ "learning_rate": 4.394104893853007e-07,
195
+ "logits/chosen": -2.896794557571411,
196
+ "logits/rejected": -2.85756254196167,
197
+ "logps/chosen": -273.5906982421875,
198
+ "logps/rejected": -257.73284912109375,
199
+ "loss": 0.6914,
200
+ "rewards/accuracies": 0.7875000238418579,
201
+ "rewards/chosen": -0.0012845676392316818,
202
+ "rewards/margins": 0.005008908919990063,
203
+ "rewards/margins_max": 0.013928805477917194,
204
+ "rewards/margins_min": -0.003106380347162485,
205
+ "rewards/margins_std": 0.007625125348567963,
206
+ "rewards/rejected": -0.006293477024883032,
207
+ "step": 100
208
+ },
209
+ {
210
+ "epoch": 0.3,
211
+ "eval_logits/chosen": -2.806475877761841,
212
+ "eval_logits/rejected": -2.767702102661133,
213
+ "eval_logps/chosen": -284.6319274902344,
214
+ "eval_logps/rejected": -258.9901123046875,
215
+ "eval_loss": 0.691525399684906,
216
+ "eval_rewards/accuracies": 0.6460000276565552,
217
+ "eval_rewards/chosen": -0.00038527295691892505,
218
+ "eval_rewards/margins": 0.003726556431502104,
219
+ "eval_rewards/margins_max": 0.01873905211687088,
220
+ "eval_rewards/margins_min": -0.009459242224693298,
221
+ "eval_rewards/margins_std": 0.00931489747017622,
222
+ "eval_rewards/rejected": -0.004111829213798046,
223
+ "eval_runtime": 428.4684,
224
+ "eval_samples_per_second": 4.668,
225
+ "eval_steps_per_second": 0.292,
226
+ "step": 100
227
+ },
228
+ {
229
+ "epoch": 0.33,
230
+ "grad_norm": 2.1453641236519685,
231
+ "learning_rate": 4.2105939205932005e-07,
232
+ "logits/chosen": -2.7631096839904785,
233
+ "logits/rejected": -2.746663808822632,
234
+ "logps/chosen": -311.8393249511719,
235
+ "logps/rejected": -235.84280395507812,
236
+ "loss": 0.6911,
237
+ "rewards/accuracies": 0.625,
238
+ "rewards/chosen": -0.0008311712299473584,
239
+ "rewards/margins": 0.0033794320188462734,
240
+ "rewards/margins_max": 0.013278109021484852,
241
+ "rewards/margins_min": -0.00541637372225523,
242
+ "rewards/margins_std": 0.008299448527395725,
243
+ "rewards/rejected": -0.0042106034234166145,
244
+ "step": 110
245
+ },
246
+ {
247
+ "epoch": 0.36,
248
+ "grad_norm": 2.024896986425123,
249
+ "learning_rate": 4.0078318482522114e-07,
250
+ "logits/chosen": -2.7521708011627197,
251
+ "logits/rejected": -2.750868082046509,
252
+ "logps/chosen": -323.51666259765625,
253
+ "logps/rejected": -274.75970458984375,
254
+ "loss": 0.6909,
255
+ "rewards/accuracies": 0.75,
256
+ "rewards/chosen": 0.0004785500350408256,
257
+ "rewards/margins": 0.004080395679920912,
258
+ "rewards/margins_max": 0.015328818932175636,
259
+ "rewards/margins_min": -0.0073760440573096275,
260
+ "rewards/margins_std": 0.00990099273622036,
261
+ "rewards/rejected": -0.0036018460523337126,
262
+ "step": 120
263
+ },
264
+ {
265
+ "epoch": 0.4,
266
+ "grad_norm": 1.6346525930252072,
267
+ "learning_rate": 3.7881005700938627e-07,
268
+ "logits/chosen": -2.8206729888916016,
269
+ "logits/rejected": -2.8308663368225098,
270
+ "logps/chosen": -266.37469482421875,
271
+ "logps/rejected": -234.52035522460938,
272
+ "loss": 0.6906,
273
+ "rewards/accuracies": 0.6625000238418579,
274
+ "rewards/chosen": 0.00018907712365034968,
275
+ "rewards/margins": 0.00421659741550684,
276
+ "rewards/margins_max": 0.015676384791731834,
277
+ "rewards/margins_min": -0.007556927390396595,
278
+ "rewards/margins_std": 0.010246575810015202,
279
+ "rewards/rejected": -0.004027520306408405,
280
+ "step": 130
281
+ },
282
+ {
283
+ "epoch": 0.43,
284
+ "grad_norm": 1.9044203185149946,
285
+ "learning_rate": 3.5538729515692354e-07,
286
+ "logits/chosen": -2.780360460281372,
287
+ "logits/rejected": -2.7639012336730957,
288
+ "logps/chosen": -294.11309814453125,
289
+ "logps/rejected": -270.84710693359375,
290
+ "loss": 0.6896,
291
+ "rewards/accuracies": 0.7124999761581421,
292
+ "rewards/chosen": 0.0028018890880048275,
293
+ "rewards/margins": 0.007480897009372711,
294
+ "rewards/margins_max": 0.021374408155679703,
295
+ "rewards/margins_min": -0.0061719887889921665,
296
+ "rewards/margins_std": 0.01222699973732233,
297
+ "rewards/rejected": -0.004679008387029171,
298
+ "step": 140
299
+ },
300
+ {
301
+ "epoch": 0.46,
302
+ "grad_norm": 1.4256559970133287,
303
+ "learning_rate": 3.3077850005803125e-07,
304
+ "logits/chosen": -2.8410263061523438,
305
+ "logits/rejected": -2.8195314407348633,
306
+ "logps/chosen": -270.49615478515625,
307
+ "logps/rejected": -245.65200805664062,
308
+ "loss": 0.6903,
309
+ "rewards/accuracies": 0.699999988079071,
310
+ "rewards/chosen": 0.0014524383004754782,
311
+ "rewards/margins": 0.006768654100596905,
312
+ "rewards/margins_max": 0.025039460510015488,
313
+ "rewards/margins_min": -0.01076546311378479,
314
+ "rewards/margins_std": 0.015858832746744156,
315
+ "rewards/rejected": -0.0053162164986133575,
316
+ "step": 150
317
+ },
318
+ {
319
+ "epoch": 0.49,
320
+ "grad_norm": 2.1265057109843077,
321
+ "learning_rate": 3.0526062017313247e-07,
322
+ "logits/chosen": -2.79884672164917,
323
+ "logits/rejected": -2.7815585136413574,
324
+ "logps/chosen": -255.3964080810547,
325
+ "logps/rejected": -241.00271606445312,
326
+ "loss": 0.6909,
327
+ "rewards/accuracies": 0.637499988079071,
328
+ "rewards/chosen": 0.0010771710658445954,
329
+ "rewards/margins": 0.005134746432304382,
330
+ "rewards/margins_max": 0.022996146231889725,
331
+ "rewards/margins_min": -0.009730304591357708,
332
+ "rewards/margins_std": 0.014861812815070152,
333
+ "rewards/rejected": -0.004057575948536396,
334
+ "step": 160
335
+ },
336
+ {
337
+ "epoch": 0.52,
338
+ "grad_norm": 1.59020242230242,
339
+ "learning_rate": 2.791208348427426e-07,
340
+ "logits/chosen": -2.814671039581299,
341
+ "logits/rejected": -2.732504367828369,
342
+ "logps/chosen": -303.4354553222656,
343
+ "logps/rejected": -273.4683837890625,
344
+ "loss": 0.6887,
345
+ "rewards/accuracies": 0.699999988079071,
346
+ "rewards/chosen": 0.002831272780895233,
347
+ "rewards/margins": 0.008308259770274162,
348
+ "rewards/margins_max": 0.02398056350648403,
349
+ "rewards/margins_min": -0.007372391410171986,
350
+ "rewards/margins_std": 0.014078010804951191,
351
+ "rewards/rejected": -0.005476987920701504,
352
+ "step": 170
353
+ },
354
+ {
355
+ "epoch": 0.55,
356
+ "grad_norm": 1.791424187804565,
357
+ "learning_rate": 2.526533223585641e-07,
358
+ "logits/chosen": -2.8398988246917725,
359
+ "logits/rejected": -2.775310754776001,
360
+ "logps/chosen": -256.0347595214844,
361
+ "logps/rejected": -229.332763671875,
362
+ "loss": 0.6897,
363
+ "rewards/accuracies": 0.6499999761581421,
364
+ "rewards/chosen": 0.0009899451397359371,
365
+ "rewards/margins": 0.005457731895148754,
366
+ "rewards/margins_max": 0.021505217999219894,
367
+ "rewards/margins_min": -0.008436702191829681,
368
+ "rewards/margins_std": 0.013385000638663769,
369
+ "rewards/rejected": -0.00446778628975153,
370
+ "step": 180
371
+ },
372
+ {
373
+ "epoch": 0.58,
374
+ "grad_norm": 1.7305988753672774,
375
+ "learning_rate": 2.261559492680755e-07,
376
+ "logits/chosen": -2.781790256500244,
377
+ "logits/rejected": -2.7643322944641113,
378
+ "logps/chosen": -300.09393310546875,
379
+ "logps/rejected": -271.13116455078125,
380
+ "loss": 0.6891,
381
+ "rewards/accuracies": 0.7250000238418579,
382
+ "rewards/chosen": 0.004870180506259203,
383
+ "rewards/margins": 0.0101470947265625,
384
+ "rewards/margins_max": 0.03561341017484665,
385
+ "rewards/margins_min": -0.00919102318584919,
386
+ "rewards/margins_std": 0.019990354776382446,
387
+ "rewards/rejected": -0.005276912357658148,
388
+ "step": 190
389
+ },
390
+ {
391
+ "epoch": 0.61,
392
+ "grad_norm": 2.169958133205736,
393
+ "learning_rate": 1.9992691817133024e-07,
394
+ "logits/chosen": -2.7859396934509277,
395
+ "logits/rejected": -2.755178213119507,
396
+ "logps/chosen": -281.18170166015625,
397
+ "logps/rejected": -288.84930419921875,
398
+ "loss": 0.6884,
399
+ "rewards/accuracies": 0.699999988079071,
400
+ "rewards/chosen": 0.0041890377178788185,
401
+ "rewards/margins": 0.009892629459500313,
402
+ "rewards/margins_max": 0.03310906141996384,
403
+ "rewards/margins_min": -0.012689237482845783,
404
+ "rewards/margins_std": 0.02011021040380001,
405
+ "rewards/rejected": -0.005703592207282782,
406
+ "step": 200
407
+ },
408
+ {
409
+ "epoch": 0.61,
410
+ "eval_logits/chosen": -2.804927349090576,
411
+ "eval_logits/rejected": -2.766470432281494,
412
+ "eval_logps/chosen": -284.3610534667969,
413
+ "eval_logps/rejected": -259.1879577636719,
414
+ "eval_loss": 0.6895014643669128,
415
+ "eval_rewards/accuracies": 0.6850000023841858,
416
+ "eval_rewards/chosen": 0.0023234861437231302,
417
+ "eval_rewards/margins": 0.008414038456976414,
418
+ "eval_rewards/margins_max": 0.038945525884628296,
419
+ "eval_rewards/margins_min": -0.018883490934967995,
420
+ "eval_rewards/margins_std": 0.018971558660268784,
421
+ "eval_rewards/rejected": -0.006090551149100065,
422
+ "eval_runtime": 427.7798,
423
+ "eval_samples_per_second": 4.675,
424
+ "eval_steps_per_second": 0.292,
425
+ "step": 200
426
+ },
427
+ {
428
+ "epoch": 0.64,
429
+ "grad_norm": 1.9906194761669704,
430
+ "learning_rate": 1.742614117358029e-07,
431
+ "logits/chosen": -2.80131196975708,
432
+ "logits/rejected": -2.7576537132263184,
433
+ "logps/chosen": -304.849853515625,
434
+ "logps/rejected": -289.08197021484375,
435
+ "loss": 0.6877,
436
+ "rewards/accuracies": 0.699999988079071,
437
+ "rewards/chosen": 0.0046735843643546104,
438
+ "rewards/margins": 0.012557747773826122,
439
+ "rewards/margins_max": 0.03481978923082352,
440
+ "rewards/margins_min": -0.00802917592227459,
441
+ "rewards/margins_std": 0.019201457500457764,
442
+ "rewards/rejected": -0.007884165272116661,
443
+ "step": 210
444
+ },
445
+ {
446
+ "epoch": 0.67,
447
+ "grad_norm": 1.9658311065665528,
448
+ "learning_rate": 1.4944827069769122e-07,
449
+ "logits/chosen": -2.851292133331299,
450
+ "logits/rejected": -2.8257217407226562,
451
+ "logps/chosen": -312.4863586425781,
452
+ "logps/rejected": -266.73626708984375,
453
+ "loss": 0.6891,
454
+ "rewards/accuracies": 0.675000011920929,
455
+ "rewards/chosen": 0.004867873154580593,
456
+ "rewards/margins": 0.008174732327461243,
457
+ "rewards/margins_max": 0.028449540957808495,
458
+ "rewards/margins_min": -0.011079356074333191,
459
+ "rewards/margins_std": 0.01735488697886467,
460
+ "rewards/rejected": -0.003306858241558075,
461
+ "step": 220
462
+ },
463
+ {
464
+ "epoch": 0.7,
465
+ "grad_norm": 1.8987994692738805,
466
+ "learning_rate": 1.2576674323558928e-07,
467
+ "logits/chosen": -2.821254014968872,
468
+ "logits/rejected": -2.8421223163604736,
469
+ "logps/chosen": -288.6875,
470
+ "logps/rejected": -263.0277099609375,
471
+ "loss": 0.6906,
472
+ "rewards/accuracies": 0.574999988079071,
473
+ "rewards/chosen": -0.00041991579928435385,
474
+ "rewards/margins": 0.0022654212079942226,
475
+ "rewards/margins_max": 0.024668732658028603,
476
+ "rewards/margins_min": -0.022191215306520462,
477
+ "rewards/margins_std": 0.020731808617711067,
478
+ "rewards/rejected": -0.002685337094590068,
479
+ "step": 230
480
+ },
481
+ {
482
+ "epoch": 0.73,
483
+ "grad_norm": 2.049682090113544,
484
+ "learning_rate": 1.0348334229922676e-07,
485
+ "logits/chosen": -2.877260684967041,
486
+ "logits/rejected": -2.8300554752349854,
487
+ "logps/chosen": -290.80633544921875,
488
+ "logps/rejected": -275.846435546875,
489
+ "loss": 0.6893,
490
+ "rewards/accuracies": 0.7124999761581421,
491
+ "rewards/chosen": 0.0021143355406820774,
492
+ "rewards/margins": 0.00877899769693613,
493
+ "rewards/margins_max": 0.03138250857591629,
494
+ "rewards/margins_min": -0.01147426012903452,
495
+ "rewards/margins_std": 0.019380424171686172,
496
+ "rewards/rejected": -0.006664662156254053,
497
+ "step": 240
498
+ },
499
+ {
500
+ "epoch": 0.76,
501
+ "grad_norm": 2.008481756505904,
502
+ "learning_rate": 8.284884626103164e-08,
503
+ "logits/chosen": -2.817871570587158,
504
+ "logits/rejected": -2.786424398422241,
505
+ "logps/chosen": -300.6135559082031,
506
+ "logps/rejected": -305.0606994628906,
507
+ "loss": 0.6882,
508
+ "rewards/accuracies": 0.7124999761581421,
509
+ "rewards/chosen": 0.0047633713111281395,
510
+ "rewards/margins": 0.009853017516434193,
511
+ "rewards/margins_max": 0.034555986523628235,
512
+ "rewards/margins_min": -0.011890431866049767,
513
+ "rewards/margins_std": 0.020635981112718582,
514
+ "rewards/rejected": -0.0050896452739834785,
515
+ "step": 250
516
+ },
517
+ {
518
+ "epoch": 0.79,
519
+ "grad_norm": 2.120277542804363,
520
+ "learning_rate": 6.409547664531733e-08,
521
+ "logits/chosen": -2.844655752182007,
522
+ "logits/rejected": -2.811575412750244,
523
+ "logps/chosen": -333.072265625,
524
+ "logps/rejected": -312.94317626953125,
525
+ "loss": 0.6874,
526
+ "rewards/accuracies": 0.762499988079071,
527
+ "rewards/chosen": 0.009096643887460232,
528
+ "rewards/margins": 0.013471168465912342,
529
+ "rewards/margins_max": 0.0355917289853096,
530
+ "rewards/margins_min": -0.005764602217823267,
531
+ "rewards/margins_std": 0.018313560634851456,
532
+ "rewards/rejected": -0.00437452457845211,
533
+ "step": 260
534
+ },
535
+ {
536
+ "epoch": 0.82,
537
+ "grad_norm": 2.015529778175616,
538
+ "learning_rate": 4.743428469705335e-08,
539
+ "logits/chosen": -2.7949509620666504,
540
+ "logits/rejected": -2.7894396781921387,
541
+ "logps/chosen": -303.4598693847656,
542
+ "logps/rejected": -308.66522216796875,
543
+ "loss": 0.6889,
544
+ "rewards/accuracies": 0.6625000238418579,
545
+ "rewards/chosen": 0.003480118466541171,
546
+ "rewards/margins": 0.010316994972527027,
547
+ "rewards/margins_max": 0.033331625163555145,
548
+ "rewards/margins_min": -0.010707234963774681,
549
+ "rewards/margins_std": 0.01953895017504692,
550
+ "rewards/rejected": -0.006836875341832638,
551
+ "step": 270
552
+ },
553
+ {
554
+ "epoch": 0.85,
555
+ "grad_norm": 2.1162209644793024,
556
+ "learning_rate": 3.305277620188826e-08,
557
+ "logits/chosen": -2.844252347946167,
558
+ "logits/rejected": -2.8254075050354004,
559
+ "logps/chosen": -324.8486633300781,
560
+ "logps/rejected": -270.613037109375,
561
+ "loss": 0.6865,
562
+ "rewards/accuracies": 0.75,
563
+ "rewards/chosen": 0.0071704513393342495,
564
+ "rewards/margins": 0.015561411157250404,
565
+ "rewards/margins_max": 0.041363365948200226,
566
+ "rewards/margins_min": -0.010495706461369991,
567
+ "rewards/margins_std": 0.0231755543500185,
568
+ "rewards/rejected": -0.008390960283577442,
569
+ "step": 280
570
+ },
571
+ {
572
+ "epoch": 0.88,
573
+ "grad_norm": 1.7280929479679055,
574
+ "learning_rate": 2.1112801287806375e-08,
575
+ "logits/chosen": -2.783881187438965,
576
+ "logits/rejected": -2.747999668121338,
577
+ "logps/chosen": -273.90185546875,
578
+ "logps/rejected": -246.3704833984375,
579
+ "loss": 0.6881,
580
+ "rewards/accuracies": 0.699999988079071,
581
+ "rewards/chosen": 0.0028805662877857685,
582
+ "rewards/margins": 0.011384439654648304,
583
+ "rewards/margins_max": 0.036729536950588226,
584
+ "rewards/margins_min": -0.009171558544039726,
585
+ "rewards/margins_std": 0.021134525537490845,
586
+ "rewards/rejected": -0.008503873832523823,
587
+ "step": 290
588
+ },
589
+ {
590
+ "epoch": 0.91,
591
+ "grad_norm": 1.8137366581326853,
592
+ "learning_rate": 1.1748732956682023e-08,
593
+ "logits/chosen": -2.878770351409912,
594
+ "logits/rejected": -2.8104898929595947,
595
+ "logps/chosen": -323.51312255859375,
596
+ "logps/rejected": -286.44964599609375,
597
+ "loss": 0.6873,
598
+ "rewards/accuracies": 0.699999988079071,
599
+ "rewards/chosen": 0.0020009407307952642,
600
+ "rewards/margins": 0.010605795308947563,
601
+ "rewards/margins_max": 0.03404298424720764,
602
+ "rewards/margins_min": -0.010880110785365105,
603
+ "rewards/margins_std": 0.020114842802286148,
604
+ "rewards/rejected": -0.008604854345321655,
605
+ "step": 300
606
+ },
607
+ {
608
+ "epoch": 0.91,
609
+ "eval_logits/chosen": -2.802642822265625,
610
+ "eval_logits/rejected": -2.7640159130096436,
611
+ "eval_logps/chosen": -284.2815246582031,
612
+ "eval_logps/rejected": -259.24383544921875,
613
+ "eval_loss": 0.6889453530311584,
614
+ "eval_rewards/accuracies": 0.6759999990463257,
615
+ "eval_rewards/chosen": 0.0031190679874271154,
616
+ "eval_rewards/margins": 0.009768038988113403,
617
+ "eval_rewards/margins_max": 0.044922519475221634,
618
+ "eval_rewards/margins_min": -0.021590130403637886,
619
+ "eval_rewards/margins_std": 0.021896740421652794,
620
+ "eval_rewards/rejected": -0.006648970767855644,
621
+ "eval_runtime": 427.9336,
622
+ "eval_samples_per_second": 4.674,
623
+ "eval_steps_per_second": 0.292,
624
+ "step": 300
625
+ },
626
+ {
627
+ "epoch": 0.94,
628
+ "grad_norm": 1.5476563917272619,
629
+ "learning_rate": 5.065954844616721e-09,
630
+ "logits/chosen": -2.8241655826568604,
631
+ "logits/rejected": -2.7778286933898926,
632
+ "logps/chosen": -276.5940856933594,
633
+ "logps/rejected": -281.5748596191406,
634
+ "loss": 0.6885,
635
+ "rewards/accuracies": 0.699999988079071,
636
+ "rewards/chosen": 0.005276383366435766,
637
+ "rewards/margins": 0.010391583666205406,
638
+ "rewards/margins_max": 0.036186523735523224,
639
+ "rewards/margins_min": -0.010774780064821243,
640
+ "rewards/margins_std": 0.02108721435070038,
641
+ "rewards/rejected": -0.005115201231092215,
642
+ "step": 310
643
+ },
644
+ {
645
+ "epoch": 0.97,
646
+ "grad_norm": 1.9217088208809332,
647
+ "learning_rate": 1.1396752298723499e-09,
648
+ "logits/chosen": -2.8640575408935547,
649
+ "logits/rejected": -2.8119149208068848,
650
+ "logps/chosen": -249.0362548828125,
651
+ "logps/rejected": -258.521484375,
652
+ "loss": 0.6879,
653
+ "rewards/accuracies": 0.6625000238418579,
654
+ "rewards/chosen": -0.0009104462224058807,
655
+ "rewards/margins": 0.008900880813598633,
656
+ "rewards/margins_max": 0.02946281060576439,
657
+ "rewards/margins_min": -0.010788346640765667,
658
+ "rewards/margins_std": 0.017393799498677254,
659
+ "rewards/rejected": -0.009811325930058956,
660
+ "step": 320
661
+ },
662
+ {
663
+ "epoch": 1.0,
664
+ "step": 329,
665
+ "total_flos": 0.0,
666
+ "train_loss": 0.6900796745323483,
667
+ "train_runtime": 3893.3874,
668
+ "train_samples_per_second": 1.352,
669
+ "train_steps_per_second": 0.085
670
+ }
671
+ ],
672
+ "logging_steps": 10,
673
+ "max_steps": 329,
674
+ "num_input_tokens_seen": 0,
675
+ "num_train_epochs": 1,
676
+ "save_steps": 100,
677
+ "total_flos": 0.0,
678
+ "train_batch_size": 4,
679
+ "trial_name": null,
680
+ "trial_params": null
681
+ }