nemik commited on
Commit
c56f9ea
·
verified ·
1 Parent(s): 6e14f3f

End of training

Browse files
README.md CHANGED
@@ -25,13 +25,13 @@ model-index:
25
  metrics:
26
  - name: Accuracy
27
  type: accuracy
28
- value: 0.9283185840707965
29
  - name: F1
30
  type: f1
31
- value: 0.8171557562076749
32
  - name: Precision
33
  type: precision
34
- value: 0.8341013824884793
35
  - name: Recall
36
  type: recall
37
  value: 0.8008849557522124
@@ -44,10 +44,10 @@ should probably proofread and complete it, then remove this comment. -->
44
 
45
  This model is a fine-tuned version of [apple/mobilevitv2-1.0-imagenet1k-256](https://huggingface.co/apple/mobilevitv2-1.0-imagenet1k-256) on the webdataset dataset.
46
  It achieves the following results on the evaluation set:
47
- - Loss: 0.1955
48
- - Accuracy: 0.9283
49
- - F1: 0.8172
50
- - Precision: 0.8341
51
  - Recall: 0.8009
52
 
53
  ## Model description
 
25
  metrics:
26
  - name: Accuracy
27
  type: accuracy
28
+ value: 0.9309734513274336
29
  - name: F1
30
  type: f1
31
+ value: 0.8227272727272726
32
  - name: Precision
33
  type: precision
34
+ value: 0.8457943925233645
35
  - name: Recall
36
  type: recall
37
  value: 0.8008849557522124
 
44
 
45
  This model is a fine-tuned version of [apple/mobilevitv2-1.0-imagenet1k-256](https://huggingface.co/apple/mobilevitv2-1.0-imagenet1k-256) on the webdataset dataset.
46
  It achieves the following results on the evaluation set:
47
+ - Loss: 0.1896
48
+ - Accuracy: 0.9310
49
+ - F1: 0.8227
50
+ - Precision: 0.8458
51
  - Recall: 0.8009
52
 
53
  ## Model description
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 30.0,
3
+ "eval_accuracy": 0.9309734513274336,
4
+ "eval_f1": 0.8227272727272726,
5
+ "eval_loss": 0.18961240351200104,
6
+ "eval_precision": 0.8457943925233645,
7
+ "eval_recall": 0.8008849557522124,
8
+ "eval_runtime": 1.0942,
9
+ "eval_samples_per_second": 103.267,
10
+ "eval_steps_per_second": 13.708,
11
+ "total_flos": 1.9916656541540352e+17,
12
+ "train_loss": 0.1991663834390541,
13
+ "train_runtime": 388.9258,
14
+ "train_samples_per_second": 78.061,
15
+ "train_steps_per_second": 4.937
16
+ }
eval_results.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 30.0,
3
+ "eval_accuracy": 0.9309734513274336,
4
+ "eval_f1": 0.8227272727272726,
5
+ "eval_loss": 0.18961240351200104,
6
+ "eval_precision": 0.8457943925233645,
7
+ "eval_recall": 0.8008849557522124,
8
+ "eval_runtime": 1.0942,
9
+ "eval_samples_per_second": 103.267,
10
+ "eval_steps_per_second": 13.708
11
+ }
runs/Jul25_17-40-11_8840352ede30/events.out.tfevents.1721929633.8840352ede30.506.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95e54f885eef0946742d39a8ef61697298dcba7ec66c3e33fb7a0bc72a94b7b2
3
+ size 560
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 30.0,
3
+ "total_flos": 1.9916656541540352e+17,
4
+ "train_loss": 0.1991663834390541,
5
+ "train_runtime": 388.9258,
6
+ "train_samples_per_second": 78.061,
7
+ "train_steps_per_second": 4.937
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1614 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.18961240351200104,
3
+ "best_model_checkpoint": "mobilevitv2-1.0-imagenet1k-256-finetuned_v2024-7-25-frost/checkpoint-1000",
4
+ "epoch": 30.0,
5
+ "eval_steps": 100,
6
+ "global_step": 1920,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.15625,
13
+ "grad_norm": 0.2519198954105377,
14
+ "learning_rate": 1.0416666666666668e-05,
15
+ "loss": 0.6958,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.3125,
20
+ "grad_norm": 0.21561957895755768,
21
+ "learning_rate": 2.0833333333333336e-05,
22
+ "loss": 0.6958,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.46875,
27
+ "grad_norm": 0.2186066061258316,
28
+ "learning_rate": 3.125e-05,
29
+ "loss": 0.6948,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.625,
34
+ "grad_norm": 0.22916783392429352,
35
+ "learning_rate": 4.166666666666667e-05,
36
+ "loss": 0.6934,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.78125,
41
+ "grad_norm": 0.23298271000385284,
42
+ "learning_rate": 5.208333333333334e-05,
43
+ "loss": 0.691,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.9375,
48
+ "grad_norm": 0.24950967729091644,
49
+ "learning_rate": 6.25e-05,
50
+ "loss": 0.6895,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 1.09375,
55
+ "grad_norm": 0.23937764763832092,
56
+ "learning_rate": 7.291666666666667e-05,
57
+ "loss": 0.684,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 1.25,
62
+ "grad_norm": 0.23057785630226135,
63
+ "learning_rate": 8.333333333333334e-05,
64
+ "loss": 0.6794,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 1.40625,
69
+ "grad_norm": 0.2983661890029907,
70
+ "learning_rate": 9.375e-05,
71
+ "loss": 0.6751,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 1.5625,
76
+ "grad_norm": 0.2652019262313843,
77
+ "learning_rate": 0.00010416666666666667,
78
+ "loss": 0.6687,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 1.5625,
83
+ "eval_accuracy": 0.7230088495575221,
84
+ "eval_f1": 0.5335320417287631,
85
+ "eval_loss": 0.6623277068138123,
86
+ "eval_precision": 0.40224719101123596,
87
+ "eval_recall": 0.7920353982300885,
88
+ "eval_runtime": 1.2295,
89
+ "eval_samples_per_second": 91.909,
90
+ "eval_steps_per_second": 12.2,
91
+ "step": 100
92
+ },
93
+ {
94
+ "epoch": 1.71875,
95
+ "grad_norm": 0.34594935178756714,
96
+ "learning_rate": 0.00011458333333333333,
97
+ "loss": 0.6586,
98
+ "step": 110
99
+ },
100
+ {
101
+ "epoch": 1.875,
102
+ "grad_norm": 0.26900964975357056,
103
+ "learning_rate": 0.000125,
104
+ "loss": 0.6476,
105
+ "step": 120
106
+ },
107
+ {
108
+ "epoch": 2.03125,
109
+ "grad_norm": 0.30181896686553955,
110
+ "learning_rate": 0.0001354166666666667,
111
+ "loss": 0.6336,
112
+ "step": 130
113
+ },
114
+ {
115
+ "epoch": 2.1875,
116
+ "grad_norm": 0.33757150173187256,
117
+ "learning_rate": 0.00014583333333333335,
118
+ "loss": 0.6079,
119
+ "step": 140
120
+ },
121
+ {
122
+ "epoch": 2.34375,
123
+ "grad_norm": 0.54989093542099,
124
+ "learning_rate": 0.00015625,
125
+ "loss": 0.5823,
126
+ "step": 150
127
+ },
128
+ {
129
+ "epoch": 2.5,
130
+ "grad_norm": 0.6273936629295349,
131
+ "learning_rate": 0.0001666666666666667,
132
+ "loss": 0.5598,
133
+ "step": 160
134
+ },
135
+ {
136
+ "epoch": 2.65625,
137
+ "grad_norm": 0.46115073561668396,
138
+ "learning_rate": 0.00017708333333333335,
139
+ "loss": 0.5239,
140
+ "step": 170
141
+ },
142
+ {
143
+ "epoch": 2.8125,
144
+ "grad_norm": 0.47255411744117737,
145
+ "learning_rate": 0.0001875,
146
+ "loss": 0.4972,
147
+ "step": 180
148
+ },
149
+ {
150
+ "epoch": 2.96875,
151
+ "grad_norm": 0.522071361541748,
152
+ "learning_rate": 0.0001979166666666667,
153
+ "loss": 0.4617,
154
+ "step": 190
155
+ },
156
+ {
157
+ "epoch": 3.125,
158
+ "grad_norm": 1.0379081964492798,
159
+ "learning_rate": 0.0001990740740740741,
160
+ "loss": 0.4454,
161
+ "step": 200
162
+ },
163
+ {
164
+ "epoch": 3.125,
165
+ "eval_accuracy": 0.8831858407079646,
166
+ "eval_f1": 0.7490494296577946,
167
+ "eval_loss": 0.41519004106521606,
168
+ "eval_precision": 0.6566666666666666,
169
+ "eval_recall": 0.8716814159292036,
170
+ "eval_runtime": 0.9401,
171
+ "eval_samples_per_second": 120.203,
172
+ "eval_steps_per_second": 15.956,
173
+ "step": 200
174
+ },
175
+ {
176
+ "epoch": 3.28125,
177
+ "grad_norm": 0.545850932598114,
178
+ "learning_rate": 0.0001979166666666667,
179
+ "loss": 0.4247,
180
+ "step": 210
181
+ },
182
+ {
183
+ "epoch": 3.4375,
184
+ "grad_norm": 0.7891977429389954,
185
+ "learning_rate": 0.00019675925925925926,
186
+ "loss": 0.4041,
187
+ "step": 220
188
+ },
189
+ {
190
+ "epoch": 3.59375,
191
+ "grad_norm": 1.542927861213684,
192
+ "learning_rate": 0.00019560185185185186,
193
+ "loss": 0.3845,
194
+ "step": 230
195
+ },
196
+ {
197
+ "epoch": 3.75,
198
+ "grad_norm": 0.44223108887672424,
199
+ "learning_rate": 0.00019444444444444446,
200
+ "loss": 0.3278,
201
+ "step": 240
202
+ },
203
+ {
204
+ "epoch": 3.90625,
205
+ "grad_norm": 0.8207859396934509,
206
+ "learning_rate": 0.00019328703703703706,
207
+ "loss": 0.3507,
208
+ "step": 250
209
+ },
210
+ {
211
+ "epoch": 4.0625,
212
+ "grad_norm": 0.42037203907966614,
213
+ "learning_rate": 0.00019212962962962963,
214
+ "loss": 0.3481,
215
+ "step": 260
216
+ },
217
+ {
218
+ "epoch": 4.21875,
219
+ "grad_norm": 1.829210877418518,
220
+ "learning_rate": 0.00019097222222222223,
221
+ "loss": 0.3133,
222
+ "step": 270
223
+ },
224
+ {
225
+ "epoch": 4.375,
226
+ "grad_norm": 1.213773250579834,
227
+ "learning_rate": 0.00018981481481481483,
228
+ "loss": 0.2934,
229
+ "step": 280
230
+ },
231
+ {
232
+ "epoch": 4.53125,
233
+ "grad_norm": 0.6154431104660034,
234
+ "learning_rate": 0.00018865740740740743,
235
+ "loss": 0.2923,
236
+ "step": 290
237
+ },
238
+ {
239
+ "epoch": 4.6875,
240
+ "grad_norm": 0.3814193606376648,
241
+ "learning_rate": 0.0001875,
242
+ "loss": 0.2835,
243
+ "step": 300
244
+ },
245
+ {
246
+ "epoch": 4.6875,
247
+ "eval_accuracy": 0.9097345132743363,
248
+ "eval_f1": 0.7660550458715596,
249
+ "eval_loss": 0.26609960198402405,
250
+ "eval_precision": 0.7952380952380952,
251
+ "eval_recall": 0.7389380530973452,
252
+ "eval_runtime": 1.4414,
253
+ "eval_samples_per_second": 78.394,
254
+ "eval_steps_per_second": 10.406,
255
+ "step": 300
256
+ },
257
+ {
258
+ "epoch": 4.84375,
259
+ "grad_norm": 0.4515719711780548,
260
+ "learning_rate": 0.0001863425925925926,
261
+ "loss": 0.2672,
262
+ "step": 310
263
+ },
264
+ {
265
+ "epoch": 5.0,
266
+ "grad_norm": 2.077721357345581,
267
+ "learning_rate": 0.0001851851851851852,
268
+ "loss": 0.2914,
269
+ "step": 320
270
+ },
271
+ {
272
+ "epoch": 5.15625,
273
+ "grad_norm": 0.9644371867179871,
274
+ "learning_rate": 0.00018402777777777778,
275
+ "loss": 0.2527,
276
+ "step": 330
277
+ },
278
+ {
279
+ "epoch": 5.3125,
280
+ "grad_norm": 0.8245725631713867,
281
+ "learning_rate": 0.00018287037037037038,
282
+ "loss": 0.2477,
283
+ "step": 340
284
+ },
285
+ {
286
+ "epoch": 5.46875,
287
+ "grad_norm": 0.5262947082519531,
288
+ "learning_rate": 0.00018171296296296297,
289
+ "loss": 0.2452,
290
+ "step": 350
291
+ },
292
+ {
293
+ "epoch": 5.625,
294
+ "grad_norm": 0.6464282870292664,
295
+ "learning_rate": 0.00018055555555555557,
296
+ "loss": 0.2308,
297
+ "step": 360
298
+ },
299
+ {
300
+ "epoch": 5.78125,
301
+ "grad_norm": 0.6029626131057739,
302
+ "learning_rate": 0.00017939814814814815,
303
+ "loss": 0.233,
304
+ "step": 370
305
+ },
306
+ {
307
+ "epoch": 5.9375,
308
+ "grad_norm": 0.683201789855957,
309
+ "learning_rate": 0.00017824074074074075,
310
+ "loss": 0.2258,
311
+ "step": 380
312
+ },
313
+ {
314
+ "epoch": 6.09375,
315
+ "grad_norm": 0.5622811317443848,
316
+ "learning_rate": 0.00017708333333333335,
317
+ "loss": 0.2342,
318
+ "step": 390
319
+ },
320
+ {
321
+ "epoch": 6.25,
322
+ "grad_norm": 0.5229126214981079,
323
+ "learning_rate": 0.00017592592592592595,
324
+ "loss": 0.2197,
325
+ "step": 400
326
+ },
327
+ {
328
+ "epoch": 6.25,
329
+ "eval_accuracy": 0.9194690265486726,
330
+ "eval_f1": 0.7868852459016394,
331
+ "eval_loss": 0.21510393917560577,
332
+ "eval_precision": 0.835820895522388,
333
+ "eval_recall": 0.7433628318584071,
334
+ "eval_runtime": 0.9737,
335
+ "eval_samples_per_second": 116.048,
336
+ "eval_steps_per_second": 15.405,
337
+ "step": 400
338
+ },
339
+ {
340
+ "epoch": 6.40625,
341
+ "grad_norm": 0.5103662610054016,
342
+ "learning_rate": 0.00017476851851851852,
343
+ "loss": 0.2084,
344
+ "step": 410
345
+ },
346
+ {
347
+ "epoch": 6.5625,
348
+ "grad_norm": 1.2655210494995117,
349
+ "learning_rate": 0.00017361111111111112,
350
+ "loss": 0.2005,
351
+ "step": 420
352
+ },
353
+ {
354
+ "epoch": 6.71875,
355
+ "grad_norm": 0.5232699513435364,
356
+ "learning_rate": 0.00017245370370370372,
357
+ "loss": 0.2296,
358
+ "step": 430
359
+ },
360
+ {
361
+ "epoch": 6.875,
362
+ "grad_norm": 0.8142613172531128,
363
+ "learning_rate": 0.00017129629629629632,
364
+ "loss": 0.187,
365
+ "step": 440
366
+ },
367
+ {
368
+ "epoch": 7.03125,
369
+ "grad_norm": 0.9919219017028809,
370
+ "learning_rate": 0.0001701388888888889,
371
+ "loss": 0.2027,
372
+ "step": 450
373
+ },
374
+ {
375
+ "epoch": 7.1875,
376
+ "grad_norm": 1.2590153217315674,
377
+ "learning_rate": 0.0001689814814814815,
378
+ "loss": 0.1873,
379
+ "step": 460
380
+ },
381
+ {
382
+ "epoch": 7.34375,
383
+ "grad_norm": 0.6513100266456604,
384
+ "learning_rate": 0.0001678240740740741,
385
+ "loss": 0.1939,
386
+ "step": 470
387
+ },
388
+ {
389
+ "epoch": 7.5,
390
+ "grad_norm": 1.0872722864151,
391
+ "learning_rate": 0.0001666666666666667,
392
+ "loss": 0.2156,
393
+ "step": 480
394
+ },
395
+ {
396
+ "epoch": 7.65625,
397
+ "grad_norm": 0.3712750971317291,
398
+ "learning_rate": 0.00016550925925925926,
399
+ "loss": 0.1968,
400
+ "step": 490
401
+ },
402
+ {
403
+ "epoch": 7.8125,
404
+ "grad_norm": 0.8672028183937073,
405
+ "learning_rate": 0.00016435185185185186,
406
+ "loss": 0.1613,
407
+ "step": 500
408
+ },
409
+ {
410
+ "epoch": 7.8125,
411
+ "eval_accuracy": 0.9292035398230089,
412
+ "eval_f1": 0.813953488372093,
413
+ "eval_loss": 0.20068036019802094,
414
+ "eval_precision": 0.8578431372549019,
415
+ "eval_recall": 0.7743362831858407,
416
+ "eval_runtime": 0.9681,
417
+ "eval_samples_per_second": 116.722,
418
+ "eval_steps_per_second": 15.494,
419
+ "step": 500
420
+ },
421
+ {
422
+ "epoch": 7.96875,
423
+ "grad_norm": 1.2281358242034912,
424
+ "learning_rate": 0.00016319444444444446,
425
+ "loss": 0.1864,
426
+ "step": 510
427
+ },
428
+ {
429
+ "epoch": 8.125,
430
+ "grad_norm": 1.1462125778198242,
431
+ "learning_rate": 0.00016203703703703706,
432
+ "loss": 0.1743,
433
+ "step": 520
434
+ },
435
+ {
436
+ "epoch": 8.28125,
437
+ "grad_norm": 0.5552182197570801,
438
+ "learning_rate": 0.00016087962962962963,
439
+ "loss": 0.1963,
440
+ "step": 530
441
+ },
442
+ {
443
+ "epoch": 8.4375,
444
+ "grad_norm": 0.8015744686126709,
445
+ "learning_rate": 0.00015972222222222223,
446
+ "loss": 0.1987,
447
+ "step": 540
448
+ },
449
+ {
450
+ "epoch": 8.59375,
451
+ "grad_norm": 0.8516111969947815,
452
+ "learning_rate": 0.00015856481481481483,
453
+ "loss": 0.1753,
454
+ "step": 550
455
+ },
456
+ {
457
+ "epoch": 8.75,
458
+ "grad_norm": 1.3942711353302002,
459
+ "learning_rate": 0.00015740740740740743,
460
+ "loss": 0.1577,
461
+ "step": 560
462
+ },
463
+ {
464
+ "epoch": 8.90625,
465
+ "grad_norm": 0.812676191329956,
466
+ "learning_rate": 0.00015625,
467
+ "loss": 0.1726,
468
+ "step": 570
469
+ },
470
+ {
471
+ "epoch": 9.0625,
472
+ "grad_norm": 0.567040205001831,
473
+ "learning_rate": 0.0001550925925925926,
474
+ "loss": 0.1665,
475
+ "step": 580
476
+ },
477
+ {
478
+ "epoch": 9.21875,
479
+ "grad_norm": 0.7389497756958008,
480
+ "learning_rate": 0.0001539351851851852,
481
+ "loss": 0.1445,
482
+ "step": 590
483
+ },
484
+ {
485
+ "epoch": 9.375,
486
+ "grad_norm": 0.6939622163772583,
487
+ "learning_rate": 0.00015277777777777777,
488
+ "loss": 0.1655,
489
+ "step": 600
490
+ },
491
+ {
492
+ "epoch": 9.375,
493
+ "eval_accuracy": 0.9309734513274336,
494
+ "eval_f1": 0.8227272727272726,
495
+ "eval_loss": 0.1935483068227768,
496
+ "eval_precision": 0.8457943925233645,
497
+ "eval_recall": 0.8008849557522124,
498
+ "eval_runtime": 1.422,
499
+ "eval_samples_per_second": 79.467,
500
+ "eval_steps_per_second": 10.549,
501
+ "step": 600
502
+ },
503
+ {
504
+ "epoch": 9.53125,
505
+ "grad_norm": 0.6073923110961914,
506
+ "learning_rate": 0.00015162037037037037,
507
+ "loss": 0.159,
508
+ "step": 610
509
+ },
510
+ {
511
+ "epoch": 9.6875,
512
+ "grad_norm": 0.8762220740318298,
513
+ "learning_rate": 0.00015046296296296297,
514
+ "loss": 0.1959,
515
+ "step": 620
516
+ },
517
+ {
518
+ "epoch": 9.84375,
519
+ "grad_norm": 0.7490831017494202,
520
+ "learning_rate": 0.00014930555555555557,
521
+ "loss": 0.1465,
522
+ "step": 630
523
+ },
524
+ {
525
+ "epoch": 10.0,
526
+ "grad_norm": 1.0123506784439087,
527
+ "learning_rate": 0.00014814814814814815,
528
+ "loss": 0.1683,
529
+ "step": 640
530
+ },
531
+ {
532
+ "epoch": 10.15625,
533
+ "grad_norm": 0.5325204133987427,
534
+ "learning_rate": 0.00014699074074074075,
535
+ "loss": 0.1636,
536
+ "step": 650
537
+ },
538
+ {
539
+ "epoch": 10.3125,
540
+ "grad_norm": 0.5814504623413086,
541
+ "learning_rate": 0.00014583333333333335,
542
+ "loss": 0.1729,
543
+ "step": 660
544
+ },
545
+ {
546
+ "epoch": 10.46875,
547
+ "grad_norm": 1.0156935453414917,
548
+ "learning_rate": 0.00014467592592592594,
549
+ "loss": 0.1569,
550
+ "step": 670
551
+ },
552
+ {
553
+ "epoch": 10.625,
554
+ "grad_norm": 1.257921576499939,
555
+ "learning_rate": 0.00014351851851851852,
556
+ "loss": 0.1429,
557
+ "step": 680
558
+ },
559
+ {
560
+ "epoch": 10.78125,
561
+ "grad_norm": 0.929108202457428,
562
+ "learning_rate": 0.00014236111111111112,
563
+ "loss": 0.1554,
564
+ "step": 690
565
+ },
566
+ {
567
+ "epoch": 10.9375,
568
+ "grad_norm": 0.5256894826889038,
569
+ "learning_rate": 0.00014120370370370372,
570
+ "loss": 0.1815,
571
+ "step": 700
572
+ },
573
+ {
574
+ "epoch": 10.9375,
575
+ "eval_accuracy": 0.9265486725663716,
576
+ "eval_f1": 0.8074245939675174,
577
+ "eval_loss": 0.18833249807357788,
578
+ "eval_precision": 0.848780487804878,
579
+ "eval_recall": 0.7699115044247787,
580
+ "eval_runtime": 0.9484,
581
+ "eval_samples_per_second": 119.151,
582
+ "eval_steps_per_second": 15.817,
583
+ "step": 700
584
+ },
585
+ {
586
+ "epoch": 11.09375,
587
+ "grad_norm": 1.1333953142166138,
588
+ "learning_rate": 0.00014004629629629632,
589
+ "loss": 0.1703,
590
+ "step": 710
591
+ },
592
+ {
593
+ "epoch": 11.25,
594
+ "grad_norm": 0.6658828854560852,
595
+ "learning_rate": 0.0001388888888888889,
596
+ "loss": 0.1475,
597
+ "step": 720
598
+ },
599
+ {
600
+ "epoch": 11.40625,
601
+ "grad_norm": 1.04364812374115,
602
+ "learning_rate": 0.0001377314814814815,
603
+ "loss": 0.1598,
604
+ "step": 730
605
+ },
606
+ {
607
+ "epoch": 11.5625,
608
+ "grad_norm": 0.8811527490615845,
609
+ "learning_rate": 0.0001365740740740741,
610
+ "loss": 0.1678,
611
+ "step": 740
612
+ },
613
+ {
614
+ "epoch": 11.71875,
615
+ "grad_norm": 0.8651083111763,
616
+ "learning_rate": 0.0001354166666666667,
617
+ "loss": 0.1681,
618
+ "step": 750
619
+ },
620
+ {
621
+ "epoch": 11.875,
622
+ "grad_norm": 0.833223283290863,
623
+ "learning_rate": 0.00013425925925925926,
624
+ "loss": 0.1616,
625
+ "step": 760
626
+ },
627
+ {
628
+ "epoch": 12.03125,
629
+ "grad_norm": 0.5667290687561035,
630
+ "learning_rate": 0.00013310185185185186,
631
+ "loss": 0.1185,
632
+ "step": 770
633
+ },
634
+ {
635
+ "epoch": 12.1875,
636
+ "grad_norm": 1.3427128791809082,
637
+ "learning_rate": 0.00013194444444444446,
638
+ "loss": 0.1442,
639
+ "step": 780
640
+ },
641
+ {
642
+ "epoch": 12.34375,
643
+ "grad_norm": 0.859018087387085,
644
+ "learning_rate": 0.00013078703703703706,
645
+ "loss": 0.1552,
646
+ "step": 790
647
+ },
648
+ {
649
+ "epoch": 12.5,
650
+ "grad_norm": 0.6311579942703247,
651
+ "learning_rate": 0.00012962962962962963,
652
+ "loss": 0.1316,
653
+ "step": 800
654
+ },
655
+ {
656
+ "epoch": 12.5,
657
+ "eval_accuracy": 0.9327433628318584,
658
+ "eval_f1": 0.8272727272727272,
659
+ "eval_loss": 0.18246687948703766,
660
+ "eval_precision": 0.8504672897196262,
661
+ "eval_recall": 0.8053097345132744,
662
+ "eval_runtime": 0.9594,
663
+ "eval_samples_per_second": 117.786,
664
+ "eval_steps_per_second": 15.635,
665
+ "step": 800
666
+ },
667
+ {
668
+ "epoch": 12.65625,
669
+ "grad_norm": 0.8464061617851257,
670
+ "learning_rate": 0.00012847222222222223,
671
+ "loss": 0.1344,
672
+ "step": 810
673
+ },
674
+ {
675
+ "epoch": 12.8125,
676
+ "grad_norm": 0.6711329221725464,
677
+ "learning_rate": 0.00012731481481481483,
678
+ "loss": 0.1602,
679
+ "step": 820
680
+ },
681
+ {
682
+ "epoch": 12.96875,
683
+ "grad_norm": 1.0340158939361572,
684
+ "learning_rate": 0.00012615740740740743,
685
+ "loss": 0.1483,
686
+ "step": 830
687
+ },
688
+ {
689
+ "epoch": 13.125,
690
+ "grad_norm": 0.711726725101471,
691
+ "learning_rate": 0.000125,
692
+ "loss": 0.1507,
693
+ "step": 840
694
+ },
695
+ {
696
+ "epoch": 13.28125,
697
+ "grad_norm": 0.8784794211387634,
698
+ "learning_rate": 0.00012384259259259258,
699
+ "loss": 0.1515,
700
+ "step": 850
701
+ },
702
+ {
703
+ "epoch": 13.4375,
704
+ "grad_norm": 0.9908888339996338,
705
+ "learning_rate": 0.0001226851851851852,
706
+ "loss": 0.1583,
707
+ "step": 860
708
+ },
709
+ {
710
+ "epoch": 13.59375,
711
+ "grad_norm": 0.5473937392234802,
712
+ "learning_rate": 0.00012152777777777777,
713
+ "loss": 0.1433,
714
+ "step": 870
715
+ },
716
+ {
717
+ "epoch": 13.75,
718
+ "grad_norm": 1.6888905763626099,
719
+ "learning_rate": 0.00012037037037037037,
720
+ "loss": 0.1371,
721
+ "step": 880
722
+ },
723
+ {
724
+ "epoch": 13.90625,
725
+ "grad_norm": 1.0640438795089722,
726
+ "learning_rate": 0.00011921296296296296,
727
+ "loss": 0.1376,
728
+ "step": 890
729
+ },
730
+ {
731
+ "epoch": 14.0625,
732
+ "grad_norm": 1.9941257238388062,
733
+ "learning_rate": 0.00011805555555555556,
734
+ "loss": 0.1612,
735
+ "step": 900
736
+ },
737
+ {
738
+ "epoch": 14.0625,
739
+ "eval_accuracy": 0.9256637168141593,
740
+ "eval_f1": 0.8099547511312217,
741
+ "eval_loss": 0.18371373414993286,
742
+ "eval_precision": 0.8287037037037037,
743
+ "eval_recall": 0.7920353982300885,
744
+ "eval_runtime": 1.4255,
745
+ "eval_samples_per_second": 79.27,
746
+ "eval_steps_per_second": 10.523,
747
+ "step": 900
748
+ },
749
+ {
750
+ "epoch": 14.21875,
751
+ "grad_norm": 0.650867760181427,
752
+ "learning_rate": 0.00011689814814814815,
753
+ "loss": 0.1468,
754
+ "step": 910
755
+ },
756
+ {
757
+ "epoch": 14.375,
758
+ "grad_norm": 0.7459059357643127,
759
+ "learning_rate": 0.00011574074074074075,
760
+ "loss": 0.1468,
761
+ "step": 920
762
+ },
763
+ {
764
+ "epoch": 14.53125,
765
+ "grad_norm": 0.7468872666358948,
766
+ "learning_rate": 0.00011458333333333333,
767
+ "loss": 0.1169,
768
+ "step": 930
769
+ },
770
+ {
771
+ "epoch": 14.6875,
772
+ "grad_norm": 0.6512945890426636,
773
+ "learning_rate": 0.00011342592592592593,
774
+ "loss": 0.1373,
775
+ "step": 940
776
+ },
777
+ {
778
+ "epoch": 14.84375,
779
+ "grad_norm": 0.710382878780365,
780
+ "learning_rate": 0.00011226851851851852,
781
+ "loss": 0.1223,
782
+ "step": 950
783
+ },
784
+ {
785
+ "epoch": 15.0,
786
+ "grad_norm": 1.2112369537353516,
787
+ "learning_rate": 0.00011111111111111112,
788
+ "loss": 0.1257,
789
+ "step": 960
790
+ },
791
+ {
792
+ "epoch": 15.15625,
793
+ "grad_norm": 4.069777965545654,
794
+ "learning_rate": 0.0001099537037037037,
795
+ "loss": 0.1369,
796
+ "step": 970
797
+ },
798
+ {
799
+ "epoch": 15.3125,
800
+ "grad_norm": 0.9751072525978088,
801
+ "learning_rate": 0.0001087962962962963,
802
+ "loss": 0.1316,
803
+ "step": 980
804
+ },
805
+ {
806
+ "epoch": 15.46875,
807
+ "grad_norm": 0.49943211674690247,
808
+ "learning_rate": 0.00010763888888888889,
809
+ "loss": 0.1088,
810
+ "step": 990
811
+ },
812
+ {
813
+ "epoch": 15.625,
814
+ "grad_norm": 0.7845533490180969,
815
+ "learning_rate": 0.00010648148148148149,
816
+ "loss": 0.118,
817
+ "step": 1000
818
+ },
819
+ {
820
+ "epoch": 15.625,
821
+ "eval_accuracy": 0.9309734513274336,
822
+ "eval_f1": 0.8227272727272726,
823
+ "eval_loss": 0.18961240351200104,
824
+ "eval_precision": 0.8457943925233645,
825
+ "eval_recall": 0.8008849557522124,
826
+ "eval_runtime": 0.9541,
827
+ "eval_samples_per_second": 118.435,
828
+ "eval_steps_per_second": 15.721,
829
+ "step": 1000
830
+ },
831
+ {
832
+ "epoch": 15.78125,
833
+ "grad_norm": 0.5193383693695068,
834
+ "learning_rate": 0.00010532407407407407,
835
+ "loss": 0.1233,
836
+ "step": 1010
837
+ },
838
+ {
839
+ "epoch": 15.9375,
840
+ "grad_norm": 0.5976629257202148,
841
+ "learning_rate": 0.00010416666666666667,
842
+ "loss": 0.1351,
843
+ "step": 1020
844
+ },
845
+ {
846
+ "epoch": 16.09375,
847
+ "grad_norm": 1.0629384517669678,
848
+ "learning_rate": 0.00010300925925925926,
849
+ "loss": 0.1597,
850
+ "step": 1030
851
+ },
852
+ {
853
+ "epoch": 16.25,
854
+ "grad_norm": 0.8576996326446533,
855
+ "learning_rate": 0.00010185185185185186,
856
+ "loss": 0.1268,
857
+ "step": 1040
858
+ },
859
+ {
860
+ "epoch": 16.40625,
861
+ "grad_norm": 0.7236841917037964,
862
+ "learning_rate": 0.00010069444444444445,
863
+ "loss": 0.1411,
864
+ "step": 1050
865
+ },
866
+ {
867
+ "epoch": 16.5625,
868
+ "grad_norm": 1.1142785549163818,
869
+ "learning_rate": 9.953703703703704e-05,
870
+ "loss": 0.1297,
871
+ "step": 1060
872
+ },
873
+ {
874
+ "epoch": 16.71875,
875
+ "grad_norm": 0.8304411768913269,
876
+ "learning_rate": 9.837962962962963e-05,
877
+ "loss": 0.1231,
878
+ "step": 1070
879
+ },
880
+ {
881
+ "epoch": 16.875,
882
+ "grad_norm": 0.8226402997970581,
883
+ "learning_rate": 9.722222222222223e-05,
884
+ "loss": 0.1399,
885
+ "step": 1080
886
+ },
887
+ {
888
+ "epoch": 17.03125,
889
+ "grad_norm": 0.6692397594451904,
890
+ "learning_rate": 9.606481481481482e-05,
891
+ "loss": 0.1833,
892
+ "step": 1090
893
+ },
894
+ {
895
+ "epoch": 17.1875,
896
+ "grad_norm": 0.6689762473106384,
897
+ "learning_rate": 9.490740740740742e-05,
898
+ "loss": 0.1178,
899
+ "step": 1100
900
+ },
901
+ {
902
+ "epoch": 17.1875,
903
+ "eval_accuracy": 0.9238938053097345,
904
+ "eval_f1": 0.8027522935779817,
905
+ "eval_loss": 0.19371576607227325,
906
+ "eval_precision": 0.8333333333333334,
907
+ "eval_recall": 0.7743362831858407,
908
+ "eval_runtime": 0.9499,
909
+ "eval_samples_per_second": 118.958,
910
+ "eval_steps_per_second": 15.791,
911
+ "step": 1100
912
+ },
913
+ {
914
+ "epoch": 17.34375,
915
+ "grad_norm": 0.6079881191253662,
916
+ "learning_rate": 9.375e-05,
917
+ "loss": 0.1259,
918
+ "step": 1110
919
+ },
920
+ {
921
+ "epoch": 17.5,
922
+ "grad_norm": 0.37670084834098816,
923
+ "learning_rate": 9.25925925925926e-05,
924
+ "loss": 0.1101,
925
+ "step": 1120
926
+ },
927
+ {
928
+ "epoch": 17.65625,
929
+ "grad_norm": 0.7734571695327759,
930
+ "learning_rate": 9.143518518518519e-05,
931
+ "loss": 0.1279,
932
+ "step": 1130
933
+ },
934
+ {
935
+ "epoch": 17.8125,
936
+ "grad_norm": 1.0208630561828613,
937
+ "learning_rate": 9.027777777777779e-05,
938
+ "loss": 0.1258,
939
+ "step": 1140
940
+ },
941
+ {
942
+ "epoch": 17.96875,
943
+ "grad_norm": 0.5698951482772827,
944
+ "learning_rate": 8.912037037037037e-05,
945
+ "loss": 0.1111,
946
+ "step": 1150
947
+ },
948
+ {
949
+ "epoch": 18.125,
950
+ "grad_norm": 1.77188241481781,
951
+ "learning_rate": 8.796296296296297e-05,
952
+ "loss": 0.1254,
953
+ "step": 1160
954
+ },
955
+ {
956
+ "epoch": 18.28125,
957
+ "grad_norm": 0.8389852643013,
958
+ "learning_rate": 8.680555555555556e-05,
959
+ "loss": 0.1042,
960
+ "step": 1170
961
+ },
962
+ {
963
+ "epoch": 18.4375,
964
+ "grad_norm": 0.6655524969100952,
965
+ "learning_rate": 8.564814814814816e-05,
966
+ "loss": 0.1272,
967
+ "step": 1180
968
+ },
969
+ {
970
+ "epoch": 18.59375,
971
+ "grad_norm": 0.4668845236301422,
972
+ "learning_rate": 8.449074074074074e-05,
973
+ "loss": 0.1096,
974
+ "step": 1190
975
+ },
976
+ {
977
+ "epoch": 18.75,
978
+ "grad_norm": 0.8379706740379333,
979
+ "learning_rate": 8.333333333333334e-05,
980
+ "loss": 0.1248,
981
+ "step": 1200
982
+ },
983
+ {
984
+ "epoch": 18.75,
985
+ "eval_accuracy": 0.9300884955752212,
986
+ "eval_f1": 0.8192219679633868,
987
+ "eval_loss": 0.19132623076438904,
988
+ "eval_precision": 0.8483412322274881,
989
+ "eval_recall": 0.7920353982300885,
990
+ "eval_runtime": 1.2549,
991
+ "eval_samples_per_second": 90.047,
992
+ "eval_steps_per_second": 11.953,
993
+ "step": 1200
994
+ },
995
+ {
996
+ "epoch": 18.90625,
997
+ "grad_norm": 0.9271652698516846,
998
+ "learning_rate": 8.217592592592593e-05,
999
+ "loss": 0.1126,
1000
+ "step": 1210
1001
+ },
1002
+ {
1003
+ "epoch": 19.0625,
1004
+ "grad_norm": 1.1356163024902344,
1005
+ "learning_rate": 8.113425925925926e-05,
1006
+ "loss": 0.129,
1007
+ "step": 1220
1008
+ },
1009
+ {
1010
+ "epoch": 19.21875,
1011
+ "grad_norm": 0.4993898570537567,
1012
+ "learning_rate": 7.997685185185186e-05,
1013
+ "loss": 0.1385,
1014
+ "step": 1230
1015
+ },
1016
+ {
1017
+ "epoch": 19.375,
1018
+ "grad_norm": 1.2999491691589355,
1019
+ "learning_rate": 7.881944444444444e-05,
1020
+ "loss": 0.1242,
1021
+ "step": 1240
1022
+ },
1023
+ {
1024
+ "epoch": 19.53125,
1025
+ "grad_norm": 1.1871651411056519,
1026
+ "learning_rate": 7.766203703703704e-05,
1027
+ "loss": 0.113,
1028
+ "step": 1250
1029
+ },
1030
+ {
1031
+ "epoch": 19.6875,
1032
+ "grad_norm": 1.5129660367965698,
1033
+ "learning_rate": 7.650462962962963e-05,
1034
+ "loss": 0.1024,
1035
+ "step": 1260
1036
+ },
1037
+ {
1038
+ "epoch": 19.84375,
1039
+ "grad_norm": 0.7286781072616577,
1040
+ "learning_rate": 7.534722222222223e-05,
1041
+ "loss": 0.0994,
1042
+ "step": 1270
1043
+ },
1044
+ {
1045
+ "epoch": 20.0,
1046
+ "grad_norm": 1.0448476076126099,
1047
+ "learning_rate": 7.418981481481481e-05,
1048
+ "loss": 0.1115,
1049
+ "step": 1280
1050
+ },
1051
+ {
1052
+ "epoch": 20.15625,
1053
+ "grad_norm": 0.6149379014968872,
1054
+ "learning_rate": 7.303240740740741e-05,
1055
+ "loss": 0.1243,
1056
+ "step": 1290
1057
+ },
1058
+ {
1059
+ "epoch": 20.3125,
1060
+ "grad_norm": 0.7020682692527771,
1061
+ "learning_rate": 7.1875e-05,
1062
+ "loss": 0.1169,
1063
+ "step": 1300
1064
+ },
1065
+ {
1066
+ "epoch": 20.3125,
1067
+ "eval_accuracy": 0.9300884955752212,
1068
+ "eval_f1": 0.8167053364269141,
1069
+ "eval_loss": 0.19162432849407196,
1070
+ "eval_precision": 0.8585365853658536,
1071
+ "eval_recall": 0.7787610619469026,
1072
+ "eval_runtime": 0.958,
1073
+ "eval_samples_per_second": 117.951,
1074
+ "eval_steps_per_second": 15.657,
1075
+ "step": 1300
1076
+ },
1077
+ {
1078
+ "epoch": 20.46875,
1079
+ "grad_norm": 1.8554919958114624,
1080
+ "learning_rate": 7.07175925925926e-05,
1081
+ "loss": 0.1175,
1082
+ "step": 1310
1083
+ },
1084
+ {
1085
+ "epoch": 20.625,
1086
+ "grad_norm": 1.1227444410324097,
1087
+ "learning_rate": 6.956018518518518e-05,
1088
+ "loss": 0.1004,
1089
+ "step": 1320
1090
+ },
1091
+ {
1092
+ "epoch": 20.78125,
1093
+ "grad_norm": 1.376546025276184,
1094
+ "learning_rate": 6.840277777777778e-05,
1095
+ "loss": 0.1099,
1096
+ "step": 1330
1097
+ },
1098
+ {
1099
+ "epoch": 20.9375,
1100
+ "grad_norm": 0.86075758934021,
1101
+ "learning_rate": 6.724537037037037e-05,
1102
+ "loss": 0.133,
1103
+ "step": 1340
1104
+ },
1105
+ {
1106
+ "epoch": 21.09375,
1107
+ "grad_norm": 1.093257188796997,
1108
+ "learning_rate": 6.608796296296297e-05,
1109
+ "loss": 0.1143,
1110
+ "step": 1350
1111
+ },
1112
+ {
1113
+ "epoch": 21.25,
1114
+ "grad_norm": 0.5665271282196045,
1115
+ "learning_rate": 6.493055555555556e-05,
1116
+ "loss": 0.1065,
1117
+ "step": 1360
1118
+ },
1119
+ {
1120
+ "epoch": 21.40625,
1121
+ "grad_norm": 0.607912003993988,
1122
+ "learning_rate": 6.377314814814816e-05,
1123
+ "loss": 0.1221,
1124
+ "step": 1370
1125
+ },
1126
+ {
1127
+ "epoch": 21.5625,
1128
+ "grad_norm": 0.4708748161792755,
1129
+ "learning_rate": 6.261574074074074e-05,
1130
+ "loss": 0.0941,
1131
+ "step": 1380
1132
+ },
1133
+ {
1134
+ "epoch": 21.71875,
1135
+ "grad_norm": 0.8719390630722046,
1136
+ "learning_rate": 6.145833333333334e-05,
1137
+ "loss": 0.112,
1138
+ "step": 1390
1139
+ },
1140
+ {
1141
+ "epoch": 21.875,
1142
+ "grad_norm": 0.45299583673477173,
1143
+ "learning_rate": 6.0300925925925934e-05,
1144
+ "loss": 0.1094,
1145
+ "step": 1400
1146
+ },
1147
+ {
1148
+ "epoch": 21.875,
1149
+ "eval_accuracy": 0.9292035398230089,
1150
+ "eval_f1": 0.8181818181818182,
1151
+ "eval_loss": 0.19246041774749756,
1152
+ "eval_precision": 0.8411214953271028,
1153
+ "eval_recall": 0.7964601769911505,
1154
+ "eval_runtime": 1.2007,
1155
+ "eval_samples_per_second": 94.111,
1156
+ "eval_steps_per_second": 12.493,
1157
+ "step": 1400
1158
+ },
1159
+ {
1160
+ "epoch": 22.03125,
1161
+ "grad_norm": 0.7893074154853821,
1162
+ "learning_rate": 5.9143518518518527e-05,
1163
+ "loss": 0.1142,
1164
+ "step": 1410
1165
+ },
1166
+ {
1167
+ "epoch": 22.1875,
1168
+ "grad_norm": 1.0832182168960571,
1169
+ "learning_rate": 5.798611111111112e-05,
1170
+ "loss": 0.096,
1171
+ "step": 1420
1172
+ },
1173
+ {
1174
+ "epoch": 22.34375,
1175
+ "grad_norm": 0.5880953669548035,
1176
+ "learning_rate": 5.682870370370371e-05,
1177
+ "loss": 0.1142,
1178
+ "step": 1430
1179
+ },
1180
+ {
1181
+ "epoch": 22.5,
1182
+ "grad_norm": 0.6071570515632629,
1183
+ "learning_rate": 5.567129629629629e-05,
1184
+ "loss": 0.1146,
1185
+ "step": 1440
1186
+ },
1187
+ {
1188
+ "epoch": 22.65625,
1189
+ "grad_norm": 1.7968199253082275,
1190
+ "learning_rate": 5.4513888888888884e-05,
1191
+ "loss": 0.1141,
1192
+ "step": 1450
1193
+ },
1194
+ {
1195
+ "epoch": 22.8125,
1196
+ "grad_norm": 0.6409327983856201,
1197
+ "learning_rate": 5.335648148148148e-05,
1198
+ "loss": 0.0907,
1199
+ "step": 1460
1200
+ },
1201
+ {
1202
+ "epoch": 22.96875,
1203
+ "grad_norm": 1.28959321975708,
1204
+ "learning_rate": 5.219907407407407e-05,
1205
+ "loss": 0.0945,
1206
+ "step": 1470
1207
+ },
1208
+ {
1209
+ "epoch": 23.125,
1210
+ "grad_norm": 1.0384379625320435,
1211
+ "learning_rate": 5.115740740740741e-05,
1212
+ "loss": 0.1062,
1213
+ "step": 1480
1214
+ },
1215
+ {
1216
+ "epoch": 23.28125,
1217
+ "grad_norm": 0.8010191917419434,
1218
+ "learning_rate": 5e-05,
1219
+ "loss": 0.1029,
1220
+ "step": 1490
1221
+ },
1222
+ {
1223
+ "epoch": 23.4375,
1224
+ "grad_norm": 1.078620195388794,
1225
+ "learning_rate": 4.8842592592592595e-05,
1226
+ "loss": 0.1108,
1227
+ "step": 1500
1228
+ },
1229
+ {
1230
+ "epoch": 23.4375,
1231
+ "eval_accuracy": 0.9345132743362832,
1232
+ "eval_f1": 0.8333333333333334,
1233
+ "eval_loss": 0.19605357944965363,
1234
+ "eval_precision": 0.8486238532110092,
1235
+ "eval_recall": 0.8185840707964602,
1236
+ "eval_runtime": 0.9739,
1237
+ "eval_samples_per_second": 116.033,
1238
+ "eval_steps_per_second": 15.403,
1239
+ "step": 1500
1240
+ },
1241
+ {
1242
+ "epoch": 23.59375,
1243
+ "grad_norm": 0.9104486703872681,
1244
+ "learning_rate": 4.768518518518519e-05,
1245
+ "loss": 0.1039,
1246
+ "step": 1510
1247
+ },
1248
+ {
1249
+ "epoch": 23.75,
1250
+ "grad_norm": 1.1187772750854492,
1251
+ "learning_rate": 4.652777777777778e-05,
1252
+ "loss": 0.1179,
1253
+ "step": 1520
1254
+ },
1255
+ {
1256
+ "epoch": 23.90625,
1257
+ "grad_norm": 0.6038283109664917,
1258
+ "learning_rate": 4.5370370370370374e-05,
1259
+ "loss": 0.0981,
1260
+ "step": 1530
1261
+ },
1262
+ {
1263
+ "epoch": 24.0625,
1264
+ "grad_norm": 1.483780860900879,
1265
+ "learning_rate": 4.4212962962962966e-05,
1266
+ "loss": 0.1287,
1267
+ "step": 1540
1268
+ },
1269
+ {
1270
+ "epoch": 24.21875,
1271
+ "grad_norm": 0.9008955359458923,
1272
+ "learning_rate": 4.305555555555556e-05,
1273
+ "loss": 0.1038,
1274
+ "step": 1550
1275
+ },
1276
+ {
1277
+ "epoch": 24.375,
1278
+ "grad_norm": 1.3843752145767212,
1279
+ "learning_rate": 4.1898148148148145e-05,
1280
+ "loss": 0.1177,
1281
+ "step": 1560
1282
+ },
1283
+ {
1284
+ "epoch": 24.53125,
1285
+ "grad_norm": 1.3227291107177734,
1286
+ "learning_rate": 4.074074074074074e-05,
1287
+ "loss": 0.0992,
1288
+ "step": 1570
1289
+ },
1290
+ {
1291
+ "epoch": 24.6875,
1292
+ "grad_norm": 0.4530428349971771,
1293
+ "learning_rate": 3.958333333333333e-05,
1294
+ "loss": 0.1064,
1295
+ "step": 1580
1296
+ },
1297
+ {
1298
+ "epoch": 24.84375,
1299
+ "grad_norm": 1.476251244544983,
1300
+ "learning_rate": 3.8425925925925924e-05,
1301
+ "loss": 0.1008,
1302
+ "step": 1590
1303
+ },
1304
+ {
1305
+ "epoch": 25.0,
1306
+ "grad_norm": 1.4163908958435059,
1307
+ "learning_rate": 3.726851851851852e-05,
1308
+ "loss": 0.1089,
1309
+ "step": 1600
1310
+ },
1311
+ {
1312
+ "epoch": 25.0,
1313
+ "eval_accuracy": 0.9283185840707965,
1314
+ "eval_f1": 0.8171557562076749,
1315
+ "eval_loss": 0.1992984265089035,
1316
+ "eval_precision": 0.8341013824884793,
1317
+ "eval_recall": 0.8008849557522124,
1318
+ "eval_runtime": 0.9786,
1319
+ "eval_samples_per_second": 115.465,
1320
+ "eval_steps_per_second": 15.327,
1321
+ "step": 1600
1322
+ },
1323
+ {
1324
+ "epoch": 25.15625,
1325
+ "grad_norm": 0.805210292339325,
1326
+ "learning_rate": 3.611111111111111e-05,
1327
+ "loss": 0.0922,
1328
+ "step": 1610
1329
+ },
1330
+ {
1331
+ "epoch": 25.3125,
1332
+ "grad_norm": 0.7946519255638123,
1333
+ "learning_rate": 3.49537037037037e-05,
1334
+ "loss": 0.1033,
1335
+ "step": 1620
1336
+ },
1337
+ {
1338
+ "epoch": 25.46875,
1339
+ "grad_norm": 0.7051573991775513,
1340
+ "learning_rate": 3.3796296296296295e-05,
1341
+ "loss": 0.1031,
1342
+ "step": 1630
1343
+ },
1344
+ {
1345
+ "epoch": 25.625,
1346
+ "grad_norm": 0.6867948174476624,
1347
+ "learning_rate": 3.263888888888889e-05,
1348
+ "loss": 0.1203,
1349
+ "step": 1640
1350
+ },
1351
+ {
1352
+ "epoch": 25.78125,
1353
+ "grad_norm": 0.9575832486152649,
1354
+ "learning_rate": 3.148148148148148e-05,
1355
+ "loss": 0.101,
1356
+ "step": 1650
1357
+ },
1358
+ {
1359
+ "epoch": 25.9375,
1360
+ "grad_norm": 1.125503420829773,
1361
+ "learning_rate": 3.0324074074074077e-05,
1362
+ "loss": 0.0868,
1363
+ "step": 1660
1364
+ },
1365
+ {
1366
+ "epoch": 26.09375,
1367
+ "grad_norm": 0.694492757320404,
1368
+ "learning_rate": 2.916666666666667e-05,
1369
+ "loss": 0.0916,
1370
+ "step": 1670
1371
+ },
1372
+ {
1373
+ "epoch": 26.25,
1374
+ "grad_norm": 0.606955885887146,
1375
+ "learning_rate": 2.8009259259259263e-05,
1376
+ "loss": 0.0978,
1377
+ "step": 1680
1378
+ },
1379
+ {
1380
+ "epoch": 26.40625,
1381
+ "grad_norm": 0.855603814125061,
1382
+ "learning_rate": 2.6851851851851855e-05,
1383
+ "loss": 0.0988,
1384
+ "step": 1690
1385
+ },
1386
+ {
1387
+ "epoch": 26.5625,
1388
+ "grad_norm": 0.6119447946548462,
1389
+ "learning_rate": 2.5694444444444445e-05,
1390
+ "loss": 0.0919,
1391
+ "step": 1700
1392
+ },
1393
+ {
1394
+ "epoch": 26.5625,
1395
+ "eval_accuracy": 0.9318584070796461,
1396
+ "eval_f1": 0.8261851015801355,
1397
+ "eval_loss": 0.19360247254371643,
1398
+ "eval_precision": 0.8433179723502304,
1399
+ "eval_recall": 0.8097345132743363,
1400
+ "eval_runtime": 1.5343,
1401
+ "eval_samples_per_second": 73.651,
1402
+ "eval_steps_per_second": 9.777,
1403
+ "step": 1700
1404
+ },
1405
+ {
1406
+ "epoch": 26.71875,
1407
+ "grad_norm": 0.9873837828636169,
1408
+ "learning_rate": 2.4537037037037038e-05,
1409
+ "loss": 0.0829,
1410
+ "step": 1710
1411
+ },
1412
+ {
1413
+ "epoch": 26.875,
1414
+ "grad_norm": 0.9287075996398926,
1415
+ "learning_rate": 2.337962962962963e-05,
1416
+ "loss": 0.1128,
1417
+ "step": 1720
1418
+ },
1419
+ {
1420
+ "epoch": 27.03125,
1421
+ "grad_norm": 0.8201906681060791,
1422
+ "learning_rate": 2.2222222222222223e-05,
1423
+ "loss": 0.1084,
1424
+ "step": 1730
1425
+ },
1426
+ {
1427
+ "epoch": 27.1875,
1428
+ "grad_norm": 1.1874263286590576,
1429
+ "learning_rate": 2.1064814814814816e-05,
1430
+ "loss": 0.1065,
1431
+ "step": 1740
1432
+ },
1433
+ {
1434
+ "epoch": 27.34375,
1435
+ "grad_norm": 1.0616997480392456,
1436
+ "learning_rate": 1.990740740740741e-05,
1437
+ "loss": 0.1014,
1438
+ "step": 1750
1439
+ },
1440
+ {
1441
+ "epoch": 27.5,
1442
+ "grad_norm": 0.8941544890403748,
1443
+ "learning_rate": 1.8750000000000002e-05,
1444
+ "loss": 0.1008,
1445
+ "step": 1760
1446
+ },
1447
+ {
1448
+ "epoch": 27.65625,
1449
+ "grad_norm": 0.9521628022193909,
1450
+ "learning_rate": 1.7592592592592595e-05,
1451
+ "loss": 0.0908,
1452
+ "step": 1770
1453
+ },
1454
+ {
1455
+ "epoch": 27.8125,
1456
+ "grad_norm": 0.79527348279953,
1457
+ "learning_rate": 1.6435185185185187e-05,
1458
+ "loss": 0.1137,
1459
+ "step": 1780
1460
+ },
1461
+ {
1462
+ "epoch": 27.96875,
1463
+ "grad_norm": 0.8606336116790771,
1464
+ "learning_rate": 1.527777777777778e-05,
1465
+ "loss": 0.0929,
1466
+ "step": 1790
1467
+ },
1468
+ {
1469
+ "epoch": 28.125,
1470
+ "grad_norm": 1.2438716888427734,
1471
+ "learning_rate": 1.412037037037037e-05,
1472
+ "loss": 0.0969,
1473
+ "step": 1800
1474
+ },
1475
+ {
1476
+ "epoch": 28.125,
1477
+ "eval_accuracy": 0.9309734513274336,
1478
+ "eval_f1": 0.8227272727272726,
1479
+ "eval_loss": 0.19780634343624115,
1480
+ "eval_precision": 0.8457943925233645,
1481
+ "eval_recall": 0.8008849557522124,
1482
+ "eval_runtime": 0.9685,
1483
+ "eval_samples_per_second": 116.67,
1484
+ "eval_steps_per_second": 15.487,
1485
+ "step": 1800
1486
+ },
1487
+ {
1488
+ "epoch": 28.28125,
1489
+ "grad_norm": 0.9101031422615051,
1490
+ "learning_rate": 1.2962962962962962e-05,
1491
+ "loss": 0.0977,
1492
+ "step": 1810
1493
+ },
1494
+ {
1495
+ "epoch": 28.4375,
1496
+ "grad_norm": 0.6852589249610901,
1497
+ "learning_rate": 1.1805555555555555e-05,
1498
+ "loss": 0.1018,
1499
+ "step": 1820
1500
+ },
1501
+ {
1502
+ "epoch": 28.59375,
1503
+ "grad_norm": 0.8925357460975647,
1504
+ "learning_rate": 1.0648148148148148e-05,
1505
+ "loss": 0.0985,
1506
+ "step": 1830
1507
+ },
1508
+ {
1509
+ "epoch": 28.75,
1510
+ "grad_norm": 0.8219801783561707,
1511
+ "learning_rate": 9.490740740740741e-06,
1512
+ "loss": 0.1014,
1513
+ "step": 1840
1514
+ },
1515
+ {
1516
+ "epoch": 28.90625,
1517
+ "grad_norm": 0.6065762639045715,
1518
+ "learning_rate": 8.333333333333334e-06,
1519
+ "loss": 0.1161,
1520
+ "step": 1850
1521
+ },
1522
+ {
1523
+ "epoch": 29.0625,
1524
+ "grad_norm": 0.7718455791473389,
1525
+ "learning_rate": 7.1759259259259266e-06,
1526
+ "loss": 0.1101,
1527
+ "step": 1860
1528
+ },
1529
+ {
1530
+ "epoch": 29.21875,
1531
+ "grad_norm": 1.0950359106063843,
1532
+ "learning_rate": 6.0185185185185185e-06,
1533
+ "loss": 0.1042,
1534
+ "step": 1870
1535
+ },
1536
+ {
1537
+ "epoch": 29.375,
1538
+ "grad_norm": 0.7298617362976074,
1539
+ "learning_rate": 4.861111111111111e-06,
1540
+ "loss": 0.0823,
1541
+ "step": 1880
1542
+ },
1543
+ {
1544
+ "epoch": 29.53125,
1545
+ "grad_norm": 0.782146692276001,
1546
+ "learning_rate": 3.7037037037037037e-06,
1547
+ "loss": 0.1051,
1548
+ "step": 1890
1549
+ },
1550
+ {
1551
+ "epoch": 29.6875,
1552
+ "grad_norm": 0.7693170309066772,
1553
+ "learning_rate": 2.546296296296296e-06,
1554
+ "loss": 0.1093,
1555
+ "step": 1900
1556
+ },
1557
+ {
1558
+ "epoch": 29.6875,
1559
+ "eval_accuracy": 0.9283185840707965,
1560
+ "eval_f1": 0.8171557562076749,
1561
+ "eval_loss": 0.19546246528625488,
1562
+ "eval_precision": 0.8341013824884793,
1563
+ "eval_recall": 0.8008849557522124,
1564
+ "eval_runtime": 1.3969,
1565
+ "eval_samples_per_second": 80.891,
1566
+ "eval_steps_per_second": 10.738,
1567
+ "step": 1900
1568
+ },
1569
+ {
1570
+ "epoch": 29.84375,
1571
+ "grad_norm": 1.2105658054351807,
1572
+ "learning_rate": 1.388888888888889e-06,
1573
+ "loss": 0.0836,
1574
+ "step": 1910
1575
+ },
1576
+ {
1577
+ "epoch": 30.0,
1578
+ "grad_norm": 2.1491291522979736,
1579
+ "learning_rate": 2.3148148148148148e-07,
1580
+ "loss": 0.1122,
1581
+ "step": 1920
1582
+ },
1583
+ {
1584
+ "epoch": 30.0,
1585
+ "step": 1920,
1586
+ "total_flos": 1.9916656541540352e+17,
1587
+ "train_loss": 0.1991663834390541,
1588
+ "train_runtime": 388.9258,
1589
+ "train_samples_per_second": 78.061,
1590
+ "train_steps_per_second": 4.937
1591
+ }
1592
+ ],
1593
+ "logging_steps": 10,
1594
+ "max_steps": 1920,
1595
+ "num_input_tokens_seen": 0,
1596
+ "num_train_epochs": 30,
1597
+ "save_steps": 500,
1598
+ "stateful_callbacks": {
1599
+ "TrainerControl": {
1600
+ "args": {
1601
+ "should_epoch_stop": false,
1602
+ "should_evaluate": false,
1603
+ "should_log": false,
1604
+ "should_save": true,
1605
+ "should_training_stop": true
1606
+ },
1607
+ "attributes": {}
1608
+ }
1609
+ },
1610
+ "total_flos": 1.9916656541540352e+17,
1611
+ "train_batch_size": 16,
1612
+ "trial_name": null,
1613
+ "trial_params": null
1614
+ }