pjox commited on
Commit
915e922
·
1 Parent(s): 091ba5c

Uploaded model

Browse files
Files changed (6) hide show
  1. dev.tsv +0 -0
  2. final-model.pt +3 -0
  3. loss.tsv +11 -0
  4. test.tsv +0 -0
  5. training.log +499 -0
  6. weights.txt +0 -0
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc3208dc8d7d34302e550643da037c4e08e941bd59cfe33ec4d4792c5d0bcb61
3
+ size 442654125
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 13:32:00 4 0.0001 0.3934547606673132 0.038586683571338654 0.759 0.8903 0.8195 0.7139
3
+ 2 14:47:41 4 0.0000 0.1352322201632861 0.015217592008411884 0.9081 0.9248 0.9164 0.8626
4
+ 3 16:03:24 4 0.0000 0.10858782178342327 0.015040190890431404 0.9266 0.9286 0.9276 0.879
5
+ 4 17:18:55 4 0.0000 0.0878958630160346 0.015710221603512764 0.9289 0.9327 0.9308 0.8838
6
+ 5 18:33:23 4 0.0000 0.07165857778550887 0.017801353707909584 0.9277 0.9361 0.9319 0.8864
7
+ 6 19:48:52 4 0.0000 0.05868402400697055 0.018429730087518692 0.9306 0.9438 0.9371 0.8922
8
+ 7 21:04:47 4 0.0000 0.049209113448846445 0.02109825611114502 0.9344 0.938 0.9362 0.8926
9
+ 8 22:20:46 4 0.0000 0.042763134030078184 0.02112417109310627 0.9347 0.9446 0.9396 0.8985
10
+ 9 23:35:14 4 0.0000 0.03838577379283954 0.02171432413160801 0.9391 0.9446 0.9419 0.9008
11
+ 10 00:49:11 4 0.0000 0.0361115163669216 0.023424603044986725 0.9389 0.9444 0.9417 0.9019
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,499 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2022-02-04 12:18:14,159 ----------------------------------------------------------------------------------------------------
2
+ 2022-02-04 12:18:14,161 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): CamembertModel(
5
+ (embeddings): RobertaEmbeddings(
6
+ (word_embeddings): Embedding(32005, 768, padding_idx=1)
7
+ (position_embeddings): Embedding(514, 768, padding_idx=1)
8
+ (token_type_embeddings): Embedding(1, 768)
9
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): RobertaEncoder(
13
+ (layer): ModuleList(
14
+ (0): RobertaLayer(
15
+ (attention): RobertaAttention(
16
+ (self): RobertaSelfAttention(
17
+ (query): Linear(in_features=768, out_features=768, bias=True)
18
+ (key): Linear(in_features=768, out_features=768, bias=True)
19
+ (value): Linear(in_features=768, out_features=768, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): RobertaSelfOutput(
23
+ (dense): Linear(in_features=768, out_features=768, bias=True)
24
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): RobertaIntermediate(
29
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
30
+ )
31
+ (output): RobertaOutput(
32
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
33
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
34
+ (dropout): Dropout(p=0.1, inplace=False)
35
+ )
36
+ )
37
+ (1): RobertaLayer(
38
+ (attention): RobertaAttention(
39
+ (self): RobertaSelfAttention(
40
+ (query): Linear(in_features=768, out_features=768, bias=True)
41
+ (key): Linear(in_features=768, out_features=768, bias=True)
42
+ (value): Linear(in_features=768, out_features=768, bias=True)
43
+ (dropout): Dropout(p=0.1, inplace=False)
44
+ )
45
+ (output): RobertaSelfOutput(
46
+ (dense): Linear(in_features=768, out_features=768, bias=True)
47
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
48
+ (dropout): Dropout(p=0.1, inplace=False)
49
+ )
50
+ )
51
+ (intermediate): RobertaIntermediate(
52
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
53
+ )
54
+ (output): RobertaOutput(
55
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
56
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
57
+ (dropout): Dropout(p=0.1, inplace=False)
58
+ )
59
+ )
60
+ (2): RobertaLayer(
61
+ (attention): RobertaAttention(
62
+ (self): RobertaSelfAttention(
63
+ (query): Linear(in_features=768, out_features=768, bias=True)
64
+ (key): Linear(in_features=768, out_features=768, bias=True)
65
+ (value): Linear(in_features=768, out_features=768, bias=True)
66
+ (dropout): Dropout(p=0.1, inplace=False)
67
+ )
68
+ (output): RobertaSelfOutput(
69
+ (dense): Linear(in_features=768, out_features=768, bias=True)
70
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
71
+ (dropout): Dropout(p=0.1, inplace=False)
72
+ )
73
+ )
74
+ (intermediate): RobertaIntermediate(
75
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
76
+ )
77
+ (output): RobertaOutput(
78
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
79
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
80
+ (dropout): Dropout(p=0.1, inplace=False)
81
+ )
82
+ )
83
+ (3): RobertaLayer(
84
+ (attention): RobertaAttention(
85
+ (self): RobertaSelfAttention(
86
+ (query): Linear(in_features=768, out_features=768, bias=True)
87
+ (key): Linear(in_features=768, out_features=768, bias=True)
88
+ (value): Linear(in_features=768, out_features=768, bias=True)
89
+ (dropout): Dropout(p=0.1, inplace=False)
90
+ )
91
+ (output): RobertaSelfOutput(
92
+ (dense): Linear(in_features=768, out_features=768, bias=True)
93
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
94
+ (dropout): Dropout(p=0.1, inplace=False)
95
+ )
96
+ )
97
+ (intermediate): RobertaIntermediate(
98
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
99
+ )
100
+ (output): RobertaOutput(
101
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
102
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
103
+ (dropout): Dropout(p=0.1, inplace=False)
104
+ )
105
+ )
106
+ (4): RobertaLayer(
107
+ (attention): RobertaAttention(
108
+ (self): RobertaSelfAttention(
109
+ (query): Linear(in_features=768, out_features=768, bias=True)
110
+ (key): Linear(in_features=768, out_features=768, bias=True)
111
+ (value): Linear(in_features=768, out_features=768, bias=True)
112
+ (dropout): Dropout(p=0.1, inplace=False)
113
+ )
114
+ (output): RobertaSelfOutput(
115
+ (dense): Linear(in_features=768, out_features=768, bias=True)
116
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
117
+ (dropout): Dropout(p=0.1, inplace=False)
118
+ )
119
+ )
120
+ (intermediate): RobertaIntermediate(
121
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
122
+ )
123
+ (output): RobertaOutput(
124
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
125
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
126
+ (dropout): Dropout(p=0.1, inplace=False)
127
+ )
128
+ )
129
+ (5): RobertaLayer(
130
+ (attention): RobertaAttention(
131
+ (self): RobertaSelfAttention(
132
+ (query): Linear(in_features=768, out_features=768, bias=True)
133
+ (key): Linear(in_features=768, out_features=768, bias=True)
134
+ (value): Linear(in_features=768, out_features=768, bias=True)
135
+ (dropout): Dropout(p=0.1, inplace=False)
136
+ )
137
+ (output): RobertaSelfOutput(
138
+ (dense): Linear(in_features=768, out_features=768, bias=True)
139
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
140
+ (dropout): Dropout(p=0.1, inplace=False)
141
+ )
142
+ )
143
+ (intermediate): RobertaIntermediate(
144
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
145
+ )
146
+ (output): RobertaOutput(
147
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
148
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
149
+ (dropout): Dropout(p=0.1, inplace=False)
150
+ )
151
+ )
152
+ (6): RobertaLayer(
153
+ (attention): RobertaAttention(
154
+ (self): RobertaSelfAttention(
155
+ (query): Linear(in_features=768, out_features=768, bias=True)
156
+ (key): Linear(in_features=768, out_features=768, bias=True)
157
+ (value): Linear(in_features=768, out_features=768, bias=True)
158
+ (dropout): Dropout(p=0.1, inplace=False)
159
+ )
160
+ (output): RobertaSelfOutput(
161
+ (dense): Linear(in_features=768, out_features=768, bias=True)
162
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
163
+ (dropout): Dropout(p=0.1, inplace=False)
164
+ )
165
+ )
166
+ (intermediate): RobertaIntermediate(
167
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
168
+ )
169
+ (output): RobertaOutput(
170
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
171
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
172
+ (dropout): Dropout(p=0.1, inplace=False)
173
+ )
174
+ )
175
+ (7): RobertaLayer(
176
+ (attention): RobertaAttention(
177
+ (self): RobertaSelfAttention(
178
+ (query): Linear(in_features=768, out_features=768, bias=True)
179
+ (key): Linear(in_features=768, out_features=768, bias=True)
180
+ (value): Linear(in_features=768, out_features=768, bias=True)
181
+ (dropout): Dropout(p=0.1, inplace=False)
182
+ )
183
+ (output): RobertaSelfOutput(
184
+ (dense): Linear(in_features=768, out_features=768, bias=True)
185
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
186
+ (dropout): Dropout(p=0.1, inplace=False)
187
+ )
188
+ )
189
+ (intermediate): RobertaIntermediate(
190
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
191
+ )
192
+ (output): RobertaOutput(
193
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
194
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
195
+ (dropout): Dropout(p=0.1, inplace=False)
196
+ )
197
+ )
198
+ (8): RobertaLayer(
199
+ (attention): RobertaAttention(
200
+ (self): RobertaSelfAttention(
201
+ (query): Linear(in_features=768, out_features=768, bias=True)
202
+ (key): Linear(in_features=768, out_features=768, bias=True)
203
+ (value): Linear(in_features=768, out_features=768, bias=True)
204
+ (dropout): Dropout(p=0.1, inplace=False)
205
+ )
206
+ (output): RobertaSelfOutput(
207
+ (dense): Linear(in_features=768, out_features=768, bias=True)
208
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
209
+ (dropout): Dropout(p=0.1, inplace=False)
210
+ )
211
+ )
212
+ (intermediate): RobertaIntermediate(
213
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
214
+ )
215
+ (output): RobertaOutput(
216
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
217
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
218
+ (dropout): Dropout(p=0.1, inplace=False)
219
+ )
220
+ )
221
+ (9): RobertaLayer(
222
+ (attention): RobertaAttention(
223
+ (self): RobertaSelfAttention(
224
+ (query): Linear(in_features=768, out_features=768, bias=True)
225
+ (key): Linear(in_features=768, out_features=768, bias=True)
226
+ (value): Linear(in_features=768, out_features=768, bias=True)
227
+ (dropout): Dropout(p=0.1, inplace=False)
228
+ )
229
+ (output): RobertaSelfOutput(
230
+ (dense): Linear(in_features=768, out_features=768, bias=True)
231
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
232
+ (dropout): Dropout(p=0.1, inplace=False)
233
+ )
234
+ )
235
+ (intermediate): RobertaIntermediate(
236
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
237
+ )
238
+ (output): RobertaOutput(
239
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
240
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
241
+ (dropout): Dropout(p=0.1, inplace=False)
242
+ )
243
+ )
244
+ (10): RobertaLayer(
245
+ (attention): RobertaAttention(
246
+ (self): RobertaSelfAttention(
247
+ (query): Linear(in_features=768, out_features=768, bias=True)
248
+ (key): Linear(in_features=768, out_features=768, bias=True)
249
+ (value): Linear(in_features=768, out_features=768, bias=True)
250
+ (dropout): Dropout(p=0.1, inplace=False)
251
+ )
252
+ (output): RobertaSelfOutput(
253
+ (dense): Linear(in_features=768, out_features=768, bias=True)
254
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
255
+ (dropout): Dropout(p=0.1, inplace=False)
256
+ )
257
+ )
258
+ (intermediate): RobertaIntermediate(
259
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
260
+ )
261
+ (output): RobertaOutput(
262
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
263
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
264
+ (dropout): Dropout(p=0.1, inplace=False)
265
+ )
266
+ )
267
+ (11): RobertaLayer(
268
+ (attention): RobertaAttention(
269
+ (self): RobertaSelfAttention(
270
+ (query): Linear(in_features=768, out_features=768, bias=True)
271
+ (key): Linear(in_features=768, out_features=768, bias=True)
272
+ (value): Linear(in_features=768, out_features=768, bias=True)
273
+ (dropout): Dropout(p=0.1, inplace=False)
274
+ )
275
+ (output): RobertaSelfOutput(
276
+ (dense): Linear(in_features=768, out_features=768, bias=True)
277
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
278
+ (dropout): Dropout(p=0.1, inplace=False)
279
+ )
280
+ )
281
+ (intermediate): RobertaIntermediate(
282
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
283
+ )
284
+ (output): RobertaOutput(
285
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
286
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
287
+ (dropout): Dropout(p=0.1, inplace=False)
288
+ )
289
+ )
290
+ )
291
+ )
292
+ (pooler): RobertaPooler(
293
+ (dense): Linear(in_features=768, out_features=768, bias=True)
294
+ (activation): Tanh()
295
+ )
296
+ )
297
+ )
298
+ (word_dropout): WordDropout(p=0.05)
299
+ (locked_dropout): LockedDropout(p=0.5)
300
+ (linear): Linear(in_features=768, out_features=18, bias=True)
301
+ (beta): 1.0
302
+ (weights): None
303
+ (weight_tensor) None
304
+ )"
305
+ 2022-02-04 12:18:14,167 ----------------------------------------------------------------------------------------------------
306
+ 2022-02-04 12:18:14,167 Corpus: "Corpus: 126973 train + 7037 dev + 7090 test sentences"
307
+ 2022-02-04 12:18:14,167 ----------------------------------------------------------------------------------------------------
308
+ 2022-02-04 12:18:14,167 Parameters:
309
+ 2022-02-04 12:18:14,167 - learning_rate: "5e-05"
310
+ 2022-02-04 12:18:14,167 - mini_batch_size: "16"
311
+ 2022-02-04 12:18:14,167 - patience: "3"
312
+ 2022-02-04 12:18:14,167 - anneal_factor: "0.5"
313
+ 2022-02-04 12:18:14,167 - max_epochs: "10"
314
+ 2022-02-04 12:18:14,167 - shuffle: "True"
315
+ 2022-02-04 12:18:14,167 - train_with_dev: "False"
316
+ 2022-02-04 12:18:14,167 - batch_growth_annealing: "False"
317
+ 2022-02-04 12:18:14,167 ----------------------------------------------------------------------------------------------------
318
+ 2022-02-04 12:18:14,167 Model training base path: "resources/taggers/ner-camembert"
319
+ 2022-02-04 12:18:14,167 ----------------------------------------------------------------------------------------------------
320
+ 2022-02-04 12:18:14,167 Device: cuda:0
321
+ 2022-02-04 12:18:14,167 ----------------------------------------------------------------------------------------------------
322
+ 2022-02-04 12:18:14,167 Embeddings storage mode: none
323
+ 2022-02-04 12:18:14,170 ----------------------------------------------------------------------------------------------------
324
+ 2022-02-04 12:25:23,397 epoch 1 - iter 793/7936 - loss 1.64849782 - samples/sec: 29.56 - lr: 0.000005
325
+ 2022-02-04 12:33:59,649 epoch 1 - iter 1586/7936 - loss 1.11222779 - samples/sec: 24.58 - lr: 0.000010
326
+ 2022-02-04 12:41:09,132 epoch 1 - iter 2379/7936 - loss 0.85257016 - samples/sec: 29.55 - lr: 0.000015
327
+ 2022-02-04 12:47:44,896 epoch 1 - iter 3172/7936 - loss 0.71981753 - samples/sec: 32.07 - lr: 0.000020
328
+ 2022-02-04 12:55:15,449 epoch 1 - iter 3965/7936 - loss 0.60512907 - samples/sec: 28.16 - lr: 0.000025
329
+ 2022-02-04 13:02:35,238 epoch 1 - iter 4758/7936 - loss 0.52903622 - samples/sec: 28.85 - lr: 0.000030
330
+ 2022-02-04 13:09:27,012 epoch 1 - iter 5551/7936 - loss 0.48171220 - samples/sec: 30.82 - lr: 0.000035
331
+ 2022-02-04 13:15:53,083 epoch 1 - iter 6344/7936 - loss 0.44948661 - samples/sec: 32.87 - lr: 0.000040
332
+ 2022-02-04 13:22:02,650 epoch 1 - iter 7137/7936 - loss 0.42228564 - samples/sec: 34.34 - lr: 0.000045
333
+ 2022-02-04 13:28:59,445 epoch 1 - iter 7930/7936 - loss 0.39366725 - samples/sec: 30.45 - lr: 0.000050
334
+ 2022-02-04 13:29:03,026 ----------------------------------------------------------------------------------------------------
335
+ 2022-02-04 13:29:03,028 EPOCH 1 done: loss 0.3935 - lr 0.0000500
336
+ 2022-02-04 13:32:00,102 DEV : loss 0.038586683571338654 - f1-score (micro avg) 0.8195
337
+ 2022-02-04 13:32:00,155 BAD EPOCHS (no improvement): 4
338
+ 2022-02-04 13:32:00,156 ----------------------------------------------------------------------------------------------------
339
+ 2022-02-04 13:39:12,612 epoch 2 - iter 793/7936 - loss 0.14931520 - samples/sec: 29.34 - lr: 0.000049
340
+ 2022-02-04 13:46:36,550 epoch 2 - iter 1586/7936 - loss 0.14672871 - samples/sec: 28.58 - lr: 0.000049
341
+ 2022-02-04 13:53:49,885 epoch 2 - iter 2379/7936 - loss 0.14547274 - samples/sec: 29.28 - lr: 0.000048
342
+ 2022-02-04 14:01:13,739 epoch 2 - iter 3172/7936 - loss 0.14418846 - samples/sec: 28.59 - lr: 0.000048
343
+ 2022-02-04 14:08:30,985 epoch 2 - iter 3965/7936 - loss 0.14265825 - samples/sec: 29.02 - lr: 0.000047
344
+ 2022-02-04 14:15:46,742 epoch 2 - iter 4758/7936 - loss 0.14086599 - samples/sec: 29.12 - lr: 0.000047
345
+ 2022-02-04 14:23:11,181 epoch 2 - iter 5551/7936 - loss 0.13927378 - samples/sec: 28.55 - lr: 0.000046
346
+ 2022-02-04 14:30:19,706 epoch 2 - iter 6344/7936 - loss 0.13799042 - samples/sec: 29.61 - lr: 0.000046
347
+ 2022-02-04 14:37:30,554 epoch 2 - iter 7137/7936 - loss 0.13666296 - samples/sec: 29.45 - lr: 0.000045
348
+ 2022-02-04 14:44:52,886 epoch 2 - iter 7930/7936 - loss 0.13525042 - samples/sec: 28.69 - lr: 0.000044
349
+ 2022-02-04 14:44:56,060 ----------------------------------------------------------------------------------------------------
350
+ 2022-02-04 14:44:56,062 EPOCH 2 done: loss 0.1352 - lr 0.0000444
351
+ 2022-02-04 14:47:40,950 DEV : loss 0.015217592008411884 - f1-score (micro avg) 0.9164
352
+ 2022-02-04 14:47:41,011 BAD EPOCHS (no improvement): 4
353
+ 2022-02-04 14:47:41,014 ----------------------------------------------------------------------------------------------------
354
+ 2022-02-04 14:55:04,697 epoch 3 - iter 793/7936 - loss 0.11742558 - samples/sec: 28.60 - lr: 0.000044
355
+ 2022-02-04 15:02:16,388 epoch 3 - iter 1586/7936 - loss 0.11679901 - samples/sec: 29.40 - lr: 0.000043
356
+ 2022-02-04 15:09:29,924 epoch 3 - iter 2379/7936 - loss 0.11557918 - samples/sec: 29.27 - lr: 0.000043
357
+ 2022-02-04 15:16:54,356 epoch 3 - iter 3172/7936 - loss 0.11469700 - samples/sec: 28.55 - lr: 0.000042
358
+ 2022-02-04 15:24:11,817 epoch 3 - iter 3965/7936 - loss 0.11351908 - samples/sec: 29.01 - lr: 0.000042
359
+ 2022-02-04 15:31:20,620 epoch 3 - iter 4758/7936 - loss 0.11266101 - samples/sec: 29.59 - lr: 0.000041
360
+ 2022-02-04 15:38:42,882 epoch 3 - iter 5551/7936 - loss 0.11158730 - samples/sec: 28.69 - lr: 0.000041
361
+ 2022-02-04 15:45:50,317 epoch 3 - iter 6344/7936 - loss 0.11067669 - samples/sec: 29.69 - lr: 0.000040
362
+ 2022-02-04 15:53:16,035 epoch 3 - iter 7137/7936 - loss 0.10955013 - samples/sec: 28.47 - lr: 0.000039
363
+ 2022-02-04 16:00:25,858 epoch 3 - iter 7930/7936 - loss 0.10859645 - samples/sec: 29.52 - lr: 0.000039
364
+ 2022-02-04 16:00:29,034 ----------------------------------------------------------------------------------------------------
365
+ 2022-02-04 16:00:29,035 EPOCH 3 done: loss 0.1086 - lr 0.0000389
366
+ 2022-02-04 16:03:24,201 DEV : loss 0.015040190890431404 - f1-score (micro avg) 0.9276
367
+ 2022-02-04 16:03:24,261 BAD EPOCHS (no improvement): 4
368
+ 2022-02-04 16:03:24,262 ----------------------------------------------------------------------------------------------------
369
+ 2022-02-04 16:10:35,356 epoch 4 - iter 793/7936 - loss 0.09491620 - samples/sec: 29.44 - lr: 0.000038
370
+ 2022-02-04 16:17:46,476 epoch 4 - iter 1586/7936 - loss 0.09400900 - samples/sec: 29.43 - lr: 0.000038
371
+ 2022-02-04 16:25:10,503 epoch 4 - iter 2379/7936 - loss 0.09355228 - samples/sec: 28.58 - lr: 0.000037
372
+ 2022-02-04 16:32:21,829 epoch 4 - iter 3172/7936 - loss 0.09257257 - samples/sec: 29.42 - lr: 0.000037
373
+ 2022-02-04 16:39:34,717 epoch 4 - iter 3965/7936 - loss 0.09178491 - samples/sec: 29.31 - lr: 0.000036
374
+ 2022-02-04 16:46:54,536 epoch 4 - iter 4758/7936 - loss 0.09102086 - samples/sec: 28.85 - lr: 0.000036
375
+ 2022-02-04 16:54:08,674 epoch 4 - iter 5551/7936 - loss 0.09026061 - samples/sec: 29.23 - lr: 0.000035
376
+ 2022-02-04 17:01:24,799 epoch 4 - iter 6344/7936 - loss 0.08942621 - samples/sec: 29.10 - lr: 0.000034
377
+ 2022-02-04 17:08:44,577 epoch 4 - iter 7137/7936 - loss 0.08868927 - samples/sec: 28.85 - lr: 0.000034
378
+ 2022-02-04 17:15:57,678 epoch 4 - iter 7930/7936 - loss 0.08790466 - samples/sec: 29.30 - lr: 0.000033
379
+ 2022-02-04 17:16:00,787 ----------------------------------------------------------------------------------------------------
380
+ 2022-02-04 17:16:00,790 EPOCH 4 done: loss 0.0879 - lr 0.0000333
381
+ 2022-02-04 17:18:55,805 DEV : loss 0.015710221603512764 - f1-score (micro avg) 0.9308
382
+ 2022-02-04 17:18:55,865 BAD EPOCHS (no improvement): 4
383
+ 2022-02-04 17:18:55,873 ----------------------------------------------------------------------------------------------------
384
+ 2022-02-04 17:26:02,969 epoch 5 - iter 793/7936 - loss 0.07683748 - samples/sec: 29.71 - lr: 0.000033
385
+ 2022-02-04 17:33:13,355 epoch 5 - iter 1586/7936 - loss 0.07621969 - samples/sec: 29.49 - lr: 0.000032
386
+ 2022-02-04 17:40:38,247 epoch 5 - iter 2379/7936 - loss 0.07573593 - samples/sec: 28.52 - lr: 0.000032
387
+ 2022-02-04 17:47:40,269 epoch 5 - iter 3172/7936 - loss 0.07524740 - samples/sec: 30.07 - lr: 0.000031
388
+ 2022-02-04 17:54:59,036 epoch 5 - iter 3965/7936 - loss 0.07449799 - samples/sec: 28.92 - lr: 0.000031
389
+ 2022-02-04 18:02:03,686 epoch 5 - iter 4758/7936 - loss 0.07405311 - samples/sec: 29.88 - lr: 0.000030
390
+ 2022-02-04 18:09:11,646 epoch 5 - iter 5551/7936 - loss 0.07340830 - samples/sec: 29.65 - lr: 0.000029
391
+ 2022-02-04 18:16:27,240 epoch 5 - iter 6344/7936 - loss 0.07271787 - samples/sec: 29.13 - lr: 0.000029
392
+ 2022-02-04 18:23:29,669 epoch 5 - iter 7137/7936 - loss 0.07217288 - samples/sec: 30.04 - lr: 0.000028
393
+ 2022-02-04 18:30:30,597 epoch 5 - iter 7930/7936 - loss 0.07166288 - samples/sec: 30.15 - lr: 0.000028
394
+ 2022-02-04 18:30:33,919 ----------------------------------------------------------------------------------------------------
395
+ 2022-02-04 18:30:33,920 EPOCH 5 done: loss 0.0717 - lr 0.0000278
396
+ 2022-02-04 18:33:23,923 DEV : loss 0.017801353707909584 - f1-score (micro avg) 0.9319
397
+ 2022-02-04 18:33:23,983 BAD EPOCHS (no improvement): 4
398
+ 2022-02-04 18:33:23,983 ----------------------------------------------------------------------------------------------------
399
+ 2022-02-04 18:40:28,017 epoch 6 - iter 793/7936 - loss 0.06265627 - samples/sec: 29.93 - lr: 0.000027
400
+ 2022-02-04 18:47:46,740 epoch 6 - iter 1586/7936 - loss 0.06168821 - samples/sec: 28.92 - lr: 0.000027
401
+ 2022-02-04 18:54:59,429 epoch 6 - iter 2379/7936 - loss 0.06137959 - samples/sec: 29.33 - lr: 0.000026
402
+ 2022-02-04 19:02:08,367 epoch 6 - iter 3172/7936 - loss 0.06101991 - samples/sec: 29.58 - lr: 0.000026
403
+ 2022-02-04 19:09:34,369 epoch 6 - iter 3965/7936 - loss 0.06073221 - samples/sec: 28.45 - lr: 0.000025
404
+ 2022-02-04 19:16:53,646 epoch 6 - iter 4758/7936 - loss 0.06031513 - samples/sec: 28.89 - lr: 0.000024
405
+ 2022-02-04 19:24:05,427 epoch 6 - iter 5551/7936 - loss 0.05997466 - samples/sec: 29.39 - lr: 0.000024
406
+ 2022-02-04 19:31:27,470 epoch 6 - iter 6344/7936 - loss 0.05952743 - samples/sec: 28.71 - lr: 0.000023
407
+ 2022-02-04 19:38:37,449 epoch 6 - iter 7137/7936 - loss 0.05906427 - samples/sec: 29.51 - lr: 0.000023
408
+ 2022-02-04 19:46:02,608 epoch 6 - iter 7930/7936 - loss 0.05868560 - samples/sec: 28.51 - lr: 0.000022
409
+ 2022-02-04 19:46:05,790 ----------------------------------------------------------------------------------------------------
410
+ 2022-02-04 19:46:05,791 EPOCH 6 done: loss 0.0587 - lr 0.0000222
411
+ 2022-02-04 19:48:52,058 DEV : loss 0.018429730087518692 - f1-score (micro avg) 0.9371
412
+ 2022-02-04 19:48:52,117 BAD EPOCHS (no improvement): 4
413
+ 2022-02-04 19:48:52,118 ----------------------------------------------------------------------------------------------------
414
+ 2022-02-04 19:56:15,841 epoch 7 - iter 793/7936 - loss 0.05186660 - samples/sec: 28.60 - lr: 0.000022
415
+ 2022-02-04 20:03:27,574 epoch 7 - iter 1586/7936 - loss 0.05230029 - samples/sec: 29.39 - lr: 0.000021
416
+ 2022-02-04 20:10:42,349 epoch 7 - iter 2379/7936 - loss 0.05178480 - samples/sec: 29.19 - lr: 0.000021
417
+ 2022-02-04 20:18:09,822 epoch 7 - iter 3172/7936 - loss 0.05114746 - samples/sec: 28.36 - lr: 0.000020
418
+ 2022-02-04 20:25:23,574 epoch 7 - iter 3965/7936 - loss 0.05080701 - samples/sec: 29.26 - lr: 0.000019
419
+ 2022-02-04 20:32:39,287 epoch 7 - iter 4758/7936 - loss 0.05039880 - samples/sec: 29.12 - lr: 0.000019
420
+ 2022-02-04 20:40:04,807 epoch 7 - iter 5551/7936 - loss 0.05020234 - samples/sec: 28.48 - lr: 0.000018
421
+ 2022-02-04 20:47:17,356 epoch 7 - iter 6344/7936 - loss 0.04984342 - samples/sec: 29.34 - lr: 0.000018
422
+ 2022-02-04 20:54:31,673 epoch 7 - iter 7137/7936 - loss 0.04955538 - samples/sec: 29.22 - lr: 0.000017
423
+ 2022-02-04 21:01:58,187 epoch 7 - iter 7930/7936 - loss 0.04921375 - samples/sec: 28.42 - lr: 0.000017
424
+ 2022-02-04 21:02:01,071 ----------------------------------------------------------------------------------------------------
425
+ 2022-02-04 21:02:01,071 EPOCH 7 done: loss 0.0492 - lr 0.0000167
426
+ 2022-02-04 21:04:47,460 DEV : loss 0.02109825611114502 - f1-score (micro avg) 0.9362
427
+ 2022-02-04 21:04:47,519 BAD EPOCHS (no improvement): 4
428
+ 2022-02-04 21:04:47,519 ----------------------------------------------------------------------------------------------------
429
+ 2022-02-04 21:12:13,992 epoch 8 - iter 793/7936 - loss 0.04468006 - samples/sec: 28.42 - lr: 0.000016
430
+ 2022-02-04 21:19:25,811 epoch 8 - iter 1586/7936 - loss 0.04434977 - samples/sec: 29.39 - lr: 0.000016
431
+ 2022-02-04 21:26:35,161 epoch 8 - iter 2379/7936 - loss 0.04431108 - samples/sec: 29.56 - lr: 0.000015
432
+ 2022-02-04 21:33:55,512 epoch 8 - iter 3172/7936 - loss 0.04408371 - samples/sec: 28.82 - lr: 0.000014
433
+ 2022-02-04 21:41:09,449 epoch 8 - iter 3965/7936 - loss 0.04390607 - samples/sec: 29.24 - lr: 0.000014
434
+ 2022-02-04 21:48:30,449 epoch 8 - iter 4758/7936 - loss 0.04368218 - samples/sec: 28.77 - lr: 0.000013
435
+ 2022-02-04 21:55:47,346 epoch 8 - iter 5551/7936 - loss 0.04350544 - samples/sec: 29.05 - lr: 0.000013
436
+ 2022-02-04 22:03:02,107 epoch 8 - iter 6344/7936 - loss 0.04321482 - samples/sec: 29.19 - lr: 0.000012
437
+ 2022-02-04 22:10:29,225 epoch 8 - iter 7137/7936 - loss 0.04299359 - samples/sec: 28.38 - lr: 0.000012
438
+ 2022-02-04 22:17:46,915 epoch 8 - iter 7930/7936 - loss 0.04275655 - samples/sec: 28.99 - lr: 0.000011
439
+ 2022-02-04 22:17:50,251 ----------------------------------------------------------------------------------------------------
440
+ 2022-02-04 22:17:50,252 EPOCH 8 done: loss 0.0428 - lr 0.0000111
441
+ 2022-02-04 22:20:46,443 DEV : loss 0.02112417109310627 - f1-score (micro avg) 0.9396
442
+ 2022-02-04 22:20:46,502 BAD EPOCHS (no improvement): 4
443
+ 2022-02-04 22:20:46,502 ----------------------------------------------------------------------------------------------------
444
+ 2022-02-04 22:27:54,677 epoch 9 - iter 793/7936 - loss 0.03874630 - samples/sec: 29.64 - lr: 0.000011
445
+ 2022-02-04 22:35:07,034 epoch 9 - iter 1586/7936 - loss 0.03916791 - samples/sec: 29.35 - lr: 0.000010
446
+ 2022-02-04 22:42:33,861 epoch 9 - iter 2379/7936 - loss 0.03903771 - samples/sec: 28.40 - lr: 0.000009
447
+ 2022-02-04 22:49:45,768 epoch 9 - iter 3172/7936 - loss 0.03915089 - samples/sec: 29.38 - lr: 0.000009
448
+ 2022-02-04 22:56:49,271 epoch 9 - iter 3965/7936 - loss 0.03903752 - samples/sec: 29.96 - lr: 0.000008
449
+ 2022-02-04 23:04:02,033 epoch 9 - iter 4758/7936 - loss 0.03886980 - samples/sec: 29.32 - lr: 0.000008
450
+ 2022-02-04 23:11:05,006 epoch 9 - iter 5551/7936 - loss 0.03870274 - samples/sec: 30.00 - lr: 0.000007
451
+ 2022-02-04 23:18:05,622 epoch 9 - iter 6344/7936 - loss 0.03860323 - samples/sec: 30.17 - lr: 0.000007
452
+ 2022-02-04 23:25:20,470 epoch 9 - iter 7137/7936 - loss 0.03844156 - samples/sec: 29.18 - lr: 0.000006
453
+ 2022-02-04 23:32:20,810 epoch 9 - iter 7930/7936 - loss 0.03839073 - samples/sec: 30.19 - lr: 0.000006
454
+ 2022-02-04 23:32:23,941 ----------------------------------------------------------------------------------------------------
455
+ 2022-02-04 23:32:23,942 EPOCH 9 done: loss 0.0384 - lr 0.0000056
456
+ 2022-02-04 23:35:14,351 DEV : loss 0.02171432413160801 - f1-score (micro avg) 0.9419
457
+ 2022-02-04 23:35:14,411 BAD EPOCHS (no improvement): 4
458
+ 2022-02-04 23:35:14,412 ----------------------------------------------------------------------------------------------------
459
+ 2022-02-04 23:42:16,230 epoch 10 - iter 793/7936 - loss 0.03646154 - samples/sec: 30.08 - lr: 0.000005
460
+ 2022-02-04 23:49:27,305 epoch 10 - iter 1586/7936 - loss 0.03635515 - samples/sec: 29.44 - lr: 0.000004
461
+ 2022-02-04 23:56:27,850 epoch 10 - iter 2379/7936 - loss 0.03662968 - samples/sec: 30.17 - lr: 0.000004
462
+ 2022-02-05 00:03:30,598 epoch 10 - iter 3172/7936 - loss 0.03640152 - samples/sec: 30.02 - lr: 0.000003
463
+ 2022-02-05 00:10:46,058 epoch 10 - iter 3965/7936 - loss 0.03636994 - samples/sec: 29.14 - lr: 0.000003
464
+ 2022-02-05 00:17:50,999 epoch 10 - iter 4758/7936 - loss 0.03636800 - samples/sec: 29.86 - lr: 0.000002
465
+ 2022-02-05 00:24:51,167 epoch 10 - iter 5551/7936 - loss 0.03625499 - samples/sec: 30.20 - lr: 0.000002
466
+ 2022-02-05 00:32:07,970 epoch 10 - iter 6344/7936 - loss 0.03625737 - samples/sec: 29.05 - lr: 0.000001
467
+ 2022-02-05 00:39:14,867 epoch 10 - iter 7137/7936 - loss 0.03618156 - samples/sec: 29.73 - lr: 0.000001
468
+ 2022-02-05 00:46:17,991 epoch 10 - iter 7930/7936 - loss 0.03611184 - samples/sec: 29.99 - lr: 0.000000
469
+ 2022-02-05 00:46:21,120 ----------------------------------------------------------------------------------------------------
470
+ 2022-02-05 00:46:21,123 EPOCH 10 done: loss 0.0361 - lr 0.0000000
471
+ 2022-02-05 00:49:11,421 DEV : loss 0.023424603044986725 - f1-score (micro avg) 0.9417
472
+ 2022-02-05 00:49:11,486 BAD EPOCHS (no improvement): 4
473
+ 2022-02-05 00:49:12,641 ----------------------------------------------------------------------------------------------------
474
+ 2022-02-05 00:49:12,643 Testing using last state of model ...
475
+ 2022-02-05 00:52:03,154 0.9303 0.9309 0.9306 0.8856
476
+ 2022-02-05 00:52:03,155
477
+ Results:
478
+ - F-score (micro) 0.9306
479
+ - F-score (macro) 0.9057
480
+ - Accuracy 0.8856
481
+
482
+ By class:
483
+ precision recall f1-score support
484
+
485
+ pers 0.9373 0.9236 0.9304 2734
486
+ loc 0.9140 0.9371 0.9254 1384
487
+ amount 0.9840 0.9840 0.9840 250
488
+ time 0.9447 0.9407 0.9427 236
489
+ func 0.9209 0.9143 0.9176 140
490
+ org 0.8364 0.9388 0.8846 49
491
+ prod 0.7742 0.8889 0.8276 27
492
+ event 0.8333 0.8333 0.8333 12
493
+
494
+ micro avg 0.9303 0.9309 0.9306 4832
495
+ macro avg 0.8931 0.9201 0.9057 4832
496
+ weighted avg 0.9307 0.9309 0.9307 4832
497
+ samples avg 0.8856 0.8856 0.8856 4832
498
+
499
+ 2022-02-05 00:52:03,155 ----------------------------------------------------------------------------------------------------
weights.txt ADDED
File without changes