File size: 24,309 Bytes
52e991d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
2023-10-19 00:34:30,183 ----------------------------------------------------------------------------------------------------
2023-10-19 00:34:30,183 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 128)
        (position_embeddings): Embedding(512, 128)
        (token_type_embeddings): Embedding(2, 128)
        (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-1): 2 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=128, out_features=128, bias=True)
                (key): Linear(in_features=128, out_features=128, bias=True)
                (value): Linear(in_features=128, out_features=128, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=128, out_features=128, bias=True)
                (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=128, out_features=512, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=512, out_features=128, bias=True)
              (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=128, out_features=128, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=128, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-19 00:34:30,183 ----------------------------------------------------------------------------------------------------
2023-10-19 00:34:30,183 MultiCorpus: 14465 train + 1392 dev + 2432 test sentences
 - NER_HIPE_2022 Corpus: 14465 train + 1392 dev + 2432 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/letemps/fr/with_doc_seperator
2023-10-19 00:34:30,183 ----------------------------------------------------------------------------------------------------
2023-10-19 00:34:30,183 Train:  14465 sentences
2023-10-19 00:34:30,183         (train_with_dev=False, train_with_test=False)
2023-10-19 00:34:30,183 ----------------------------------------------------------------------------------------------------
2023-10-19 00:34:30,183 Training Params:
2023-10-19 00:34:30,183  - learning_rate: "3e-05" 
2023-10-19 00:34:30,183  - mini_batch_size: "4"
2023-10-19 00:34:30,183  - max_epochs: "10"
2023-10-19 00:34:30,184  - shuffle: "True"
2023-10-19 00:34:30,184 ----------------------------------------------------------------------------------------------------
2023-10-19 00:34:30,184 Plugins:
2023-10-19 00:34:30,184  - TensorboardLogger
2023-10-19 00:34:30,184  - LinearScheduler | warmup_fraction: '0.1'
2023-10-19 00:34:30,184 ----------------------------------------------------------------------------------------------------
2023-10-19 00:34:30,184 Final evaluation on model from best epoch (best-model.pt)
2023-10-19 00:34:30,184  - metric: "('micro avg', 'f1-score')"
2023-10-19 00:34:30,184 ----------------------------------------------------------------------------------------------------
2023-10-19 00:34:30,184 Computation:
2023-10-19 00:34:30,184  - compute on device: cuda:0
2023-10-19 00:34:30,184  - embedding storage: none
2023-10-19 00:34:30,184 ----------------------------------------------------------------------------------------------------
2023-10-19 00:34:30,184 Model training base path: "hmbench-letemps/fr-dbmdz/bert-tiny-historic-multilingual-cased-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-3"
2023-10-19 00:34:30,184 ----------------------------------------------------------------------------------------------------
2023-10-19 00:34:30,184 ----------------------------------------------------------------------------------------------------
2023-10-19 00:34:30,184 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-19 00:34:35,918 epoch 1 - iter 361/3617 - loss 2.91805740 - time (sec): 5.73 - samples/sec: 6815.20 - lr: 0.000003 - momentum: 0.000000
2023-10-19 00:34:41,682 epoch 1 - iter 722/3617 - loss 2.25303714 - time (sec): 11.50 - samples/sec: 6622.87 - lr: 0.000006 - momentum: 0.000000
2023-10-19 00:34:47,406 epoch 1 - iter 1083/3617 - loss 1.66388576 - time (sec): 17.22 - samples/sec: 6640.71 - lr: 0.000009 - momentum: 0.000000
2023-10-19 00:34:53,049 epoch 1 - iter 1444/3617 - loss 1.33761956 - time (sec): 22.86 - samples/sec: 6653.83 - lr: 0.000012 - momentum: 0.000000
2023-10-19 00:34:58,249 epoch 1 - iter 1805/3617 - loss 1.12626075 - time (sec): 28.06 - samples/sec: 6840.05 - lr: 0.000015 - momentum: 0.000000
2023-10-19 00:35:03,842 epoch 1 - iter 2166/3617 - loss 0.98677877 - time (sec): 33.66 - samples/sec: 6847.55 - lr: 0.000018 - momentum: 0.000000
2023-10-19 00:35:09,452 epoch 1 - iter 2527/3617 - loss 0.89109170 - time (sec): 39.27 - samples/sec: 6783.21 - lr: 0.000021 - momentum: 0.000000
2023-10-19 00:35:14,571 epoch 1 - iter 2888/3617 - loss 0.81168511 - time (sec): 44.39 - samples/sec: 6859.56 - lr: 0.000024 - momentum: 0.000000
2023-10-19 00:35:19,754 epoch 1 - iter 3249/3617 - loss 0.74884197 - time (sec): 49.57 - samples/sec: 6883.09 - lr: 0.000027 - momentum: 0.000000
2023-10-19 00:35:25,488 epoch 1 - iter 3610/3617 - loss 0.69704141 - time (sec): 55.30 - samples/sec: 6861.62 - lr: 0.000030 - momentum: 0.000000
2023-10-19 00:35:25,588 ----------------------------------------------------------------------------------------------------
2023-10-19 00:35:25,588 EPOCH 1 done: loss 0.6964 - lr: 0.000030
2023-10-19 00:35:27,894 DEV : loss 0.18414448201656342 - f1-score (micro avg)  0.1723
2023-10-19 00:35:27,923 saving best model
2023-10-19 00:35:27,957 ----------------------------------------------------------------------------------------------------
2023-10-19 00:35:33,423 epoch 2 - iter 361/3617 - loss 0.20767271 - time (sec): 5.46 - samples/sec: 6898.30 - lr: 0.000030 - momentum: 0.000000
2023-10-19 00:35:39,165 epoch 2 - iter 722/3617 - loss 0.20857724 - time (sec): 11.21 - samples/sec: 6785.54 - lr: 0.000029 - momentum: 0.000000
2023-10-19 00:35:44,839 epoch 2 - iter 1083/3617 - loss 0.20034477 - time (sec): 16.88 - samples/sec: 6781.60 - lr: 0.000029 - momentum: 0.000000
2023-10-19 00:35:50,478 epoch 2 - iter 1444/3617 - loss 0.19662744 - time (sec): 22.52 - samples/sec: 6670.61 - lr: 0.000029 - momentum: 0.000000
2023-10-19 00:35:56,146 epoch 2 - iter 1805/3617 - loss 0.19587540 - time (sec): 28.19 - samples/sec: 6615.87 - lr: 0.000028 - momentum: 0.000000
2023-10-19 00:36:01,731 epoch 2 - iter 2166/3617 - loss 0.19490222 - time (sec): 33.77 - samples/sec: 6691.46 - lr: 0.000028 - momentum: 0.000000
2023-10-19 00:36:07,501 epoch 2 - iter 2527/3617 - loss 0.19331286 - time (sec): 39.54 - samples/sec: 6672.70 - lr: 0.000028 - momentum: 0.000000
2023-10-19 00:36:13,181 epoch 2 - iter 2888/3617 - loss 0.19208740 - time (sec): 45.22 - samples/sec: 6649.22 - lr: 0.000027 - momentum: 0.000000
2023-10-19 00:36:18,875 epoch 2 - iter 3249/3617 - loss 0.18952749 - time (sec): 50.92 - samples/sec: 6673.34 - lr: 0.000027 - momentum: 0.000000
2023-10-19 00:36:24,598 epoch 2 - iter 3610/3617 - loss 0.18852611 - time (sec): 56.64 - samples/sec: 6696.51 - lr: 0.000027 - momentum: 0.000000
2023-10-19 00:36:24,705 ----------------------------------------------------------------------------------------------------
2023-10-19 00:36:24,706 EPOCH 2 done: loss 0.1886 - lr: 0.000027
2023-10-19 00:36:28,635 DEV : loss 0.1664983630180359 - f1-score (micro avg)  0.381
2023-10-19 00:36:28,663 saving best model
2023-10-19 00:36:28,696 ----------------------------------------------------------------------------------------------------
2023-10-19 00:36:34,434 epoch 3 - iter 361/3617 - loss 0.15091050 - time (sec): 5.74 - samples/sec: 6577.68 - lr: 0.000026 - momentum: 0.000000
2023-10-19 00:36:40,116 epoch 3 - iter 722/3617 - loss 0.15277438 - time (sec): 11.42 - samples/sec: 6633.98 - lr: 0.000026 - momentum: 0.000000
2023-10-19 00:36:45,517 epoch 3 - iter 1083/3617 - loss 0.15890046 - time (sec): 16.82 - samples/sec: 6773.25 - lr: 0.000026 - momentum: 0.000000
2023-10-19 00:36:51,398 epoch 3 - iter 1444/3617 - loss 0.16323482 - time (sec): 22.70 - samples/sec: 6688.62 - lr: 0.000025 - momentum: 0.000000
2023-10-19 00:36:57,123 epoch 3 - iter 1805/3617 - loss 0.15977729 - time (sec): 28.43 - samples/sec: 6707.84 - lr: 0.000025 - momentum: 0.000000
2023-10-19 00:37:02,800 epoch 3 - iter 2166/3617 - loss 0.16133560 - time (sec): 34.10 - samples/sec: 6683.91 - lr: 0.000025 - momentum: 0.000000
2023-10-19 00:37:08,622 epoch 3 - iter 2527/3617 - loss 0.16171229 - time (sec): 39.92 - samples/sec: 6680.19 - lr: 0.000024 - momentum: 0.000000
2023-10-19 00:37:14,112 epoch 3 - iter 2888/3617 - loss 0.16077563 - time (sec): 45.41 - samples/sec: 6699.91 - lr: 0.000024 - momentum: 0.000000
2023-10-19 00:37:19,821 epoch 3 - iter 3249/3617 - loss 0.15940779 - time (sec): 51.12 - samples/sec: 6685.32 - lr: 0.000024 - momentum: 0.000000
2023-10-19 00:37:25,531 epoch 3 - iter 3610/3617 - loss 0.15948787 - time (sec): 56.83 - samples/sec: 6671.03 - lr: 0.000023 - momentum: 0.000000
2023-10-19 00:37:25,641 ----------------------------------------------------------------------------------------------------
2023-10-19 00:37:25,641 EPOCH 3 done: loss 0.1594 - lr: 0.000023
2023-10-19 00:37:28,812 DEV : loss 0.16962358355522156 - f1-score (micro avg)  0.3721
2023-10-19 00:37:28,839 ----------------------------------------------------------------------------------------------------
2023-10-19 00:37:34,732 epoch 4 - iter 361/3617 - loss 0.14232155 - time (sec): 5.89 - samples/sec: 6302.73 - lr: 0.000023 - momentum: 0.000000
2023-10-19 00:37:40,531 epoch 4 - iter 722/3617 - loss 0.14496426 - time (sec): 11.69 - samples/sec: 6505.66 - lr: 0.000023 - momentum: 0.000000
2023-10-19 00:37:46,264 epoch 4 - iter 1083/3617 - loss 0.15214303 - time (sec): 17.42 - samples/sec: 6523.89 - lr: 0.000022 - momentum: 0.000000
2023-10-19 00:37:52,081 epoch 4 - iter 1444/3617 - loss 0.14974326 - time (sec): 23.24 - samples/sec: 6539.88 - lr: 0.000022 - momentum: 0.000000
2023-10-19 00:37:57,503 epoch 4 - iter 1805/3617 - loss 0.15035754 - time (sec): 28.66 - samples/sec: 6641.02 - lr: 0.000022 - momentum: 0.000000
2023-10-19 00:38:02,896 epoch 4 - iter 2166/3617 - loss 0.15039641 - time (sec): 34.06 - samples/sec: 6679.55 - lr: 0.000021 - momentum: 0.000000
2023-10-19 00:38:08,575 epoch 4 - iter 2527/3617 - loss 0.14933721 - time (sec): 39.73 - samples/sec: 6636.79 - lr: 0.000021 - momentum: 0.000000
2023-10-19 00:38:14,295 epoch 4 - iter 2888/3617 - loss 0.14723742 - time (sec): 45.46 - samples/sec: 6659.06 - lr: 0.000021 - momentum: 0.000000
2023-10-19 00:38:19,742 epoch 4 - iter 3249/3617 - loss 0.14686544 - time (sec): 50.90 - samples/sec: 6721.83 - lr: 0.000020 - momentum: 0.000000
2023-10-19 00:38:25,385 epoch 4 - iter 3610/3617 - loss 0.14710156 - time (sec): 56.55 - samples/sec: 6702.85 - lr: 0.000020 - momentum: 0.000000
2023-10-19 00:38:25,499 ----------------------------------------------------------------------------------------------------
2023-10-19 00:38:25,499 EPOCH 4 done: loss 0.1470 - lr: 0.000020
2023-10-19 00:38:29,396 DEV : loss 0.16811420023441315 - f1-score (micro avg)  0.4632
2023-10-19 00:38:29,424 saving best model
2023-10-19 00:38:29,457 ----------------------------------------------------------------------------------------------------
2023-10-19 00:38:35,207 epoch 5 - iter 361/3617 - loss 0.14721800 - time (sec): 5.75 - samples/sec: 6158.21 - lr: 0.000020 - momentum: 0.000000
2023-10-19 00:38:41,041 epoch 5 - iter 722/3617 - loss 0.14279723 - time (sec): 11.58 - samples/sec: 6442.74 - lr: 0.000019 - momentum: 0.000000
2023-10-19 00:38:46,517 epoch 5 - iter 1083/3617 - loss 0.13300252 - time (sec): 17.06 - samples/sec: 6532.03 - lr: 0.000019 - momentum: 0.000000
2023-10-19 00:38:52,413 epoch 5 - iter 1444/3617 - loss 0.13201950 - time (sec): 22.95 - samples/sec: 6528.86 - lr: 0.000019 - momentum: 0.000000
2023-10-19 00:38:58,140 epoch 5 - iter 1805/3617 - loss 0.13224927 - time (sec): 28.68 - samples/sec: 6504.15 - lr: 0.000018 - momentum: 0.000000
2023-10-19 00:39:03,858 epoch 5 - iter 2166/3617 - loss 0.13286286 - time (sec): 34.40 - samples/sec: 6560.32 - lr: 0.000018 - momentum: 0.000000
2023-10-19 00:39:09,676 epoch 5 - iter 2527/3617 - loss 0.13266831 - time (sec): 40.22 - samples/sec: 6601.60 - lr: 0.000018 - momentum: 0.000000
2023-10-19 00:39:15,284 epoch 5 - iter 2888/3617 - loss 0.13352438 - time (sec): 45.83 - samples/sec: 6614.43 - lr: 0.000017 - momentum: 0.000000
2023-10-19 00:39:20,653 epoch 5 - iter 3249/3617 - loss 0.13325418 - time (sec): 51.20 - samples/sec: 6681.12 - lr: 0.000017 - momentum: 0.000000
2023-10-19 00:39:26,465 epoch 5 - iter 3610/3617 - loss 0.13392825 - time (sec): 57.01 - samples/sec: 6652.32 - lr: 0.000017 - momentum: 0.000000
2023-10-19 00:39:26,599 ----------------------------------------------------------------------------------------------------
2023-10-19 00:39:26,600 EPOCH 5 done: loss 0.1339 - lr: 0.000017
2023-10-19 00:39:29,796 DEV : loss 0.17465609312057495 - f1-score (micro avg)  0.4657
2023-10-19 00:39:29,825 saving best model
2023-10-19 00:39:29,859 ----------------------------------------------------------------------------------------------------
2023-10-19 00:39:35,592 epoch 6 - iter 361/3617 - loss 0.12127706 - time (sec): 5.73 - samples/sec: 6721.20 - lr: 0.000016 - momentum: 0.000000
2023-10-19 00:39:41,260 epoch 6 - iter 722/3617 - loss 0.11778469 - time (sec): 11.40 - samples/sec: 6597.92 - lr: 0.000016 - momentum: 0.000000
2023-10-19 00:39:47,020 epoch 6 - iter 1083/3617 - loss 0.11656336 - time (sec): 17.16 - samples/sec: 6622.72 - lr: 0.000016 - momentum: 0.000000
2023-10-19 00:39:52,551 epoch 6 - iter 1444/3617 - loss 0.12094593 - time (sec): 22.69 - samples/sec: 6627.71 - lr: 0.000015 - momentum: 0.000000
2023-10-19 00:39:58,056 epoch 6 - iter 1805/3617 - loss 0.12103503 - time (sec): 28.20 - samples/sec: 6695.63 - lr: 0.000015 - momentum: 0.000000
2023-10-19 00:40:03,764 epoch 6 - iter 2166/3617 - loss 0.12382929 - time (sec): 33.90 - samples/sec: 6676.97 - lr: 0.000015 - momentum: 0.000000
2023-10-19 00:40:09,505 epoch 6 - iter 2527/3617 - loss 0.12571836 - time (sec): 39.65 - samples/sec: 6688.00 - lr: 0.000014 - momentum: 0.000000
2023-10-19 00:40:15,239 epoch 6 - iter 2888/3617 - loss 0.12660519 - time (sec): 45.38 - samples/sec: 6656.01 - lr: 0.000014 - momentum: 0.000000
2023-10-19 00:40:21,099 epoch 6 - iter 3249/3617 - loss 0.12660343 - time (sec): 51.24 - samples/sec: 6643.48 - lr: 0.000014 - momentum: 0.000000
2023-10-19 00:40:26,893 epoch 6 - iter 3610/3617 - loss 0.12570085 - time (sec): 57.03 - samples/sec: 6643.19 - lr: 0.000013 - momentum: 0.000000
2023-10-19 00:40:27,007 ----------------------------------------------------------------------------------------------------
2023-10-19 00:40:27,008 EPOCH 6 done: loss 0.1257 - lr: 0.000013
2023-10-19 00:40:30,241 DEV : loss 0.18021412193775177 - f1-score (micro avg)  0.4823
2023-10-19 00:40:30,269 saving best model
2023-10-19 00:40:30,301 ----------------------------------------------------------------------------------------------------
2023-10-19 00:40:36,131 epoch 7 - iter 361/3617 - loss 0.12149979 - time (sec): 5.83 - samples/sec: 6622.41 - lr: 0.000013 - momentum: 0.000000
2023-10-19 00:40:41,749 epoch 7 - iter 722/3617 - loss 0.11531578 - time (sec): 11.45 - samples/sec: 6667.10 - lr: 0.000013 - momentum: 0.000000
2023-10-19 00:40:47,463 epoch 7 - iter 1083/3617 - loss 0.11569164 - time (sec): 17.16 - samples/sec: 6666.03 - lr: 0.000012 - momentum: 0.000000
2023-10-19 00:40:52,785 epoch 7 - iter 1444/3617 - loss 0.11579758 - time (sec): 22.48 - samples/sec: 6774.95 - lr: 0.000012 - momentum: 0.000000
2023-10-19 00:40:58,121 epoch 7 - iter 1805/3617 - loss 0.11667595 - time (sec): 27.82 - samples/sec: 6819.17 - lr: 0.000012 - momentum: 0.000000
2023-10-19 00:41:03,894 epoch 7 - iter 2166/3617 - loss 0.12102352 - time (sec): 33.59 - samples/sec: 6764.86 - lr: 0.000011 - momentum: 0.000000
2023-10-19 00:41:09,701 epoch 7 - iter 2527/3617 - loss 0.12119387 - time (sec): 39.40 - samples/sec: 6738.92 - lr: 0.000011 - momentum: 0.000000
2023-10-19 00:41:15,459 epoch 7 - iter 2888/3617 - loss 0.12026071 - time (sec): 45.16 - samples/sec: 6692.51 - lr: 0.000011 - momentum: 0.000000
2023-10-19 00:41:21,204 epoch 7 - iter 3249/3617 - loss 0.11944450 - time (sec): 50.90 - samples/sec: 6680.71 - lr: 0.000010 - momentum: 0.000000
2023-10-19 00:41:27,019 epoch 7 - iter 3610/3617 - loss 0.11841485 - time (sec): 56.72 - samples/sec: 6679.74 - lr: 0.000010 - momentum: 0.000000
2023-10-19 00:41:27,134 ----------------------------------------------------------------------------------------------------
2023-10-19 00:41:27,135 EPOCH 7 done: loss 0.1186 - lr: 0.000010
2023-10-19 00:41:31,039 DEV : loss 0.1851833611726761 - f1-score (micro avg)  0.4913
2023-10-19 00:41:31,067 saving best model
2023-10-19 00:41:31,106 ----------------------------------------------------------------------------------------------------
2023-10-19 00:41:36,914 epoch 8 - iter 361/3617 - loss 0.11416187 - time (sec): 5.81 - samples/sec: 6707.95 - lr: 0.000010 - momentum: 0.000000
2023-10-19 00:41:42,734 epoch 8 - iter 722/3617 - loss 0.10800957 - time (sec): 11.63 - samples/sec: 6724.86 - lr: 0.000009 - momentum: 0.000000
2023-10-19 00:41:48,476 epoch 8 - iter 1083/3617 - loss 0.11095705 - time (sec): 17.37 - samples/sec: 6711.31 - lr: 0.000009 - momentum: 0.000000
2023-10-19 00:41:54,248 epoch 8 - iter 1444/3617 - loss 0.10817209 - time (sec): 23.14 - samples/sec: 6666.37 - lr: 0.000009 - momentum: 0.000000
2023-10-19 00:42:00,062 epoch 8 - iter 1805/3617 - loss 0.11279132 - time (sec): 28.96 - samples/sec: 6669.85 - lr: 0.000008 - momentum: 0.000000
2023-10-19 00:42:05,767 epoch 8 - iter 2166/3617 - loss 0.11285781 - time (sec): 34.66 - samples/sec: 6670.09 - lr: 0.000008 - momentum: 0.000000
2023-10-19 00:42:11,517 epoch 8 - iter 2527/3617 - loss 0.11301960 - time (sec): 40.41 - samples/sec: 6634.57 - lr: 0.000008 - momentum: 0.000000
2023-10-19 00:42:17,117 epoch 8 - iter 2888/3617 - loss 0.11249704 - time (sec): 46.01 - samples/sec: 6632.10 - lr: 0.000007 - momentum: 0.000000
2023-10-19 00:42:22,519 epoch 8 - iter 3249/3617 - loss 0.11359347 - time (sec): 51.41 - samples/sec: 6671.77 - lr: 0.000007 - momentum: 0.000000
2023-10-19 00:42:28,116 epoch 8 - iter 3610/3617 - loss 0.11389018 - time (sec): 57.01 - samples/sec: 6649.67 - lr: 0.000007 - momentum: 0.000000
2023-10-19 00:42:28,224 ----------------------------------------------------------------------------------------------------
2023-10-19 00:42:28,224 EPOCH 8 done: loss 0.1137 - lr: 0.000007
2023-10-19 00:42:31,463 DEV : loss 0.19062528014183044 - f1-score (micro avg)  0.4952
2023-10-19 00:42:31,491 saving best model
2023-10-19 00:42:31,528 ----------------------------------------------------------------------------------------------------
2023-10-19 00:42:37,260 epoch 9 - iter 361/3617 - loss 0.10534011 - time (sec): 5.73 - samples/sec: 6685.34 - lr: 0.000006 - momentum: 0.000000
2023-10-19 00:42:42,983 epoch 9 - iter 722/3617 - loss 0.11077406 - time (sec): 11.45 - samples/sec: 6598.82 - lr: 0.000006 - momentum: 0.000000
2023-10-19 00:42:48,253 epoch 9 - iter 1083/3617 - loss 0.10676885 - time (sec): 16.72 - samples/sec: 6791.81 - lr: 0.000006 - momentum: 0.000000
2023-10-19 00:42:54,088 epoch 9 - iter 1444/3617 - loss 0.10622678 - time (sec): 22.56 - samples/sec: 6690.49 - lr: 0.000005 - momentum: 0.000000
2023-10-19 00:42:59,808 epoch 9 - iter 1805/3617 - loss 0.10776910 - time (sec): 28.28 - samples/sec: 6707.15 - lr: 0.000005 - momentum: 0.000000
2023-10-19 00:43:05,595 epoch 9 - iter 2166/3617 - loss 0.11026578 - time (sec): 34.07 - samples/sec: 6654.23 - lr: 0.000005 - momentum: 0.000000
2023-10-19 00:43:11,285 epoch 9 - iter 2527/3617 - loss 0.11034727 - time (sec): 39.76 - samples/sec: 6657.01 - lr: 0.000004 - momentum: 0.000000
2023-10-19 00:43:17,141 epoch 9 - iter 2888/3617 - loss 0.11070044 - time (sec): 45.61 - samples/sec: 6632.25 - lr: 0.000004 - momentum: 0.000000
2023-10-19 00:43:22,779 epoch 9 - iter 3249/3617 - loss 0.10960559 - time (sec): 51.25 - samples/sec: 6625.34 - lr: 0.000004 - momentum: 0.000000
2023-10-19 00:43:28,625 epoch 9 - iter 3610/3617 - loss 0.11083827 - time (sec): 57.10 - samples/sec: 6642.00 - lr: 0.000003 - momentum: 0.000000
2023-10-19 00:43:28,729 ----------------------------------------------------------------------------------------------------
2023-10-19 00:43:28,729 EPOCH 9 done: loss 0.1109 - lr: 0.000003
2023-10-19 00:43:31,979 DEV : loss 0.19269128143787384 - f1-score (micro avg)  0.5016
2023-10-19 00:43:32,008 saving best model
2023-10-19 00:43:32,041 ----------------------------------------------------------------------------------------------------
2023-10-19 00:43:38,495 epoch 10 - iter 361/3617 - loss 0.10547873 - time (sec): 6.45 - samples/sec: 5966.48 - lr: 0.000003 - momentum: 0.000000
2023-10-19 00:43:44,265 epoch 10 - iter 722/3617 - loss 0.10323462 - time (sec): 12.22 - samples/sec: 6292.25 - lr: 0.000003 - momentum: 0.000000
2023-10-19 00:43:49,972 epoch 10 - iter 1083/3617 - loss 0.11136052 - time (sec): 17.93 - samples/sec: 6278.56 - lr: 0.000002 - momentum: 0.000000
2023-10-19 00:43:55,711 epoch 10 - iter 1444/3617 - loss 0.10802696 - time (sec): 23.67 - samples/sec: 6385.53 - lr: 0.000002 - momentum: 0.000000
2023-10-19 00:44:01,531 epoch 10 - iter 1805/3617 - loss 0.10816177 - time (sec): 29.49 - samples/sec: 6442.13 - lr: 0.000002 - momentum: 0.000000
2023-10-19 00:44:07,303 epoch 10 - iter 2166/3617 - loss 0.10603407 - time (sec): 35.26 - samples/sec: 6485.02 - lr: 0.000001 - momentum: 0.000000
2023-10-19 00:44:13,016 epoch 10 - iter 2527/3617 - loss 0.10623744 - time (sec): 40.97 - samples/sec: 6481.74 - lr: 0.000001 - momentum: 0.000000
2023-10-19 00:44:18,445 epoch 10 - iter 2888/3617 - loss 0.10604407 - time (sec): 46.40 - samples/sec: 6574.08 - lr: 0.000001 - momentum: 0.000000
2023-10-19 00:44:24,218 epoch 10 - iter 3249/3617 - loss 0.10773950 - time (sec): 52.18 - samples/sec: 6581.15 - lr: 0.000000 - momentum: 0.000000
2023-10-19 00:44:29,920 epoch 10 - iter 3610/3617 - loss 0.10913884 - time (sec): 57.88 - samples/sec: 6551.05 - lr: 0.000000 - momentum: 0.000000
2023-10-19 00:44:30,022 ----------------------------------------------------------------------------------------------------
2023-10-19 00:44:30,023 EPOCH 10 done: loss 0.1090 - lr: 0.000000
2023-10-19 00:44:33,292 DEV : loss 0.1960730254650116 - f1-score (micro avg)  0.5019
2023-10-19 00:44:33,321 saving best model
2023-10-19 00:44:33,388 ----------------------------------------------------------------------------------------------------
2023-10-19 00:44:33,389 Loading model from best epoch ...
2023-10-19 00:44:33,469 SequenceTagger predicts: Dictionary with 13 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org
2023-10-19 00:44:37,684 
Results:
- F-score (micro) 0.5164
- F-score (macro) 0.3449
- Accuracy 0.36

By class:
              precision    recall  f1-score   support

         loc     0.5194    0.6785    0.5884       591
        pers     0.3952    0.5126    0.4463       357
         org     0.0000    0.0000    0.0000        79

   micro avg     0.4729    0.5686    0.5164      1027
   macro avg     0.3049    0.3970    0.3449      1027
weighted avg     0.4363    0.5686    0.4938      1027

2023-10-19 00:44:37,685 ----------------------------------------------------------------------------------------------------