File size: 23,941 Bytes
f7fd11a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
2023-10-18 22:55:56,510 ----------------------------------------------------------------------------------------------------
2023-10-18 22:55:56,510 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 128)
        (position_embeddings): Embedding(512, 128)
        (token_type_embeddings): Embedding(2, 128)
        (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-1): 2 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=128, out_features=128, bias=True)
                (key): Linear(in_features=128, out_features=128, bias=True)
                (value): Linear(in_features=128, out_features=128, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=128, out_features=128, bias=True)
                (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=128, out_features=512, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=512, out_features=128, bias=True)
              (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=128, out_features=128, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=128, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-18 22:55:56,510 ----------------------------------------------------------------------------------------------------
2023-10-18 22:55:56,510 MultiCorpus: 5777 train + 722 dev + 723 test sentences
 - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl
2023-10-18 22:55:56,510 ----------------------------------------------------------------------------------------------------
2023-10-18 22:55:56,510 Train:  5777 sentences
2023-10-18 22:55:56,510         (train_with_dev=False, train_with_test=False)
2023-10-18 22:55:56,510 ----------------------------------------------------------------------------------------------------
2023-10-18 22:55:56,510 Training Params:
2023-10-18 22:55:56,510  - learning_rate: "3e-05" 
2023-10-18 22:55:56,510  - mini_batch_size: "8"
2023-10-18 22:55:56,510  - max_epochs: "10"
2023-10-18 22:55:56,510  - shuffle: "True"
2023-10-18 22:55:56,510 ----------------------------------------------------------------------------------------------------
2023-10-18 22:55:56,510 Plugins:
2023-10-18 22:55:56,511  - TensorboardLogger
2023-10-18 22:55:56,511  - LinearScheduler | warmup_fraction: '0.1'
2023-10-18 22:55:56,511 ----------------------------------------------------------------------------------------------------
2023-10-18 22:55:56,511 Final evaluation on model from best epoch (best-model.pt)
2023-10-18 22:55:56,511  - metric: "('micro avg', 'f1-score')"
2023-10-18 22:55:56,511 ----------------------------------------------------------------------------------------------------
2023-10-18 22:55:56,511 Computation:
2023-10-18 22:55:56,511  - compute on device: cuda:0
2023-10-18 22:55:56,511  - embedding storage: none
2023-10-18 22:55:56,511 ----------------------------------------------------------------------------------------------------
2023-10-18 22:55:56,511 Model training base path: "hmbench-icdar/nl-dbmdz/bert-tiny-historic-multilingual-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-4"
2023-10-18 22:55:56,511 ----------------------------------------------------------------------------------------------------
2023-10-18 22:55:56,511 ----------------------------------------------------------------------------------------------------
2023-10-18 22:55:56,511 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-18 22:55:58,254 epoch 1 - iter 72/723 - loss 3.42262036 - time (sec): 1.74 - samples/sec: 9302.87 - lr: 0.000003 - momentum: 0.000000
2023-10-18 22:56:00,005 epoch 1 - iter 144/723 - loss 3.24227113 - time (sec): 3.49 - samples/sec: 9609.69 - lr: 0.000006 - momentum: 0.000000
2023-10-18 22:56:01,802 epoch 1 - iter 216/723 - loss 2.95510672 - time (sec): 5.29 - samples/sec: 9718.03 - lr: 0.000009 - momentum: 0.000000
2023-10-18 22:56:03,598 epoch 1 - iter 288/723 - loss 2.57569294 - time (sec): 7.09 - samples/sec: 9834.91 - lr: 0.000012 - momentum: 0.000000
2023-10-18 22:56:05,446 epoch 1 - iter 360/723 - loss 2.21089906 - time (sec): 8.93 - samples/sec: 9689.02 - lr: 0.000015 - momentum: 0.000000
2023-10-18 22:56:07,307 epoch 1 - iter 432/723 - loss 1.92171641 - time (sec): 10.80 - samples/sec: 9584.33 - lr: 0.000018 - momentum: 0.000000
2023-10-18 22:56:09,078 epoch 1 - iter 504/723 - loss 1.70254512 - time (sec): 12.57 - samples/sec: 9580.68 - lr: 0.000021 - momentum: 0.000000
2023-10-18 22:56:10,968 epoch 1 - iter 576/723 - loss 1.52679826 - time (sec): 14.46 - samples/sec: 9625.95 - lr: 0.000024 - momentum: 0.000000
2023-10-18 22:56:12,823 epoch 1 - iter 648/723 - loss 1.38048154 - time (sec): 16.31 - samples/sec: 9688.82 - lr: 0.000027 - momentum: 0.000000
2023-10-18 22:56:14,681 epoch 1 - iter 720/723 - loss 1.27452802 - time (sec): 18.17 - samples/sec: 9669.11 - lr: 0.000030 - momentum: 0.000000
2023-10-18 22:56:14,747 ----------------------------------------------------------------------------------------------------
2023-10-18 22:56:14,747 EPOCH 1 done: loss 1.2718 - lr: 0.000030
2023-10-18 22:56:16,024 DEV : loss 0.383542537689209 - f1-score (micro avg)  0.0
2023-10-18 22:56:16,039 ----------------------------------------------------------------------------------------------------
2023-10-18 22:56:17,801 epoch 2 - iter 72/723 - loss 0.26690832 - time (sec): 1.76 - samples/sec: 10084.86 - lr: 0.000030 - momentum: 0.000000
2023-10-18 22:56:19,552 epoch 2 - iter 144/723 - loss 0.27198191 - time (sec): 3.51 - samples/sec: 9845.28 - lr: 0.000029 - momentum: 0.000000
2023-10-18 22:56:21,325 epoch 2 - iter 216/723 - loss 0.26855355 - time (sec): 5.28 - samples/sec: 9696.90 - lr: 0.000029 - momentum: 0.000000
2023-10-18 22:56:23,331 epoch 2 - iter 288/723 - loss 0.25857843 - time (sec): 7.29 - samples/sec: 9665.14 - lr: 0.000029 - momentum: 0.000000
2023-10-18 22:56:25,245 epoch 2 - iter 360/723 - loss 0.25269809 - time (sec): 9.21 - samples/sec: 9480.00 - lr: 0.000028 - momentum: 0.000000
2023-10-18 22:56:27,132 epoch 2 - iter 432/723 - loss 0.24641116 - time (sec): 11.09 - samples/sec: 9490.38 - lr: 0.000028 - momentum: 0.000000
2023-10-18 22:56:28,913 epoch 2 - iter 504/723 - loss 0.24268049 - time (sec): 12.87 - samples/sec: 9567.70 - lr: 0.000028 - momentum: 0.000000
2023-10-18 22:56:30,665 epoch 2 - iter 576/723 - loss 0.24535974 - time (sec): 14.63 - samples/sec: 9572.90 - lr: 0.000027 - momentum: 0.000000
2023-10-18 22:56:32,433 epoch 2 - iter 648/723 - loss 0.24136581 - time (sec): 16.39 - samples/sec: 9638.28 - lr: 0.000027 - momentum: 0.000000
2023-10-18 22:56:34,226 epoch 2 - iter 720/723 - loss 0.24079891 - time (sec): 18.19 - samples/sec: 9647.42 - lr: 0.000027 - momentum: 0.000000
2023-10-18 22:56:34,291 ----------------------------------------------------------------------------------------------------
2023-10-18 22:56:34,291 EPOCH 2 done: loss 0.2406 - lr: 0.000027
2023-10-18 22:56:36,363 DEV : loss 0.2568000555038452 - f1-score (micro avg)  0.1136
2023-10-18 22:56:36,378 saving best model
2023-10-18 22:56:36,409 ----------------------------------------------------------------------------------------------------
2023-10-18 22:56:38,173 epoch 3 - iter 72/723 - loss 0.20902337 - time (sec): 1.76 - samples/sec: 10632.39 - lr: 0.000026 - momentum: 0.000000
2023-10-18 22:56:40,014 epoch 3 - iter 144/723 - loss 0.21487658 - time (sec): 3.60 - samples/sec: 10443.35 - lr: 0.000026 - momentum: 0.000000
2023-10-18 22:56:41,785 epoch 3 - iter 216/723 - loss 0.20843859 - time (sec): 5.38 - samples/sec: 10261.69 - lr: 0.000026 - momentum: 0.000000
2023-10-18 22:56:43,590 epoch 3 - iter 288/723 - loss 0.20311414 - time (sec): 7.18 - samples/sec: 10178.10 - lr: 0.000025 - momentum: 0.000000
2023-10-18 22:56:45,423 epoch 3 - iter 360/723 - loss 0.20319710 - time (sec): 9.01 - samples/sec: 10025.50 - lr: 0.000025 - momentum: 0.000000
2023-10-18 22:56:47,180 epoch 3 - iter 432/723 - loss 0.20226321 - time (sec): 10.77 - samples/sec: 9889.11 - lr: 0.000025 - momentum: 0.000000
2023-10-18 22:56:48,993 epoch 3 - iter 504/723 - loss 0.20421469 - time (sec): 12.58 - samples/sec: 9890.71 - lr: 0.000024 - momentum: 0.000000
2023-10-18 22:56:50,707 epoch 3 - iter 576/723 - loss 0.20341312 - time (sec): 14.30 - samples/sec: 9879.51 - lr: 0.000024 - momentum: 0.000000
2023-10-18 22:56:52,498 epoch 3 - iter 648/723 - loss 0.19940795 - time (sec): 16.09 - samples/sec: 9891.77 - lr: 0.000024 - momentum: 0.000000
2023-10-18 22:56:54,232 epoch 3 - iter 720/723 - loss 0.19888487 - time (sec): 17.82 - samples/sec: 9845.70 - lr: 0.000023 - momentum: 0.000000
2023-10-18 22:56:54,305 ----------------------------------------------------------------------------------------------------
2023-10-18 22:56:54,305 EPOCH 3 done: loss 0.1986 - lr: 0.000023
2023-10-18 22:56:56,067 DEV : loss 0.23752760887145996 - f1-score (micro avg)  0.2402
2023-10-18 22:56:56,081 saving best model
2023-10-18 22:56:56,117 ----------------------------------------------------------------------------------------------------
2023-10-18 22:56:57,864 epoch 4 - iter 72/723 - loss 0.18226854 - time (sec): 1.75 - samples/sec: 9748.99 - lr: 0.000023 - momentum: 0.000000
2023-10-18 22:56:59,625 epoch 4 - iter 144/723 - loss 0.17750412 - time (sec): 3.51 - samples/sec: 9415.68 - lr: 0.000023 - momentum: 0.000000
2023-10-18 22:57:01,393 epoch 4 - iter 216/723 - loss 0.18220143 - time (sec): 5.28 - samples/sec: 9528.19 - lr: 0.000022 - momentum: 0.000000
2023-10-18 22:57:03,194 epoch 4 - iter 288/723 - loss 0.18036256 - time (sec): 7.08 - samples/sec: 9602.73 - lr: 0.000022 - momentum: 0.000000
2023-10-18 22:57:04,967 epoch 4 - iter 360/723 - loss 0.18084117 - time (sec): 8.85 - samples/sec: 9642.48 - lr: 0.000022 - momentum: 0.000000
2023-10-18 22:57:06,777 epoch 4 - iter 432/723 - loss 0.18157715 - time (sec): 10.66 - samples/sec: 9629.56 - lr: 0.000021 - momentum: 0.000000
2023-10-18 22:57:08,656 epoch 4 - iter 504/723 - loss 0.17979702 - time (sec): 12.54 - samples/sec: 9608.03 - lr: 0.000021 - momentum: 0.000000
2023-10-18 22:57:10,668 epoch 4 - iter 576/723 - loss 0.18168708 - time (sec): 14.55 - samples/sec: 9652.05 - lr: 0.000021 - momentum: 0.000000
2023-10-18 22:57:12,484 epoch 4 - iter 648/723 - loss 0.18042569 - time (sec): 16.37 - samples/sec: 9619.66 - lr: 0.000020 - momentum: 0.000000
2023-10-18 22:57:14,305 epoch 4 - iter 720/723 - loss 0.18043801 - time (sec): 18.19 - samples/sec: 9660.63 - lr: 0.000020 - momentum: 0.000000
2023-10-18 22:57:14,373 ----------------------------------------------------------------------------------------------------
2023-10-18 22:57:14,373 EPOCH 4 done: loss 0.1804 - lr: 0.000020
2023-10-18 22:57:16,480 DEV : loss 0.21891361474990845 - f1-score (micro avg)  0.3434
2023-10-18 22:57:16,495 saving best model
2023-10-18 22:57:16,531 ----------------------------------------------------------------------------------------------------
2023-10-18 22:57:18,323 epoch 5 - iter 72/723 - loss 0.17790732 - time (sec): 1.79 - samples/sec: 9958.89 - lr: 0.000020 - momentum: 0.000000
2023-10-18 22:57:20,100 epoch 5 - iter 144/723 - loss 0.17861548 - time (sec): 3.57 - samples/sec: 10105.93 - lr: 0.000019 - momentum: 0.000000
2023-10-18 22:57:21,842 epoch 5 - iter 216/723 - loss 0.17749356 - time (sec): 5.31 - samples/sec: 10116.70 - lr: 0.000019 - momentum: 0.000000
2023-10-18 22:57:23,657 epoch 5 - iter 288/723 - loss 0.17769605 - time (sec): 7.13 - samples/sec: 9890.75 - lr: 0.000019 - momentum: 0.000000
2023-10-18 22:57:25,353 epoch 5 - iter 360/723 - loss 0.17341481 - time (sec): 8.82 - samples/sec: 9828.62 - lr: 0.000018 - momentum: 0.000000
2023-10-18 22:57:27,135 epoch 5 - iter 432/723 - loss 0.17512851 - time (sec): 10.60 - samples/sec: 9813.50 - lr: 0.000018 - momentum: 0.000000
2023-10-18 22:57:28,937 epoch 5 - iter 504/723 - loss 0.17373162 - time (sec): 12.40 - samples/sec: 9795.38 - lr: 0.000018 - momentum: 0.000000
2023-10-18 22:57:30,802 epoch 5 - iter 576/723 - loss 0.17323872 - time (sec): 14.27 - samples/sec: 9816.46 - lr: 0.000017 - momentum: 0.000000
2023-10-18 22:57:32,548 epoch 5 - iter 648/723 - loss 0.17259816 - time (sec): 16.02 - samples/sec: 9833.06 - lr: 0.000017 - momentum: 0.000000
2023-10-18 22:57:34,416 epoch 5 - iter 720/723 - loss 0.17201555 - time (sec): 17.88 - samples/sec: 9836.04 - lr: 0.000017 - momentum: 0.000000
2023-10-18 22:57:34,471 ----------------------------------------------------------------------------------------------------
2023-10-18 22:57:34,471 EPOCH 5 done: loss 0.1721 - lr: 0.000017
2023-10-18 22:57:36,235 DEV : loss 0.21598245203495026 - f1-score (micro avg)  0.3603
2023-10-18 22:57:36,250 saving best model
2023-10-18 22:57:36,284 ----------------------------------------------------------------------------------------------------
2023-10-18 22:57:38,054 epoch 6 - iter 72/723 - loss 0.16484753 - time (sec): 1.77 - samples/sec: 9347.04 - lr: 0.000016 - momentum: 0.000000
2023-10-18 22:57:39,819 epoch 6 - iter 144/723 - loss 0.15606069 - time (sec): 3.53 - samples/sec: 9582.22 - lr: 0.000016 - momentum: 0.000000
2023-10-18 22:57:41,637 epoch 6 - iter 216/723 - loss 0.15864873 - time (sec): 5.35 - samples/sec: 9773.71 - lr: 0.000016 - momentum: 0.000000
2023-10-18 22:57:43,473 epoch 6 - iter 288/723 - loss 0.15870321 - time (sec): 7.19 - samples/sec: 9828.73 - lr: 0.000015 - momentum: 0.000000
2023-10-18 22:57:45,196 epoch 6 - iter 360/723 - loss 0.16234323 - time (sec): 8.91 - samples/sec: 9815.47 - lr: 0.000015 - momentum: 0.000000
2023-10-18 22:57:46,945 epoch 6 - iter 432/723 - loss 0.15856690 - time (sec): 10.66 - samples/sec: 9844.04 - lr: 0.000015 - momentum: 0.000000
2023-10-18 22:57:48,750 epoch 6 - iter 504/723 - loss 0.16153975 - time (sec): 12.47 - samples/sec: 9746.37 - lr: 0.000014 - momentum: 0.000000
2023-10-18 22:57:50,516 epoch 6 - iter 576/723 - loss 0.15864405 - time (sec): 14.23 - samples/sec: 9779.22 - lr: 0.000014 - momentum: 0.000000
2023-10-18 22:57:52,302 epoch 6 - iter 648/723 - loss 0.16169816 - time (sec): 16.02 - samples/sec: 9789.70 - lr: 0.000014 - momentum: 0.000000
2023-10-18 22:57:54,492 epoch 6 - iter 720/723 - loss 0.16273531 - time (sec): 18.21 - samples/sec: 9651.21 - lr: 0.000013 - momentum: 0.000000
2023-10-18 22:57:54,553 ----------------------------------------------------------------------------------------------------
2023-10-18 22:57:54,553 EPOCH 6 done: loss 0.1627 - lr: 0.000013
2023-10-18 22:57:56,329 DEV : loss 0.20462197065353394 - f1-score (micro avg)  0.4453
2023-10-18 22:57:56,343 saving best model
2023-10-18 22:57:56,380 ----------------------------------------------------------------------------------------------------
2023-10-18 22:57:58,120 epoch 7 - iter 72/723 - loss 0.15662991 - time (sec): 1.74 - samples/sec: 9472.72 - lr: 0.000013 - momentum: 0.000000
2023-10-18 22:57:59,877 epoch 7 - iter 144/723 - loss 0.15924704 - time (sec): 3.50 - samples/sec: 9819.69 - lr: 0.000013 - momentum: 0.000000
2023-10-18 22:58:01,664 epoch 7 - iter 216/723 - loss 0.16189171 - time (sec): 5.28 - samples/sec: 9682.85 - lr: 0.000012 - momentum: 0.000000
2023-10-18 22:58:03,519 epoch 7 - iter 288/723 - loss 0.16308365 - time (sec): 7.14 - samples/sec: 9742.21 - lr: 0.000012 - momentum: 0.000000
2023-10-18 22:58:05,279 epoch 7 - iter 360/723 - loss 0.15997371 - time (sec): 8.90 - samples/sec: 9728.59 - lr: 0.000012 - momentum: 0.000000
2023-10-18 22:58:07,107 epoch 7 - iter 432/723 - loss 0.15929076 - time (sec): 10.73 - samples/sec: 9703.79 - lr: 0.000011 - momentum: 0.000000
2023-10-18 22:58:08,914 epoch 7 - iter 504/723 - loss 0.15943523 - time (sec): 12.53 - samples/sec: 9763.34 - lr: 0.000011 - momentum: 0.000000
2023-10-18 22:58:10,735 epoch 7 - iter 576/723 - loss 0.15929299 - time (sec): 14.35 - samples/sec: 9753.67 - lr: 0.000011 - momentum: 0.000000
2023-10-18 22:58:12,561 epoch 7 - iter 648/723 - loss 0.16055045 - time (sec): 16.18 - samples/sec: 9764.87 - lr: 0.000010 - momentum: 0.000000
2023-10-18 22:58:14,285 epoch 7 - iter 720/723 - loss 0.15893188 - time (sec): 17.90 - samples/sec: 9809.31 - lr: 0.000010 - momentum: 0.000000
2023-10-18 22:58:14,357 ----------------------------------------------------------------------------------------------------
2023-10-18 22:58:14,357 EPOCH 7 done: loss 0.1588 - lr: 0.000010
2023-10-18 22:58:16,151 DEV : loss 0.20282398164272308 - f1-score (micro avg)  0.4363
2023-10-18 22:58:16,166 ----------------------------------------------------------------------------------------------------
2023-10-18 22:58:17,915 epoch 8 - iter 72/723 - loss 0.17813877 - time (sec): 1.75 - samples/sec: 9433.30 - lr: 0.000010 - momentum: 0.000000
2023-10-18 22:58:19,745 epoch 8 - iter 144/723 - loss 0.16727864 - time (sec): 3.58 - samples/sec: 9705.54 - lr: 0.000009 - momentum: 0.000000
2023-10-18 22:58:21,556 epoch 8 - iter 216/723 - loss 0.16087750 - time (sec): 5.39 - samples/sec: 9840.08 - lr: 0.000009 - momentum: 0.000000
2023-10-18 22:58:23,697 epoch 8 - iter 288/723 - loss 0.16062473 - time (sec): 7.53 - samples/sec: 9269.48 - lr: 0.000009 - momentum: 0.000000
2023-10-18 22:58:25,503 epoch 8 - iter 360/723 - loss 0.15637198 - time (sec): 9.34 - samples/sec: 9412.21 - lr: 0.000008 - momentum: 0.000000
2023-10-18 22:58:27,293 epoch 8 - iter 432/723 - loss 0.15307484 - time (sec): 11.13 - samples/sec: 9499.23 - lr: 0.000008 - momentum: 0.000000
2023-10-18 22:58:29,140 epoch 8 - iter 504/723 - loss 0.15285181 - time (sec): 12.97 - samples/sec: 9527.96 - lr: 0.000008 - momentum: 0.000000
2023-10-18 22:58:30,949 epoch 8 - iter 576/723 - loss 0.15081701 - time (sec): 14.78 - samples/sec: 9483.93 - lr: 0.000007 - momentum: 0.000000
2023-10-18 22:58:32,695 epoch 8 - iter 648/723 - loss 0.15212614 - time (sec): 16.53 - samples/sec: 9541.68 - lr: 0.000007 - momentum: 0.000000
2023-10-18 22:58:34,593 epoch 8 - iter 720/723 - loss 0.15293930 - time (sec): 18.43 - samples/sec: 9539.14 - lr: 0.000007 - momentum: 0.000000
2023-10-18 22:58:34,658 ----------------------------------------------------------------------------------------------------
2023-10-18 22:58:34,658 EPOCH 8 done: loss 0.1528 - lr: 0.000007
2023-10-18 22:58:36,427 DEV : loss 0.20108488202095032 - f1-score (micro avg)  0.4437
2023-10-18 22:58:36,441 ----------------------------------------------------------------------------------------------------
2023-10-18 22:58:38,298 epoch 9 - iter 72/723 - loss 0.15671408 - time (sec): 1.86 - samples/sec: 9684.00 - lr: 0.000006 - momentum: 0.000000
2023-10-18 22:58:40,047 epoch 9 - iter 144/723 - loss 0.15606990 - time (sec): 3.61 - samples/sec: 9977.58 - lr: 0.000006 - momentum: 0.000000
2023-10-18 22:58:41,672 epoch 9 - iter 216/723 - loss 0.14775714 - time (sec): 5.23 - samples/sec: 10078.84 - lr: 0.000006 - momentum: 0.000000
2023-10-18 22:58:43,185 epoch 9 - iter 288/723 - loss 0.14736930 - time (sec): 6.74 - samples/sec: 10426.94 - lr: 0.000005 - momentum: 0.000000
2023-10-18 22:58:44,941 epoch 9 - iter 360/723 - loss 0.14937082 - time (sec): 8.50 - samples/sec: 10387.93 - lr: 0.000005 - momentum: 0.000000
2023-10-18 22:58:46,773 epoch 9 - iter 432/723 - loss 0.14931943 - time (sec): 10.33 - samples/sec: 10324.00 - lr: 0.000005 - momentum: 0.000000
2023-10-18 22:58:48,632 epoch 9 - iter 504/723 - loss 0.15130844 - time (sec): 12.19 - samples/sec: 10210.11 - lr: 0.000004 - momentum: 0.000000
2023-10-18 22:58:50,375 epoch 9 - iter 576/723 - loss 0.15312491 - time (sec): 13.93 - samples/sec: 10138.98 - lr: 0.000004 - momentum: 0.000000
2023-10-18 22:58:52,141 epoch 9 - iter 648/723 - loss 0.15285247 - time (sec): 15.70 - samples/sec: 10116.75 - lr: 0.000004 - momentum: 0.000000
2023-10-18 22:58:53,970 epoch 9 - iter 720/723 - loss 0.15229974 - time (sec): 17.53 - samples/sec: 10023.57 - lr: 0.000003 - momentum: 0.000000
2023-10-18 22:58:54,038 ----------------------------------------------------------------------------------------------------
2023-10-18 22:58:54,038 EPOCH 9 done: loss 0.1524 - lr: 0.000003
2023-10-18 22:58:55,801 DEV : loss 0.19397485256195068 - f1-score (micro avg)  0.4627
2023-10-18 22:58:55,816 saving best model
2023-10-18 22:58:55,853 ----------------------------------------------------------------------------------------------------
2023-10-18 22:58:57,695 epoch 10 - iter 72/723 - loss 0.15773780 - time (sec): 1.84 - samples/sec: 9363.32 - lr: 0.000003 - momentum: 0.000000
2023-10-18 22:58:59,247 epoch 10 - iter 144/723 - loss 0.16358437 - time (sec): 3.39 - samples/sec: 10063.93 - lr: 0.000003 - momentum: 0.000000
2023-10-18 22:59:00,997 epoch 10 - iter 216/723 - loss 0.16315365 - time (sec): 5.14 - samples/sec: 10100.77 - lr: 0.000002 - momentum: 0.000000
2023-10-18 22:59:02,867 epoch 10 - iter 288/723 - loss 0.15797071 - time (sec): 7.01 - samples/sec: 10138.78 - lr: 0.000002 - momentum: 0.000000
2023-10-18 22:59:04,640 epoch 10 - iter 360/723 - loss 0.15242034 - time (sec): 8.79 - samples/sec: 10132.19 - lr: 0.000002 - momentum: 0.000000
2023-10-18 22:59:06,374 epoch 10 - iter 432/723 - loss 0.15330741 - time (sec): 10.52 - samples/sec: 10040.08 - lr: 0.000001 - momentum: 0.000000
2023-10-18 22:59:08,160 epoch 10 - iter 504/723 - loss 0.14940519 - time (sec): 12.31 - samples/sec: 10018.59 - lr: 0.000001 - momentum: 0.000000
2023-10-18 22:59:10,041 epoch 10 - iter 576/723 - loss 0.14798129 - time (sec): 14.19 - samples/sec: 9994.37 - lr: 0.000001 - momentum: 0.000000
2023-10-18 22:59:11,814 epoch 10 - iter 648/723 - loss 0.14933500 - time (sec): 15.96 - samples/sec: 9919.85 - lr: 0.000000 - momentum: 0.000000
2023-10-18 22:59:13,565 epoch 10 - iter 720/723 - loss 0.15018807 - time (sec): 17.71 - samples/sec: 9915.89 - lr: 0.000000 - momentum: 0.000000
2023-10-18 22:59:13,636 ----------------------------------------------------------------------------------------------------
2023-10-18 22:59:13,636 EPOCH 10 done: loss 0.1503 - lr: 0.000000
2023-10-18 22:59:15,409 DEV : loss 0.19576019048690796 - f1-score (micro avg)  0.4566
2023-10-18 22:59:15,453 ----------------------------------------------------------------------------------------------------
2023-10-18 22:59:15,453 Loading model from best epoch ...
2023-10-18 22:59:15,532 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-18 22:59:16,857 
Results:
- F-score (micro) 0.4758
- F-score (macro) 0.3265
- Accuracy 0.322

By class:
              precision    recall  f1-score   support

         LOC     0.5332    0.5437    0.5384       458
         PER     0.6653    0.3299    0.4411       482
         ORG     0.0000    0.0000    0.0000        69

   micro avg     0.5779    0.4044    0.4758      1009
   macro avg     0.3995    0.2912    0.3265      1009
weighted avg     0.5598    0.4044    0.4551      1009

2023-10-18 22:59:16,857 ----------------------------------------------------------------------------------------------------