2023-10-12 20:09:28,155 ----------------------------------------------------------------------------------------------------
2023-10-12 20:09:28,157 Model: "SequenceTagger(
  (embeddings): ByT5Embeddings(
    (model): T5EncoderModel(
      (shared): Embedding(384, 1472)
      (encoder): T5Stack(
        (embed_tokens): Embedding(384, 1472)
        (block): ModuleList(
          (0): T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=1472, out_features=384, bias=False)
                  (k): Linear(in_features=1472, out_features=384, bias=False)
                  (v): Linear(in_features=1472, out_features=384, bias=False)
                  (o): Linear(in_features=384, out_features=1472, bias=False)
                  (relative_attention_bias): Embedding(32, 6)
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                  (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                  (wo): Linear(in_features=3584, out_features=1472, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
          (1-11): 11 x T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=1472, out_features=384, bias=False)
                  (k): Linear(in_features=1472, out_features=384, bias=False)
                  (v): Linear(in_features=1472, out_features=384, bias=False)
                  (o): Linear(in_features=384, out_features=1472, bias=False)
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                  (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                  (wo): Linear(in_features=3584, out_features=1472, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
        )
        (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=1472, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-12 20:09:28,157 ----------------------------------------------------------------------------------------------------
2023-10-12 20:09:28,157 MultiCorpus: 5777 train + 722 dev + 723 test sentences
 - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl
2023-10-12 20:09:28,157 ----------------------------------------------------------------------------------------------------
2023-10-12 20:09:28,158 Train:  5777 sentences
2023-10-12 20:09:28,158         (train_with_dev=False, train_with_test=False)
2023-10-12 20:09:28,158 ----------------------------------------------------------------------------------------------------
2023-10-12 20:09:28,158 Training Params:
2023-10-12 20:09:28,158  - learning_rate: "0.00015" 
2023-10-12 20:09:28,158  - mini_batch_size: "4"
2023-10-12 20:09:28,158  - max_epochs: "10"
2023-10-12 20:09:28,158  - shuffle: "True"
2023-10-12 20:09:28,158 ----------------------------------------------------------------------------------------------------
2023-10-12 20:09:28,158 Plugins:
2023-10-12 20:09:28,158  - TensorboardLogger
2023-10-12 20:09:28,158  - LinearScheduler | warmup_fraction: '0.1'
2023-10-12 20:09:28,158 ----------------------------------------------------------------------------------------------------
2023-10-12 20:09:28,158 Final evaluation on model from best epoch (best-model.pt)
2023-10-12 20:09:28,158  - metric: "('micro avg', 'f1-score')"
2023-10-12 20:09:28,159 ----------------------------------------------------------------------------------------------------
2023-10-12 20:09:28,159 Computation:
2023-10-12 20:09:28,159  - compute on device: cuda:0
2023-10-12 20:09:28,159  - embedding storage: none
2023-10-12 20:09:28,159 ----------------------------------------------------------------------------------------------------
2023-10-12 20:09:28,159 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-5"
2023-10-12 20:09:28,159 ----------------------------------------------------------------------------------------------------
2023-10-12 20:09:28,159 ----------------------------------------------------------------------------------------------------
2023-10-12 20:09:28,159 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-12 20:10:10,201 epoch 1 - iter 144/1445 - loss 2.53513164 - time (sec): 42.04 - samples/sec: 429.57 - lr: 0.000015 - momentum: 0.000000
2023-10-12 20:10:51,555 epoch 1 - iter 288/1445 - loss 2.38125034 - time (sec): 83.39 - samples/sec: 432.87 - lr: 0.000030 - momentum: 0.000000
2023-10-12 20:11:32,357 epoch 1 - iter 432/1445 - loss 2.14339240 - time (sec): 124.20 - samples/sec: 421.72 - lr: 0.000045 - momentum: 0.000000
2023-10-12 20:12:14,329 epoch 1 - iter 576/1445 - loss 1.85422411 - time (sec): 166.17 - samples/sec: 421.64 - lr: 0.000060 - momentum: 0.000000
2023-10-12 20:12:56,106 epoch 1 - iter 720/1445 - loss 1.58043777 - time (sec): 207.95 - samples/sec: 422.36 - lr: 0.000075 - momentum: 0.000000
2023-10-12 20:13:37,386 epoch 1 - iter 864/1445 - loss 1.36953584 - time (sec): 249.22 - samples/sec: 420.10 - lr: 0.000090 - momentum: 0.000000
2023-10-12 20:14:19,750 epoch 1 - iter 1008/1445 - loss 1.20108235 - time (sec): 291.59 - samples/sec: 419.86 - lr: 0.000105 - momentum: 0.000000
2023-10-12 20:14:59,913 epoch 1 - iter 1152/1445 - loss 1.07934407 - time (sec): 331.75 - samples/sec: 419.62 - lr: 0.000119 - momentum: 0.000000
2023-10-12 20:15:40,857 epoch 1 - iter 1296/1445 - loss 0.97634839 - time (sec): 372.70 - samples/sec: 421.06 - lr: 0.000134 - momentum: 0.000000
2023-10-12 20:16:23,000 epoch 1 - iter 1440/1445 - loss 0.88821258 - time (sec): 414.84 - samples/sec: 422.96 - lr: 0.000149 - momentum: 0.000000
2023-10-12 20:16:24,456 ----------------------------------------------------------------------------------------------------
2023-10-12 20:16:24,456 EPOCH 1 done: loss 0.8849 - lr: 0.000149
2023-10-12 20:16:45,374 DEV : loss 0.18604247272014618 - f1-score (micro avg)  0.3195
2023-10-12 20:16:45,410 saving best model
2023-10-12 20:16:46,297 ----------------------------------------------------------------------------------------------------
2023-10-12 20:17:28,351 epoch 2 - iter 144/1445 - loss 0.13676022 - time (sec): 42.05 - samples/sec: 414.84 - lr: 0.000148 - momentum: 0.000000
2023-10-12 20:18:10,440 epoch 2 - iter 288/1445 - loss 0.12929980 - time (sec): 84.14 - samples/sec: 421.50 - lr: 0.000147 - momentum: 0.000000
2023-10-12 20:18:51,718 epoch 2 - iter 432/1445 - loss 0.12842442 - time (sec): 125.42 - samples/sec: 414.49 - lr: 0.000145 - momentum: 0.000000
2023-10-12 20:19:33,782 epoch 2 - iter 576/1445 - loss 0.12658184 - time (sec): 167.48 - samples/sec: 416.65 - lr: 0.000143 - momentum: 0.000000
2023-10-12 20:20:17,490 epoch 2 - iter 720/1445 - loss 0.12315068 - time (sec): 211.19 - samples/sec: 413.09 - lr: 0.000142 - momentum: 0.000000
2023-10-12 20:20:59,183 epoch 2 - iter 864/1445 - loss 0.12053054 - time (sec): 252.88 - samples/sec: 414.43 - lr: 0.000140 - momentum: 0.000000
2023-10-12 20:21:41,327 epoch 2 - iter 1008/1445 - loss 0.12141328 - time (sec): 295.03 - samples/sec: 413.84 - lr: 0.000138 - momentum: 0.000000
2023-10-12 20:22:23,104 epoch 2 - iter 1152/1445 - loss 0.11874436 - time (sec): 336.80 - samples/sec: 415.42 - lr: 0.000137 - momentum: 0.000000
2023-10-12 20:23:06,226 epoch 2 - iter 1296/1445 - loss 0.11649198 - time (sec): 379.93 - samples/sec: 416.37 - lr: 0.000135 - momentum: 0.000000
2023-10-12 20:23:47,727 epoch 2 - iter 1440/1445 - loss 0.11308681 - time (sec): 421.43 - samples/sec: 416.69 - lr: 0.000133 - momentum: 0.000000
2023-10-12 20:23:49,078 ----------------------------------------------------------------------------------------------------
2023-10-12 20:23:49,079 EPOCH 2 done: loss 0.1129 - lr: 0.000133
2023-10-12 20:24:10,065 DEV : loss 0.09517565369606018 - f1-score (micro avg)  0.8125
2023-10-12 20:24:10,102 saving best model
2023-10-12 20:24:12,992 ----------------------------------------------------------------------------------------------------
2023-10-12 20:24:56,006 epoch 3 - iter 144/1445 - loss 0.07327007 - time (sec): 43.01 - samples/sec: 416.27 - lr: 0.000132 - momentum: 0.000000
2023-10-12 20:25:38,617 epoch 3 - iter 288/1445 - loss 0.07041840 - time (sec): 85.62 - samples/sec: 420.04 - lr: 0.000130 - momentum: 0.000000
2023-10-12 20:26:20,786 epoch 3 - iter 432/1445 - loss 0.06999514 - time (sec): 127.79 - samples/sec: 418.84 - lr: 0.000128 - momentum: 0.000000
2023-10-12 20:27:03,456 epoch 3 - iter 576/1445 - loss 0.07112475 - time (sec): 170.46 - samples/sec: 417.00 - lr: 0.000127 - momentum: 0.000000
2023-10-12 20:27:45,281 epoch 3 - iter 720/1445 - loss 0.07021459 - time (sec): 212.29 - samples/sec: 419.44 - lr: 0.000125 - momentum: 0.000000
2023-10-12 20:28:27,466 epoch 3 - iter 864/1445 - loss 0.07000170 - time (sec): 254.47 - samples/sec: 423.51 - lr: 0.000123 - momentum: 0.000000
2023-10-12 20:29:09,522 epoch 3 - iter 1008/1445 - loss 0.06982317 - time (sec): 296.53 - samples/sec: 422.11 - lr: 0.000122 - momentum: 0.000000
2023-10-12 20:29:50,667 epoch 3 - iter 1152/1445 - loss 0.06964722 - time (sec): 337.67 - samples/sec: 421.05 - lr: 0.000120 - momentum: 0.000000
2023-10-12 20:30:31,746 epoch 3 - iter 1296/1445 - loss 0.06839580 - time (sec): 378.75 - samples/sec: 419.07 - lr: 0.000118 - momentum: 0.000000
2023-10-12 20:31:14,591 epoch 3 - iter 1440/1445 - loss 0.06745276 - time (sec): 421.60 - samples/sec: 416.31 - lr: 0.000117 - momentum: 0.000000
2023-10-12 20:31:16,021 ----------------------------------------------------------------------------------------------------
2023-10-12 20:31:16,022 EPOCH 3 done: loss 0.0675 - lr: 0.000117
2023-10-12 20:31:37,989 DEV : loss 0.08340664207935333 - f1-score (micro avg)  0.8439
2023-10-12 20:31:38,020 saving best model
2023-10-12 20:31:40,553 ----------------------------------------------------------------------------------------------------
2023-10-12 20:32:23,274 epoch 4 - iter 144/1445 - loss 0.05297012 - time (sec): 42.72 - samples/sec: 419.86 - lr: 0.000115 - momentum: 0.000000
2023-10-12 20:33:04,940 epoch 4 - iter 288/1445 - loss 0.05061011 - time (sec): 84.38 - samples/sec: 411.81 - lr: 0.000113 - momentum: 0.000000
2023-10-12 20:33:47,048 epoch 4 - iter 432/1445 - loss 0.04713746 - time (sec): 126.49 - samples/sec: 414.49 - lr: 0.000112 - momentum: 0.000000
2023-10-12 20:34:30,755 epoch 4 - iter 576/1445 - loss 0.04639052 - time (sec): 170.20 - samples/sec: 419.88 - lr: 0.000110 - momentum: 0.000000
2023-10-12 20:35:13,049 epoch 4 - iter 720/1445 - loss 0.04446589 - time (sec): 212.49 - samples/sec: 419.88 - lr: 0.000108 - momentum: 0.000000
2023-10-12 20:35:54,366 epoch 4 - iter 864/1445 - loss 0.04321469 - time (sec): 253.81 - samples/sec: 418.40 - lr: 0.000107 - momentum: 0.000000
2023-10-12 20:36:39,400 epoch 4 - iter 1008/1445 - loss 0.04351905 - time (sec): 298.84 - samples/sec: 412.20 - lr: 0.000105 - momentum: 0.000000
2023-10-12 20:37:22,743 epoch 4 - iter 1152/1445 - loss 0.04392393 - time (sec): 342.18 - samples/sec: 413.28 - lr: 0.000103 - momentum: 0.000000
2023-10-12 20:38:05,982 epoch 4 - iter 1296/1445 - loss 0.04654178 - time (sec): 385.42 - samples/sec: 412.16 - lr: 0.000102 - momentum: 0.000000
2023-10-12 20:38:47,608 epoch 4 - iter 1440/1445 - loss 0.04601732 - time (sec): 427.05 - samples/sec: 411.73 - lr: 0.000100 - momentum: 0.000000
2023-10-12 20:38:48,759 ----------------------------------------------------------------------------------------------------
2023-10-12 20:38:48,759 EPOCH 4 done: loss 0.0460 - lr: 0.000100
2023-10-12 20:39:09,821 DEV : loss 0.09578309208154678 - f1-score (micro avg)  0.854
2023-10-12 20:39:09,854 saving best model
2023-10-12 20:39:12,430 ----------------------------------------------------------------------------------------------------
2023-10-12 20:39:55,317 epoch 5 - iter 144/1445 - loss 0.03906087 - time (sec): 42.88 - samples/sec: 440.19 - lr: 0.000098 - momentum: 0.000000
2023-10-12 20:40:37,355 epoch 5 - iter 288/1445 - loss 0.03326712 - time (sec): 84.92 - samples/sec: 427.34 - lr: 0.000097 - momentum: 0.000000
2023-10-12 20:41:17,016 epoch 5 - iter 432/1445 - loss 0.03091069 - time (sec): 124.58 - samples/sec: 416.23 - lr: 0.000095 - momentum: 0.000000
2023-10-12 20:41:56,651 epoch 5 - iter 576/1445 - loss 0.03004124 - time (sec): 164.22 - samples/sec: 414.41 - lr: 0.000093 - momentum: 0.000000
2023-10-12 20:42:38,714 epoch 5 - iter 720/1445 - loss 0.03116302 - time (sec): 206.28 - samples/sec: 419.94 - lr: 0.000092 - momentum: 0.000000
2023-10-12 20:43:20,128 epoch 5 - iter 864/1445 - loss 0.03024606 - time (sec): 247.69 - samples/sec: 419.91 - lr: 0.000090 - momentum: 0.000000
2023-10-12 20:44:02,818 epoch 5 - iter 1008/1445 - loss 0.03089293 - time (sec): 290.38 - samples/sec: 421.95 - lr: 0.000088 - momentum: 0.000000
2023-10-12 20:44:44,492 epoch 5 - iter 1152/1445 - loss 0.03113353 - time (sec): 332.06 - samples/sec: 422.52 - lr: 0.000087 - momentum: 0.000000
2023-10-12 20:45:26,335 epoch 5 - iter 1296/1445 - loss 0.03083010 - time (sec): 373.90 - samples/sec: 422.31 - lr: 0.000085 - momentum: 0.000000
2023-10-12 20:46:07,798 epoch 5 - iter 1440/1445 - loss 0.03275021 - time (sec): 415.36 - samples/sec: 422.21 - lr: 0.000083 - momentum: 0.000000
2023-10-12 20:46:09,256 ----------------------------------------------------------------------------------------------------
2023-10-12 20:46:09,256 EPOCH 5 done: loss 0.0332 - lr: 0.000083
2023-10-12 20:46:30,762 DEV : loss 0.10168028622865677 - f1-score (micro avg)  0.851
2023-10-12 20:46:30,793 ----------------------------------------------------------------------------------------------------
2023-10-12 20:47:12,084 epoch 6 - iter 144/1445 - loss 0.02321712 - time (sec): 41.29 - samples/sec: 416.75 - lr: 0.000082 - momentum: 0.000000
2023-10-12 20:47:52,553 epoch 6 - iter 288/1445 - loss 0.02481388 - time (sec): 81.76 - samples/sec: 422.31 - lr: 0.000080 - momentum: 0.000000
2023-10-12 20:48:33,748 epoch 6 - iter 432/1445 - loss 0.02757483 - time (sec): 122.95 - samples/sec: 426.21 - lr: 0.000078 - momentum: 0.000000
2023-10-12 20:49:15,702 epoch 6 - iter 576/1445 - loss 0.02650027 - time (sec): 164.91 - samples/sec: 426.98 - lr: 0.000077 - momentum: 0.000000
2023-10-12 20:49:57,116 epoch 6 - iter 720/1445 - loss 0.02588266 - time (sec): 206.32 - samples/sec: 427.16 - lr: 0.000075 - momentum: 0.000000
2023-10-12 20:50:40,206 epoch 6 - iter 864/1445 - loss 0.02373812 - time (sec): 249.41 - samples/sec: 427.26 - lr: 0.000073 - momentum: 0.000000
2023-10-12 20:51:22,906 epoch 6 - iter 1008/1445 - loss 0.02602288 - time (sec): 292.11 - samples/sec: 425.85 - lr: 0.000072 - momentum: 0.000000
2023-10-12 20:52:03,431 epoch 6 - iter 1152/1445 - loss 0.02504745 - time (sec): 332.64 - samples/sec: 423.75 - lr: 0.000070 - momentum: 0.000000
2023-10-12 20:52:43,013 epoch 6 - iter 1296/1445 - loss 0.02425352 - time (sec): 372.22 - samples/sec: 423.56 - lr: 0.000068 - momentum: 0.000000
2023-10-12 20:53:24,660 epoch 6 - iter 1440/1445 - loss 0.02500831 - time (sec): 413.86 - samples/sec: 424.47 - lr: 0.000067 - momentum: 0.000000
2023-10-12 20:53:25,858 ----------------------------------------------------------------------------------------------------
2023-10-12 20:53:25,858 EPOCH 6 done: loss 0.0249 - lr: 0.000067
2023-10-12 20:53:46,613 DEV : loss 0.11829700320959091 - f1-score (micro avg)  0.8511
2023-10-12 20:53:46,644 ----------------------------------------------------------------------------------------------------
2023-10-12 20:54:27,677 epoch 7 - iter 144/1445 - loss 0.02617530 - time (sec): 41.03 - samples/sec: 429.90 - lr: 0.000065 - momentum: 0.000000
2023-10-12 20:55:08,659 epoch 7 - iter 288/1445 - loss 0.01907134 - time (sec): 82.01 - samples/sec: 433.79 - lr: 0.000063 - momentum: 0.000000
2023-10-12 20:55:48,427 epoch 7 - iter 432/1445 - loss 0.01974084 - time (sec): 121.78 - samples/sec: 427.77 - lr: 0.000062 - momentum: 0.000000
2023-10-12 20:56:27,995 epoch 7 - iter 576/1445 - loss 0.01857623 - time (sec): 161.35 - samples/sec: 425.46 - lr: 0.000060 - momentum: 0.000000
2023-10-12 20:57:09,150 epoch 7 - iter 720/1445 - loss 0.02000500 - time (sec): 202.50 - samples/sec: 428.61 - lr: 0.000058 - momentum: 0.000000
2023-10-12 20:57:50,603 epoch 7 - iter 864/1445 - loss 0.01835280 - time (sec): 243.96 - samples/sec: 427.19 - lr: 0.000057 - momentum: 0.000000
2023-10-12 20:58:32,369 epoch 7 - iter 1008/1445 - loss 0.01795777 - time (sec): 285.72 - samples/sec: 426.74 - lr: 0.000055 - momentum: 0.000000
2023-10-12 20:59:13,015 epoch 7 - iter 1152/1445 - loss 0.01805962 - time (sec): 326.37 - samples/sec: 425.71 - lr: 0.000053 - momentum: 0.000000
2023-10-12 20:59:53,241 epoch 7 - iter 1296/1445 - loss 0.01759878 - time (sec): 366.60 - samples/sec: 426.95 - lr: 0.000052 - momentum: 0.000000
2023-10-12 21:00:34,943 epoch 7 - iter 1440/1445 - loss 0.01818970 - time (sec): 408.30 - samples/sec: 429.81 - lr: 0.000050 - momentum: 0.000000
2023-10-12 21:00:36,340 ----------------------------------------------------------------------------------------------------
2023-10-12 21:00:36,340 EPOCH 7 done: loss 0.0182 - lr: 0.000050
2023-10-12 21:00:57,548 DEV : loss 0.11870528757572174 - f1-score (micro avg)  0.8525
2023-10-12 21:00:57,579 ----------------------------------------------------------------------------------------------------
2023-10-12 21:01:39,477 epoch 8 - iter 144/1445 - loss 0.01437875 - time (sec): 41.90 - samples/sec: 443.00 - lr: 0.000048 - momentum: 0.000000
2023-10-12 21:02:19,894 epoch 8 - iter 288/1445 - loss 0.01688151 - time (sec): 82.31 - samples/sec: 436.25 - lr: 0.000047 - momentum: 0.000000
2023-10-12 21:02:59,900 epoch 8 - iter 432/1445 - loss 0.01434297 - time (sec): 122.32 - samples/sec: 431.85 - lr: 0.000045 - momentum: 0.000000
2023-10-12 21:03:41,511 epoch 8 - iter 576/1445 - loss 0.01368236 - time (sec): 163.93 - samples/sec: 438.79 - lr: 0.000043 - momentum: 0.000000
2023-10-12 21:04:21,650 epoch 8 - iter 720/1445 - loss 0.01360876 - time (sec): 204.07 - samples/sec: 437.61 - lr: 0.000042 - momentum: 0.000000
2023-10-12 21:05:01,428 epoch 8 - iter 864/1445 - loss 0.01358988 - time (sec): 243.85 - samples/sec: 433.72 - lr: 0.000040 - momentum: 0.000000
2023-10-12 21:05:42,391 epoch 8 - iter 1008/1445 - loss 0.01382289 - time (sec): 284.81 - samples/sec: 432.03 - lr: 0.000038 - momentum: 0.000000
2023-10-12 21:06:21,709 epoch 8 - iter 1152/1445 - loss 0.01364830 - time (sec): 324.13 - samples/sec: 430.45 - lr: 0.000037 - momentum: 0.000000
2023-10-12 21:07:02,459 epoch 8 - iter 1296/1445 - loss 0.01538390 - time (sec): 364.88 - samples/sec: 432.75 - lr: 0.000035 - momentum: 0.000000
2023-10-12 21:07:42,880 epoch 8 - iter 1440/1445 - loss 0.01492242 - time (sec): 405.30 - samples/sec: 433.56 - lr: 0.000033 - momentum: 0.000000
2023-10-12 21:07:44,100 ----------------------------------------------------------------------------------------------------
2023-10-12 21:07:44,100 EPOCH 8 done: loss 0.0150 - lr: 0.000033
2023-10-12 21:08:04,932 DEV : loss 0.13916537165641785 - f1-score (micro avg)  0.8524
2023-10-12 21:08:04,963 ----------------------------------------------------------------------------------------------------
2023-10-12 21:08:45,633 epoch 9 - iter 144/1445 - loss 0.00328491 - time (sec): 40.67 - samples/sec: 452.52 - lr: 0.000032 - momentum: 0.000000
2023-10-12 21:09:27,458 epoch 9 - iter 288/1445 - loss 0.01469900 - time (sec): 82.49 - samples/sec: 450.93 - lr: 0.000030 - momentum: 0.000000
2023-10-12 21:10:07,918 epoch 9 - iter 432/1445 - loss 0.01343190 - time (sec): 122.95 - samples/sec: 447.75 - lr: 0.000028 - momentum: 0.000000
2023-10-12 21:10:47,029 epoch 9 - iter 576/1445 - loss 0.01174943 - time (sec): 162.06 - samples/sec: 438.96 - lr: 0.000027 - momentum: 0.000000
2023-10-12 21:11:25,827 epoch 9 - iter 720/1445 - loss 0.01124109 - time (sec): 200.86 - samples/sec: 433.13 - lr: 0.000025 - momentum: 0.000000
2023-10-12 21:12:06,252 epoch 9 - iter 864/1445 - loss 0.01129143 - time (sec): 241.29 - samples/sec: 434.82 - lr: 0.000023 - momentum: 0.000000
2023-10-12 21:12:46,994 epoch 9 - iter 1008/1445 - loss 0.01164716 - time (sec): 282.03 - samples/sec: 434.48 - lr: 0.000022 - momentum: 0.000000
2023-10-12 21:13:28,846 epoch 9 - iter 1152/1445 - loss 0.01179582 - time (sec): 323.88 - samples/sec: 436.77 - lr: 0.000020 - momentum: 0.000000
2023-10-12 21:14:08,610 epoch 9 - iter 1296/1445 - loss 0.01095141 - time (sec): 363.64 - samples/sec: 435.91 - lr: 0.000018 - momentum: 0.000000
2023-10-12 21:14:49,912 epoch 9 - iter 1440/1445 - loss 0.01040016 - time (sec): 404.95 - samples/sec: 433.81 - lr: 0.000017 - momentum: 0.000000
2023-10-12 21:14:51,213 ----------------------------------------------------------------------------------------------------
2023-10-12 21:14:51,213 EPOCH 9 done: loss 0.0104 - lr: 0.000017
2023-10-12 21:15:11,280 DEV : loss 0.1443011313676834 - f1-score (micro avg)  0.8538
2023-10-12 21:15:11,310 ----------------------------------------------------------------------------------------------------
2023-10-12 21:15:53,223 epoch 10 - iter 144/1445 - loss 0.00659592 - time (sec): 41.91 - samples/sec: 429.81 - lr: 0.000015 - momentum: 0.000000
2023-10-12 21:16:33,015 epoch 10 - iter 288/1445 - loss 0.00689152 - time (sec): 81.70 - samples/sec: 412.85 - lr: 0.000013 - momentum: 0.000000
2023-10-12 21:17:13,952 epoch 10 - iter 432/1445 - loss 0.00829991 - time (sec): 122.64 - samples/sec: 413.32 - lr: 0.000012 - momentum: 0.000000
2023-10-12 21:17:55,529 epoch 10 - iter 576/1445 - loss 0.00927037 - time (sec): 164.22 - samples/sec: 420.87 - lr: 0.000010 - momentum: 0.000000
2023-10-12 21:18:36,103 epoch 10 - iter 720/1445 - loss 0.00865224 - time (sec): 204.79 - samples/sec: 421.42 - lr: 0.000008 - momentum: 0.000000
2023-10-12 21:19:17,606 epoch 10 - iter 864/1445 - loss 0.00794145 - time (sec): 246.29 - samples/sec: 425.13 - lr: 0.000007 - momentum: 0.000000
2023-10-12 21:19:59,178 epoch 10 - iter 1008/1445 - loss 0.00815458 - time (sec): 287.87 - samples/sec: 428.24 - lr: 0.000005 - momentum: 0.000000
2023-10-12 21:20:39,135 epoch 10 - iter 1152/1445 - loss 0.00746884 - time (sec): 327.82 - samples/sec: 426.96 - lr: 0.000003 - momentum: 0.000000
2023-10-12 21:21:19,627 epoch 10 - iter 1296/1445 - loss 0.00801406 - time (sec): 368.31 - samples/sec: 428.13 - lr: 0.000002 - momentum: 0.000000
2023-10-12 21:22:01,376 epoch 10 - iter 1440/1445 - loss 0.00775432 - time (sec): 410.06 - samples/sec: 428.56 - lr: 0.000000 - momentum: 0.000000
2023-10-12 21:22:02,551 ----------------------------------------------------------------------------------------------------
2023-10-12 21:22:02,552 EPOCH 10 done: loss 0.0077 - lr: 0.000000
2023-10-12 21:22:24,387 DEV : loss 0.15128682553768158 - f1-score (micro avg)  0.8515
2023-10-12 21:22:25,328 ----------------------------------------------------------------------------------------------------
2023-10-12 21:22:25,330 Loading model from best epoch ...
2023-10-12 21:22:29,244 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-12 21:22:50,647 
Results:
- F-score (micro) 0.8113
- F-score (macro) 0.7148
- Accuracy 0.6935

By class:
              precision    recall  f1-score   support

         PER     0.8591    0.7718    0.8131       482
         LOC     0.9056    0.8166    0.8588       458
         ORG     0.5172    0.4348    0.4724        69

   micro avg     0.8584    0.7691    0.8113      1009
   macro avg     0.7606    0.6744    0.7148      1009
weighted avg     0.8568    0.7691    0.8105      1009

2023-10-12 21:22:50,647 ----------------------------------------------------------------------------------------------------