2023-10-12 20:09:28,155 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:09:28,157 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 20:09:28,157 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:09:28,157 MultiCorpus: 5777 train + 722 dev + 723 test sentences - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl 2023-10-12 20:09:28,157 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:09:28,158 Train: 5777 sentences 2023-10-12 20:09:28,158 (train_with_dev=False, train_with_test=False) 2023-10-12 20:09:28,158 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:09:28,158 Training Params: 2023-10-12 20:09:28,158 - learning_rate: "0.00015" 2023-10-12 20:09:28,158 - mini_batch_size: "4" 2023-10-12 20:09:28,158 - max_epochs: "10" 2023-10-12 20:09:28,158 - shuffle: "True" 2023-10-12 20:09:28,158 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:09:28,158 Plugins: 2023-10-12 20:09:28,158 - TensorboardLogger 2023-10-12 20:09:28,158 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 20:09:28,158 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:09:28,158 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 20:09:28,158 - metric: "('micro avg', 'f1-score')" 2023-10-12 20:09:28,159 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:09:28,159 Computation: 2023-10-12 20:09:28,159 - compute on device: cuda:0 2023-10-12 20:09:28,159 - embedding storage: none 2023-10-12 20:09:28,159 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:09:28,159 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-5" 2023-10-12 20:09:28,159 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:09:28,159 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:09:28,159 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 20:10:10,201 epoch 1 - iter 144/1445 - loss 2.53513164 - time (sec): 42.04 - samples/sec: 429.57 - lr: 0.000015 - momentum: 0.000000 2023-10-12 20:10:51,555 epoch 1 - iter 288/1445 - loss 2.38125034 - time (sec): 83.39 - samples/sec: 432.87 - lr: 0.000030 - momentum: 0.000000 2023-10-12 20:11:32,357 epoch 1 - iter 432/1445 - loss 2.14339240 - time (sec): 124.20 - samples/sec: 421.72 - lr: 0.000045 - momentum: 0.000000 2023-10-12 20:12:14,329 epoch 1 - iter 576/1445 - loss 1.85422411 - time (sec): 166.17 - samples/sec: 421.64 - lr: 0.000060 - momentum: 0.000000 2023-10-12 20:12:56,106 epoch 1 - iter 720/1445 - loss 1.58043777 - time (sec): 207.95 - samples/sec: 422.36 - lr: 0.000075 - momentum: 0.000000 2023-10-12 20:13:37,386 epoch 1 - iter 864/1445 - loss 1.36953584 - time (sec): 249.22 - samples/sec: 420.10 - lr: 0.000090 - momentum: 0.000000 2023-10-12 20:14:19,750 epoch 1 - iter 1008/1445 - loss 1.20108235 - time (sec): 291.59 - samples/sec: 419.86 - lr: 0.000105 - momentum: 0.000000 2023-10-12 20:14:59,913 epoch 1 - iter 1152/1445 - loss 1.07934407 - time (sec): 331.75 - samples/sec: 419.62 - lr: 0.000119 - momentum: 0.000000 2023-10-12 20:15:40,857 epoch 1 - iter 1296/1445 - loss 0.97634839 - time (sec): 372.70 - samples/sec: 421.06 - lr: 0.000134 - momentum: 0.000000 2023-10-12 20:16:23,000 epoch 1 - iter 1440/1445 - loss 0.88821258 - time (sec): 414.84 - samples/sec: 422.96 - lr: 0.000149 - momentum: 0.000000 2023-10-12 20:16:24,456 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:16:24,456 EPOCH 1 done: loss 0.8849 - lr: 0.000149 2023-10-12 20:16:45,374 DEV : loss 0.18604247272014618 - f1-score (micro avg) 0.3195 2023-10-12 20:16:45,410 saving best model 2023-10-12 20:16:46,297 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:17:28,351 epoch 2 - iter 144/1445 - loss 0.13676022 - time (sec): 42.05 - samples/sec: 414.84 - lr: 0.000148 - momentum: 0.000000 2023-10-12 20:18:10,440 epoch 2 - iter 288/1445 - loss 0.12929980 - time (sec): 84.14 - samples/sec: 421.50 - lr: 0.000147 - momentum: 0.000000 2023-10-12 20:18:51,718 epoch 2 - iter 432/1445 - loss 0.12842442 - time (sec): 125.42 - samples/sec: 414.49 - lr: 0.000145 - momentum: 0.000000 2023-10-12 20:19:33,782 epoch 2 - iter 576/1445 - loss 0.12658184 - time (sec): 167.48 - samples/sec: 416.65 - lr: 0.000143 - momentum: 0.000000 2023-10-12 20:20:17,490 epoch 2 - iter 720/1445 - loss 0.12315068 - time (sec): 211.19 - samples/sec: 413.09 - lr: 0.000142 - momentum: 0.000000 2023-10-12 20:20:59,183 epoch 2 - iter 864/1445 - loss 0.12053054 - time (sec): 252.88 - samples/sec: 414.43 - lr: 0.000140 - momentum: 0.000000 2023-10-12 20:21:41,327 epoch 2 - iter 1008/1445 - loss 0.12141328 - time (sec): 295.03 - samples/sec: 413.84 - lr: 0.000138 - momentum: 0.000000 2023-10-12 20:22:23,104 epoch 2 - iter 1152/1445 - loss 0.11874436 - time (sec): 336.80 - samples/sec: 415.42 - lr: 0.000137 - momentum: 0.000000 2023-10-12 20:23:06,226 epoch 2 - iter 1296/1445 - loss 0.11649198 - time (sec): 379.93 - samples/sec: 416.37 - lr: 0.000135 - momentum: 0.000000 2023-10-12 20:23:47,727 epoch 2 - iter 1440/1445 - loss 0.11308681 - time (sec): 421.43 - samples/sec: 416.69 - lr: 0.000133 - momentum: 0.000000 2023-10-12 20:23:49,078 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:23:49,079 EPOCH 2 done: loss 0.1129 - lr: 0.000133 2023-10-12 20:24:10,065 DEV : loss 0.09517565369606018 - f1-score (micro avg) 0.8125 2023-10-12 20:24:10,102 saving best model 2023-10-12 20:24:12,992 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:24:56,006 epoch 3 - iter 144/1445 - loss 0.07327007 - time (sec): 43.01 - samples/sec: 416.27 - lr: 0.000132 - momentum: 0.000000 2023-10-12 20:25:38,617 epoch 3 - iter 288/1445 - loss 0.07041840 - time (sec): 85.62 - samples/sec: 420.04 - lr: 0.000130 - momentum: 0.000000 2023-10-12 20:26:20,786 epoch 3 - iter 432/1445 - loss 0.06999514 - time (sec): 127.79 - samples/sec: 418.84 - lr: 0.000128 - momentum: 0.000000 2023-10-12 20:27:03,456 epoch 3 - iter 576/1445 - loss 0.07112475 - time (sec): 170.46 - samples/sec: 417.00 - lr: 0.000127 - momentum: 0.000000 2023-10-12 20:27:45,281 epoch 3 - iter 720/1445 - loss 0.07021459 - time (sec): 212.29 - samples/sec: 419.44 - lr: 0.000125 - momentum: 0.000000 2023-10-12 20:28:27,466 epoch 3 - iter 864/1445 - loss 0.07000170 - time (sec): 254.47 - samples/sec: 423.51 - lr: 0.000123 - momentum: 0.000000 2023-10-12 20:29:09,522 epoch 3 - iter 1008/1445 - loss 0.06982317 - time (sec): 296.53 - samples/sec: 422.11 - lr: 0.000122 - momentum: 0.000000 2023-10-12 20:29:50,667 epoch 3 - iter 1152/1445 - loss 0.06964722 - time (sec): 337.67 - samples/sec: 421.05 - lr: 0.000120 - momentum: 0.000000 2023-10-12 20:30:31,746 epoch 3 - iter 1296/1445 - loss 0.06839580 - time (sec): 378.75 - samples/sec: 419.07 - lr: 0.000118 - momentum: 0.000000 2023-10-12 20:31:14,591 epoch 3 - iter 1440/1445 - loss 0.06745276 - time (sec): 421.60 - samples/sec: 416.31 - lr: 0.000117 - momentum: 0.000000 2023-10-12 20:31:16,021 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:31:16,022 EPOCH 3 done: loss 0.0675 - lr: 0.000117 2023-10-12 20:31:37,989 DEV : loss 0.08340664207935333 - f1-score (micro avg) 0.8439 2023-10-12 20:31:38,020 saving best model 2023-10-12 20:31:40,553 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:32:23,274 epoch 4 - iter 144/1445 - loss 0.05297012 - time (sec): 42.72 - samples/sec: 419.86 - lr: 0.000115 - momentum: 0.000000 2023-10-12 20:33:04,940 epoch 4 - iter 288/1445 - loss 0.05061011 - time (sec): 84.38 - samples/sec: 411.81 - lr: 0.000113 - momentum: 0.000000 2023-10-12 20:33:47,048 epoch 4 - iter 432/1445 - loss 0.04713746 - time (sec): 126.49 - samples/sec: 414.49 - lr: 0.000112 - momentum: 0.000000 2023-10-12 20:34:30,755 epoch 4 - iter 576/1445 - loss 0.04639052 - time (sec): 170.20 - samples/sec: 419.88 - lr: 0.000110 - momentum: 0.000000 2023-10-12 20:35:13,049 epoch 4 - iter 720/1445 - loss 0.04446589 - time (sec): 212.49 - samples/sec: 419.88 - lr: 0.000108 - momentum: 0.000000 2023-10-12 20:35:54,366 epoch 4 - iter 864/1445 - loss 0.04321469 - time (sec): 253.81 - samples/sec: 418.40 - lr: 0.000107 - momentum: 0.000000 2023-10-12 20:36:39,400 epoch 4 - iter 1008/1445 - loss 0.04351905 - time (sec): 298.84 - samples/sec: 412.20 - lr: 0.000105 - momentum: 0.000000 2023-10-12 20:37:22,743 epoch 4 - iter 1152/1445 - loss 0.04392393 - time (sec): 342.18 - samples/sec: 413.28 - lr: 0.000103 - momentum: 0.000000 2023-10-12 20:38:05,982 epoch 4 - iter 1296/1445 - loss 0.04654178 - time (sec): 385.42 - samples/sec: 412.16 - lr: 0.000102 - momentum: 0.000000 2023-10-12 20:38:47,608 epoch 4 - iter 1440/1445 - loss 0.04601732 - time (sec): 427.05 - samples/sec: 411.73 - lr: 0.000100 - momentum: 0.000000 2023-10-12 20:38:48,759 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:38:48,759 EPOCH 4 done: loss 0.0460 - lr: 0.000100 2023-10-12 20:39:09,821 DEV : loss 0.09578309208154678 - f1-score (micro avg) 0.854 2023-10-12 20:39:09,854 saving best model 2023-10-12 20:39:12,430 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:39:55,317 epoch 5 - iter 144/1445 - loss 0.03906087 - time (sec): 42.88 - samples/sec: 440.19 - lr: 0.000098 - momentum: 0.000000 2023-10-12 20:40:37,355 epoch 5 - iter 288/1445 - loss 0.03326712 - time (sec): 84.92 - samples/sec: 427.34 - lr: 0.000097 - momentum: 0.000000 2023-10-12 20:41:17,016 epoch 5 - iter 432/1445 - loss 0.03091069 - time (sec): 124.58 - samples/sec: 416.23 - lr: 0.000095 - momentum: 0.000000 2023-10-12 20:41:56,651 epoch 5 - iter 576/1445 - loss 0.03004124 - time (sec): 164.22 - samples/sec: 414.41 - lr: 0.000093 - momentum: 0.000000 2023-10-12 20:42:38,714 epoch 5 - iter 720/1445 - loss 0.03116302 - time (sec): 206.28 - samples/sec: 419.94 - lr: 0.000092 - momentum: 0.000000 2023-10-12 20:43:20,128 epoch 5 - iter 864/1445 - loss 0.03024606 - time (sec): 247.69 - samples/sec: 419.91 - lr: 0.000090 - momentum: 0.000000 2023-10-12 20:44:02,818 epoch 5 - iter 1008/1445 - loss 0.03089293 - time (sec): 290.38 - samples/sec: 421.95 - lr: 0.000088 - momentum: 0.000000 2023-10-12 20:44:44,492 epoch 5 - iter 1152/1445 - loss 0.03113353 - time (sec): 332.06 - samples/sec: 422.52 - lr: 0.000087 - momentum: 0.000000 2023-10-12 20:45:26,335 epoch 5 - iter 1296/1445 - loss 0.03083010 - time (sec): 373.90 - samples/sec: 422.31 - lr: 0.000085 - momentum: 0.000000 2023-10-12 20:46:07,798 epoch 5 - iter 1440/1445 - loss 0.03275021 - time (sec): 415.36 - samples/sec: 422.21 - lr: 0.000083 - momentum: 0.000000 2023-10-12 20:46:09,256 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:46:09,256 EPOCH 5 done: loss 0.0332 - lr: 0.000083 2023-10-12 20:46:30,762 DEV : loss 0.10168028622865677 - f1-score (micro avg) 0.851 2023-10-12 20:46:30,793 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:47:12,084 epoch 6 - iter 144/1445 - loss 0.02321712 - time (sec): 41.29 - samples/sec: 416.75 - lr: 0.000082 - momentum: 0.000000 2023-10-12 20:47:52,553 epoch 6 - iter 288/1445 - loss 0.02481388 - time (sec): 81.76 - samples/sec: 422.31 - lr: 0.000080 - momentum: 0.000000 2023-10-12 20:48:33,748 epoch 6 - iter 432/1445 - loss 0.02757483 - time (sec): 122.95 - samples/sec: 426.21 - lr: 0.000078 - momentum: 0.000000 2023-10-12 20:49:15,702 epoch 6 - iter 576/1445 - loss 0.02650027 - time (sec): 164.91 - samples/sec: 426.98 - lr: 0.000077 - momentum: 0.000000 2023-10-12 20:49:57,116 epoch 6 - iter 720/1445 - loss 0.02588266 - time (sec): 206.32 - samples/sec: 427.16 - lr: 0.000075 - momentum: 0.000000 2023-10-12 20:50:40,206 epoch 6 - iter 864/1445 - loss 0.02373812 - time (sec): 249.41 - samples/sec: 427.26 - lr: 0.000073 - momentum: 0.000000 2023-10-12 20:51:22,906 epoch 6 - iter 1008/1445 - loss 0.02602288 - time (sec): 292.11 - samples/sec: 425.85 - lr: 0.000072 - momentum: 0.000000 2023-10-12 20:52:03,431 epoch 6 - iter 1152/1445 - loss 0.02504745 - time (sec): 332.64 - samples/sec: 423.75 - lr: 0.000070 - momentum: 0.000000 2023-10-12 20:52:43,013 epoch 6 - iter 1296/1445 - loss 0.02425352 - time (sec): 372.22 - samples/sec: 423.56 - lr: 0.000068 - momentum: 0.000000 2023-10-12 20:53:24,660 epoch 6 - iter 1440/1445 - loss 0.02500831 - time (sec): 413.86 - samples/sec: 424.47 - lr: 0.000067 - momentum: 0.000000 2023-10-12 20:53:25,858 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:53:25,858 EPOCH 6 done: loss 0.0249 - lr: 0.000067 2023-10-12 20:53:46,613 DEV : loss 0.11829700320959091 - f1-score (micro avg) 0.8511 2023-10-12 20:53:46,644 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:54:27,677 epoch 7 - iter 144/1445 - loss 0.02617530 - time (sec): 41.03 - samples/sec: 429.90 - lr: 0.000065 - momentum: 0.000000 2023-10-12 20:55:08,659 epoch 7 - iter 288/1445 - loss 0.01907134 - time (sec): 82.01 - samples/sec: 433.79 - lr: 0.000063 - momentum: 0.000000 2023-10-12 20:55:48,427 epoch 7 - iter 432/1445 - loss 0.01974084 - time (sec): 121.78 - samples/sec: 427.77 - lr: 0.000062 - momentum: 0.000000 2023-10-12 20:56:27,995 epoch 7 - iter 576/1445 - loss 0.01857623 - time (sec): 161.35 - samples/sec: 425.46 - lr: 0.000060 - momentum: 0.000000 2023-10-12 20:57:09,150 epoch 7 - iter 720/1445 - loss 0.02000500 - time (sec): 202.50 - samples/sec: 428.61 - lr: 0.000058 - momentum: 0.000000 2023-10-12 20:57:50,603 epoch 7 - iter 864/1445 - loss 0.01835280 - time (sec): 243.96 - samples/sec: 427.19 - lr: 0.000057 - momentum: 0.000000 2023-10-12 20:58:32,369 epoch 7 - iter 1008/1445 - loss 0.01795777 - time (sec): 285.72 - samples/sec: 426.74 - lr: 0.000055 - momentum: 0.000000 2023-10-12 20:59:13,015 epoch 7 - iter 1152/1445 - loss 0.01805962 - time (sec): 326.37 - samples/sec: 425.71 - lr: 0.000053 - momentum: 0.000000 2023-10-12 20:59:53,241 epoch 7 - iter 1296/1445 - loss 0.01759878 - time (sec): 366.60 - samples/sec: 426.95 - lr: 0.000052 - momentum: 0.000000 2023-10-12 21:00:34,943 epoch 7 - iter 1440/1445 - loss 0.01818970 - time (sec): 408.30 - samples/sec: 429.81 - lr: 0.000050 - momentum: 0.000000 2023-10-12 21:00:36,340 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:00:36,340 EPOCH 7 done: loss 0.0182 - lr: 0.000050 2023-10-12 21:00:57,548 DEV : loss 0.11870528757572174 - f1-score (micro avg) 0.8525 2023-10-12 21:00:57,579 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:01:39,477 epoch 8 - iter 144/1445 - loss 0.01437875 - time (sec): 41.90 - samples/sec: 443.00 - lr: 0.000048 - momentum: 0.000000 2023-10-12 21:02:19,894 epoch 8 - iter 288/1445 - loss 0.01688151 - time (sec): 82.31 - samples/sec: 436.25 - lr: 0.000047 - momentum: 0.000000 2023-10-12 21:02:59,900 epoch 8 - iter 432/1445 - loss 0.01434297 - time (sec): 122.32 - samples/sec: 431.85 - lr: 0.000045 - momentum: 0.000000 2023-10-12 21:03:41,511 epoch 8 - iter 576/1445 - loss 0.01368236 - time (sec): 163.93 - samples/sec: 438.79 - lr: 0.000043 - momentum: 0.000000 2023-10-12 21:04:21,650 epoch 8 - iter 720/1445 - loss 0.01360876 - time (sec): 204.07 - samples/sec: 437.61 - lr: 0.000042 - momentum: 0.000000 2023-10-12 21:05:01,428 epoch 8 - iter 864/1445 - loss 0.01358988 - time (sec): 243.85 - samples/sec: 433.72 - lr: 0.000040 - momentum: 0.000000 2023-10-12 21:05:42,391 epoch 8 - iter 1008/1445 - loss 0.01382289 - time (sec): 284.81 - samples/sec: 432.03 - lr: 0.000038 - momentum: 0.000000 2023-10-12 21:06:21,709 epoch 8 - iter 1152/1445 - loss 0.01364830 - time (sec): 324.13 - samples/sec: 430.45 - lr: 0.000037 - momentum: 0.000000 2023-10-12 21:07:02,459 epoch 8 - iter 1296/1445 - loss 0.01538390 - time (sec): 364.88 - samples/sec: 432.75 - lr: 0.000035 - momentum: 0.000000 2023-10-12 21:07:42,880 epoch 8 - iter 1440/1445 - loss 0.01492242 - time (sec): 405.30 - samples/sec: 433.56 - lr: 0.000033 - momentum: 0.000000 2023-10-12 21:07:44,100 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:07:44,100 EPOCH 8 done: loss 0.0150 - lr: 0.000033 2023-10-12 21:08:04,932 DEV : loss 0.13916537165641785 - f1-score (micro avg) 0.8524 2023-10-12 21:08:04,963 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:08:45,633 epoch 9 - iter 144/1445 - loss 0.00328491 - time (sec): 40.67 - samples/sec: 452.52 - lr: 0.000032 - momentum: 0.000000 2023-10-12 21:09:27,458 epoch 9 - iter 288/1445 - loss 0.01469900 - time (sec): 82.49 - samples/sec: 450.93 - lr: 0.000030 - momentum: 0.000000 2023-10-12 21:10:07,918 epoch 9 - iter 432/1445 - loss 0.01343190 - time (sec): 122.95 - samples/sec: 447.75 - lr: 0.000028 - momentum: 0.000000 2023-10-12 21:10:47,029 epoch 9 - iter 576/1445 - loss 0.01174943 - time (sec): 162.06 - samples/sec: 438.96 - lr: 0.000027 - momentum: 0.000000 2023-10-12 21:11:25,827 epoch 9 - iter 720/1445 - loss 0.01124109 - time (sec): 200.86 - samples/sec: 433.13 - lr: 0.000025 - momentum: 0.000000 2023-10-12 21:12:06,252 epoch 9 - iter 864/1445 - loss 0.01129143 - time (sec): 241.29 - samples/sec: 434.82 - lr: 0.000023 - momentum: 0.000000 2023-10-12 21:12:46,994 epoch 9 - iter 1008/1445 - loss 0.01164716 - time (sec): 282.03 - samples/sec: 434.48 - lr: 0.000022 - momentum: 0.000000 2023-10-12 21:13:28,846 epoch 9 - iter 1152/1445 - loss 0.01179582 - time (sec): 323.88 - samples/sec: 436.77 - lr: 0.000020 - momentum: 0.000000 2023-10-12 21:14:08,610 epoch 9 - iter 1296/1445 - loss 0.01095141 - time (sec): 363.64 - samples/sec: 435.91 - lr: 0.000018 - momentum: 0.000000 2023-10-12 21:14:49,912 epoch 9 - iter 1440/1445 - loss 0.01040016 - time (sec): 404.95 - samples/sec: 433.81 - lr: 0.000017 - momentum: 0.000000 2023-10-12 21:14:51,213 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:14:51,213 EPOCH 9 done: loss 0.0104 - lr: 0.000017 2023-10-12 21:15:11,280 DEV : loss 0.1443011313676834 - f1-score (micro avg) 0.8538 2023-10-12 21:15:11,310 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:15:53,223 epoch 10 - iter 144/1445 - loss 0.00659592 - time (sec): 41.91 - samples/sec: 429.81 - lr: 0.000015 - momentum: 0.000000 2023-10-12 21:16:33,015 epoch 10 - iter 288/1445 - loss 0.00689152 - time (sec): 81.70 - samples/sec: 412.85 - lr: 0.000013 - momentum: 0.000000 2023-10-12 21:17:13,952 epoch 10 - iter 432/1445 - loss 0.00829991 - time (sec): 122.64 - samples/sec: 413.32 - lr: 0.000012 - momentum: 0.000000 2023-10-12 21:17:55,529 epoch 10 - iter 576/1445 - loss 0.00927037 - time (sec): 164.22 - samples/sec: 420.87 - lr: 0.000010 - momentum: 0.000000 2023-10-12 21:18:36,103 epoch 10 - iter 720/1445 - loss 0.00865224 - time (sec): 204.79 - samples/sec: 421.42 - lr: 0.000008 - momentum: 0.000000 2023-10-12 21:19:17,606 epoch 10 - iter 864/1445 - loss 0.00794145 - time (sec): 246.29 - samples/sec: 425.13 - lr: 0.000007 - momentum: 0.000000 2023-10-12 21:19:59,178 epoch 10 - iter 1008/1445 - loss 0.00815458 - time (sec): 287.87 - samples/sec: 428.24 - lr: 0.000005 - momentum: 0.000000 2023-10-12 21:20:39,135 epoch 10 - iter 1152/1445 - loss 0.00746884 - time (sec): 327.82 - samples/sec: 426.96 - lr: 0.000003 - momentum: 0.000000 2023-10-12 21:21:19,627 epoch 10 - iter 1296/1445 - loss 0.00801406 - time (sec): 368.31 - samples/sec: 428.13 - lr: 0.000002 - momentum: 0.000000 2023-10-12 21:22:01,376 epoch 10 - iter 1440/1445 - loss 0.00775432 - time (sec): 410.06 - samples/sec: 428.56 - lr: 0.000000 - momentum: 0.000000 2023-10-12 21:22:02,551 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:22:02,552 EPOCH 10 done: loss 0.0077 - lr: 0.000000 2023-10-12 21:22:24,387 DEV : loss 0.15128682553768158 - f1-score (micro avg) 0.8515 2023-10-12 21:22:25,328 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:22:25,330 Loading model from best epoch ... 2023-10-12 21:22:29,244 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-12 21:22:50,647 Results: - F-score (micro) 0.8113 - F-score (macro) 0.7148 - Accuracy 0.6935 By class: precision recall f1-score support PER 0.8591 0.7718 0.8131 482 LOC 0.9056 0.8166 0.8588 458 ORG 0.5172 0.4348 0.4724 69 micro avg 0.8584 0.7691 0.8113 1009 macro avg 0.7606 0.6744 0.7148 1009 weighted avg 0.8568 0.7691 0.8105 1009 2023-10-12 21:22:50,647 ----------------------------------------------------------------------------------------------------