2023-10-13 13:26:01,621 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:01,624 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 13:26:01,624 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:01,625 MultiCorpus: 6183 train + 680 dev + 2113 test sentences - NER_HIPE_2022 Corpus: 6183 train + 680 dev + 2113 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/topres19th/en/with_doc_seperator 2023-10-13 13:26:01,625 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:01,625 Train: 6183 sentences 2023-10-13 13:26:01,625 (train_with_dev=False, train_with_test=False) 2023-10-13 13:26:01,625 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:01,625 Training Params: 2023-10-13 13:26:01,625 - learning_rate: "0.00016" 2023-10-13 13:26:01,625 - mini_batch_size: "8" 2023-10-13 13:26:01,625 - max_epochs: "10" 2023-10-13 13:26:01,625 - shuffle: "True" 2023-10-13 13:26:01,625 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:01,626 Plugins: 2023-10-13 13:26:01,626 - TensorboardLogger 2023-10-13 13:26:01,626 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 13:26:01,626 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:01,626 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 13:26:01,626 - metric: "('micro avg', 'f1-score')" 2023-10-13 13:26:01,626 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:01,626 Computation: 2023-10-13 13:26:01,626 - compute on device: cuda:0 2023-10-13 13:26:01,626 - embedding storage: none 2023-10-13 13:26:01,626 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:01,626 Model training base path: "hmbench-topres19th/en-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-2" 2023-10-13 13:26:01,627 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:01,627 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:01,627 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-13 13:26:45,818 epoch 1 - iter 77/773 - loss 2.57062010 - time (sec): 44.19 - samples/sec: 279.83 - lr: 0.000016 - momentum: 0.000000 2023-10-13 13:27:29,191 epoch 1 - iter 154/773 - loss 2.53541742 - time (sec): 87.56 - samples/sec: 273.18 - lr: 0.000032 - momentum: 0.000000 2023-10-13 13:28:12,935 epoch 1 - iter 231/773 - loss 2.36209147 - time (sec): 131.31 - samples/sec: 279.40 - lr: 0.000048 - momentum: 0.000000 2023-10-13 13:28:54,426 epoch 1 - iter 308/773 - loss 2.11589559 - time (sec): 172.80 - samples/sec: 291.48 - lr: 0.000064 - momentum: 0.000000 2023-10-13 13:29:35,243 epoch 1 - iter 385/773 - loss 1.87498350 - time (sec): 213.61 - samples/sec: 294.41 - lr: 0.000079 - momentum: 0.000000 2023-10-13 13:30:15,300 epoch 1 - iter 462/773 - loss 1.66074017 - time (sec): 253.67 - samples/sec: 293.66 - lr: 0.000095 - momentum: 0.000000 2023-10-13 13:30:55,241 epoch 1 - iter 539/773 - loss 1.47343454 - time (sec): 293.61 - samples/sec: 293.37 - lr: 0.000111 - momentum: 0.000000 2023-10-13 13:31:36,180 epoch 1 - iter 616/773 - loss 1.31856275 - time (sec): 334.55 - samples/sec: 293.10 - lr: 0.000127 - momentum: 0.000000 2023-10-13 13:32:18,222 epoch 1 - iter 693/773 - loss 1.18884911 - time (sec): 376.59 - samples/sec: 294.74 - lr: 0.000143 - momentum: 0.000000 2023-10-13 13:32:58,624 epoch 1 - iter 770/773 - loss 1.08342681 - time (sec): 416.99 - samples/sec: 297.00 - lr: 0.000159 - momentum: 0.000000 2023-10-13 13:33:00,095 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:33:00,095 EPOCH 1 done: loss 1.0799 - lr: 0.000159 2023-10-13 13:33:17,290 DEV : loss 0.09480314701795578 - f1-score (micro avg) 0.5298 2023-10-13 13:33:17,321 saving best model 2023-10-13 13:33:18,262 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:34:00,643 epoch 2 - iter 77/773 - loss 0.13616110 - time (sec): 42.38 - samples/sec: 285.85 - lr: 0.000158 - momentum: 0.000000 2023-10-13 13:34:40,741 epoch 2 - iter 154/773 - loss 0.13603970 - time (sec): 82.48 - samples/sec: 302.30 - lr: 0.000156 - momentum: 0.000000 2023-10-13 13:35:20,081 epoch 2 - iter 231/773 - loss 0.12633571 - time (sec): 121.82 - samples/sec: 305.57 - lr: 0.000155 - momentum: 0.000000 2023-10-13 13:36:00,308 epoch 2 - iter 308/773 - loss 0.11757551 - time (sec): 162.04 - samples/sec: 308.22 - lr: 0.000153 - momentum: 0.000000 2023-10-13 13:36:40,604 epoch 2 - iter 385/773 - loss 0.11330549 - time (sec): 202.34 - samples/sec: 304.24 - lr: 0.000151 - momentum: 0.000000 2023-10-13 13:37:22,329 epoch 2 - iter 462/773 - loss 0.10747945 - time (sec): 244.07 - samples/sec: 305.52 - lr: 0.000149 - momentum: 0.000000 2023-10-13 13:38:02,731 epoch 2 - iter 539/773 - loss 0.10730665 - time (sec): 284.47 - samples/sec: 307.32 - lr: 0.000148 - momentum: 0.000000 2023-10-13 13:38:43,718 epoch 2 - iter 616/773 - loss 0.10535188 - time (sec): 325.45 - samples/sec: 305.88 - lr: 0.000146 - momentum: 0.000000 2023-10-13 13:39:28,058 epoch 2 - iter 693/773 - loss 0.10308060 - time (sec): 369.79 - samples/sec: 302.56 - lr: 0.000144 - momentum: 0.000000 2023-10-13 13:40:10,324 epoch 2 - iter 770/773 - loss 0.10039870 - time (sec): 412.06 - samples/sec: 300.66 - lr: 0.000142 - momentum: 0.000000 2023-10-13 13:40:11,973 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:40:11,973 EPOCH 2 done: loss 0.1002 - lr: 0.000142 2023-10-13 13:40:31,298 DEV : loss 0.057838067412376404 - f1-score (micro avg) 0.7813 2023-10-13 13:40:31,328 saving best model 2023-10-13 13:40:34,113 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:41:16,472 epoch 3 - iter 77/773 - loss 0.06633658 - time (sec): 42.36 - samples/sec: 301.00 - lr: 0.000140 - momentum: 0.000000 2023-10-13 13:41:58,905 epoch 3 - iter 154/773 - loss 0.06277207 - time (sec): 84.79 - samples/sec: 297.67 - lr: 0.000139 - momentum: 0.000000 2023-10-13 13:42:40,672 epoch 3 - iter 231/773 - loss 0.06457756 - time (sec): 126.55 - samples/sec: 291.46 - lr: 0.000137 - momentum: 0.000000 2023-10-13 13:43:23,276 epoch 3 - iter 308/773 - loss 0.06768297 - time (sec): 169.16 - samples/sec: 295.98 - lr: 0.000135 - momentum: 0.000000 2023-10-13 13:44:02,782 epoch 3 - iter 385/773 - loss 0.06578215 - time (sec): 208.66 - samples/sec: 297.09 - lr: 0.000133 - momentum: 0.000000 2023-10-13 13:44:44,904 epoch 3 - iter 462/773 - loss 0.06565666 - time (sec): 250.79 - samples/sec: 296.04 - lr: 0.000132 - momentum: 0.000000 2023-10-13 13:45:27,591 epoch 3 - iter 539/773 - loss 0.06483917 - time (sec): 293.47 - samples/sec: 296.59 - lr: 0.000130 - momentum: 0.000000 2023-10-13 13:46:07,958 epoch 3 - iter 616/773 - loss 0.06239288 - time (sec): 333.84 - samples/sec: 299.57 - lr: 0.000128 - momentum: 0.000000 2023-10-13 13:46:48,009 epoch 3 - iter 693/773 - loss 0.06147020 - time (sec): 373.89 - samples/sec: 300.51 - lr: 0.000126 - momentum: 0.000000 2023-10-13 13:47:27,430 epoch 3 - iter 770/773 - loss 0.06074472 - time (sec): 413.31 - samples/sec: 299.26 - lr: 0.000125 - momentum: 0.000000 2023-10-13 13:47:29,015 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:47:29,015 EPOCH 3 done: loss 0.0608 - lr: 0.000125 2023-10-13 13:47:46,060 DEV : loss 0.04795033112168312 - f1-score (micro avg) 0.786 2023-10-13 13:47:46,091 saving best model 2023-10-13 13:47:48,760 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:48:31,247 epoch 4 - iter 77/773 - loss 0.04570769 - time (sec): 42.48 - samples/sec: 266.33 - lr: 0.000123 - momentum: 0.000000 2023-10-13 13:49:11,886 epoch 4 - iter 154/773 - loss 0.04142650 - time (sec): 83.12 - samples/sec: 293.56 - lr: 0.000121 - momentum: 0.000000 2023-10-13 13:49:50,782 epoch 4 - iter 231/773 - loss 0.04222222 - time (sec): 122.02 - samples/sec: 295.97 - lr: 0.000119 - momentum: 0.000000 2023-10-13 13:50:31,804 epoch 4 - iter 308/773 - loss 0.04243309 - time (sec): 163.04 - samples/sec: 306.86 - lr: 0.000117 - momentum: 0.000000 2023-10-13 13:51:11,088 epoch 4 - iter 385/773 - loss 0.03983764 - time (sec): 202.32 - samples/sec: 306.16 - lr: 0.000116 - momentum: 0.000000 2023-10-13 13:51:51,434 epoch 4 - iter 462/773 - loss 0.03841069 - time (sec): 242.67 - samples/sec: 307.84 - lr: 0.000114 - momentum: 0.000000 2023-10-13 13:52:31,480 epoch 4 - iter 539/773 - loss 0.03825856 - time (sec): 282.72 - samples/sec: 308.09 - lr: 0.000112 - momentum: 0.000000 2023-10-13 13:53:11,592 epoch 4 - iter 616/773 - loss 0.03669827 - time (sec): 322.83 - samples/sec: 309.31 - lr: 0.000110 - momentum: 0.000000 2023-10-13 13:53:51,388 epoch 4 - iter 693/773 - loss 0.03804052 - time (sec): 362.62 - samples/sec: 308.06 - lr: 0.000109 - momentum: 0.000000 2023-10-13 13:54:31,531 epoch 4 - iter 770/773 - loss 0.03792700 - time (sec): 402.77 - samples/sec: 307.42 - lr: 0.000107 - momentum: 0.000000 2023-10-13 13:54:33,017 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:54:33,018 EPOCH 4 done: loss 0.0378 - lr: 0.000107 2023-10-13 13:54:49,877 DEV : loss 0.06375983357429504 - f1-score (micro avg) 0.8 2023-10-13 13:54:49,910 saving best model 2023-10-13 13:54:50,939 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:55:30,979 epoch 5 - iter 77/773 - loss 0.02552778 - time (sec): 40.04 - samples/sec: 305.94 - lr: 0.000105 - momentum: 0.000000 2023-10-13 13:56:14,288 epoch 5 - iter 154/773 - loss 0.02578947 - time (sec): 83.35 - samples/sec: 292.86 - lr: 0.000103 - momentum: 0.000000 2023-10-13 13:56:55,531 epoch 5 - iter 231/773 - loss 0.02476470 - time (sec): 124.59 - samples/sec: 303.09 - lr: 0.000101 - momentum: 0.000000 2023-10-13 13:57:36,632 epoch 5 - iter 308/773 - loss 0.02496423 - time (sec): 165.69 - samples/sec: 305.85 - lr: 0.000100 - momentum: 0.000000 2023-10-13 13:58:16,866 epoch 5 - iter 385/773 - loss 0.02406075 - time (sec): 205.92 - samples/sec: 302.56 - lr: 0.000098 - momentum: 0.000000 2023-10-13 13:58:57,095 epoch 5 - iter 462/773 - loss 0.02389150 - time (sec): 246.15 - samples/sec: 303.26 - lr: 0.000096 - momentum: 0.000000 2023-10-13 13:59:36,789 epoch 5 - iter 539/773 - loss 0.02394706 - time (sec): 285.85 - samples/sec: 302.28 - lr: 0.000094 - momentum: 0.000000 2023-10-13 14:00:17,171 epoch 5 - iter 616/773 - loss 0.02525134 - time (sec): 326.23 - samples/sec: 305.03 - lr: 0.000093 - momentum: 0.000000 2023-10-13 14:00:56,105 epoch 5 - iter 693/773 - loss 0.02473743 - time (sec): 365.16 - samples/sec: 305.88 - lr: 0.000091 - momentum: 0.000000 2023-10-13 14:01:35,740 epoch 5 - iter 770/773 - loss 0.02498417 - time (sec): 404.80 - samples/sec: 305.61 - lr: 0.000089 - momentum: 0.000000 2023-10-13 14:01:37,312 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:01:37,312 EPOCH 5 done: loss 0.0251 - lr: 0.000089 2023-10-13 14:01:54,361 DEV : loss 0.06829117983579636 - f1-score (micro avg) 0.818 2023-10-13 14:01:54,402 saving best model 2023-10-13 14:01:57,066 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:02:37,096 epoch 6 - iter 77/773 - loss 0.01718220 - time (sec): 40.02 - samples/sec: 279.69 - lr: 0.000087 - momentum: 0.000000 2023-10-13 14:03:16,814 epoch 6 - iter 154/773 - loss 0.01550972 - time (sec): 79.74 - samples/sec: 303.25 - lr: 0.000085 - momentum: 0.000000 2023-10-13 14:03:56,340 epoch 6 - iter 231/773 - loss 0.01618194 - time (sec): 119.27 - samples/sec: 304.29 - lr: 0.000084 - momentum: 0.000000 2023-10-13 14:04:35,237 epoch 6 - iter 308/773 - loss 0.01557615 - time (sec): 158.17 - samples/sec: 306.65 - lr: 0.000082 - momentum: 0.000000 2023-10-13 14:05:14,712 epoch 6 - iter 385/773 - loss 0.01677557 - time (sec): 197.64 - samples/sec: 306.89 - lr: 0.000080 - momentum: 0.000000 2023-10-13 14:05:55,045 epoch 6 - iter 462/773 - loss 0.01629595 - time (sec): 237.97 - samples/sec: 309.87 - lr: 0.000078 - momentum: 0.000000 2023-10-13 14:06:34,474 epoch 6 - iter 539/773 - loss 0.01552389 - time (sec): 277.40 - samples/sec: 310.32 - lr: 0.000077 - momentum: 0.000000 2023-10-13 14:07:15,189 epoch 6 - iter 616/773 - loss 0.01710578 - time (sec): 318.12 - samples/sec: 311.28 - lr: 0.000075 - momentum: 0.000000 2023-10-13 14:07:56,193 epoch 6 - iter 693/773 - loss 0.01669410 - time (sec): 359.12 - samples/sec: 310.20 - lr: 0.000073 - momentum: 0.000000 2023-10-13 14:08:37,693 epoch 6 - iter 770/773 - loss 0.01618705 - time (sec): 400.62 - samples/sec: 308.81 - lr: 0.000071 - momentum: 0.000000 2023-10-13 14:08:39,327 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:08:39,327 EPOCH 6 done: loss 0.0162 - lr: 0.000071 2023-10-13 14:08:57,695 DEV : loss 0.08230733126401901 - f1-score (micro avg) 0.8056 2023-10-13 14:08:57,725 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:09:38,784 epoch 7 - iter 77/773 - loss 0.01104098 - time (sec): 41.06 - samples/sec: 301.10 - lr: 0.000069 - momentum: 0.000000 2023-10-13 14:10:21,690 epoch 7 - iter 154/773 - loss 0.01084070 - time (sec): 83.96 - samples/sec: 290.25 - lr: 0.000068 - momentum: 0.000000 2023-10-13 14:11:03,996 epoch 7 - iter 231/773 - loss 0.00993163 - time (sec): 126.27 - samples/sec: 289.09 - lr: 0.000066 - momentum: 0.000000 2023-10-13 14:11:45,648 epoch 7 - iter 308/773 - loss 0.00931498 - time (sec): 167.92 - samples/sec: 297.79 - lr: 0.000064 - momentum: 0.000000 2023-10-13 14:12:25,998 epoch 7 - iter 385/773 - loss 0.00990930 - time (sec): 208.27 - samples/sec: 300.74 - lr: 0.000062 - momentum: 0.000000 2023-10-13 14:13:08,002 epoch 7 - iter 462/773 - loss 0.01091458 - time (sec): 250.27 - samples/sec: 297.45 - lr: 0.000061 - momentum: 0.000000 2023-10-13 14:13:50,331 epoch 7 - iter 539/773 - loss 0.01055709 - time (sec): 292.60 - samples/sec: 295.05 - lr: 0.000059 - momentum: 0.000000 2023-10-13 14:14:31,155 epoch 7 - iter 616/773 - loss 0.01103797 - time (sec): 333.43 - samples/sec: 296.92 - lr: 0.000057 - momentum: 0.000000 2023-10-13 14:15:12,601 epoch 7 - iter 693/773 - loss 0.01095294 - time (sec): 374.87 - samples/sec: 297.76 - lr: 0.000055 - momentum: 0.000000 2023-10-13 14:15:57,007 epoch 7 - iter 770/773 - loss 0.01084300 - time (sec): 419.28 - samples/sec: 295.63 - lr: 0.000054 - momentum: 0.000000 2023-10-13 14:15:58,471 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:15:58,471 EPOCH 7 done: loss 0.0112 - lr: 0.000054 2023-10-13 14:16:15,743 DEV : loss 0.08712616562843323 - f1-score (micro avg) 0.8209 2023-10-13 14:16:15,772 saving best model 2023-10-13 14:16:18,511 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:16:58,837 epoch 8 - iter 77/773 - loss 0.00793344 - time (sec): 40.32 - samples/sec: 299.32 - lr: 0.000052 - momentum: 0.000000 2023-10-13 14:17:41,619 epoch 8 - iter 154/773 - loss 0.00868923 - time (sec): 83.10 - samples/sec: 302.77 - lr: 0.000050 - momentum: 0.000000 2023-10-13 14:18:22,789 epoch 8 - iter 231/773 - loss 0.00820784 - time (sec): 124.27 - samples/sec: 299.14 - lr: 0.000048 - momentum: 0.000000 2023-10-13 14:19:05,552 epoch 8 - iter 308/773 - loss 0.00790837 - time (sec): 167.04 - samples/sec: 291.71 - lr: 0.000046 - momentum: 0.000000 2023-10-13 14:19:44,332 epoch 8 - iter 385/773 - loss 0.00769877 - time (sec): 205.82 - samples/sec: 291.56 - lr: 0.000045 - momentum: 0.000000 2023-10-13 14:20:24,710 epoch 8 - iter 462/773 - loss 0.00752209 - time (sec): 246.19 - samples/sec: 297.01 - lr: 0.000043 - momentum: 0.000000 2023-10-13 14:21:09,307 epoch 8 - iter 539/773 - loss 0.00819953 - time (sec): 290.79 - samples/sec: 297.36 - lr: 0.000041 - momentum: 0.000000 2023-10-13 14:21:51,152 epoch 8 - iter 616/773 - loss 0.00792088 - time (sec): 332.64 - samples/sec: 298.68 - lr: 0.000039 - momentum: 0.000000 2023-10-13 14:22:31,008 epoch 8 - iter 693/773 - loss 0.00757495 - time (sec): 372.49 - samples/sec: 299.83 - lr: 0.000038 - momentum: 0.000000 2023-10-13 14:23:10,962 epoch 8 - iter 770/773 - loss 0.00734145 - time (sec): 412.45 - samples/sec: 299.81 - lr: 0.000036 - momentum: 0.000000 2023-10-13 14:23:12,536 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:23:12,537 EPOCH 8 done: loss 0.0073 - lr: 0.000036 2023-10-13 14:23:29,813 DEV : loss 0.0935787558555603 - f1-score (micro avg) 0.8129 2023-10-13 14:23:29,844 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:24:10,868 epoch 9 - iter 77/773 - loss 0.00369340 - time (sec): 41.02 - samples/sec: 304.99 - lr: 0.000034 - momentum: 0.000000 2023-10-13 14:24:52,522 epoch 9 - iter 154/773 - loss 0.00441737 - time (sec): 82.68 - samples/sec: 306.66 - lr: 0.000032 - momentum: 0.000000 2023-10-13 14:25:34,148 epoch 9 - iter 231/773 - loss 0.00418139 - time (sec): 124.30 - samples/sec: 306.22 - lr: 0.000030 - momentum: 0.000000 2023-10-13 14:26:18,115 epoch 9 - iter 308/773 - loss 0.00465322 - time (sec): 168.27 - samples/sec: 297.99 - lr: 0.000029 - momentum: 0.000000 2023-10-13 14:26:58,721 epoch 9 - iter 385/773 - loss 0.00496921 - time (sec): 208.87 - samples/sec: 301.32 - lr: 0.000027 - momentum: 0.000000 2023-10-13 14:27:41,960 epoch 9 - iter 462/773 - loss 0.00516472 - time (sec): 252.11 - samples/sec: 298.31 - lr: 0.000025 - momentum: 0.000000 2023-10-13 14:28:24,233 epoch 9 - iter 539/773 - loss 0.00499233 - time (sec): 294.39 - samples/sec: 294.84 - lr: 0.000023 - momentum: 0.000000 2023-10-13 14:29:05,825 epoch 9 - iter 616/773 - loss 0.00572259 - time (sec): 335.98 - samples/sec: 296.74 - lr: 0.000022 - momentum: 0.000000 2023-10-13 14:29:51,119 epoch 9 - iter 693/773 - loss 0.00573440 - time (sec): 381.27 - samples/sec: 293.32 - lr: 0.000020 - momentum: 0.000000 2023-10-13 14:30:33,427 epoch 9 - iter 770/773 - loss 0.00567544 - time (sec): 423.58 - samples/sec: 292.77 - lr: 0.000018 - momentum: 0.000000 2023-10-13 14:30:34,870 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:30:34,871 EPOCH 9 done: loss 0.0057 - lr: 0.000018 2023-10-13 14:30:52,483 DEV : loss 0.09646561741828918 - f1-score (micro avg) 0.816 2023-10-13 14:30:52,518 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:31:38,164 epoch 10 - iter 77/773 - loss 0.00271756 - time (sec): 45.64 - samples/sec: 269.06 - lr: 0.000016 - momentum: 0.000000 2023-10-13 14:32:20,990 epoch 10 - iter 154/773 - loss 0.00479762 - time (sec): 88.47 - samples/sec: 284.55 - lr: 0.000014 - momentum: 0.000000 2023-10-13 14:33:05,141 epoch 10 - iter 231/773 - loss 0.00480447 - time (sec): 132.62 - samples/sec: 282.47 - lr: 0.000013 - momentum: 0.000000 2023-10-13 14:33:48,679 epoch 10 - iter 308/773 - loss 0.00412616 - time (sec): 176.16 - samples/sec: 282.82 - lr: 0.000011 - momentum: 0.000000 2023-10-13 14:34:28,762 epoch 10 - iter 385/773 - loss 0.00410371 - time (sec): 216.24 - samples/sec: 287.37 - lr: 0.000009 - momentum: 0.000000 2023-10-13 14:35:08,671 epoch 10 - iter 462/773 - loss 0.00414590 - time (sec): 256.15 - samples/sec: 288.15 - lr: 0.000007 - momentum: 0.000000 2023-10-13 14:35:47,002 epoch 10 - iter 539/773 - loss 0.00433560 - time (sec): 294.48 - samples/sec: 292.35 - lr: 0.000006 - momentum: 0.000000 2023-10-13 14:36:24,903 epoch 10 - iter 616/773 - loss 0.00453624 - time (sec): 332.38 - samples/sec: 294.15 - lr: 0.000004 - momentum: 0.000000 2023-10-13 14:37:04,210 epoch 10 - iter 693/773 - loss 0.00469727 - time (sec): 371.69 - samples/sec: 299.05 - lr: 0.000002 - momentum: 0.000000 2023-10-13 14:37:44,090 epoch 10 - iter 770/773 - loss 0.00452566 - time (sec): 411.57 - samples/sec: 300.46 - lr: 0.000000 - momentum: 0.000000 2023-10-13 14:37:45,747 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:37:45,747 EPOCH 10 done: loss 0.0045 - lr: 0.000000 2023-10-13 14:38:02,933 DEV : loss 0.09914136677980423 - f1-score (micro avg) 0.8129 2023-10-13 14:38:03,928 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:38:03,930 Loading model from best epoch ... 2023-10-13 14:38:08,507 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-BUILDING, B-BUILDING, E-BUILDING, I-BUILDING, S-STREET, B-STREET, E-STREET, I-STREET 2023-10-13 14:39:03,429 Results: - F-score (micro) 0.8076 - F-score (macro) 0.725 - Accuracy 0.6957 By class: precision recall f1-score support LOC 0.8396 0.8742 0.8566 946 BUILDING 0.5746 0.5622 0.5683 185 STREET 0.7031 0.8036 0.7500 56 micro avg 0.7935 0.8222 0.8076 1187 macro avg 0.7058 0.7466 0.7250 1187 weighted avg 0.7919 0.8222 0.8066 1187 2023-10-13 14:39:03,429 ----------------------------------------------------------------------------------------------------