2023-10-12 09:12:31,013 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:12:31,015 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 09:12:31,016 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:12:31,016 MultiCorpus: 5777 train + 722 dev + 723 test sentences - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl 2023-10-12 09:12:31,016 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:12:31,016 Train: 5777 sentences 2023-10-12 09:12:31,016 (train_with_dev=False, train_with_test=False) 2023-10-12 09:12:31,016 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:12:31,016 Training Params: 2023-10-12 09:12:31,016 - learning_rate: "0.00016" 2023-10-12 09:12:31,016 - mini_batch_size: "8" 2023-10-12 09:12:31,016 - max_epochs: "10" 2023-10-12 09:12:31,016 - shuffle: "True" 2023-10-12 09:12:31,017 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:12:31,017 Plugins: 2023-10-12 09:12:31,017 - TensorboardLogger 2023-10-12 09:12:31,017 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 09:12:31,017 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:12:31,017 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 09:12:31,017 - metric: "('micro avg', 'f1-score')" 2023-10-12 09:12:31,017 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:12:31,017 Computation: 2023-10-12 09:12:31,017 - compute on device: cuda:0 2023-10-12 09:12:31,017 - embedding storage: none 2023-10-12 09:12:31,017 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:12:31,017 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3" 2023-10-12 09:12:31,017 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:12:31,017 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:12:31,018 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 09:13:10,651 epoch 1 - iter 72/723 - loss 2.54068274 - time (sec): 39.63 - samples/sec: 435.41 - lr: 0.000016 - momentum: 0.000000 2023-10-12 09:13:49,781 epoch 1 - iter 144/723 - loss 2.48896229 - time (sec): 78.76 - samples/sec: 437.46 - lr: 0.000032 - momentum: 0.000000 2023-10-12 09:14:30,289 epoch 1 - iter 216/723 - loss 2.31703108 - time (sec): 119.27 - samples/sec: 439.33 - lr: 0.000048 - momentum: 0.000000 2023-10-12 09:15:08,873 epoch 1 - iter 288/723 - loss 2.11125208 - time (sec): 157.85 - samples/sec: 438.77 - lr: 0.000064 - momentum: 0.000000 2023-10-12 09:15:47,731 epoch 1 - iter 360/723 - loss 1.87865228 - time (sec): 196.71 - samples/sec: 443.76 - lr: 0.000079 - momentum: 0.000000 2023-10-12 09:16:25,957 epoch 1 - iter 432/723 - loss 1.66306784 - time (sec): 234.94 - samples/sec: 444.23 - lr: 0.000095 - momentum: 0.000000 2023-10-12 09:17:05,925 epoch 1 - iter 504/723 - loss 1.45592361 - time (sec): 274.91 - samples/sec: 449.78 - lr: 0.000111 - momentum: 0.000000 2023-10-12 09:17:44,352 epoch 1 - iter 576/723 - loss 1.31694027 - time (sec): 313.33 - samples/sec: 446.12 - lr: 0.000127 - momentum: 0.000000 2023-10-12 09:18:22,944 epoch 1 - iter 648/723 - loss 1.19195801 - time (sec): 351.92 - samples/sec: 447.36 - lr: 0.000143 - momentum: 0.000000 2023-10-12 09:19:01,438 epoch 1 - iter 720/723 - loss 1.08849052 - time (sec): 390.42 - samples/sec: 449.67 - lr: 0.000159 - momentum: 0.000000 2023-10-12 09:19:02,618 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:19:02,618 EPOCH 1 done: loss 1.0851 - lr: 0.000159 2023-10-12 09:19:21,607 DEV : loss 0.19754847884178162 - f1-score (micro avg) 0.3452 2023-10-12 09:19:21,636 saving best model 2023-10-12 09:19:22,482 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:19:59,860 epoch 2 - iter 72/723 - loss 0.15960703 - time (sec): 37.38 - samples/sec: 474.56 - lr: 0.000158 - momentum: 0.000000 2023-10-12 09:20:37,795 epoch 2 - iter 144/723 - loss 0.15205605 - time (sec): 75.31 - samples/sec: 476.10 - lr: 0.000156 - momentum: 0.000000 2023-10-12 09:21:15,356 epoch 2 - iter 216/723 - loss 0.14254311 - time (sec): 112.87 - samples/sec: 476.58 - lr: 0.000155 - momentum: 0.000000 2023-10-12 09:21:53,960 epoch 2 - iter 288/723 - loss 0.13485595 - time (sec): 151.48 - samples/sec: 482.39 - lr: 0.000153 - momentum: 0.000000 2023-10-12 09:22:30,418 epoch 2 - iter 360/723 - loss 0.12904402 - time (sec): 187.93 - samples/sec: 476.01 - lr: 0.000151 - momentum: 0.000000 2023-10-12 09:23:07,023 epoch 2 - iter 432/723 - loss 0.12930298 - time (sec): 224.54 - samples/sec: 472.38 - lr: 0.000149 - momentum: 0.000000 2023-10-12 09:23:43,890 epoch 2 - iter 504/723 - loss 0.12740399 - time (sec): 261.41 - samples/sec: 472.37 - lr: 0.000148 - momentum: 0.000000 2023-10-12 09:24:20,296 epoch 2 - iter 576/723 - loss 0.12436397 - time (sec): 297.81 - samples/sec: 471.01 - lr: 0.000146 - momentum: 0.000000 2023-10-12 09:24:58,056 epoch 2 - iter 648/723 - loss 0.12313707 - time (sec): 335.57 - samples/sec: 471.28 - lr: 0.000144 - momentum: 0.000000 2023-10-12 09:25:34,923 epoch 2 - iter 720/723 - loss 0.12108946 - time (sec): 372.44 - samples/sec: 471.44 - lr: 0.000142 - momentum: 0.000000 2023-10-12 09:25:36,090 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:25:36,091 EPOCH 2 done: loss 0.1208 - lr: 0.000142 2023-10-12 09:25:55,610 DEV : loss 0.10809068381786346 - f1-score (micro avg) 0.7546 2023-10-12 09:25:55,639 saving best model 2023-10-12 09:25:58,153 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:26:35,794 epoch 3 - iter 72/723 - loss 0.09524519 - time (sec): 37.64 - samples/sec: 465.61 - lr: 0.000140 - momentum: 0.000000 2023-10-12 09:27:12,312 epoch 3 - iter 144/723 - loss 0.08802830 - time (sec): 74.15 - samples/sec: 463.82 - lr: 0.000139 - momentum: 0.000000 2023-10-12 09:27:50,862 epoch 3 - iter 216/723 - loss 0.07792497 - time (sec): 112.70 - samples/sec: 462.77 - lr: 0.000137 - momentum: 0.000000 2023-10-12 09:28:27,774 epoch 3 - iter 288/723 - loss 0.08033417 - time (sec): 149.62 - samples/sec: 455.30 - lr: 0.000135 - momentum: 0.000000 2023-10-12 09:29:07,289 epoch 3 - iter 360/723 - loss 0.07718260 - time (sec): 189.13 - samples/sec: 457.93 - lr: 0.000133 - momentum: 0.000000 2023-10-12 09:29:45,291 epoch 3 - iter 432/723 - loss 0.07571314 - time (sec): 227.13 - samples/sec: 458.11 - lr: 0.000132 - momentum: 0.000000 2023-10-12 09:30:25,618 epoch 3 - iter 504/723 - loss 0.07655001 - time (sec): 267.46 - samples/sec: 457.54 - lr: 0.000130 - momentum: 0.000000 2023-10-12 09:31:03,813 epoch 3 - iter 576/723 - loss 0.07485966 - time (sec): 305.66 - samples/sec: 455.69 - lr: 0.000128 - momentum: 0.000000 2023-10-12 09:31:42,269 epoch 3 - iter 648/723 - loss 0.07373807 - time (sec): 344.11 - samples/sec: 456.31 - lr: 0.000126 - momentum: 0.000000 2023-10-12 09:32:22,158 epoch 3 - iter 720/723 - loss 0.07223758 - time (sec): 384.00 - samples/sec: 457.61 - lr: 0.000125 - momentum: 0.000000 2023-10-12 09:32:23,294 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:32:23,294 EPOCH 3 done: loss 0.0723 - lr: 0.000125 2023-10-12 09:32:44,912 DEV : loss 0.06969450414180756 - f1-score (micro avg) 0.8636 2023-10-12 09:32:44,945 saving best model 2023-10-12 09:32:47,473 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:33:25,242 epoch 4 - iter 72/723 - loss 0.04406291 - time (sec): 37.76 - samples/sec: 481.38 - lr: 0.000123 - momentum: 0.000000 2023-10-12 09:34:02,574 epoch 4 - iter 144/723 - loss 0.04218829 - time (sec): 75.10 - samples/sec: 472.91 - lr: 0.000121 - momentum: 0.000000 2023-10-12 09:34:41,277 epoch 4 - iter 216/723 - loss 0.05044030 - time (sec): 113.80 - samples/sec: 467.34 - lr: 0.000119 - momentum: 0.000000 2023-10-12 09:35:20,827 epoch 4 - iter 288/723 - loss 0.04888086 - time (sec): 153.35 - samples/sec: 465.31 - lr: 0.000117 - momentum: 0.000000 2023-10-12 09:35:58,524 epoch 4 - iter 360/723 - loss 0.04697769 - time (sec): 191.05 - samples/sec: 460.95 - lr: 0.000116 - momentum: 0.000000 2023-10-12 09:36:36,190 epoch 4 - iter 432/723 - loss 0.04817827 - time (sec): 228.71 - samples/sec: 461.05 - lr: 0.000114 - momentum: 0.000000 2023-10-12 09:37:14,593 epoch 4 - iter 504/723 - loss 0.04752541 - time (sec): 267.12 - samples/sec: 464.52 - lr: 0.000112 - momentum: 0.000000 2023-10-12 09:37:50,742 epoch 4 - iter 576/723 - loss 0.04722134 - time (sec): 303.26 - samples/sec: 465.08 - lr: 0.000110 - momentum: 0.000000 2023-10-12 09:38:27,389 epoch 4 - iter 648/723 - loss 0.04692030 - time (sec): 339.91 - samples/sec: 466.31 - lr: 0.000109 - momentum: 0.000000 2023-10-12 09:39:03,653 epoch 4 - iter 720/723 - loss 0.04727199 - time (sec): 376.18 - samples/sec: 466.91 - lr: 0.000107 - momentum: 0.000000 2023-10-12 09:39:04,778 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:39:04,779 EPOCH 4 done: loss 0.0472 - lr: 0.000107 2023-10-12 09:39:25,632 DEV : loss 0.07068605720996857 - f1-score (micro avg) 0.8603 2023-10-12 09:39:25,665 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:40:02,917 epoch 5 - iter 72/723 - loss 0.02748124 - time (sec): 37.25 - samples/sec: 455.60 - lr: 0.000105 - momentum: 0.000000 2023-10-12 09:40:40,248 epoch 5 - iter 144/723 - loss 0.03666304 - time (sec): 74.58 - samples/sec: 457.23 - lr: 0.000103 - momentum: 0.000000 2023-10-12 09:41:17,534 epoch 5 - iter 216/723 - loss 0.03231394 - time (sec): 111.87 - samples/sec: 467.35 - lr: 0.000101 - momentum: 0.000000 2023-10-12 09:41:54,565 epoch 5 - iter 288/723 - loss 0.03031252 - time (sec): 148.90 - samples/sec: 471.60 - lr: 0.000100 - momentum: 0.000000 2023-10-12 09:42:31,819 epoch 5 - iter 360/723 - loss 0.03210903 - time (sec): 186.15 - samples/sec: 476.16 - lr: 0.000098 - momentum: 0.000000 2023-10-12 09:43:08,920 epoch 5 - iter 432/723 - loss 0.03173792 - time (sec): 223.25 - samples/sec: 474.45 - lr: 0.000096 - momentum: 0.000000 2023-10-12 09:43:46,282 epoch 5 - iter 504/723 - loss 0.03240803 - time (sec): 260.61 - samples/sec: 470.41 - lr: 0.000094 - momentum: 0.000000 2023-10-12 09:44:24,185 epoch 5 - iter 576/723 - loss 0.03345992 - time (sec): 298.52 - samples/sec: 469.45 - lr: 0.000093 - momentum: 0.000000 2023-10-12 09:45:01,744 epoch 5 - iter 648/723 - loss 0.03356952 - time (sec): 336.08 - samples/sec: 467.27 - lr: 0.000091 - momentum: 0.000000 2023-10-12 09:45:40,826 epoch 5 - iter 720/723 - loss 0.03345778 - time (sec): 375.16 - samples/sec: 468.24 - lr: 0.000089 - momentum: 0.000000 2023-10-12 09:45:42,036 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:45:42,036 EPOCH 5 done: loss 0.0334 - lr: 0.000089 2023-10-12 09:46:03,200 DEV : loss 0.08003567904233932 - f1-score (micro avg) 0.8598 2023-10-12 09:46:03,230 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:46:41,057 epoch 6 - iter 72/723 - loss 0.01469662 - time (sec): 37.83 - samples/sec: 476.88 - lr: 0.000087 - momentum: 0.000000 2023-10-12 09:47:17,385 epoch 6 - iter 144/723 - loss 0.02012899 - time (sec): 74.15 - samples/sec: 458.62 - lr: 0.000085 - momentum: 0.000000 2023-10-12 09:47:54,113 epoch 6 - iter 216/723 - loss 0.02320686 - time (sec): 110.88 - samples/sec: 462.61 - lr: 0.000084 - momentum: 0.000000 2023-10-12 09:48:30,521 epoch 6 - iter 288/723 - loss 0.02223906 - time (sec): 147.29 - samples/sec: 460.08 - lr: 0.000082 - momentum: 0.000000 2023-10-12 09:49:07,665 epoch 6 - iter 360/723 - loss 0.02179582 - time (sec): 184.43 - samples/sec: 466.52 - lr: 0.000080 - momentum: 0.000000 2023-10-12 09:49:44,594 epoch 6 - iter 432/723 - loss 0.02302519 - time (sec): 221.36 - samples/sec: 467.19 - lr: 0.000078 - momentum: 0.000000 2023-10-12 09:50:23,505 epoch 6 - iter 504/723 - loss 0.02173843 - time (sec): 260.27 - samples/sec: 464.70 - lr: 0.000077 - momentum: 0.000000 2023-10-12 09:51:04,453 epoch 6 - iter 576/723 - loss 0.02233847 - time (sec): 301.22 - samples/sec: 460.32 - lr: 0.000075 - momentum: 0.000000 2023-10-12 09:51:48,517 epoch 6 - iter 648/723 - loss 0.02374348 - time (sec): 345.28 - samples/sec: 454.70 - lr: 0.000073 - momentum: 0.000000 2023-10-12 09:52:32,950 epoch 6 - iter 720/723 - loss 0.02595781 - time (sec): 389.72 - samples/sec: 450.50 - lr: 0.000071 - momentum: 0.000000 2023-10-12 09:52:34,302 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:52:34,303 EPOCH 6 done: loss 0.0262 - lr: 0.000071 2023-10-12 09:52:56,496 DEV : loss 0.09371042996644974 - f1-score (micro avg) 0.8562 2023-10-12 09:52:56,538 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:53:40,163 epoch 7 - iter 72/723 - loss 0.02096644 - time (sec): 43.62 - samples/sec: 427.85 - lr: 0.000069 - momentum: 0.000000 2023-10-12 09:54:20,242 epoch 7 - iter 144/723 - loss 0.01613838 - time (sec): 83.70 - samples/sec: 417.02 - lr: 0.000068 - momentum: 0.000000 2023-10-12 09:55:01,918 epoch 7 - iter 216/723 - loss 0.02027854 - time (sec): 125.38 - samples/sec: 423.27 - lr: 0.000066 - momentum: 0.000000 2023-10-12 09:55:46,254 epoch 7 - iter 288/723 - loss 0.02019598 - time (sec): 169.71 - samples/sec: 421.59 - lr: 0.000064 - momentum: 0.000000 2023-10-12 09:56:30,235 epoch 7 - iter 360/723 - loss 0.02145432 - time (sec): 213.69 - samples/sec: 418.12 - lr: 0.000062 - momentum: 0.000000 2023-10-12 09:57:11,668 epoch 7 - iter 432/723 - loss 0.02070345 - time (sec): 255.13 - samples/sec: 414.64 - lr: 0.000061 - momentum: 0.000000 2023-10-12 09:57:54,874 epoch 7 - iter 504/723 - loss 0.02079100 - time (sec): 298.33 - samples/sec: 413.30 - lr: 0.000059 - momentum: 0.000000 2023-10-12 09:58:39,294 epoch 7 - iter 576/723 - loss 0.02219735 - time (sec): 342.75 - samples/sec: 410.99 - lr: 0.000057 - momentum: 0.000000 2023-10-12 09:59:21,613 epoch 7 - iter 648/723 - loss 0.02096818 - time (sec): 385.07 - samples/sec: 410.00 - lr: 0.000055 - momentum: 0.000000 2023-10-12 10:00:01,414 epoch 7 - iter 720/723 - loss 0.02042740 - time (sec): 424.87 - samples/sec: 413.43 - lr: 0.000053 - momentum: 0.000000 2023-10-12 10:00:02,702 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:00:02,702 EPOCH 7 done: loss 0.0204 - lr: 0.000053 2023-10-12 10:00:23,279 DEV : loss 0.11184699088335037 - f1-score (micro avg) 0.8644 2023-10-12 10:00:23,310 saving best model 2023-10-12 10:00:26,348 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:01:04,206 epoch 8 - iter 72/723 - loss 0.00941353 - time (sec): 37.85 - samples/sec: 439.88 - lr: 0.000052 - momentum: 0.000000 2023-10-12 10:01:42,366 epoch 8 - iter 144/723 - loss 0.01683680 - time (sec): 76.01 - samples/sec: 453.17 - lr: 0.000050 - momentum: 0.000000 2023-10-12 10:02:20,195 epoch 8 - iter 216/723 - loss 0.01621885 - time (sec): 113.84 - samples/sec: 455.09 - lr: 0.000048 - momentum: 0.000000 2023-10-12 10:02:58,389 epoch 8 - iter 288/723 - loss 0.01774953 - time (sec): 152.04 - samples/sec: 459.51 - lr: 0.000046 - momentum: 0.000000 2023-10-12 10:03:36,210 epoch 8 - iter 360/723 - loss 0.01673376 - time (sec): 189.86 - samples/sec: 464.11 - lr: 0.000045 - momentum: 0.000000 2023-10-12 10:04:13,871 epoch 8 - iter 432/723 - loss 0.01870128 - time (sec): 227.52 - samples/sec: 464.71 - lr: 0.000043 - momentum: 0.000000 2023-10-12 10:04:52,284 epoch 8 - iter 504/723 - loss 0.01758556 - time (sec): 265.93 - samples/sec: 466.61 - lr: 0.000041 - momentum: 0.000000 2023-10-12 10:05:30,404 epoch 8 - iter 576/723 - loss 0.01710052 - time (sec): 304.05 - samples/sec: 467.78 - lr: 0.000039 - momentum: 0.000000 2023-10-12 10:06:08,098 epoch 8 - iter 648/723 - loss 0.01689011 - time (sec): 341.75 - samples/sec: 466.57 - lr: 0.000037 - momentum: 0.000000 2023-10-12 10:06:45,084 epoch 8 - iter 720/723 - loss 0.01690370 - time (sec): 378.73 - samples/sec: 463.95 - lr: 0.000036 - momentum: 0.000000 2023-10-12 10:06:46,164 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:06:46,164 EPOCH 8 done: loss 0.0169 - lr: 0.000036 2023-10-12 10:07:06,450 DEV : loss 0.11143004894256592 - f1-score (micro avg) 0.8698 2023-10-12 10:07:06,482 saving best model 2023-10-12 10:07:09,971 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:07:46,794 epoch 9 - iter 72/723 - loss 0.01791029 - time (sec): 36.82 - samples/sec: 467.39 - lr: 0.000034 - momentum: 0.000000 2023-10-12 10:08:23,570 epoch 9 - iter 144/723 - loss 0.01713879 - time (sec): 73.59 - samples/sec: 458.06 - lr: 0.000032 - momentum: 0.000000 2023-10-12 10:09:02,166 epoch 9 - iter 216/723 - loss 0.01306516 - time (sec): 112.19 - samples/sec: 469.03 - lr: 0.000030 - momentum: 0.000000 2023-10-12 10:09:39,707 epoch 9 - iter 288/723 - loss 0.01216452 - time (sec): 149.73 - samples/sec: 467.63 - lr: 0.000028 - momentum: 0.000000 2023-10-12 10:10:17,712 epoch 9 - iter 360/723 - loss 0.01176075 - time (sec): 187.74 - samples/sec: 464.93 - lr: 0.000027 - momentum: 0.000000 2023-10-12 10:10:54,836 epoch 9 - iter 432/723 - loss 0.01188469 - time (sec): 224.86 - samples/sec: 466.23 - lr: 0.000025 - momentum: 0.000000 2023-10-12 10:11:34,174 epoch 9 - iter 504/723 - loss 0.01118095 - time (sec): 264.20 - samples/sec: 464.20 - lr: 0.000023 - momentum: 0.000000 2023-10-12 10:12:12,050 epoch 9 - iter 576/723 - loss 0.01081907 - time (sec): 302.07 - samples/sec: 462.46 - lr: 0.000021 - momentum: 0.000000 2023-10-12 10:12:50,928 epoch 9 - iter 648/723 - loss 0.01218741 - time (sec): 340.95 - samples/sec: 462.49 - lr: 0.000020 - momentum: 0.000000 2023-10-12 10:13:29,451 epoch 9 - iter 720/723 - loss 0.01308210 - time (sec): 379.48 - samples/sec: 462.30 - lr: 0.000018 - momentum: 0.000000 2023-10-12 10:13:30,853 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:13:30,854 EPOCH 9 done: loss 0.0130 - lr: 0.000018 2023-10-12 10:13:51,633 DEV : loss 0.12346570193767548 - f1-score (micro avg) 0.8661 2023-10-12 10:13:51,663 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:14:29,981 epoch 10 - iter 72/723 - loss 0.01183784 - time (sec): 38.32 - samples/sec: 458.40 - lr: 0.000016 - momentum: 0.000000 2023-10-12 10:15:08,278 epoch 10 - iter 144/723 - loss 0.00969580 - time (sec): 76.61 - samples/sec: 454.26 - lr: 0.000014 - momentum: 0.000000 2023-10-12 10:15:46,823 epoch 10 - iter 216/723 - loss 0.00857501 - time (sec): 115.16 - samples/sec: 460.10 - lr: 0.000012 - momentum: 0.000000 2023-10-12 10:16:24,331 epoch 10 - iter 288/723 - loss 0.00966868 - time (sec): 152.67 - samples/sec: 456.38 - lr: 0.000011 - momentum: 0.000000 2023-10-12 10:17:01,431 epoch 10 - iter 360/723 - loss 0.00912886 - time (sec): 189.77 - samples/sec: 453.30 - lr: 0.000009 - momentum: 0.000000 2023-10-12 10:17:40,125 epoch 10 - iter 432/723 - loss 0.00936631 - time (sec): 228.46 - samples/sec: 455.61 - lr: 0.000007 - momentum: 0.000000 2023-10-12 10:18:19,178 epoch 10 - iter 504/723 - loss 0.00919697 - time (sec): 267.51 - samples/sec: 457.77 - lr: 0.000005 - momentum: 0.000000 2023-10-12 10:18:57,221 epoch 10 - iter 576/723 - loss 0.00981610 - time (sec): 305.56 - samples/sec: 457.01 - lr: 0.000004 - momentum: 0.000000 2023-10-12 10:19:35,541 epoch 10 - iter 648/723 - loss 0.01029312 - time (sec): 343.88 - samples/sec: 457.54 - lr: 0.000002 - momentum: 0.000000 2023-10-12 10:20:14,421 epoch 10 - iter 720/723 - loss 0.01158386 - time (sec): 382.76 - samples/sec: 459.37 - lr: 0.000000 - momentum: 0.000000 2023-10-12 10:20:15,483 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:20:15,484 EPOCH 10 done: loss 0.0116 - lr: 0.000000 2023-10-12 10:20:35,988 DEV : loss 0.12695999443531036 - f1-score (micro avg) 0.866 2023-10-12 10:20:36,903 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:20:36,905 Loading model from best epoch ... 2023-10-12 10:20:41,160 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-12 10:21:01,016 Results: - F-score (micro) 0.8354 - F-score (macro) 0.7615 - Accuracy 0.7266 By class: precision recall f1-score support PER 0.8275 0.8755 0.8508 482 LOC 0.9082 0.8210 0.8624 458 ORG 0.5385 0.6087 0.5714 69 micro avg 0.8383 0.8325 0.8354 1009 macro avg 0.7580 0.7684 0.7615 1009 weighted avg 0.8443 0.8325 0.8370 1009 2023-10-12 10:21:01,016 ----------------------------------------------------------------------------------------------------