2023-10-13 08:19:08,899 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:19:08,902 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 08:19:08,902 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:19:08,902 MultiCorpus: 6183 train + 680 dev + 2113 test sentences - NER_HIPE_2022 Corpus: 6183 train + 680 dev + 2113 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/topres19th/en/with_doc_seperator 2023-10-13 08:19:08,902 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:19:08,902 Train: 6183 sentences 2023-10-13 08:19:08,902 (train_with_dev=False, train_with_test=False) 2023-10-13 08:19:08,902 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:19:08,902 Training Params: 2023-10-13 08:19:08,903 - learning_rate: "0.00016" 2023-10-13 08:19:08,903 - mini_batch_size: "8" 2023-10-13 08:19:08,903 - max_epochs: "10" 2023-10-13 08:19:08,903 - shuffle: "True" 2023-10-13 08:19:08,903 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:19:08,903 Plugins: 2023-10-13 08:19:08,903 - TensorboardLogger 2023-10-13 08:19:08,903 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 08:19:08,903 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:19:08,903 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 08:19:08,903 - metric: "('micro avg', 'f1-score')" 2023-10-13 08:19:08,903 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:19:08,904 Computation: 2023-10-13 08:19:08,904 - compute on device: cuda:0 2023-10-13 08:19:08,904 - embedding storage: none 2023-10-13 08:19:08,904 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:19:08,904 Model training base path: "hmbench-topres19th/en-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-1" 2023-10-13 08:19:08,904 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:19:08,904 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:19:08,904 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-13 08:19:50,141 epoch 1 - iter 77/773 - loss 2.58730266 - time (sec): 41.23 - samples/sec: 301.42 - lr: 0.000016 - momentum: 0.000000 2023-10-13 08:20:30,710 epoch 1 - iter 154/773 - loss 2.55059890 - time (sec): 81.80 - samples/sec: 299.71 - lr: 0.000032 - momentum: 0.000000 2023-10-13 08:21:10,668 epoch 1 - iter 231/773 - loss 2.39904412 - time (sec): 121.76 - samples/sec: 297.29 - lr: 0.000048 - momentum: 0.000000 2023-10-13 08:21:51,323 epoch 1 - iter 308/773 - loss 2.15465609 - time (sec): 162.42 - samples/sec: 304.57 - lr: 0.000064 - momentum: 0.000000 2023-10-13 08:22:32,528 epoch 1 - iter 385/773 - loss 1.91746015 - time (sec): 203.62 - samples/sec: 302.07 - lr: 0.000079 - momentum: 0.000000 2023-10-13 08:23:13,165 epoch 1 - iter 462/773 - loss 1.68296157 - time (sec): 244.26 - samples/sec: 301.35 - lr: 0.000095 - momentum: 0.000000 2023-10-13 08:23:54,263 epoch 1 - iter 539/773 - loss 1.47564652 - time (sec): 285.36 - samples/sec: 302.30 - lr: 0.000111 - momentum: 0.000000 2023-10-13 08:24:33,555 epoch 1 - iter 616/773 - loss 1.33003809 - time (sec): 324.65 - samples/sec: 300.69 - lr: 0.000127 - momentum: 0.000000 2023-10-13 08:25:14,174 epoch 1 - iter 693/773 - loss 1.19198349 - time (sec): 365.27 - samples/sec: 304.37 - lr: 0.000143 - momentum: 0.000000 2023-10-13 08:25:54,562 epoch 1 - iter 770/773 - loss 1.08563981 - time (sec): 405.66 - samples/sec: 305.32 - lr: 0.000159 - momentum: 0.000000 2023-10-13 08:25:56,056 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:25:56,056 EPOCH 1 done: loss 1.0826 - lr: 0.000159 2023-10-13 08:26:13,153 DEV : loss 0.11099149286746979 - f1-score (micro avg) 0.1016 2023-10-13 08:26:13,184 saving best model 2023-10-13 08:26:14,082 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:26:54,066 epoch 2 - iter 77/773 - loss 0.14305981 - time (sec): 39.98 - samples/sec: 292.54 - lr: 0.000158 - momentum: 0.000000 2023-10-13 08:27:35,342 epoch 2 - iter 154/773 - loss 0.13795277 - time (sec): 81.26 - samples/sec: 290.12 - lr: 0.000156 - momentum: 0.000000 2023-10-13 08:28:17,325 epoch 2 - iter 231/773 - loss 0.13042122 - time (sec): 123.24 - samples/sec: 296.86 - lr: 0.000155 - momentum: 0.000000 2023-10-13 08:28:59,076 epoch 2 - iter 308/773 - loss 0.12544485 - time (sec): 164.99 - samples/sec: 299.44 - lr: 0.000153 - momentum: 0.000000 2023-10-13 08:29:40,403 epoch 2 - iter 385/773 - loss 0.11796362 - time (sec): 206.32 - samples/sec: 301.05 - lr: 0.000151 - momentum: 0.000000 2023-10-13 08:30:20,228 epoch 2 - iter 462/773 - loss 0.11608104 - time (sec): 246.14 - samples/sec: 298.83 - lr: 0.000149 - momentum: 0.000000 2023-10-13 08:31:00,760 epoch 2 - iter 539/773 - loss 0.11492213 - time (sec): 286.68 - samples/sec: 298.69 - lr: 0.000148 - momentum: 0.000000 2023-10-13 08:31:41,887 epoch 2 - iter 616/773 - loss 0.11409853 - time (sec): 327.80 - samples/sec: 298.31 - lr: 0.000146 - momentum: 0.000000 2023-10-13 08:32:24,110 epoch 2 - iter 693/773 - loss 0.10924535 - time (sec): 370.03 - samples/sec: 300.79 - lr: 0.000144 - momentum: 0.000000 2023-10-13 08:33:04,730 epoch 2 - iter 770/773 - loss 0.10522174 - time (sec): 410.65 - samples/sec: 301.53 - lr: 0.000142 - momentum: 0.000000 2023-10-13 08:33:06,213 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:33:06,214 EPOCH 2 done: loss 0.1051 - lr: 0.000142 2023-10-13 08:33:23,548 DEV : loss 0.05811596289277077 - f1-score (micro avg) 0.7475 2023-10-13 08:33:23,577 saving best model 2023-10-13 08:33:26,281 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:34:06,347 epoch 3 - iter 77/773 - loss 0.07785087 - time (sec): 40.06 - samples/sec: 304.00 - lr: 0.000140 - momentum: 0.000000 2023-10-13 08:34:47,144 epoch 3 - iter 154/773 - loss 0.07400192 - time (sec): 80.86 - samples/sec: 301.26 - lr: 0.000139 - momentum: 0.000000 2023-10-13 08:35:27,746 epoch 3 - iter 231/773 - loss 0.06533236 - time (sec): 121.46 - samples/sec: 307.32 - lr: 0.000137 - momentum: 0.000000 2023-10-13 08:36:06,754 epoch 3 - iter 308/773 - loss 0.06656562 - time (sec): 160.47 - samples/sec: 301.45 - lr: 0.000135 - momentum: 0.000000 2023-10-13 08:36:46,986 epoch 3 - iter 385/773 - loss 0.06409639 - time (sec): 200.70 - samples/sec: 303.24 - lr: 0.000133 - momentum: 0.000000 2023-10-13 08:37:28,249 epoch 3 - iter 462/773 - loss 0.06382119 - time (sec): 241.96 - samples/sec: 303.68 - lr: 0.000132 - momentum: 0.000000 2023-10-13 08:38:10,144 epoch 3 - iter 539/773 - loss 0.06259696 - time (sec): 283.86 - samples/sec: 303.91 - lr: 0.000130 - momentum: 0.000000 2023-10-13 08:38:52,032 epoch 3 - iter 616/773 - loss 0.06115232 - time (sec): 325.75 - samples/sec: 303.96 - lr: 0.000128 - momentum: 0.000000 2023-10-13 08:39:34,247 epoch 3 - iter 693/773 - loss 0.06103512 - time (sec): 367.96 - samples/sec: 304.31 - lr: 0.000126 - momentum: 0.000000 2023-10-13 08:40:14,706 epoch 3 - iter 770/773 - loss 0.06073176 - time (sec): 408.42 - samples/sec: 303.28 - lr: 0.000125 - momentum: 0.000000 2023-10-13 08:40:16,218 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:40:16,218 EPOCH 3 done: loss 0.0606 - lr: 0.000125 2023-10-13 08:40:33,594 DEV : loss 0.0552801787853241 - f1-score (micro avg) 0.7881 2023-10-13 08:40:33,623 saving best model 2023-10-13 08:40:36,276 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:41:18,425 epoch 4 - iter 77/773 - loss 0.04476949 - time (sec): 42.15 - samples/sec: 300.04 - lr: 0.000123 - momentum: 0.000000 2023-10-13 08:42:00,086 epoch 4 - iter 154/773 - loss 0.03746333 - time (sec): 83.81 - samples/sec: 295.93 - lr: 0.000121 - momentum: 0.000000 2023-10-13 08:42:42,080 epoch 4 - iter 231/773 - loss 0.03607636 - time (sec): 125.80 - samples/sec: 298.86 - lr: 0.000119 - momentum: 0.000000 2023-10-13 08:43:22,064 epoch 4 - iter 308/773 - loss 0.03867782 - time (sec): 165.78 - samples/sec: 293.02 - lr: 0.000117 - momentum: 0.000000 2023-10-13 08:44:02,559 epoch 4 - iter 385/773 - loss 0.03895762 - time (sec): 206.28 - samples/sec: 296.12 - lr: 0.000116 - momentum: 0.000000 2023-10-13 08:44:43,495 epoch 4 - iter 462/773 - loss 0.03983620 - time (sec): 247.22 - samples/sec: 298.78 - lr: 0.000114 - momentum: 0.000000 2023-10-13 08:45:25,523 epoch 4 - iter 539/773 - loss 0.03920856 - time (sec): 289.24 - samples/sec: 299.42 - lr: 0.000112 - momentum: 0.000000 2023-10-13 08:46:08,084 epoch 4 - iter 616/773 - loss 0.03888995 - time (sec): 331.80 - samples/sec: 298.84 - lr: 0.000110 - momentum: 0.000000 2023-10-13 08:46:50,278 epoch 4 - iter 693/773 - loss 0.03923841 - time (sec): 374.00 - samples/sec: 298.91 - lr: 0.000109 - momentum: 0.000000 2023-10-13 08:47:32,327 epoch 4 - iter 770/773 - loss 0.03856068 - time (sec): 416.05 - samples/sec: 297.69 - lr: 0.000107 - momentum: 0.000000 2023-10-13 08:47:33,887 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:47:33,887 EPOCH 4 done: loss 0.0385 - lr: 0.000107 2023-10-13 08:47:52,334 DEV : loss 0.06545507162809372 - f1-score (micro avg) 0.8016 2023-10-13 08:47:52,364 saving best model 2023-10-13 08:47:55,049 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:48:38,854 epoch 5 - iter 77/773 - loss 0.01846706 - time (sec): 43.80 - samples/sec: 288.79 - lr: 0.000105 - momentum: 0.000000 2023-10-13 08:49:21,723 epoch 5 - iter 154/773 - loss 0.02454332 - time (sec): 86.67 - samples/sec: 296.98 - lr: 0.000103 - momentum: 0.000000 2023-10-13 08:50:02,363 epoch 5 - iter 231/773 - loss 0.02461244 - time (sec): 127.31 - samples/sec: 293.55 - lr: 0.000101 - momentum: 0.000000 2023-10-13 08:50:44,834 epoch 5 - iter 308/773 - loss 0.02573056 - time (sec): 169.78 - samples/sec: 294.29 - lr: 0.000100 - momentum: 0.000000 2023-10-13 08:51:26,399 epoch 5 - iter 385/773 - loss 0.02503993 - time (sec): 211.34 - samples/sec: 293.14 - lr: 0.000098 - momentum: 0.000000 2023-10-13 08:52:08,335 epoch 5 - iter 462/773 - loss 0.02523957 - time (sec): 253.28 - samples/sec: 293.25 - lr: 0.000096 - momentum: 0.000000 2023-10-13 08:52:50,137 epoch 5 - iter 539/773 - loss 0.02476814 - time (sec): 295.08 - samples/sec: 292.50 - lr: 0.000094 - momentum: 0.000000 2023-10-13 08:53:32,478 epoch 5 - iter 616/773 - loss 0.02532580 - time (sec): 337.42 - samples/sec: 291.70 - lr: 0.000093 - momentum: 0.000000 2023-10-13 08:54:14,968 epoch 5 - iter 693/773 - loss 0.02453011 - time (sec): 379.91 - samples/sec: 292.62 - lr: 0.000091 - momentum: 0.000000 2023-10-13 08:54:57,343 epoch 5 - iter 770/773 - loss 0.02434157 - time (sec): 422.29 - samples/sec: 292.92 - lr: 0.000089 - momentum: 0.000000 2023-10-13 08:54:58,958 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:54:58,958 EPOCH 5 done: loss 0.0244 - lr: 0.000089 2023-10-13 08:55:16,703 DEV : loss 0.0751594752073288 - f1-score (micro avg) 0.8144 2023-10-13 08:55:16,734 saving best model 2023-10-13 08:55:19,789 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:56:01,233 epoch 6 - iter 77/773 - loss 0.01673791 - time (sec): 41.44 - samples/sec: 317.64 - lr: 0.000087 - momentum: 0.000000 2023-10-13 08:56:42,436 epoch 6 - iter 154/773 - loss 0.01680699 - time (sec): 82.64 - samples/sec: 312.25 - lr: 0.000085 - momentum: 0.000000 2023-10-13 08:57:23,912 epoch 6 - iter 231/773 - loss 0.01587428 - time (sec): 124.12 - samples/sec: 312.83 - lr: 0.000084 - momentum: 0.000000 2023-10-13 08:58:04,159 epoch 6 - iter 308/773 - loss 0.01703803 - time (sec): 164.37 - samples/sec: 309.35 - lr: 0.000082 - momentum: 0.000000 2023-10-13 08:58:45,074 epoch 6 - iter 385/773 - loss 0.01583363 - time (sec): 205.28 - samples/sec: 308.11 - lr: 0.000080 - momentum: 0.000000 2023-10-13 08:59:24,621 epoch 6 - iter 462/773 - loss 0.01716180 - time (sec): 244.83 - samples/sec: 302.92 - lr: 0.000078 - momentum: 0.000000 2023-10-13 09:00:06,607 epoch 6 - iter 539/773 - loss 0.01706255 - time (sec): 286.81 - samples/sec: 302.79 - lr: 0.000077 - momentum: 0.000000 2023-10-13 09:00:46,986 epoch 6 - iter 616/773 - loss 0.01715802 - time (sec): 327.19 - samples/sec: 302.13 - lr: 0.000075 - momentum: 0.000000 2023-10-13 09:01:27,957 epoch 6 - iter 693/773 - loss 0.01750027 - time (sec): 368.16 - samples/sec: 300.91 - lr: 0.000073 - momentum: 0.000000 2023-10-13 09:02:08,865 epoch 6 - iter 770/773 - loss 0.01766724 - time (sec): 409.07 - samples/sec: 302.36 - lr: 0.000071 - momentum: 0.000000 2023-10-13 09:02:10,479 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:02:10,479 EPOCH 6 done: loss 0.0176 - lr: 0.000071 2023-10-13 09:02:27,976 DEV : loss 0.08374767750501633 - f1-score (micro avg) 0.7984 2023-10-13 09:02:28,010 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:03:07,761 epoch 7 - iter 77/773 - loss 0.01033091 - time (sec): 39.75 - samples/sec: 295.81 - lr: 0.000069 - momentum: 0.000000 2023-10-13 09:03:48,071 epoch 7 - iter 154/773 - loss 0.01115286 - time (sec): 80.06 - samples/sec: 312.36 - lr: 0.000068 - momentum: 0.000000 2023-10-13 09:04:27,324 epoch 7 - iter 231/773 - loss 0.01249494 - time (sec): 119.31 - samples/sec: 308.50 - lr: 0.000066 - momentum: 0.000000 2023-10-13 09:05:07,581 epoch 7 - iter 308/773 - loss 0.01131802 - time (sec): 159.57 - samples/sec: 304.43 - lr: 0.000064 - momentum: 0.000000 2023-10-13 09:05:47,131 epoch 7 - iter 385/773 - loss 0.01181494 - time (sec): 199.12 - samples/sec: 300.09 - lr: 0.000062 - momentum: 0.000000 2023-10-13 09:06:28,020 epoch 7 - iter 462/773 - loss 0.01230035 - time (sec): 240.01 - samples/sec: 303.00 - lr: 0.000061 - momentum: 0.000000 2023-10-13 09:07:08,824 epoch 7 - iter 539/773 - loss 0.01249036 - time (sec): 280.81 - samples/sec: 306.60 - lr: 0.000059 - momentum: 0.000000 2023-10-13 09:07:49,019 epoch 7 - iter 616/773 - loss 0.01225369 - time (sec): 321.01 - samples/sec: 307.52 - lr: 0.000057 - momentum: 0.000000 2023-10-13 09:08:29,608 epoch 7 - iter 693/773 - loss 0.01221858 - time (sec): 361.60 - samples/sec: 311.11 - lr: 0.000055 - momentum: 0.000000 2023-10-13 09:09:08,846 epoch 7 - iter 770/773 - loss 0.01224012 - time (sec): 400.83 - samples/sec: 308.94 - lr: 0.000054 - momentum: 0.000000 2023-10-13 09:09:10,350 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:09:10,350 EPOCH 7 done: loss 0.0122 - lr: 0.000054 2023-10-13 09:09:27,602 DEV : loss 0.09720773994922638 - f1-score (micro avg) 0.7937 2023-10-13 09:09:27,634 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:10:07,791 epoch 8 - iter 77/773 - loss 0.01708732 - time (sec): 40.16 - samples/sec: 291.97 - lr: 0.000052 - momentum: 0.000000 2023-10-13 09:10:49,357 epoch 8 - iter 154/773 - loss 0.01343491 - time (sec): 81.72 - samples/sec: 296.01 - lr: 0.000050 - momentum: 0.000000 2023-10-13 09:11:30,343 epoch 8 - iter 231/773 - loss 0.01154821 - time (sec): 122.71 - samples/sec: 297.74 - lr: 0.000048 - momentum: 0.000000 2023-10-13 09:12:12,352 epoch 8 - iter 308/773 - loss 0.01089407 - time (sec): 164.72 - samples/sec: 295.88 - lr: 0.000046 - momentum: 0.000000 2023-10-13 09:12:53,254 epoch 8 - iter 385/773 - loss 0.01019327 - time (sec): 205.62 - samples/sec: 298.89 - lr: 0.000045 - momentum: 0.000000 2023-10-13 09:13:33,143 epoch 8 - iter 462/773 - loss 0.01018692 - time (sec): 245.51 - samples/sec: 299.10 - lr: 0.000043 - momentum: 0.000000 2023-10-13 09:14:13,585 epoch 8 - iter 539/773 - loss 0.00949815 - time (sec): 285.95 - samples/sec: 300.80 - lr: 0.000041 - momentum: 0.000000 2023-10-13 09:14:53,793 epoch 8 - iter 616/773 - loss 0.00908648 - time (sec): 326.16 - samples/sec: 302.26 - lr: 0.000039 - momentum: 0.000000 2023-10-13 09:15:35,446 epoch 8 - iter 693/773 - loss 0.00876514 - time (sec): 367.81 - samples/sec: 303.70 - lr: 0.000038 - momentum: 0.000000 2023-10-13 09:16:15,879 epoch 8 - iter 770/773 - loss 0.00886285 - time (sec): 408.24 - samples/sec: 303.07 - lr: 0.000036 - momentum: 0.000000 2023-10-13 09:16:17,537 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:16:17,538 EPOCH 8 done: loss 0.0088 - lr: 0.000036 2023-10-13 09:16:35,753 DEV : loss 0.09890314191579819 - f1-score (micro avg) 0.7911 2023-10-13 09:16:35,783 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:17:17,503 epoch 9 - iter 77/773 - loss 0.00646634 - time (sec): 41.72 - samples/sec: 304.55 - lr: 0.000034 - momentum: 0.000000 2023-10-13 09:17:58,563 epoch 9 - iter 154/773 - loss 0.00554555 - time (sec): 82.78 - samples/sec: 314.97 - lr: 0.000032 - momentum: 0.000000 2023-10-13 09:18:37,848 epoch 9 - iter 231/773 - loss 0.00533044 - time (sec): 122.06 - samples/sec: 310.68 - lr: 0.000030 - momentum: 0.000000 2023-10-13 09:19:18,324 epoch 9 - iter 308/773 - loss 0.00587687 - time (sec): 162.54 - samples/sec: 312.71 - lr: 0.000029 - momentum: 0.000000 2023-10-13 09:19:58,350 epoch 9 - iter 385/773 - loss 0.00628664 - time (sec): 202.56 - samples/sec: 306.65 - lr: 0.000027 - momentum: 0.000000 2023-10-13 09:20:38,748 epoch 9 - iter 462/773 - loss 0.00615498 - time (sec): 242.96 - samples/sec: 306.57 - lr: 0.000025 - momentum: 0.000000 2023-10-13 09:21:19,766 epoch 9 - iter 539/773 - loss 0.00604577 - time (sec): 283.98 - samples/sec: 305.10 - lr: 0.000023 - momentum: 0.000000 2023-10-13 09:22:01,519 epoch 9 - iter 616/773 - loss 0.00611423 - time (sec): 325.73 - samples/sec: 304.07 - lr: 0.000022 - momentum: 0.000000 2023-10-13 09:22:43,185 epoch 9 - iter 693/773 - loss 0.00630368 - time (sec): 367.40 - samples/sec: 303.01 - lr: 0.000020 - momentum: 0.000000 2023-10-13 09:23:23,592 epoch 9 - iter 770/773 - loss 0.00633550 - time (sec): 407.81 - samples/sec: 303.33 - lr: 0.000018 - momentum: 0.000000 2023-10-13 09:23:25,195 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:23:25,196 EPOCH 9 done: loss 0.0063 - lr: 0.000018 2023-10-13 09:23:42,559 DEV : loss 0.10465419292449951 - f1-score (micro avg) 0.804 2023-10-13 09:23:42,588 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:24:23,348 epoch 10 - iter 77/773 - loss 0.00813140 - time (sec): 40.76 - samples/sec: 294.03 - lr: 0.000016 - momentum: 0.000000 2023-10-13 09:25:03,909 epoch 10 - iter 154/773 - loss 0.00628416 - time (sec): 81.32 - samples/sec: 295.21 - lr: 0.000014 - momentum: 0.000000 2023-10-13 09:25:45,244 epoch 10 - iter 231/773 - loss 0.00587552 - time (sec): 122.65 - samples/sec: 307.28 - lr: 0.000013 - momentum: 0.000000 2023-10-13 09:26:24,689 epoch 10 - iter 308/773 - loss 0.00556725 - time (sec): 162.10 - samples/sec: 307.03 - lr: 0.000011 - momentum: 0.000000 2023-10-13 09:27:05,760 epoch 10 - iter 385/773 - loss 0.00507982 - time (sec): 203.17 - samples/sec: 311.80 - lr: 0.000009 - momentum: 0.000000 2023-10-13 09:27:45,942 epoch 10 - iter 462/773 - loss 0.00493721 - time (sec): 243.35 - samples/sec: 307.84 - lr: 0.000007 - momentum: 0.000000 2023-10-13 09:28:25,346 epoch 10 - iter 539/773 - loss 0.00467343 - time (sec): 282.76 - samples/sec: 306.12 - lr: 0.000006 - momentum: 0.000000 2023-10-13 09:29:05,476 epoch 10 - iter 616/773 - loss 0.00474369 - time (sec): 322.88 - samples/sec: 306.60 - lr: 0.000004 - momentum: 0.000000 2023-10-13 09:29:45,952 epoch 10 - iter 693/773 - loss 0.00460932 - time (sec): 363.36 - samples/sec: 307.07 - lr: 0.000002 - momentum: 0.000000 2023-10-13 09:30:25,663 epoch 10 - iter 770/773 - loss 0.00490796 - time (sec): 403.07 - samples/sec: 307.22 - lr: 0.000000 - momentum: 0.000000 2023-10-13 09:30:27,137 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:30:27,137 EPOCH 10 done: loss 0.0049 - lr: 0.000000 2023-10-13 09:30:44,002 DEV : loss 0.10406773537397385 - f1-score (micro avg) 0.7952 2023-10-13 09:30:44,940 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:30:44,942 Loading model from best epoch ... 2023-10-13 09:30:49,804 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-BUILDING, B-BUILDING, E-BUILDING, I-BUILDING, S-STREET, B-STREET, E-STREET, I-STREET 2023-10-13 09:31:46,437 Results: - F-score (micro) 0.7939 - F-score (macro) 0.7132 - Accuracy 0.6792 By class: precision recall f1-score support LOC 0.8423 0.8467 0.8445 946 BUILDING 0.5489 0.5459 0.5474 185 STREET 0.7843 0.7143 0.7477 56 micro avg 0.7943 0.7936 0.7939 1187 macro avg 0.7252 0.7023 0.7132 1187 weighted avg 0.7938 0.7936 0.7936 1187 2023-10-13 09:31:46,437 ----------------------------------------------------------------------------------------------------