2023-10-13 17:15:31,806 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:15:31,809 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 17:15:31,809 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:15:31,810 MultiCorpus: 6183 train + 680 dev + 2113 test sentences - NER_HIPE_2022 Corpus: 6183 train + 680 dev + 2113 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/topres19th/en/with_doc_seperator 2023-10-13 17:15:31,810 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:15:31,810 Train: 6183 sentences 2023-10-13 17:15:31,810 (train_with_dev=False, train_with_test=False) 2023-10-13 17:15:31,810 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:15:31,810 Training Params: 2023-10-13 17:15:31,810 - learning_rate: "0.00015" 2023-10-13 17:15:31,810 - mini_batch_size: "8" 2023-10-13 17:15:31,810 - max_epochs: "10" 2023-10-13 17:15:31,810 - shuffle: "True" 2023-10-13 17:15:31,811 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:15:31,811 Plugins: 2023-10-13 17:15:31,811 - TensorboardLogger 2023-10-13 17:15:31,811 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 17:15:31,811 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:15:31,811 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 17:15:31,811 - metric: "('micro avg', 'f1-score')" 2023-10-13 17:15:31,811 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:15:31,811 Computation: 2023-10-13 17:15:31,811 - compute on device: cuda:0 2023-10-13 17:15:31,811 - embedding storage: none 2023-10-13 17:15:31,811 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:15:31,811 Model training base path: "hmbench-topres19th/en-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3" 2023-10-13 17:15:31,812 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:15:31,812 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:15:31,812 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-13 17:16:11,330 epoch 1 - iter 77/773 - loss 2.53777300 - time (sec): 39.52 - samples/sec: 292.16 - lr: 0.000015 - momentum: 0.000000 2023-10-13 17:16:51,228 epoch 1 - iter 154/773 - loss 2.49369541 - time (sec): 79.41 - samples/sec: 303.18 - lr: 0.000030 - momentum: 0.000000 2023-10-13 17:17:31,929 epoch 1 - iter 231/773 - loss 2.32948188 - time (sec): 120.11 - samples/sec: 304.80 - lr: 0.000045 - momentum: 0.000000 2023-10-13 17:18:13,110 epoch 1 - iter 308/773 - loss 2.11470827 - time (sec): 161.30 - samples/sec: 304.89 - lr: 0.000060 - momentum: 0.000000 2023-10-13 17:18:54,504 epoch 1 - iter 385/773 - loss 1.90288187 - time (sec): 202.69 - samples/sec: 301.50 - lr: 0.000075 - momentum: 0.000000 2023-10-13 17:19:34,785 epoch 1 - iter 462/773 - loss 1.67712086 - time (sec): 242.97 - samples/sec: 301.73 - lr: 0.000089 - momentum: 0.000000 2023-10-13 17:20:14,388 epoch 1 - iter 539/773 - loss 1.47746186 - time (sec): 282.57 - samples/sec: 303.06 - lr: 0.000104 - momentum: 0.000000 2023-10-13 17:20:54,452 epoch 1 - iter 616/773 - loss 1.31731180 - time (sec): 322.64 - samples/sec: 304.67 - lr: 0.000119 - momentum: 0.000000 2023-10-13 17:21:34,169 epoch 1 - iter 693/773 - loss 1.19495894 - time (sec): 362.35 - samples/sec: 305.47 - lr: 0.000134 - momentum: 0.000000 2023-10-13 17:22:14,919 epoch 1 - iter 770/773 - loss 1.08518740 - time (sec): 403.10 - samples/sec: 307.41 - lr: 0.000149 - momentum: 0.000000 2023-10-13 17:22:16,338 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:22:16,338 EPOCH 1 done: loss 1.0824 - lr: 0.000149 2023-10-13 17:22:32,706 DEV : loss 0.09791397303342819 - f1-score (micro avg) 0.0 2023-10-13 17:22:32,733 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:23:12,872 epoch 2 - iter 77/773 - loss 0.13435438 - time (sec): 40.14 - samples/sec: 279.67 - lr: 0.000148 - momentum: 0.000000 2023-10-13 17:23:53,937 epoch 2 - iter 154/773 - loss 0.12660142 - time (sec): 81.20 - samples/sec: 285.64 - lr: 0.000147 - momentum: 0.000000 2023-10-13 17:24:34,663 epoch 2 - iter 231/773 - loss 0.12315260 - time (sec): 121.93 - samples/sec: 297.53 - lr: 0.000145 - momentum: 0.000000 2023-10-13 17:25:14,614 epoch 2 - iter 308/773 - loss 0.12161588 - time (sec): 161.88 - samples/sec: 302.05 - lr: 0.000143 - momentum: 0.000000 2023-10-13 17:25:54,874 epoch 2 - iter 385/773 - loss 0.11903730 - time (sec): 202.14 - samples/sec: 304.42 - lr: 0.000142 - momentum: 0.000000 2023-10-13 17:26:34,375 epoch 2 - iter 462/773 - loss 0.11583604 - time (sec): 241.64 - samples/sec: 302.69 - lr: 0.000140 - momentum: 0.000000 2023-10-13 17:27:13,810 epoch 2 - iter 539/773 - loss 0.11302760 - time (sec): 281.07 - samples/sec: 301.81 - lr: 0.000138 - momentum: 0.000000 2023-10-13 17:27:54,601 epoch 2 - iter 616/773 - loss 0.10939658 - time (sec): 321.87 - samples/sec: 306.16 - lr: 0.000137 - momentum: 0.000000 2023-10-13 17:28:34,289 epoch 2 - iter 693/773 - loss 0.10642948 - time (sec): 361.55 - samples/sec: 305.63 - lr: 0.000135 - momentum: 0.000000 2023-10-13 17:29:15,133 epoch 2 - iter 770/773 - loss 0.10414025 - time (sec): 402.40 - samples/sec: 308.04 - lr: 0.000133 - momentum: 0.000000 2023-10-13 17:29:16,532 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:29:16,533 EPOCH 2 done: loss 0.1042 - lr: 0.000133 2023-10-13 17:29:34,169 DEV : loss 0.06052974984049797 - f1-score (micro avg) 0.7483 2023-10-13 17:29:34,204 saving best model 2023-10-13 17:29:35,171 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:30:15,916 epoch 3 - iter 77/773 - loss 0.06637866 - time (sec): 40.74 - samples/sec: 315.46 - lr: 0.000132 - momentum: 0.000000 2023-10-13 17:30:57,103 epoch 3 - iter 154/773 - loss 0.07367969 - time (sec): 81.93 - samples/sec: 307.59 - lr: 0.000130 - momentum: 0.000000 2023-10-13 17:31:38,243 epoch 3 - iter 231/773 - loss 0.06770262 - time (sec): 123.07 - samples/sec: 303.97 - lr: 0.000128 - momentum: 0.000000 2023-10-13 17:32:19,573 epoch 3 - iter 308/773 - loss 0.06804529 - time (sec): 164.40 - samples/sec: 301.03 - lr: 0.000127 - momentum: 0.000000 2023-10-13 17:33:01,236 epoch 3 - iter 385/773 - loss 0.06731084 - time (sec): 206.06 - samples/sec: 297.59 - lr: 0.000125 - momentum: 0.000000 2023-10-13 17:33:42,967 epoch 3 - iter 462/773 - loss 0.06679348 - time (sec): 247.79 - samples/sec: 298.22 - lr: 0.000123 - momentum: 0.000000 2023-10-13 17:34:22,539 epoch 3 - iter 539/773 - loss 0.06486323 - time (sec): 287.37 - samples/sec: 300.80 - lr: 0.000122 - momentum: 0.000000 2023-10-13 17:35:02,752 epoch 3 - iter 616/773 - loss 0.06256280 - time (sec): 327.58 - samples/sec: 301.75 - lr: 0.000120 - momentum: 0.000000 2023-10-13 17:35:43,326 epoch 3 - iter 693/773 - loss 0.06248414 - time (sec): 368.15 - samples/sec: 302.07 - lr: 0.000118 - momentum: 0.000000 2023-10-13 17:36:23,839 epoch 3 - iter 770/773 - loss 0.06337434 - time (sec): 408.67 - samples/sec: 302.55 - lr: 0.000117 - momentum: 0.000000 2023-10-13 17:36:25,542 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:36:25,543 EPOCH 3 done: loss 0.0635 - lr: 0.000117 2023-10-13 17:36:43,553 DEV : loss 0.05895433574914932 - f1-score (micro avg) 0.7747 2023-10-13 17:36:43,582 saving best model 2023-10-13 17:36:46,243 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:37:26,456 epoch 4 - iter 77/773 - loss 0.03920921 - time (sec): 40.21 - samples/sec: 297.47 - lr: 0.000115 - momentum: 0.000000 2023-10-13 17:38:05,917 epoch 4 - iter 154/773 - loss 0.04471010 - time (sec): 79.67 - samples/sec: 302.53 - lr: 0.000113 - momentum: 0.000000 2023-10-13 17:38:46,036 epoch 4 - iter 231/773 - loss 0.04357768 - time (sec): 119.79 - samples/sec: 306.88 - lr: 0.000112 - momentum: 0.000000 2023-10-13 17:39:25,576 epoch 4 - iter 308/773 - loss 0.04110929 - time (sec): 159.33 - samples/sec: 303.72 - lr: 0.000110 - momentum: 0.000000 2023-10-13 17:40:06,355 epoch 4 - iter 385/773 - loss 0.04203085 - time (sec): 200.11 - samples/sec: 303.61 - lr: 0.000108 - momentum: 0.000000 2023-10-13 17:40:47,876 epoch 4 - iter 462/773 - loss 0.04164789 - time (sec): 241.63 - samples/sec: 305.88 - lr: 0.000107 - momentum: 0.000000 2023-10-13 17:41:27,686 epoch 4 - iter 539/773 - loss 0.04178221 - time (sec): 281.44 - samples/sec: 305.45 - lr: 0.000105 - momentum: 0.000000 2023-10-13 17:42:08,800 epoch 4 - iter 616/773 - loss 0.04237077 - time (sec): 322.55 - samples/sec: 306.22 - lr: 0.000103 - momentum: 0.000000 2023-10-13 17:42:49,799 epoch 4 - iter 693/773 - loss 0.04162761 - time (sec): 363.55 - samples/sec: 307.00 - lr: 0.000102 - momentum: 0.000000 2023-10-13 17:43:30,072 epoch 4 - iter 770/773 - loss 0.04094877 - time (sec): 403.82 - samples/sec: 306.52 - lr: 0.000100 - momentum: 0.000000 2023-10-13 17:43:31,577 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:43:31,577 EPOCH 4 done: loss 0.0409 - lr: 0.000100 2023-10-13 17:43:48,933 DEV : loss 0.061349667608737946 - f1-score (micro avg) 0.8024 2023-10-13 17:43:48,960 saving best model 2023-10-13 17:43:51,557 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:44:32,555 epoch 5 - iter 77/773 - loss 0.02670187 - time (sec): 40.99 - samples/sec: 319.39 - lr: 0.000098 - momentum: 0.000000 2023-10-13 17:45:12,578 epoch 5 - iter 154/773 - loss 0.02458544 - time (sec): 81.02 - samples/sec: 301.83 - lr: 0.000097 - momentum: 0.000000 2023-10-13 17:45:52,902 epoch 5 - iter 231/773 - loss 0.02523199 - time (sec): 121.34 - samples/sec: 307.11 - lr: 0.000095 - momentum: 0.000000 2023-10-13 17:46:33,070 epoch 5 - iter 308/773 - loss 0.02445345 - time (sec): 161.51 - samples/sec: 308.37 - lr: 0.000093 - momentum: 0.000000 2023-10-13 17:47:14,170 epoch 5 - iter 385/773 - loss 0.02715882 - time (sec): 202.61 - samples/sec: 309.92 - lr: 0.000092 - momentum: 0.000000 2023-10-13 17:47:54,217 epoch 5 - iter 462/773 - loss 0.02748993 - time (sec): 242.66 - samples/sec: 311.16 - lr: 0.000090 - momentum: 0.000000 2023-10-13 17:48:34,167 epoch 5 - iter 539/773 - loss 0.02757364 - time (sec): 282.61 - samples/sec: 310.41 - lr: 0.000088 - momentum: 0.000000 2023-10-13 17:49:13,961 epoch 5 - iter 616/773 - loss 0.02745903 - time (sec): 322.40 - samples/sec: 311.14 - lr: 0.000087 - momentum: 0.000000 2023-10-13 17:49:53,667 epoch 5 - iter 693/773 - loss 0.02735164 - time (sec): 362.11 - samples/sec: 310.47 - lr: 0.000085 - momentum: 0.000000 2023-10-13 17:50:32,927 epoch 5 - iter 770/773 - loss 0.02703869 - time (sec): 401.37 - samples/sec: 308.69 - lr: 0.000083 - momentum: 0.000000 2023-10-13 17:50:34,344 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:50:34,345 EPOCH 5 done: loss 0.0271 - lr: 0.000083 2023-10-13 17:50:51,329 DEV : loss 0.07207323610782623 - f1-score (micro avg) 0.7876 2023-10-13 17:50:51,359 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:51:32,093 epoch 6 - iter 77/773 - loss 0.02037186 - time (sec): 40.73 - samples/sec: 324.37 - lr: 0.000082 - momentum: 0.000000 2023-10-13 17:52:11,821 epoch 6 - iter 154/773 - loss 0.02091576 - time (sec): 80.46 - samples/sec: 306.02 - lr: 0.000080 - momentum: 0.000000 2023-10-13 17:52:53,494 epoch 6 - iter 231/773 - loss 0.02128225 - time (sec): 122.13 - samples/sec: 310.25 - lr: 0.000078 - momentum: 0.000000 2023-10-13 17:53:34,630 epoch 6 - iter 308/773 - loss 0.02091718 - time (sec): 163.27 - samples/sec: 307.22 - lr: 0.000077 - momentum: 0.000000 2023-10-13 17:54:14,102 epoch 6 - iter 385/773 - loss 0.01902102 - time (sec): 202.74 - samples/sec: 304.66 - lr: 0.000075 - momentum: 0.000000 2023-10-13 17:54:54,541 epoch 6 - iter 462/773 - loss 0.01975430 - time (sec): 243.18 - samples/sec: 306.46 - lr: 0.000073 - momentum: 0.000000 2023-10-13 17:55:33,990 epoch 6 - iter 539/773 - loss 0.01975584 - time (sec): 282.63 - samples/sec: 305.55 - lr: 0.000072 - momentum: 0.000000 2023-10-13 17:56:13,256 epoch 6 - iter 616/773 - loss 0.01933327 - time (sec): 321.89 - samples/sec: 304.57 - lr: 0.000070 - momentum: 0.000000 2023-10-13 17:56:53,888 epoch 6 - iter 693/773 - loss 0.01883937 - time (sec): 362.53 - samples/sec: 304.22 - lr: 0.000068 - momentum: 0.000000 2023-10-13 17:57:34,581 epoch 6 - iter 770/773 - loss 0.01913272 - time (sec): 403.22 - samples/sec: 307.28 - lr: 0.000067 - momentum: 0.000000 2023-10-13 17:57:36,011 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:57:36,012 EPOCH 6 done: loss 0.0193 - lr: 0.000067 2023-10-13 17:57:52,925 DEV : loss 0.08004289120435715 - f1-score (micro avg) 0.7896 2023-10-13 17:57:52,953 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:58:33,473 epoch 7 - iter 77/773 - loss 0.01052829 - time (sec): 40.52 - samples/sec: 315.07 - lr: 0.000065 - momentum: 0.000000 2023-10-13 17:59:12,688 epoch 7 - iter 154/773 - loss 0.01217112 - time (sec): 79.73 - samples/sec: 311.68 - lr: 0.000063 - momentum: 0.000000 2023-10-13 17:59:52,552 epoch 7 - iter 231/773 - loss 0.01258506 - time (sec): 119.60 - samples/sec: 311.55 - lr: 0.000062 - momentum: 0.000000 2023-10-13 18:00:33,331 epoch 7 - iter 308/773 - loss 0.01270538 - time (sec): 160.38 - samples/sec: 311.57 - lr: 0.000060 - momentum: 0.000000 2023-10-13 18:01:13,969 epoch 7 - iter 385/773 - loss 0.01276005 - time (sec): 201.01 - samples/sec: 309.45 - lr: 0.000058 - momentum: 0.000000 2023-10-13 18:01:53,522 epoch 7 - iter 462/773 - loss 0.01229893 - time (sec): 240.57 - samples/sec: 308.18 - lr: 0.000057 - momentum: 0.000000 2023-10-13 18:02:33,945 epoch 7 - iter 539/773 - loss 0.01220461 - time (sec): 280.99 - samples/sec: 307.69 - lr: 0.000055 - momentum: 0.000000 2023-10-13 18:03:14,733 epoch 7 - iter 616/773 - loss 0.01161193 - time (sec): 321.78 - samples/sec: 308.13 - lr: 0.000054 - momentum: 0.000000 2023-10-13 18:03:55,166 epoch 7 - iter 693/773 - loss 0.01260338 - time (sec): 362.21 - samples/sec: 307.12 - lr: 0.000052 - momentum: 0.000000 2023-10-13 18:04:34,914 epoch 7 - iter 770/773 - loss 0.01235583 - time (sec): 401.96 - samples/sec: 307.96 - lr: 0.000050 - momentum: 0.000000 2023-10-13 18:04:36,433 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:04:36,433 EPOCH 7 done: loss 0.0124 - lr: 0.000050 2023-10-13 18:04:53,194 DEV : loss 0.09459361433982849 - f1-score (micro avg) 0.792 2023-10-13 18:04:53,223 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:05:33,830 epoch 8 - iter 77/773 - loss 0.01144622 - time (sec): 40.60 - samples/sec: 328.26 - lr: 0.000048 - momentum: 0.000000 2023-10-13 18:06:15,374 epoch 8 - iter 154/773 - loss 0.00970948 - time (sec): 82.15 - samples/sec: 316.58 - lr: 0.000047 - momentum: 0.000000 2023-10-13 18:06:56,115 epoch 8 - iter 231/773 - loss 0.00944804 - time (sec): 122.89 - samples/sec: 308.35 - lr: 0.000045 - momentum: 0.000000 2023-10-13 18:07:37,287 epoch 8 - iter 308/773 - loss 0.00923867 - time (sec): 164.06 - samples/sec: 306.77 - lr: 0.000043 - momentum: 0.000000 2023-10-13 18:08:18,386 epoch 8 - iter 385/773 - loss 0.00978032 - time (sec): 205.16 - samples/sec: 310.98 - lr: 0.000042 - momentum: 0.000000 2023-10-13 18:08:59,496 epoch 8 - iter 462/773 - loss 0.01173064 - time (sec): 246.27 - samples/sec: 308.52 - lr: 0.000040 - momentum: 0.000000 2023-10-13 18:09:39,687 epoch 8 - iter 539/773 - loss 0.01090035 - time (sec): 286.46 - samples/sec: 307.75 - lr: 0.000039 - momentum: 0.000000 2023-10-13 18:10:18,679 epoch 8 - iter 616/773 - loss 0.01064762 - time (sec): 325.45 - samples/sec: 304.49 - lr: 0.000037 - momentum: 0.000000 2023-10-13 18:10:58,847 epoch 8 - iter 693/773 - loss 0.01013823 - time (sec): 365.62 - samples/sec: 303.87 - lr: 0.000035 - momentum: 0.000000 2023-10-13 18:11:38,667 epoch 8 - iter 770/773 - loss 0.00975702 - time (sec): 405.44 - samples/sec: 305.54 - lr: 0.000034 - momentum: 0.000000 2023-10-13 18:11:40,105 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:11:40,106 EPOCH 8 done: loss 0.0098 - lr: 0.000034 2023-10-13 18:11:57,039 DEV : loss 0.09352090209722519 - f1-score (micro avg) 0.7842 2023-10-13 18:11:57,070 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:12:37,400 epoch 9 - iter 77/773 - loss 0.00652820 - time (sec): 40.33 - samples/sec: 318.29 - lr: 0.000032 - momentum: 0.000000 2023-10-13 18:13:18,257 epoch 9 - iter 154/773 - loss 0.00570655 - time (sec): 81.18 - samples/sec: 317.61 - lr: 0.000030 - momentum: 0.000000 2023-10-13 18:13:57,276 epoch 9 - iter 231/773 - loss 0.00585034 - time (sec): 120.20 - samples/sec: 310.14 - lr: 0.000028 - momentum: 0.000000 2023-10-13 18:14:37,023 epoch 9 - iter 308/773 - loss 0.00612878 - time (sec): 159.95 - samples/sec: 308.35 - lr: 0.000027 - momentum: 0.000000 2023-10-13 18:15:16,631 epoch 9 - iter 385/773 - loss 0.00671168 - time (sec): 199.56 - samples/sec: 307.49 - lr: 0.000025 - momentum: 0.000000 2023-10-13 18:15:56,285 epoch 9 - iter 462/773 - loss 0.00718371 - time (sec): 239.21 - samples/sec: 304.49 - lr: 0.000024 - momentum: 0.000000 2023-10-13 18:16:37,807 epoch 9 - iter 539/773 - loss 0.00797348 - time (sec): 280.74 - samples/sec: 305.58 - lr: 0.000022 - momentum: 0.000000 2023-10-13 18:17:18,932 epoch 9 - iter 616/773 - loss 0.00801701 - time (sec): 321.86 - samples/sec: 304.66 - lr: 0.000020 - momentum: 0.000000 2023-10-13 18:17:59,375 epoch 9 - iter 693/773 - loss 0.00786144 - time (sec): 362.30 - samples/sec: 304.33 - lr: 0.000019 - momentum: 0.000000 2023-10-13 18:18:40,458 epoch 9 - iter 770/773 - loss 0.00750063 - time (sec): 403.39 - samples/sec: 307.03 - lr: 0.000017 - momentum: 0.000000 2023-10-13 18:18:41,956 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:18:41,956 EPOCH 9 done: loss 0.0075 - lr: 0.000017 2023-10-13 18:18:59,133 DEV : loss 0.09976237267255783 - f1-score (micro avg) 0.7751 2023-10-13 18:18:59,164 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:19:39,491 epoch 10 - iter 77/773 - loss 0.00581833 - time (sec): 40.33 - samples/sec: 301.35 - lr: 0.000015 - momentum: 0.000000 2023-10-13 18:20:18,915 epoch 10 - iter 154/773 - loss 0.00435431 - time (sec): 79.75 - samples/sec: 296.06 - lr: 0.000014 - momentum: 0.000000 2023-10-13 18:20:59,083 epoch 10 - iter 231/773 - loss 0.00547099 - time (sec): 119.92 - samples/sec: 294.88 - lr: 0.000012 - momentum: 0.000000 2023-10-13 18:21:38,863 epoch 10 - iter 308/773 - loss 0.00540846 - time (sec): 159.70 - samples/sec: 301.38 - lr: 0.000010 - momentum: 0.000000 2023-10-13 18:22:18,093 epoch 10 - iter 385/773 - loss 0.00499552 - time (sec): 198.93 - samples/sec: 304.11 - lr: 0.000009 - momentum: 0.000000 2023-10-13 18:22:58,689 epoch 10 - iter 462/773 - loss 0.00533351 - time (sec): 239.52 - samples/sec: 307.37 - lr: 0.000007 - momentum: 0.000000 2023-10-13 18:23:39,202 epoch 10 - iter 539/773 - loss 0.00590520 - time (sec): 280.04 - samples/sec: 307.79 - lr: 0.000005 - momentum: 0.000000 2023-10-13 18:24:20,024 epoch 10 - iter 616/773 - loss 0.00588295 - time (sec): 320.86 - samples/sec: 309.60 - lr: 0.000004 - momentum: 0.000000 2023-10-13 18:25:00,654 epoch 10 - iter 693/773 - loss 0.00607221 - time (sec): 361.49 - samples/sec: 308.83 - lr: 0.000002 - momentum: 0.000000 2023-10-13 18:25:40,608 epoch 10 - iter 770/773 - loss 0.00573109 - time (sec): 401.44 - samples/sec: 308.29 - lr: 0.000000 - momentum: 0.000000 2023-10-13 18:25:42,149 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:25:42,149 EPOCH 10 done: loss 0.0057 - lr: 0.000000 2023-10-13 18:25:59,998 DEV : loss 0.10486873239278793 - f1-score (micro avg) 0.7791 2023-10-13 18:26:01,376 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:26:01,377 Loading model from best epoch ... 2023-10-13 18:26:05,333 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-BUILDING, B-BUILDING, E-BUILDING, I-BUILDING, S-STREET, B-STREET, E-STREET, I-STREET 2023-10-13 18:27:01,370 Results: - F-score (micro) 0.7957 - F-score (macro) 0.7121 - Accuracy 0.6857 By class: precision recall f1-score support LOC 0.8471 0.8436 0.8453 946 BUILDING 0.5362 0.6811 0.6000 185 STREET 0.7037 0.6786 0.6909 56 micro avg 0.7815 0.8104 0.7957 1187 macro avg 0.6957 0.7344 0.7121 1187 weighted avg 0.7919 0.8104 0.7998 1187 2023-10-13 18:27:01,370 ----------------------------------------------------------------------------------------------------