2023-10-13 09:32:31,304 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:32:31,306 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 09:32:31,306 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:32:31,307 MultiCorpus: 6183 train + 680 dev + 2113 test sentences - NER_HIPE_2022 Corpus: 6183 train + 680 dev + 2113 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/topres19th/en/with_doc_seperator 2023-10-13 09:32:31,307 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:32:31,307 Train: 6183 sentences 2023-10-13 09:32:31,307 (train_with_dev=False, train_with_test=False) 2023-10-13 09:32:31,307 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:32:31,307 Training Params: 2023-10-13 09:32:31,307 - learning_rate: "0.00015" 2023-10-13 09:32:31,307 - mini_batch_size: "4" 2023-10-13 09:32:31,307 - max_epochs: "10" 2023-10-13 09:32:31,307 - shuffle: "True" 2023-10-13 09:32:31,308 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:32:31,308 Plugins: 2023-10-13 09:32:31,308 - TensorboardLogger 2023-10-13 09:32:31,308 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 09:32:31,308 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:32:31,308 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 09:32:31,308 - metric: "('micro avg', 'f1-score')" 2023-10-13 09:32:31,308 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:32:31,308 Computation: 2023-10-13 09:32:31,308 - compute on device: cuda:0 2023-10-13 09:32:31,308 - embedding storage: none 2023-10-13 09:32:31,308 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:32:31,308 Model training base path: "hmbench-topres19th/en-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-1" 2023-10-13 09:32:31,309 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:32:31,309 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:32:31,309 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-13 09:33:15,220 epoch 1 - iter 154/1546 - loss 2.58089173 - time (sec): 43.91 - samples/sec: 283.06 - lr: 0.000015 - momentum: 0.000000 2023-10-13 09:33:57,641 epoch 1 - iter 308/1546 - loss 2.49475826 - time (sec): 86.33 - samples/sec: 283.99 - lr: 0.000030 - momentum: 0.000000 2023-10-13 09:34:40,223 epoch 1 - iter 462/1546 - loss 2.24605833 - time (sec): 128.91 - samples/sec: 280.80 - lr: 0.000045 - momentum: 0.000000 2023-10-13 09:35:23,021 epoch 1 - iter 616/1546 - loss 1.91937604 - time (sec): 171.71 - samples/sec: 288.08 - lr: 0.000060 - momentum: 0.000000 2023-10-13 09:36:05,203 epoch 1 - iter 770/1546 - loss 1.63450044 - time (sec): 213.89 - samples/sec: 287.57 - lr: 0.000075 - momentum: 0.000000 2023-10-13 09:36:48,056 epoch 1 - iter 924/1546 - loss 1.40442989 - time (sec): 256.74 - samples/sec: 286.70 - lr: 0.000090 - momentum: 0.000000 2023-10-13 09:37:31,857 epoch 1 - iter 1078/1546 - loss 1.23099071 - time (sec): 300.55 - samples/sec: 287.02 - lr: 0.000104 - momentum: 0.000000 2023-10-13 09:38:14,433 epoch 1 - iter 1232/1546 - loss 1.10986676 - time (sec): 343.12 - samples/sec: 284.50 - lr: 0.000119 - momentum: 0.000000 2023-10-13 09:38:58,506 epoch 1 - iter 1386/1546 - loss 0.99545808 - time (sec): 387.20 - samples/sec: 287.13 - lr: 0.000134 - momentum: 0.000000 2023-10-13 09:39:41,758 epoch 1 - iter 1540/1546 - loss 0.90547295 - time (sec): 430.45 - samples/sec: 287.74 - lr: 0.000149 - momentum: 0.000000 2023-10-13 09:39:43,282 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:39:43,282 EPOCH 1 done: loss 0.9029 - lr: 0.000149 2023-10-13 09:39:59,716 DEV : loss 0.08553236722946167 - f1-score (micro avg) 0.5611 2023-10-13 09:39:59,746 saving best model 2023-10-13 09:40:00,659 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:40:42,617 epoch 2 - iter 154/1546 - loss 0.12200903 - time (sec): 41.96 - samples/sec: 278.77 - lr: 0.000148 - momentum: 0.000000 2023-10-13 09:41:25,024 epoch 2 - iter 308/1546 - loss 0.11893143 - time (sec): 84.36 - samples/sec: 279.44 - lr: 0.000147 - momentum: 0.000000 2023-10-13 09:42:09,579 epoch 2 - iter 462/1546 - loss 0.11299889 - time (sec): 128.92 - samples/sec: 283.78 - lr: 0.000145 - momentum: 0.000000 2023-10-13 09:42:52,516 epoch 2 - iter 616/1546 - loss 0.10688952 - time (sec): 171.85 - samples/sec: 287.48 - lr: 0.000143 - momentum: 0.000000 2023-10-13 09:43:35,911 epoch 2 - iter 770/1546 - loss 0.10165840 - time (sec): 215.25 - samples/sec: 288.56 - lr: 0.000142 - momentum: 0.000000 2023-10-13 09:44:18,065 epoch 2 - iter 924/1546 - loss 0.09938803 - time (sec): 257.40 - samples/sec: 285.76 - lr: 0.000140 - momentum: 0.000000 2023-10-13 09:45:01,291 epoch 2 - iter 1078/1546 - loss 0.09941699 - time (sec): 300.63 - samples/sec: 284.82 - lr: 0.000138 - momentum: 0.000000 2023-10-13 09:45:44,201 epoch 2 - iter 1232/1546 - loss 0.09981071 - time (sec): 343.54 - samples/sec: 284.64 - lr: 0.000137 - momentum: 0.000000 2023-10-13 09:46:28,346 epoch 2 - iter 1386/1546 - loss 0.09642655 - time (sec): 387.69 - samples/sec: 287.09 - lr: 0.000135 - momentum: 0.000000 2023-10-13 09:47:11,211 epoch 2 - iter 1540/1546 - loss 0.09319186 - time (sec): 430.55 - samples/sec: 287.59 - lr: 0.000133 - momentum: 0.000000 2023-10-13 09:47:12,875 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:47:12,876 EPOCH 2 done: loss 0.0930 - lr: 0.000133 2023-10-13 09:47:30,832 DEV : loss 0.05797132849693298 - f1-score (micro avg) 0.7951 2023-10-13 09:47:30,866 saving best model 2023-10-13 09:47:33,588 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:48:17,153 epoch 3 - iter 154/1546 - loss 0.07139060 - time (sec): 43.56 - samples/sec: 279.60 - lr: 0.000132 - momentum: 0.000000 2023-10-13 09:49:00,659 epoch 3 - iter 308/1546 - loss 0.06425129 - time (sec): 87.06 - samples/sec: 279.79 - lr: 0.000130 - momentum: 0.000000 2023-10-13 09:49:43,640 epoch 3 - iter 462/1546 - loss 0.05602698 - time (sec): 130.05 - samples/sec: 287.03 - lr: 0.000128 - momentum: 0.000000 2023-10-13 09:50:25,253 epoch 3 - iter 616/1546 - loss 0.05576436 - time (sec): 171.66 - samples/sec: 281.80 - lr: 0.000127 - momentum: 0.000000 2023-10-13 09:51:08,706 epoch 3 - iter 770/1546 - loss 0.05520624 - time (sec): 215.11 - samples/sec: 282.93 - lr: 0.000125 - momentum: 0.000000 2023-10-13 09:51:51,867 epoch 3 - iter 924/1546 - loss 0.05690583 - time (sec): 258.27 - samples/sec: 284.51 - lr: 0.000123 - momentum: 0.000000 2023-10-13 09:52:34,906 epoch 3 - iter 1078/1546 - loss 0.05578666 - time (sec): 301.31 - samples/sec: 286.31 - lr: 0.000122 - momentum: 0.000000 2023-10-13 09:53:18,728 epoch 3 - iter 1232/1546 - loss 0.05491332 - time (sec): 345.13 - samples/sec: 286.89 - lr: 0.000120 - momentum: 0.000000 2023-10-13 09:54:02,189 epoch 3 - iter 1386/1546 - loss 0.05522112 - time (sec): 388.59 - samples/sec: 288.15 - lr: 0.000118 - momentum: 0.000000 2023-10-13 09:54:44,026 epoch 3 - iter 1540/1546 - loss 0.05495390 - time (sec): 430.43 - samples/sec: 287.77 - lr: 0.000117 - momentum: 0.000000 2023-10-13 09:54:45,676 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:54:45,677 EPOCH 3 done: loss 0.0548 - lr: 0.000117 2023-10-13 09:55:03,127 DEV : loss 0.059545643627643585 - f1-score (micro avg) 0.7921 2023-10-13 09:55:03,161 ---------------------------------------------------------------------------------------------------- 2023-10-13 09:55:47,609 epoch 4 - iter 154/1546 - loss 0.03703647 - time (sec): 44.45 - samples/sec: 284.50 - lr: 0.000115 - momentum: 0.000000 2023-10-13 09:56:30,722 epoch 4 - iter 308/1546 - loss 0.03106991 - time (sec): 87.56 - samples/sec: 283.25 - lr: 0.000113 - momentum: 0.000000 2023-10-13 09:57:14,974 epoch 4 - iter 462/1546 - loss 0.02898605 - time (sec): 131.81 - samples/sec: 285.23 - lr: 0.000112 - momentum: 0.000000 2023-10-13 09:57:58,608 epoch 4 - iter 616/1546 - loss 0.03074237 - time (sec): 175.44 - samples/sec: 276.89 - lr: 0.000110 - momentum: 0.000000 2023-10-13 09:58:43,273 epoch 4 - iter 770/1546 - loss 0.03301947 - time (sec): 220.11 - samples/sec: 277.51 - lr: 0.000108 - momentum: 0.000000 2023-10-13 09:59:30,913 epoch 4 - iter 924/1546 - loss 0.03399197 - time (sec): 267.75 - samples/sec: 275.87 - lr: 0.000107 - momentum: 0.000000 2023-10-13 10:00:17,635 epoch 4 - iter 1078/1546 - loss 0.03359479 - time (sec): 314.47 - samples/sec: 275.40 - lr: 0.000105 - momentum: 0.000000 2023-10-13 10:01:04,322 epoch 4 - iter 1232/1546 - loss 0.03330922 - time (sec): 361.16 - samples/sec: 274.56 - lr: 0.000103 - momentum: 0.000000 2023-10-13 10:01:50,860 epoch 4 - iter 1386/1546 - loss 0.03362633 - time (sec): 407.70 - samples/sec: 274.21 - lr: 0.000102 - momentum: 0.000000 2023-10-13 10:02:37,520 epoch 4 - iter 1540/1546 - loss 0.03347558 - time (sec): 454.36 - samples/sec: 272.59 - lr: 0.000100 - momentum: 0.000000 2023-10-13 10:02:39,239 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:02:39,239 EPOCH 4 done: loss 0.0334 - lr: 0.000100 2023-10-13 10:02:57,010 DEV : loss 0.07305894047021866 - f1-score (micro avg) 0.7929 2023-10-13 10:02:57,040 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:03:44,621 epoch 5 - iter 154/1546 - loss 0.01538529 - time (sec): 47.58 - samples/sec: 265.85 - lr: 0.000098 - momentum: 0.000000 2023-10-13 10:04:30,440 epoch 5 - iter 308/1546 - loss 0.02039276 - time (sec): 93.40 - samples/sec: 275.59 - lr: 0.000097 - momentum: 0.000000 2023-10-13 10:05:13,779 epoch 5 - iter 462/1546 - loss 0.02114600 - time (sec): 136.74 - samples/sec: 273.31 - lr: 0.000095 - momentum: 0.000000 2023-10-13 10:05:55,130 epoch 5 - iter 616/1546 - loss 0.02172052 - time (sec): 178.09 - samples/sec: 280.56 - lr: 0.000093 - momentum: 0.000000 2023-10-13 10:06:35,667 epoch 5 - iter 770/1546 - loss 0.02047965 - time (sec): 218.62 - samples/sec: 283.38 - lr: 0.000092 - momentum: 0.000000 2023-10-13 10:07:16,245 epoch 5 - iter 924/1546 - loss 0.02169642 - time (sec): 259.20 - samples/sec: 286.55 - lr: 0.000090 - momentum: 0.000000 2023-10-13 10:07:59,279 epoch 5 - iter 1078/1546 - loss 0.02168417 - time (sec): 302.24 - samples/sec: 285.58 - lr: 0.000088 - momentum: 0.000000 2023-10-13 10:08:42,699 epoch 5 - iter 1232/1546 - loss 0.02271991 - time (sec): 345.66 - samples/sec: 284.75 - lr: 0.000087 - momentum: 0.000000 2023-10-13 10:09:25,956 epoch 5 - iter 1386/1546 - loss 0.02322415 - time (sec): 388.91 - samples/sec: 285.85 - lr: 0.000085 - momentum: 0.000000 2023-10-13 10:10:09,101 epoch 5 - iter 1540/1546 - loss 0.02247912 - time (sec): 432.06 - samples/sec: 286.30 - lr: 0.000083 - momentum: 0.000000 2023-10-13 10:10:10,819 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:10:10,820 EPOCH 5 done: loss 0.0225 - lr: 0.000083 2023-10-13 10:10:27,483 DEV : loss 0.08502887934446335 - f1-score (micro avg) 0.7992 2023-10-13 10:10:27,511 saving best model 2023-10-13 10:10:30,134 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:11:15,341 epoch 6 - iter 154/1546 - loss 0.01540326 - time (sec): 45.20 - samples/sec: 291.20 - lr: 0.000082 - momentum: 0.000000 2023-10-13 10:11:59,889 epoch 6 - iter 308/1546 - loss 0.01726855 - time (sec): 89.75 - samples/sec: 287.52 - lr: 0.000080 - momentum: 0.000000 2023-10-13 10:12:45,995 epoch 6 - iter 462/1546 - loss 0.01478350 - time (sec): 135.86 - samples/sec: 285.80 - lr: 0.000078 - momentum: 0.000000 2023-10-13 10:13:28,634 epoch 6 - iter 616/1546 - loss 0.01486102 - time (sec): 178.50 - samples/sec: 284.86 - lr: 0.000077 - momentum: 0.000000 2023-10-13 10:14:13,366 epoch 6 - iter 770/1546 - loss 0.01271067 - time (sec): 223.23 - samples/sec: 283.34 - lr: 0.000075 - momentum: 0.000000 2023-10-13 10:14:56,982 epoch 6 - iter 924/1546 - loss 0.01374191 - time (sec): 266.84 - samples/sec: 277.93 - lr: 0.000073 - momentum: 0.000000 2023-10-13 10:15:40,790 epoch 6 - iter 1078/1546 - loss 0.01309584 - time (sec): 310.65 - samples/sec: 279.56 - lr: 0.000072 - momentum: 0.000000 2023-10-13 10:16:24,499 epoch 6 - iter 1232/1546 - loss 0.01254562 - time (sec): 354.36 - samples/sec: 278.96 - lr: 0.000070 - momentum: 0.000000 2023-10-13 10:17:08,743 epoch 6 - iter 1386/1546 - loss 0.01378174 - time (sec): 398.60 - samples/sec: 277.93 - lr: 0.000068 - momentum: 0.000000 2023-10-13 10:17:53,163 epoch 6 - iter 1540/1546 - loss 0.01418016 - time (sec): 443.02 - samples/sec: 279.19 - lr: 0.000067 - momentum: 0.000000 2023-10-13 10:17:54,953 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:17:54,953 EPOCH 6 done: loss 0.0141 - lr: 0.000067 2023-10-13 10:18:12,876 DEV : loss 0.09621559828519821 - f1-score (micro avg) 0.7873 2023-10-13 10:18:12,919 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:18:57,224 epoch 7 - iter 154/1546 - loss 0.00910352 - time (sec): 44.30 - samples/sec: 265.40 - lr: 0.000065 - momentum: 0.000000 2023-10-13 10:19:42,915 epoch 7 - iter 308/1546 - loss 0.01161357 - time (sec): 89.99 - samples/sec: 277.88 - lr: 0.000063 - momentum: 0.000000 2023-10-13 10:20:28,122 epoch 7 - iter 462/1546 - loss 0.01227135 - time (sec): 135.20 - samples/sec: 272.24 - lr: 0.000062 - momentum: 0.000000 2023-10-13 10:21:12,787 epoch 7 - iter 616/1546 - loss 0.01136743 - time (sec): 179.87 - samples/sec: 270.08 - lr: 0.000060 - momentum: 0.000000 2023-10-13 10:21:56,983 epoch 7 - iter 770/1546 - loss 0.01071208 - time (sec): 224.06 - samples/sec: 266.68 - lr: 0.000058 - momentum: 0.000000 2023-10-13 10:22:42,738 epoch 7 - iter 924/1546 - loss 0.01094035 - time (sec): 269.82 - samples/sec: 269.52 - lr: 0.000057 - momentum: 0.000000 2023-10-13 10:23:28,286 epoch 7 - iter 1078/1546 - loss 0.01073439 - time (sec): 315.36 - samples/sec: 273.01 - lr: 0.000055 - momentum: 0.000000 2023-10-13 10:24:12,816 epoch 7 - iter 1232/1546 - loss 0.01035952 - time (sec): 359.89 - samples/sec: 274.29 - lr: 0.000053 - momentum: 0.000000 2023-10-13 10:24:58,624 epoch 7 - iter 1386/1546 - loss 0.00984261 - time (sec): 405.70 - samples/sec: 277.29 - lr: 0.000052 - momentum: 0.000000 2023-10-13 10:25:42,975 epoch 7 - iter 1540/1546 - loss 0.00997254 - time (sec): 450.05 - samples/sec: 275.15 - lr: 0.000050 - momentum: 0.000000 2023-10-13 10:25:44,642 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:25:44,642 EPOCH 7 done: loss 0.0100 - lr: 0.000050 2023-10-13 10:26:02,375 DEV : loss 0.10020222514867783 - f1-score (micro avg) 0.7886 2023-10-13 10:26:02,405 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:26:47,796 epoch 8 - iter 154/1546 - loss 0.01244001 - time (sec): 45.39 - samples/sec: 258.30 - lr: 0.000048 - momentum: 0.000000 2023-10-13 10:27:31,481 epoch 8 - iter 308/1546 - loss 0.00842998 - time (sec): 89.07 - samples/sec: 271.57 - lr: 0.000047 - momentum: 0.000000 2023-10-13 10:28:17,362 epoch 8 - iter 462/1546 - loss 0.00627451 - time (sec): 134.95 - samples/sec: 270.72 - lr: 0.000045 - momentum: 0.000000 2023-10-13 10:29:01,594 epoch 8 - iter 616/1546 - loss 0.00697601 - time (sec): 179.19 - samples/sec: 271.98 - lr: 0.000043 - momentum: 0.000000 2023-10-13 10:29:46,681 epoch 8 - iter 770/1546 - loss 0.00681597 - time (sec): 224.27 - samples/sec: 274.03 - lr: 0.000042 - momentum: 0.000000 2023-10-13 10:30:31,390 epoch 8 - iter 924/1546 - loss 0.00609977 - time (sec): 268.98 - samples/sec: 273.00 - lr: 0.000040 - momentum: 0.000000 2023-10-13 10:31:17,039 epoch 8 - iter 1078/1546 - loss 0.00559074 - time (sec): 314.63 - samples/sec: 273.38 - lr: 0.000038 - momentum: 0.000000 2023-10-13 10:32:01,870 epoch 8 - iter 1232/1546 - loss 0.00526804 - time (sec): 359.46 - samples/sec: 274.25 - lr: 0.000037 - momentum: 0.000000 2023-10-13 10:32:47,288 epoch 8 - iter 1386/1546 - loss 0.00531994 - time (sec): 404.88 - samples/sec: 275.89 - lr: 0.000035 - momentum: 0.000000 2023-10-13 10:33:31,162 epoch 8 - iter 1540/1546 - loss 0.00552273 - time (sec): 448.75 - samples/sec: 275.71 - lr: 0.000033 - momentum: 0.000000 2023-10-13 10:33:32,970 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:33:32,970 EPOCH 8 done: loss 0.0055 - lr: 0.000033 2023-10-13 10:33:50,993 DEV : loss 0.11565513908863068 - f1-score (micro avg) 0.7714 2023-10-13 10:33:51,025 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:34:35,062 epoch 9 - iter 154/1546 - loss 0.00353921 - time (sec): 44.03 - samples/sec: 288.53 - lr: 0.000032 - momentum: 0.000000 2023-10-13 10:35:19,747 epoch 9 - iter 308/1546 - loss 0.00481950 - time (sec): 88.72 - samples/sec: 293.87 - lr: 0.000030 - momentum: 0.000000 2023-10-13 10:36:02,894 epoch 9 - iter 462/1546 - loss 0.00429851 - time (sec): 131.87 - samples/sec: 287.58 - lr: 0.000028 - momentum: 0.000000 2023-10-13 10:36:47,612 epoch 9 - iter 616/1546 - loss 0.00413043 - time (sec): 176.58 - samples/sec: 287.83 - lr: 0.000027 - momentum: 0.000000 2023-10-13 10:37:30,752 epoch 9 - iter 770/1546 - loss 0.00537386 - time (sec): 219.72 - samples/sec: 282.70 - lr: 0.000025 - momentum: 0.000000 2023-10-13 10:38:15,023 epoch 9 - iter 924/1546 - loss 0.00487878 - time (sec): 264.00 - samples/sec: 282.14 - lr: 0.000023 - momentum: 0.000000 2023-10-13 10:38:59,330 epoch 9 - iter 1078/1546 - loss 0.00467215 - time (sec): 308.30 - samples/sec: 281.03 - lr: 0.000022 - momentum: 0.000000 2023-10-13 10:39:43,130 epoch 9 - iter 1232/1546 - loss 0.00481155 - time (sec): 352.10 - samples/sec: 281.30 - lr: 0.000020 - momentum: 0.000000 2023-10-13 10:40:27,690 epoch 9 - iter 1386/1546 - loss 0.00475832 - time (sec): 396.66 - samples/sec: 280.66 - lr: 0.000018 - momentum: 0.000000 2023-10-13 10:41:12,637 epoch 9 - iter 1540/1546 - loss 0.00468018 - time (sec): 441.61 - samples/sec: 280.11 - lr: 0.000017 - momentum: 0.000000 2023-10-13 10:41:14,448 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:41:14,449 EPOCH 9 done: loss 0.0047 - lr: 0.000017 2023-10-13 10:41:31,579 DEV : loss 0.11798277497291565 - f1-score (micro avg) 0.7903 2023-10-13 10:41:31,609 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:42:16,010 epoch 10 - iter 154/1546 - loss 0.00481621 - time (sec): 44.40 - samples/sec: 269.92 - lr: 0.000015 - momentum: 0.000000 2023-10-13 10:43:02,187 epoch 10 - iter 308/1546 - loss 0.00357679 - time (sec): 90.58 - samples/sec: 265.04 - lr: 0.000013 - momentum: 0.000000 2023-10-13 10:43:51,217 epoch 10 - iter 462/1546 - loss 0.00374515 - time (sec): 139.61 - samples/sec: 269.97 - lr: 0.000012 - momentum: 0.000000 2023-10-13 10:44:37,368 epoch 10 - iter 616/1546 - loss 0.00379115 - time (sec): 185.76 - samples/sec: 267.93 - lr: 0.000010 - momentum: 0.000000 2023-10-13 10:45:22,324 epoch 10 - iter 770/1546 - loss 0.00346057 - time (sec): 230.71 - samples/sec: 274.58 - lr: 0.000008 - momentum: 0.000000 2023-10-13 10:46:06,661 epoch 10 - iter 924/1546 - loss 0.00318152 - time (sec): 275.05 - samples/sec: 272.36 - lr: 0.000007 - momentum: 0.000000 2023-10-13 10:46:51,187 epoch 10 - iter 1078/1546 - loss 0.00287153 - time (sec): 319.58 - samples/sec: 270.85 - lr: 0.000005 - momentum: 0.000000 2023-10-13 10:47:35,782 epoch 10 - iter 1232/1546 - loss 0.00283244 - time (sec): 364.17 - samples/sec: 271.84 - lr: 0.000003 - momentum: 0.000000 2023-10-13 10:48:20,263 epoch 10 - iter 1386/1546 - loss 0.00268580 - time (sec): 408.65 - samples/sec: 273.03 - lr: 0.000002 - momentum: 0.000000 2023-10-13 10:49:04,820 epoch 10 - iter 1540/1546 - loss 0.00279265 - time (sec): 453.21 - samples/sec: 273.24 - lr: 0.000000 - momentum: 0.000000 2023-10-13 10:49:06,424 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:49:06,424 EPOCH 10 done: loss 0.0028 - lr: 0.000000 2023-10-13 10:49:24,834 DEV : loss 0.11898898333311081 - f1-score (micro avg) 0.7871 2023-10-13 10:49:25,906 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:49:25,908 Loading model from best epoch ... 2023-10-13 10:49:30,245 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-BUILDING, B-BUILDING, E-BUILDING, I-BUILDING, S-STREET, B-STREET, E-STREET, I-STREET 2023-10-13 10:50:25,764 Results: - F-score (micro) 0.8119 - F-score (macro) 0.7384 - Accuracy 0.7023 By class: precision recall f1-score support LOC 0.8253 0.8742 0.8491 946 BUILDING 0.6571 0.6216 0.6389 185 STREET 0.6769 0.7857 0.7273 56 micro avg 0.7939 0.8307 0.8119 1187 macro avg 0.7198 0.7605 0.7384 1187 weighted avg 0.7921 0.8307 0.8106 1187 2023-10-13 10:50:25,764 ----------------------------------------------------------------------------------------------------