2023-10-10 14:21:26,577 ---------------------------------------------------------------------------------------------------- 2023-10-10 14:21:26,580 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-10 14:21:26,580 ---------------------------------------------------------------------------------------------------- 2023-10-10 14:21:26,580 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator 2023-10-10 14:21:26,580 ---------------------------------------------------------------------------------------------------- 2023-10-10 14:21:26,580 Train: 20847 sentences 2023-10-10 14:21:26,581 (train_with_dev=False, train_with_test=False) 2023-10-10 14:21:26,581 ---------------------------------------------------------------------------------------------------- 2023-10-10 14:21:26,581 Training Params: 2023-10-10 14:21:26,581 - learning_rate: "0.00016" 2023-10-10 14:21:26,581 - mini_batch_size: "8" 2023-10-10 14:21:26,581 - max_epochs: "10" 2023-10-10 14:21:26,581 - shuffle: "True" 2023-10-10 14:21:26,581 ---------------------------------------------------------------------------------------------------- 2023-10-10 14:21:26,581 Plugins: 2023-10-10 14:21:26,581 - TensorboardLogger 2023-10-10 14:21:26,581 - LinearScheduler | warmup_fraction: '0.1' 2023-10-10 14:21:26,581 ---------------------------------------------------------------------------------------------------- 2023-10-10 14:21:26,581 Final evaluation on model from best epoch (best-model.pt) 2023-10-10 14:21:26,582 - metric: "('micro avg', 'f1-score')" 2023-10-10 14:21:26,582 ---------------------------------------------------------------------------------------------------- 2023-10-10 14:21:26,582 Computation: 2023-10-10 14:21:26,582 - compute on device: cuda:0 2023-10-10 14:21:26,582 - embedding storage: none 2023-10-10 14:21:26,582 ---------------------------------------------------------------------------------------------------- 2023-10-10 14:21:26,582 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-2" 2023-10-10 14:21:26,582 ---------------------------------------------------------------------------------------------------- 2023-10-10 14:21:26,582 ---------------------------------------------------------------------------------------------------- 2023-10-10 14:21:26,582 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-10 14:23:45,008 epoch 1 - iter 260/2606 - loss 2.82650621 - time (sec): 138.42 - samples/sec: 258.00 - lr: 0.000016 - momentum: 0.000000 2023-10-10 14:26:05,289 epoch 1 - iter 520/2606 - loss 2.56314673 - time (sec): 278.70 - samples/sec: 258.20 - lr: 0.000032 - momentum: 0.000000 2023-10-10 14:28:31,209 epoch 1 - iter 780/2606 - loss 2.14030820 - time (sec): 424.62 - samples/sec: 257.28 - lr: 0.000048 - momentum: 0.000000 2023-10-10 14:30:56,830 epoch 1 - iter 1040/2606 - loss 1.73439842 - time (sec): 570.25 - samples/sec: 260.41 - lr: 0.000064 - momentum: 0.000000 2023-10-10 14:33:17,157 epoch 1 - iter 1300/2606 - loss 1.47736320 - time (sec): 710.57 - samples/sec: 260.61 - lr: 0.000080 - momentum: 0.000000 2023-10-10 14:35:44,796 epoch 1 - iter 1560/2606 - loss 1.30609247 - time (sec): 858.21 - samples/sec: 260.09 - lr: 0.000096 - momentum: 0.000000 2023-10-10 14:38:04,201 epoch 1 - iter 1820/2606 - loss 1.17487372 - time (sec): 997.62 - samples/sec: 260.31 - lr: 0.000112 - momentum: 0.000000 2023-10-10 14:40:25,567 epoch 1 - iter 2080/2606 - loss 1.06691289 - time (sec): 1138.98 - samples/sec: 259.14 - lr: 0.000128 - momentum: 0.000000 2023-10-10 14:42:44,023 epoch 1 - iter 2340/2606 - loss 0.98460797 - time (sec): 1277.44 - samples/sec: 257.56 - lr: 0.000144 - momentum: 0.000000 2023-10-10 14:45:09,215 epoch 1 - iter 2600/2606 - loss 0.90554459 - time (sec): 1422.63 - samples/sec: 257.48 - lr: 0.000160 - momentum: 0.000000 2023-10-10 14:45:12,600 ---------------------------------------------------------------------------------------------------- 2023-10-10 14:45:12,601 EPOCH 1 done: loss 0.9036 - lr: 0.000160 2023-10-10 14:45:51,723 DEV : loss 0.15153414011001587 - f1-score (micro avg) 0.2331 2023-10-10 14:45:51,776 saving best model 2023-10-10 14:45:52,788 ---------------------------------------------------------------------------------------------------- 2023-10-10 14:48:11,787 epoch 2 - iter 260/2606 - loss 0.18842284 - time (sec): 139.00 - samples/sec: 251.93 - lr: 0.000158 - momentum: 0.000000 2023-10-10 14:50:30,954 epoch 2 - iter 520/2606 - loss 0.18792004 - time (sec): 278.16 - samples/sec: 254.84 - lr: 0.000156 - momentum: 0.000000 2023-10-10 14:52:50,250 epoch 2 - iter 780/2606 - loss 0.17880044 - time (sec): 417.46 - samples/sec: 257.67 - lr: 0.000155 - momentum: 0.000000 2023-10-10 14:55:12,932 epoch 2 - iter 1040/2606 - loss 0.17379042 - time (sec): 560.14 - samples/sec: 254.57 - lr: 0.000153 - momentum: 0.000000 2023-10-10 14:57:36,261 epoch 2 - iter 1300/2606 - loss 0.16829776 - time (sec): 703.47 - samples/sec: 255.54 - lr: 0.000151 - momentum: 0.000000 2023-10-10 14:59:58,439 epoch 2 - iter 1560/2606 - loss 0.16028935 - time (sec): 845.65 - samples/sec: 257.10 - lr: 0.000149 - momentum: 0.000000 2023-10-10 15:02:21,827 epoch 2 - iter 1820/2606 - loss 0.15744925 - time (sec): 989.04 - samples/sec: 257.84 - lr: 0.000148 - momentum: 0.000000 2023-10-10 15:04:42,614 epoch 2 - iter 2080/2606 - loss 0.15613165 - time (sec): 1129.82 - samples/sec: 257.27 - lr: 0.000146 - momentum: 0.000000 2023-10-10 15:07:03,205 epoch 2 - iter 2340/2606 - loss 0.15275373 - time (sec): 1270.42 - samples/sec: 256.73 - lr: 0.000144 - momentum: 0.000000 2023-10-10 15:09:24,020 epoch 2 - iter 2600/2606 - loss 0.14918224 - time (sec): 1411.23 - samples/sec: 259.85 - lr: 0.000142 - momentum: 0.000000 2023-10-10 15:09:26,918 ---------------------------------------------------------------------------------------------------- 2023-10-10 15:09:26,919 EPOCH 2 done: loss 0.1490 - lr: 0.000142 2023-10-10 15:10:10,127 DEV : loss 0.11950229853391647 - f1-score (micro avg) 0.3234 2023-10-10 15:10:10,187 saving best model 2023-10-10 15:10:12,879 ---------------------------------------------------------------------------------------------------- 2023-10-10 15:12:28,827 epoch 3 - iter 260/2606 - loss 0.08047863 - time (sec): 135.94 - samples/sec: 261.72 - lr: 0.000140 - momentum: 0.000000 2023-10-10 15:14:47,287 epoch 3 - iter 520/2606 - loss 0.08774055 - time (sec): 274.40 - samples/sec: 264.43 - lr: 0.000139 - momentum: 0.000000 2023-10-10 15:17:07,350 epoch 3 - iter 780/2606 - loss 0.08904143 - time (sec): 414.47 - samples/sec: 264.51 - lr: 0.000137 - momentum: 0.000000 2023-10-10 15:19:28,398 epoch 3 - iter 1040/2606 - loss 0.09225516 - time (sec): 555.51 - samples/sec: 261.92 - lr: 0.000135 - momentum: 0.000000 2023-10-10 15:21:56,150 epoch 3 - iter 1300/2606 - loss 0.09497422 - time (sec): 703.27 - samples/sec: 263.70 - lr: 0.000133 - momentum: 0.000000 2023-10-10 15:24:21,771 epoch 3 - iter 1560/2606 - loss 0.09353639 - time (sec): 848.89 - samples/sec: 262.80 - lr: 0.000132 - momentum: 0.000000 2023-10-10 15:26:45,547 epoch 3 - iter 1820/2606 - loss 0.09255565 - time (sec): 992.66 - samples/sec: 261.67 - lr: 0.000130 - momentum: 0.000000 2023-10-10 15:29:00,939 epoch 3 - iter 2080/2606 - loss 0.09167665 - time (sec): 1128.06 - samples/sec: 262.47 - lr: 0.000128 - momentum: 0.000000 2023-10-10 15:31:19,016 epoch 3 - iter 2340/2606 - loss 0.09077967 - time (sec): 1266.13 - samples/sec: 259.91 - lr: 0.000126 - momentum: 0.000000 2023-10-10 15:33:46,544 epoch 3 - iter 2600/2606 - loss 0.09046288 - time (sec): 1413.66 - samples/sec: 259.30 - lr: 0.000125 - momentum: 0.000000 2023-10-10 15:33:49,679 ---------------------------------------------------------------------------------------------------- 2023-10-10 15:33:49,679 EPOCH 3 done: loss 0.0905 - lr: 0.000125 2023-10-10 15:34:30,739 DEV : loss 0.18015694618225098 - f1-score (micro avg) 0.3558 2023-10-10 15:34:30,790 saving best model 2023-10-10 15:34:33,476 ---------------------------------------------------------------------------------------------------- 2023-10-10 15:36:51,151 epoch 4 - iter 260/2606 - loss 0.05157111 - time (sec): 137.67 - samples/sec: 267.59 - lr: 0.000123 - momentum: 0.000000 2023-10-10 15:39:11,379 epoch 4 - iter 520/2606 - loss 0.06164379 - time (sec): 277.90 - samples/sec: 274.53 - lr: 0.000121 - momentum: 0.000000 2023-10-10 15:41:28,765 epoch 4 - iter 780/2606 - loss 0.06041382 - time (sec): 415.28 - samples/sec: 271.33 - lr: 0.000119 - momentum: 0.000000 2023-10-10 15:43:44,655 epoch 4 - iter 1040/2606 - loss 0.06191568 - time (sec): 551.17 - samples/sec: 269.10 - lr: 0.000117 - momentum: 0.000000 2023-10-10 15:46:05,001 epoch 4 - iter 1300/2606 - loss 0.06056013 - time (sec): 691.52 - samples/sec: 268.16 - lr: 0.000116 - momentum: 0.000000 2023-10-10 15:48:16,573 epoch 4 - iter 1560/2606 - loss 0.06351392 - time (sec): 823.09 - samples/sec: 269.95 - lr: 0.000114 - momentum: 0.000000 2023-10-10 15:50:29,084 epoch 4 - iter 1820/2606 - loss 0.06300262 - time (sec): 955.60 - samples/sec: 272.07 - lr: 0.000112 - momentum: 0.000000 2023-10-10 15:52:38,568 epoch 4 - iter 2080/2606 - loss 0.06312479 - time (sec): 1085.09 - samples/sec: 271.77 - lr: 0.000110 - momentum: 0.000000 2023-10-10 15:54:50,406 epoch 4 - iter 2340/2606 - loss 0.06500771 - time (sec): 1216.93 - samples/sec: 272.23 - lr: 0.000109 - momentum: 0.000000 2023-10-10 15:56:59,946 epoch 4 - iter 2600/2606 - loss 0.06498073 - time (sec): 1346.47 - samples/sec: 272.46 - lr: 0.000107 - momentum: 0.000000 2023-10-10 15:57:02,689 ---------------------------------------------------------------------------------------------------- 2023-10-10 15:57:02,690 EPOCH 4 done: loss 0.0650 - lr: 0.000107 2023-10-10 15:57:41,615 DEV : loss 0.22460463643074036 - f1-score (micro avg) 0.3585 2023-10-10 15:57:41,667 saving best model 2023-10-10 15:57:44,347 ---------------------------------------------------------------------------------------------------- 2023-10-10 15:59:52,262 epoch 5 - iter 260/2606 - loss 0.03592166 - time (sec): 127.91 - samples/sec: 268.55 - lr: 0.000105 - momentum: 0.000000 2023-10-10 16:02:02,554 epoch 5 - iter 520/2606 - loss 0.04034587 - time (sec): 258.20 - samples/sec: 272.50 - lr: 0.000103 - momentum: 0.000000 2023-10-10 16:04:14,414 epoch 5 - iter 780/2606 - loss 0.03986586 - time (sec): 390.06 - samples/sec: 278.42 - lr: 0.000101 - momentum: 0.000000 2023-10-10 16:06:26,506 epoch 5 - iter 1040/2606 - loss 0.04078098 - time (sec): 522.15 - samples/sec: 277.65 - lr: 0.000100 - momentum: 0.000000 2023-10-10 16:08:39,982 epoch 5 - iter 1300/2606 - loss 0.04342342 - time (sec): 655.63 - samples/sec: 278.05 - lr: 0.000098 - momentum: 0.000000 2023-10-10 16:10:56,870 epoch 5 - iter 1560/2606 - loss 0.04386466 - time (sec): 792.52 - samples/sec: 276.41 - lr: 0.000096 - momentum: 0.000000 2023-10-10 16:13:22,636 epoch 5 - iter 1820/2606 - loss 0.04404192 - time (sec): 938.28 - samples/sec: 271.26 - lr: 0.000094 - momentum: 0.000000 2023-10-10 16:15:50,560 epoch 5 - iter 2080/2606 - loss 0.04511901 - time (sec): 1086.21 - samples/sec: 269.90 - lr: 0.000093 - momentum: 0.000000 2023-10-10 16:18:17,914 epoch 5 - iter 2340/2606 - loss 0.04514489 - time (sec): 1233.56 - samples/sec: 268.76 - lr: 0.000091 - momentum: 0.000000 2023-10-10 16:20:38,307 epoch 5 - iter 2600/2606 - loss 0.04534151 - time (sec): 1373.95 - samples/sec: 266.91 - lr: 0.000089 - momentum: 0.000000 2023-10-10 16:20:41,341 ---------------------------------------------------------------------------------------------------- 2023-10-10 16:20:41,341 EPOCH 5 done: loss 0.0453 - lr: 0.000089 2023-10-10 16:21:26,960 DEV : loss 0.3552384078502655 - f1-score (micro avg) 0.3391 2023-10-10 16:21:27,020 ---------------------------------------------------------------------------------------------------- 2023-10-10 16:23:46,309 epoch 6 - iter 260/2606 - loss 0.03073255 - time (sec): 139.29 - samples/sec: 251.20 - lr: 0.000087 - momentum: 0.000000 2023-10-10 16:26:06,409 epoch 6 - iter 520/2606 - loss 0.03336276 - time (sec): 279.39 - samples/sec: 250.59 - lr: 0.000085 - momentum: 0.000000 2023-10-10 16:28:31,218 epoch 6 - iter 780/2606 - loss 0.03342619 - time (sec): 424.20 - samples/sec: 255.01 - lr: 0.000084 - momentum: 0.000000 2023-10-10 16:30:56,809 epoch 6 - iter 1040/2606 - loss 0.03305085 - time (sec): 569.79 - samples/sec: 257.41 - lr: 0.000082 - momentum: 0.000000 2023-10-10 16:33:15,628 epoch 6 - iter 1300/2606 - loss 0.03234751 - time (sec): 708.61 - samples/sec: 261.61 - lr: 0.000080 - momentum: 0.000000 2023-10-10 16:35:27,531 epoch 6 - iter 1560/2606 - loss 0.03383010 - time (sec): 840.51 - samples/sec: 263.10 - lr: 0.000078 - momentum: 0.000000 2023-10-10 16:37:38,652 epoch 6 - iter 1820/2606 - loss 0.03360923 - time (sec): 971.63 - samples/sec: 265.18 - lr: 0.000077 - momentum: 0.000000 2023-10-10 16:39:50,837 epoch 6 - iter 2080/2606 - loss 0.03354235 - time (sec): 1103.82 - samples/sec: 267.05 - lr: 0.000075 - momentum: 0.000000 2023-10-10 16:42:00,685 epoch 6 - iter 2340/2606 - loss 0.03417729 - time (sec): 1233.66 - samples/sec: 266.96 - lr: 0.000073 - momentum: 0.000000 2023-10-10 16:44:19,012 epoch 6 - iter 2600/2606 - loss 0.03364077 - time (sec): 1371.99 - samples/sec: 267.28 - lr: 0.000071 - momentum: 0.000000 2023-10-10 16:44:21,887 ---------------------------------------------------------------------------------------------------- 2023-10-10 16:44:21,888 EPOCH 6 done: loss 0.0336 - lr: 0.000071 2023-10-10 16:45:00,806 DEV : loss 0.34769406914711 - f1-score (micro avg) 0.3806 2023-10-10 16:45:00,863 saving best model 2023-10-10 16:45:03,552 ---------------------------------------------------------------------------------------------------- 2023-10-10 16:47:10,382 epoch 7 - iter 260/2606 - loss 0.02507245 - time (sec): 126.82 - samples/sec: 279.71 - lr: 0.000069 - momentum: 0.000000 2023-10-10 16:49:17,745 epoch 7 - iter 520/2606 - loss 0.02198476 - time (sec): 254.19 - samples/sec: 280.93 - lr: 0.000068 - momentum: 0.000000 2023-10-10 16:51:24,652 epoch 7 - iter 780/2606 - loss 0.02230588 - time (sec): 381.09 - samples/sec: 283.27 - lr: 0.000066 - momentum: 0.000000 2023-10-10 16:53:32,924 epoch 7 - iter 1040/2606 - loss 0.02469517 - time (sec): 509.37 - samples/sec: 283.41 - lr: 0.000064 - momentum: 0.000000 2023-10-10 16:55:41,775 epoch 7 - iter 1300/2606 - loss 0.02514914 - time (sec): 638.22 - samples/sec: 286.11 - lr: 0.000062 - momentum: 0.000000 2023-10-10 16:57:48,643 epoch 7 - iter 1560/2606 - loss 0.02586490 - time (sec): 765.09 - samples/sec: 286.23 - lr: 0.000061 - momentum: 0.000000 2023-10-10 16:59:55,786 epoch 7 - iter 1820/2606 - loss 0.02506821 - time (sec): 892.23 - samples/sec: 286.32 - lr: 0.000059 - momentum: 0.000000 2023-10-10 17:02:01,919 epoch 7 - iter 2080/2606 - loss 0.02485056 - time (sec): 1018.36 - samples/sec: 284.07 - lr: 0.000057 - momentum: 0.000000 2023-10-10 17:04:11,098 epoch 7 - iter 2340/2606 - loss 0.02484702 - time (sec): 1147.54 - samples/sec: 284.94 - lr: 0.000055 - momentum: 0.000000 2023-10-10 17:06:22,781 epoch 7 - iter 2600/2606 - loss 0.02416949 - time (sec): 1279.22 - samples/sec: 286.55 - lr: 0.000053 - momentum: 0.000000 2023-10-10 17:06:25,669 ---------------------------------------------------------------------------------------------------- 2023-10-10 17:06:25,669 EPOCH 7 done: loss 0.0242 - lr: 0.000053 2023-10-10 17:07:04,419 DEV : loss 0.38912469148635864 - f1-score (micro avg) 0.3886 2023-10-10 17:07:04,492 saving best model 2023-10-10 17:07:08,095 ---------------------------------------------------------------------------------------------------- 2023-10-10 17:09:19,922 epoch 8 - iter 260/2606 - loss 0.01590417 - time (sec): 131.82 - samples/sec: 302.15 - lr: 0.000052 - momentum: 0.000000 2023-10-10 17:11:31,107 epoch 8 - iter 520/2606 - loss 0.01657268 - time (sec): 263.01 - samples/sec: 292.08 - lr: 0.000050 - momentum: 0.000000 2023-10-10 17:13:38,438 epoch 8 - iter 780/2606 - loss 0.01716544 - time (sec): 390.34 - samples/sec: 287.05 - lr: 0.000048 - momentum: 0.000000 2023-10-10 17:15:47,199 epoch 8 - iter 1040/2606 - loss 0.01681174 - time (sec): 519.10 - samples/sec: 288.07 - lr: 0.000046 - momentum: 0.000000 2023-10-10 17:17:54,730 epoch 8 - iter 1300/2606 - loss 0.01634833 - time (sec): 646.63 - samples/sec: 286.35 - lr: 0.000045 - momentum: 0.000000 2023-10-10 17:20:02,473 epoch 8 - iter 1560/2606 - loss 0.01711598 - time (sec): 774.37 - samples/sec: 287.31 - lr: 0.000043 - momentum: 0.000000 2023-10-10 17:22:11,287 epoch 8 - iter 1820/2606 - loss 0.01677195 - time (sec): 903.19 - samples/sec: 286.75 - lr: 0.000041 - momentum: 0.000000 2023-10-10 17:24:17,214 epoch 8 - iter 2080/2606 - loss 0.01710973 - time (sec): 1029.11 - samples/sec: 285.16 - lr: 0.000039 - momentum: 0.000000 2023-10-10 17:26:22,215 epoch 8 - iter 2340/2606 - loss 0.01765144 - time (sec): 1154.12 - samples/sec: 283.46 - lr: 0.000037 - momentum: 0.000000 2023-10-10 17:28:34,006 epoch 8 - iter 2600/2606 - loss 0.01846439 - time (sec): 1285.91 - samples/sec: 285.24 - lr: 0.000036 - momentum: 0.000000 2023-10-10 17:28:36,682 ---------------------------------------------------------------------------------------------------- 2023-10-10 17:28:36,682 EPOCH 8 done: loss 0.0185 - lr: 0.000036 2023-10-10 17:29:14,536 DEV : loss 0.41515445709228516 - f1-score (micro avg) 0.3921 2023-10-10 17:29:14,587 saving best model 2023-10-10 17:29:17,244 ---------------------------------------------------------------------------------------------------- 2023-10-10 17:31:26,353 epoch 9 - iter 260/2606 - loss 0.01719462 - time (sec): 129.10 - samples/sec: 286.82 - lr: 0.000034 - momentum: 0.000000 2023-10-10 17:33:37,943 epoch 9 - iter 520/2606 - loss 0.01306206 - time (sec): 260.69 - samples/sec: 296.75 - lr: 0.000032 - momentum: 0.000000 2023-10-10 17:35:45,408 epoch 9 - iter 780/2606 - loss 0.01249072 - time (sec): 388.15 - samples/sec: 291.69 - lr: 0.000030 - momentum: 0.000000 2023-10-10 17:37:50,026 epoch 9 - iter 1040/2606 - loss 0.01153243 - time (sec): 512.77 - samples/sec: 286.00 - lr: 0.000029 - momentum: 0.000000 2023-10-10 17:40:00,767 epoch 9 - iter 1300/2606 - loss 0.01176972 - time (sec): 643.51 - samples/sec: 288.25 - lr: 0.000027 - momentum: 0.000000 2023-10-10 17:42:06,166 epoch 9 - iter 1560/2606 - loss 0.01199337 - time (sec): 768.91 - samples/sec: 286.24 - lr: 0.000025 - momentum: 0.000000 2023-10-10 17:44:13,171 epoch 9 - iter 1820/2606 - loss 0.01190054 - time (sec): 895.91 - samples/sec: 284.89 - lr: 0.000023 - momentum: 0.000000 2023-10-10 17:46:21,808 epoch 9 - iter 2080/2606 - loss 0.01247403 - time (sec): 1024.55 - samples/sec: 284.48 - lr: 0.000021 - momentum: 0.000000 2023-10-10 17:48:30,521 epoch 9 - iter 2340/2606 - loss 0.01213440 - time (sec): 1153.26 - samples/sec: 284.01 - lr: 0.000020 - momentum: 0.000000 2023-10-10 17:50:42,947 epoch 9 - iter 2600/2606 - loss 0.01250624 - time (sec): 1285.69 - samples/sec: 285.36 - lr: 0.000018 - momentum: 0.000000 2023-10-10 17:50:45,623 ---------------------------------------------------------------------------------------------------- 2023-10-10 17:50:45,624 EPOCH 9 done: loss 0.0125 - lr: 0.000018 2023-10-10 17:51:27,746 DEV : loss 0.4364360272884369 - f1-score (micro avg) 0.3877 2023-10-10 17:51:27,812 ---------------------------------------------------------------------------------------------------- 2023-10-10 17:53:43,933 epoch 10 - iter 260/2606 - loss 0.01035287 - time (sec): 136.12 - samples/sec: 280.82 - lr: 0.000016 - momentum: 0.000000 2023-10-10 17:55:59,905 epoch 10 - iter 520/2606 - loss 0.00933623 - time (sec): 272.09 - samples/sec: 281.96 - lr: 0.000014 - momentum: 0.000000 2023-10-10 17:58:14,199 epoch 10 - iter 780/2606 - loss 0.00911268 - time (sec): 406.38 - samples/sec: 276.91 - lr: 0.000013 - momentum: 0.000000 2023-10-10 18:00:25,139 epoch 10 - iter 1040/2606 - loss 0.00886118 - time (sec): 537.32 - samples/sec: 270.22 - lr: 0.000011 - momentum: 0.000000 2023-10-10 18:02:36,223 epoch 10 - iter 1300/2606 - loss 0.00858393 - time (sec): 668.41 - samples/sec: 273.11 - lr: 0.000009 - momentum: 0.000000 2023-10-10 18:04:45,180 epoch 10 - iter 1560/2606 - loss 0.00837399 - time (sec): 797.37 - samples/sec: 274.08 - lr: 0.000007 - momentum: 0.000000 2023-10-10 18:06:54,526 epoch 10 - iter 1820/2606 - loss 0.00838034 - time (sec): 926.71 - samples/sec: 275.86 - lr: 0.000005 - momentum: 0.000000 2023-10-10 18:09:05,520 epoch 10 - iter 2080/2606 - loss 0.00862970 - time (sec): 1057.71 - samples/sec: 278.71 - lr: 0.000004 - momentum: 0.000000 2023-10-10 18:11:13,645 epoch 10 - iter 2340/2606 - loss 0.00858216 - time (sec): 1185.83 - samples/sec: 278.84 - lr: 0.000002 - momentum: 0.000000 2023-10-10 18:13:21,604 epoch 10 - iter 2600/2606 - loss 0.00867931 - time (sec): 1313.79 - samples/sec: 278.96 - lr: 0.000000 - momentum: 0.000000 2023-10-10 18:13:24,617 ---------------------------------------------------------------------------------------------------- 2023-10-10 18:13:24,618 EPOCH 10 done: loss 0.0087 - lr: 0.000000 2023-10-10 18:14:05,757 DEV : loss 0.48594337701797485 - f1-score (micro avg) 0.3814 2023-10-10 18:14:06,788 ---------------------------------------------------------------------------------------------------- 2023-10-10 18:14:06,791 Loading model from best epoch ... 2023-10-10 18:14:10,833 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-10 18:15:46,590 Results: - F-score (micro) 0.4343 - F-score (macro) 0.3054 - Accuracy 0.2807 By class: precision recall f1-score support LOC 0.4638 0.4901 0.4766 1214 PER 0.4027 0.4480 0.4241 808 ORG 0.3274 0.3144 0.3208 353 HumanProd 0.0000 0.0000 0.0000 15 micro avg 0.4225 0.4469 0.4343 2390 macro avg 0.2985 0.3131 0.3054 2390 weighted avg 0.4201 0.4469 0.4328 2390 2023-10-10 18:15:46,590 ----------------------------------------------------------------------------------------------------