stefan-it's picture
Upload folder using huggingface_hub
f1ead6f
2023-10-10 14:21:26,577 ----------------------------------------------------------------------------------------------------
2023-10-10 14:21:26,580 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-10 14:21:26,580 ----------------------------------------------------------------------------------------------------
2023-10-10 14:21:26,580 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences
- NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator
2023-10-10 14:21:26,580 ----------------------------------------------------------------------------------------------------
2023-10-10 14:21:26,580 Train: 20847 sentences
2023-10-10 14:21:26,581 (train_with_dev=False, train_with_test=False)
2023-10-10 14:21:26,581 ----------------------------------------------------------------------------------------------------
2023-10-10 14:21:26,581 Training Params:
2023-10-10 14:21:26,581 - learning_rate: "0.00016"
2023-10-10 14:21:26,581 - mini_batch_size: "8"
2023-10-10 14:21:26,581 - max_epochs: "10"
2023-10-10 14:21:26,581 - shuffle: "True"
2023-10-10 14:21:26,581 ----------------------------------------------------------------------------------------------------
2023-10-10 14:21:26,581 Plugins:
2023-10-10 14:21:26,581 - TensorboardLogger
2023-10-10 14:21:26,581 - LinearScheduler | warmup_fraction: '0.1'
2023-10-10 14:21:26,581 ----------------------------------------------------------------------------------------------------
2023-10-10 14:21:26,581 Final evaluation on model from best epoch (best-model.pt)
2023-10-10 14:21:26,582 - metric: "('micro avg', 'f1-score')"
2023-10-10 14:21:26,582 ----------------------------------------------------------------------------------------------------
2023-10-10 14:21:26,582 Computation:
2023-10-10 14:21:26,582 - compute on device: cuda:0
2023-10-10 14:21:26,582 - embedding storage: none
2023-10-10 14:21:26,582 ----------------------------------------------------------------------------------------------------
2023-10-10 14:21:26,582 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-2"
2023-10-10 14:21:26,582 ----------------------------------------------------------------------------------------------------
2023-10-10 14:21:26,582 ----------------------------------------------------------------------------------------------------
2023-10-10 14:21:26,582 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-10 14:23:45,008 epoch 1 - iter 260/2606 - loss 2.82650621 - time (sec): 138.42 - samples/sec: 258.00 - lr: 0.000016 - momentum: 0.000000
2023-10-10 14:26:05,289 epoch 1 - iter 520/2606 - loss 2.56314673 - time (sec): 278.70 - samples/sec: 258.20 - lr: 0.000032 - momentum: 0.000000
2023-10-10 14:28:31,209 epoch 1 - iter 780/2606 - loss 2.14030820 - time (sec): 424.62 - samples/sec: 257.28 - lr: 0.000048 - momentum: 0.000000
2023-10-10 14:30:56,830 epoch 1 - iter 1040/2606 - loss 1.73439842 - time (sec): 570.25 - samples/sec: 260.41 - lr: 0.000064 - momentum: 0.000000
2023-10-10 14:33:17,157 epoch 1 - iter 1300/2606 - loss 1.47736320 - time (sec): 710.57 - samples/sec: 260.61 - lr: 0.000080 - momentum: 0.000000
2023-10-10 14:35:44,796 epoch 1 - iter 1560/2606 - loss 1.30609247 - time (sec): 858.21 - samples/sec: 260.09 - lr: 0.000096 - momentum: 0.000000
2023-10-10 14:38:04,201 epoch 1 - iter 1820/2606 - loss 1.17487372 - time (sec): 997.62 - samples/sec: 260.31 - lr: 0.000112 - momentum: 0.000000
2023-10-10 14:40:25,567 epoch 1 - iter 2080/2606 - loss 1.06691289 - time (sec): 1138.98 - samples/sec: 259.14 - lr: 0.000128 - momentum: 0.000000
2023-10-10 14:42:44,023 epoch 1 - iter 2340/2606 - loss 0.98460797 - time (sec): 1277.44 - samples/sec: 257.56 - lr: 0.000144 - momentum: 0.000000
2023-10-10 14:45:09,215 epoch 1 - iter 2600/2606 - loss 0.90554459 - time (sec): 1422.63 - samples/sec: 257.48 - lr: 0.000160 - momentum: 0.000000
2023-10-10 14:45:12,600 ----------------------------------------------------------------------------------------------------
2023-10-10 14:45:12,601 EPOCH 1 done: loss 0.9036 - lr: 0.000160
2023-10-10 14:45:51,723 DEV : loss 0.15153414011001587 - f1-score (micro avg) 0.2331
2023-10-10 14:45:51,776 saving best model
2023-10-10 14:45:52,788 ----------------------------------------------------------------------------------------------------
2023-10-10 14:48:11,787 epoch 2 - iter 260/2606 - loss 0.18842284 - time (sec): 139.00 - samples/sec: 251.93 - lr: 0.000158 - momentum: 0.000000
2023-10-10 14:50:30,954 epoch 2 - iter 520/2606 - loss 0.18792004 - time (sec): 278.16 - samples/sec: 254.84 - lr: 0.000156 - momentum: 0.000000
2023-10-10 14:52:50,250 epoch 2 - iter 780/2606 - loss 0.17880044 - time (sec): 417.46 - samples/sec: 257.67 - lr: 0.000155 - momentum: 0.000000
2023-10-10 14:55:12,932 epoch 2 - iter 1040/2606 - loss 0.17379042 - time (sec): 560.14 - samples/sec: 254.57 - lr: 0.000153 - momentum: 0.000000
2023-10-10 14:57:36,261 epoch 2 - iter 1300/2606 - loss 0.16829776 - time (sec): 703.47 - samples/sec: 255.54 - lr: 0.000151 - momentum: 0.000000
2023-10-10 14:59:58,439 epoch 2 - iter 1560/2606 - loss 0.16028935 - time (sec): 845.65 - samples/sec: 257.10 - lr: 0.000149 - momentum: 0.000000
2023-10-10 15:02:21,827 epoch 2 - iter 1820/2606 - loss 0.15744925 - time (sec): 989.04 - samples/sec: 257.84 - lr: 0.000148 - momentum: 0.000000
2023-10-10 15:04:42,614 epoch 2 - iter 2080/2606 - loss 0.15613165 - time (sec): 1129.82 - samples/sec: 257.27 - lr: 0.000146 - momentum: 0.000000
2023-10-10 15:07:03,205 epoch 2 - iter 2340/2606 - loss 0.15275373 - time (sec): 1270.42 - samples/sec: 256.73 - lr: 0.000144 - momentum: 0.000000
2023-10-10 15:09:24,020 epoch 2 - iter 2600/2606 - loss 0.14918224 - time (sec): 1411.23 - samples/sec: 259.85 - lr: 0.000142 - momentum: 0.000000
2023-10-10 15:09:26,918 ----------------------------------------------------------------------------------------------------
2023-10-10 15:09:26,919 EPOCH 2 done: loss 0.1490 - lr: 0.000142
2023-10-10 15:10:10,127 DEV : loss 0.11950229853391647 - f1-score (micro avg) 0.3234
2023-10-10 15:10:10,187 saving best model
2023-10-10 15:10:12,879 ----------------------------------------------------------------------------------------------------
2023-10-10 15:12:28,827 epoch 3 - iter 260/2606 - loss 0.08047863 - time (sec): 135.94 - samples/sec: 261.72 - lr: 0.000140 - momentum: 0.000000
2023-10-10 15:14:47,287 epoch 3 - iter 520/2606 - loss 0.08774055 - time (sec): 274.40 - samples/sec: 264.43 - lr: 0.000139 - momentum: 0.000000
2023-10-10 15:17:07,350 epoch 3 - iter 780/2606 - loss 0.08904143 - time (sec): 414.47 - samples/sec: 264.51 - lr: 0.000137 - momentum: 0.000000
2023-10-10 15:19:28,398 epoch 3 - iter 1040/2606 - loss 0.09225516 - time (sec): 555.51 - samples/sec: 261.92 - lr: 0.000135 - momentum: 0.000000
2023-10-10 15:21:56,150 epoch 3 - iter 1300/2606 - loss 0.09497422 - time (sec): 703.27 - samples/sec: 263.70 - lr: 0.000133 - momentum: 0.000000
2023-10-10 15:24:21,771 epoch 3 - iter 1560/2606 - loss 0.09353639 - time (sec): 848.89 - samples/sec: 262.80 - lr: 0.000132 - momentum: 0.000000
2023-10-10 15:26:45,547 epoch 3 - iter 1820/2606 - loss 0.09255565 - time (sec): 992.66 - samples/sec: 261.67 - lr: 0.000130 - momentum: 0.000000
2023-10-10 15:29:00,939 epoch 3 - iter 2080/2606 - loss 0.09167665 - time (sec): 1128.06 - samples/sec: 262.47 - lr: 0.000128 - momentum: 0.000000
2023-10-10 15:31:19,016 epoch 3 - iter 2340/2606 - loss 0.09077967 - time (sec): 1266.13 - samples/sec: 259.91 - lr: 0.000126 - momentum: 0.000000
2023-10-10 15:33:46,544 epoch 3 - iter 2600/2606 - loss 0.09046288 - time (sec): 1413.66 - samples/sec: 259.30 - lr: 0.000125 - momentum: 0.000000
2023-10-10 15:33:49,679 ----------------------------------------------------------------------------------------------------
2023-10-10 15:33:49,679 EPOCH 3 done: loss 0.0905 - lr: 0.000125
2023-10-10 15:34:30,739 DEV : loss 0.18015694618225098 - f1-score (micro avg) 0.3558
2023-10-10 15:34:30,790 saving best model
2023-10-10 15:34:33,476 ----------------------------------------------------------------------------------------------------
2023-10-10 15:36:51,151 epoch 4 - iter 260/2606 - loss 0.05157111 - time (sec): 137.67 - samples/sec: 267.59 - lr: 0.000123 - momentum: 0.000000
2023-10-10 15:39:11,379 epoch 4 - iter 520/2606 - loss 0.06164379 - time (sec): 277.90 - samples/sec: 274.53 - lr: 0.000121 - momentum: 0.000000
2023-10-10 15:41:28,765 epoch 4 - iter 780/2606 - loss 0.06041382 - time (sec): 415.28 - samples/sec: 271.33 - lr: 0.000119 - momentum: 0.000000
2023-10-10 15:43:44,655 epoch 4 - iter 1040/2606 - loss 0.06191568 - time (sec): 551.17 - samples/sec: 269.10 - lr: 0.000117 - momentum: 0.000000
2023-10-10 15:46:05,001 epoch 4 - iter 1300/2606 - loss 0.06056013 - time (sec): 691.52 - samples/sec: 268.16 - lr: 0.000116 - momentum: 0.000000
2023-10-10 15:48:16,573 epoch 4 - iter 1560/2606 - loss 0.06351392 - time (sec): 823.09 - samples/sec: 269.95 - lr: 0.000114 - momentum: 0.000000
2023-10-10 15:50:29,084 epoch 4 - iter 1820/2606 - loss 0.06300262 - time (sec): 955.60 - samples/sec: 272.07 - lr: 0.000112 - momentum: 0.000000
2023-10-10 15:52:38,568 epoch 4 - iter 2080/2606 - loss 0.06312479 - time (sec): 1085.09 - samples/sec: 271.77 - lr: 0.000110 - momentum: 0.000000
2023-10-10 15:54:50,406 epoch 4 - iter 2340/2606 - loss 0.06500771 - time (sec): 1216.93 - samples/sec: 272.23 - lr: 0.000109 - momentum: 0.000000
2023-10-10 15:56:59,946 epoch 4 - iter 2600/2606 - loss 0.06498073 - time (sec): 1346.47 - samples/sec: 272.46 - lr: 0.000107 - momentum: 0.000000
2023-10-10 15:57:02,689 ----------------------------------------------------------------------------------------------------
2023-10-10 15:57:02,690 EPOCH 4 done: loss 0.0650 - lr: 0.000107
2023-10-10 15:57:41,615 DEV : loss 0.22460463643074036 - f1-score (micro avg) 0.3585
2023-10-10 15:57:41,667 saving best model
2023-10-10 15:57:44,347 ----------------------------------------------------------------------------------------------------
2023-10-10 15:59:52,262 epoch 5 - iter 260/2606 - loss 0.03592166 - time (sec): 127.91 - samples/sec: 268.55 - lr: 0.000105 - momentum: 0.000000
2023-10-10 16:02:02,554 epoch 5 - iter 520/2606 - loss 0.04034587 - time (sec): 258.20 - samples/sec: 272.50 - lr: 0.000103 - momentum: 0.000000
2023-10-10 16:04:14,414 epoch 5 - iter 780/2606 - loss 0.03986586 - time (sec): 390.06 - samples/sec: 278.42 - lr: 0.000101 - momentum: 0.000000
2023-10-10 16:06:26,506 epoch 5 - iter 1040/2606 - loss 0.04078098 - time (sec): 522.15 - samples/sec: 277.65 - lr: 0.000100 - momentum: 0.000000
2023-10-10 16:08:39,982 epoch 5 - iter 1300/2606 - loss 0.04342342 - time (sec): 655.63 - samples/sec: 278.05 - lr: 0.000098 - momentum: 0.000000
2023-10-10 16:10:56,870 epoch 5 - iter 1560/2606 - loss 0.04386466 - time (sec): 792.52 - samples/sec: 276.41 - lr: 0.000096 - momentum: 0.000000
2023-10-10 16:13:22,636 epoch 5 - iter 1820/2606 - loss 0.04404192 - time (sec): 938.28 - samples/sec: 271.26 - lr: 0.000094 - momentum: 0.000000
2023-10-10 16:15:50,560 epoch 5 - iter 2080/2606 - loss 0.04511901 - time (sec): 1086.21 - samples/sec: 269.90 - lr: 0.000093 - momentum: 0.000000
2023-10-10 16:18:17,914 epoch 5 - iter 2340/2606 - loss 0.04514489 - time (sec): 1233.56 - samples/sec: 268.76 - lr: 0.000091 - momentum: 0.000000
2023-10-10 16:20:38,307 epoch 5 - iter 2600/2606 - loss 0.04534151 - time (sec): 1373.95 - samples/sec: 266.91 - lr: 0.000089 - momentum: 0.000000
2023-10-10 16:20:41,341 ----------------------------------------------------------------------------------------------------
2023-10-10 16:20:41,341 EPOCH 5 done: loss 0.0453 - lr: 0.000089
2023-10-10 16:21:26,960 DEV : loss 0.3552384078502655 - f1-score (micro avg) 0.3391
2023-10-10 16:21:27,020 ----------------------------------------------------------------------------------------------------
2023-10-10 16:23:46,309 epoch 6 - iter 260/2606 - loss 0.03073255 - time (sec): 139.29 - samples/sec: 251.20 - lr: 0.000087 - momentum: 0.000000
2023-10-10 16:26:06,409 epoch 6 - iter 520/2606 - loss 0.03336276 - time (sec): 279.39 - samples/sec: 250.59 - lr: 0.000085 - momentum: 0.000000
2023-10-10 16:28:31,218 epoch 6 - iter 780/2606 - loss 0.03342619 - time (sec): 424.20 - samples/sec: 255.01 - lr: 0.000084 - momentum: 0.000000
2023-10-10 16:30:56,809 epoch 6 - iter 1040/2606 - loss 0.03305085 - time (sec): 569.79 - samples/sec: 257.41 - lr: 0.000082 - momentum: 0.000000
2023-10-10 16:33:15,628 epoch 6 - iter 1300/2606 - loss 0.03234751 - time (sec): 708.61 - samples/sec: 261.61 - lr: 0.000080 - momentum: 0.000000
2023-10-10 16:35:27,531 epoch 6 - iter 1560/2606 - loss 0.03383010 - time (sec): 840.51 - samples/sec: 263.10 - lr: 0.000078 - momentum: 0.000000
2023-10-10 16:37:38,652 epoch 6 - iter 1820/2606 - loss 0.03360923 - time (sec): 971.63 - samples/sec: 265.18 - lr: 0.000077 - momentum: 0.000000
2023-10-10 16:39:50,837 epoch 6 - iter 2080/2606 - loss 0.03354235 - time (sec): 1103.82 - samples/sec: 267.05 - lr: 0.000075 - momentum: 0.000000
2023-10-10 16:42:00,685 epoch 6 - iter 2340/2606 - loss 0.03417729 - time (sec): 1233.66 - samples/sec: 266.96 - lr: 0.000073 - momentum: 0.000000
2023-10-10 16:44:19,012 epoch 6 - iter 2600/2606 - loss 0.03364077 - time (sec): 1371.99 - samples/sec: 267.28 - lr: 0.000071 - momentum: 0.000000
2023-10-10 16:44:21,887 ----------------------------------------------------------------------------------------------------
2023-10-10 16:44:21,888 EPOCH 6 done: loss 0.0336 - lr: 0.000071
2023-10-10 16:45:00,806 DEV : loss 0.34769406914711 - f1-score (micro avg) 0.3806
2023-10-10 16:45:00,863 saving best model
2023-10-10 16:45:03,552 ----------------------------------------------------------------------------------------------------
2023-10-10 16:47:10,382 epoch 7 - iter 260/2606 - loss 0.02507245 - time (sec): 126.82 - samples/sec: 279.71 - lr: 0.000069 - momentum: 0.000000
2023-10-10 16:49:17,745 epoch 7 - iter 520/2606 - loss 0.02198476 - time (sec): 254.19 - samples/sec: 280.93 - lr: 0.000068 - momentum: 0.000000
2023-10-10 16:51:24,652 epoch 7 - iter 780/2606 - loss 0.02230588 - time (sec): 381.09 - samples/sec: 283.27 - lr: 0.000066 - momentum: 0.000000
2023-10-10 16:53:32,924 epoch 7 - iter 1040/2606 - loss 0.02469517 - time (sec): 509.37 - samples/sec: 283.41 - lr: 0.000064 - momentum: 0.000000
2023-10-10 16:55:41,775 epoch 7 - iter 1300/2606 - loss 0.02514914 - time (sec): 638.22 - samples/sec: 286.11 - lr: 0.000062 - momentum: 0.000000
2023-10-10 16:57:48,643 epoch 7 - iter 1560/2606 - loss 0.02586490 - time (sec): 765.09 - samples/sec: 286.23 - lr: 0.000061 - momentum: 0.000000
2023-10-10 16:59:55,786 epoch 7 - iter 1820/2606 - loss 0.02506821 - time (sec): 892.23 - samples/sec: 286.32 - lr: 0.000059 - momentum: 0.000000
2023-10-10 17:02:01,919 epoch 7 - iter 2080/2606 - loss 0.02485056 - time (sec): 1018.36 - samples/sec: 284.07 - lr: 0.000057 - momentum: 0.000000
2023-10-10 17:04:11,098 epoch 7 - iter 2340/2606 - loss 0.02484702 - time (sec): 1147.54 - samples/sec: 284.94 - lr: 0.000055 - momentum: 0.000000
2023-10-10 17:06:22,781 epoch 7 - iter 2600/2606 - loss 0.02416949 - time (sec): 1279.22 - samples/sec: 286.55 - lr: 0.000053 - momentum: 0.000000
2023-10-10 17:06:25,669 ----------------------------------------------------------------------------------------------------
2023-10-10 17:06:25,669 EPOCH 7 done: loss 0.0242 - lr: 0.000053
2023-10-10 17:07:04,419 DEV : loss 0.38912469148635864 - f1-score (micro avg) 0.3886
2023-10-10 17:07:04,492 saving best model
2023-10-10 17:07:08,095 ----------------------------------------------------------------------------------------------------
2023-10-10 17:09:19,922 epoch 8 - iter 260/2606 - loss 0.01590417 - time (sec): 131.82 - samples/sec: 302.15 - lr: 0.000052 - momentum: 0.000000
2023-10-10 17:11:31,107 epoch 8 - iter 520/2606 - loss 0.01657268 - time (sec): 263.01 - samples/sec: 292.08 - lr: 0.000050 - momentum: 0.000000
2023-10-10 17:13:38,438 epoch 8 - iter 780/2606 - loss 0.01716544 - time (sec): 390.34 - samples/sec: 287.05 - lr: 0.000048 - momentum: 0.000000
2023-10-10 17:15:47,199 epoch 8 - iter 1040/2606 - loss 0.01681174 - time (sec): 519.10 - samples/sec: 288.07 - lr: 0.000046 - momentum: 0.000000
2023-10-10 17:17:54,730 epoch 8 - iter 1300/2606 - loss 0.01634833 - time (sec): 646.63 - samples/sec: 286.35 - lr: 0.000045 - momentum: 0.000000
2023-10-10 17:20:02,473 epoch 8 - iter 1560/2606 - loss 0.01711598 - time (sec): 774.37 - samples/sec: 287.31 - lr: 0.000043 - momentum: 0.000000
2023-10-10 17:22:11,287 epoch 8 - iter 1820/2606 - loss 0.01677195 - time (sec): 903.19 - samples/sec: 286.75 - lr: 0.000041 - momentum: 0.000000
2023-10-10 17:24:17,214 epoch 8 - iter 2080/2606 - loss 0.01710973 - time (sec): 1029.11 - samples/sec: 285.16 - lr: 0.000039 - momentum: 0.000000
2023-10-10 17:26:22,215 epoch 8 - iter 2340/2606 - loss 0.01765144 - time (sec): 1154.12 - samples/sec: 283.46 - lr: 0.000037 - momentum: 0.000000
2023-10-10 17:28:34,006 epoch 8 - iter 2600/2606 - loss 0.01846439 - time (sec): 1285.91 - samples/sec: 285.24 - lr: 0.000036 - momentum: 0.000000
2023-10-10 17:28:36,682 ----------------------------------------------------------------------------------------------------
2023-10-10 17:28:36,682 EPOCH 8 done: loss 0.0185 - lr: 0.000036
2023-10-10 17:29:14,536 DEV : loss 0.41515445709228516 - f1-score (micro avg) 0.3921
2023-10-10 17:29:14,587 saving best model
2023-10-10 17:29:17,244 ----------------------------------------------------------------------------------------------------
2023-10-10 17:31:26,353 epoch 9 - iter 260/2606 - loss 0.01719462 - time (sec): 129.10 - samples/sec: 286.82 - lr: 0.000034 - momentum: 0.000000
2023-10-10 17:33:37,943 epoch 9 - iter 520/2606 - loss 0.01306206 - time (sec): 260.69 - samples/sec: 296.75 - lr: 0.000032 - momentum: 0.000000
2023-10-10 17:35:45,408 epoch 9 - iter 780/2606 - loss 0.01249072 - time (sec): 388.15 - samples/sec: 291.69 - lr: 0.000030 - momentum: 0.000000
2023-10-10 17:37:50,026 epoch 9 - iter 1040/2606 - loss 0.01153243 - time (sec): 512.77 - samples/sec: 286.00 - lr: 0.000029 - momentum: 0.000000
2023-10-10 17:40:00,767 epoch 9 - iter 1300/2606 - loss 0.01176972 - time (sec): 643.51 - samples/sec: 288.25 - lr: 0.000027 - momentum: 0.000000
2023-10-10 17:42:06,166 epoch 9 - iter 1560/2606 - loss 0.01199337 - time (sec): 768.91 - samples/sec: 286.24 - lr: 0.000025 - momentum: 0.000000
2023-10-10 17:44:13,171 epoch 9 - iter 1820/2606 - loss 0.01190054 - time (sec): 895.91 - samples/sec: 284.89 - lr: 0.000023 - momentum: 0.000000
2023-10-10 17:46:21,808 epoch 9 - iter 2080/2606 - loss 0.01247403 - time (sec): 1024.55 - samples/sec: 284.48 - lr: 0.000021 - momentum: 0.000000
2023-10-10 17:48:30,521 epoch 9 - iter 2340/2606 - loss 0.01213440 - time (sec): 1153.26 - samples/sec: 284.01 - lr: 0.000020 - momentum: 0.000000
2023-10-10 17:50:42,947 epoch 9 - iter 2600/2606 - loss 0.01250624 - time (sec): 1285.69 - samples/sec: 285.36 - lr: 0.000018 - momentum: 0.000000
2023-10-10 17:50:45,623 ----------------------------------------------------------------------------------------------------
2023-10-10 17:50:45,624 EPOCH 9 done: loss 0.0125 - lr: 0.000018
2023-10-10 17:51:27,746 DEV : loss 0.4364360272884369 - f1-score (micro avg) 0.3877
2023-10-10 17:51:27,812 ----------------------------------------------------------------------------------------------------
2023-10-10 17:53:43,933 epoch 10 - iter 260/2606 - loss 0.01035287 - time (sec): 136.12 - samples/sec: 280.82 - lr: 0.000016 - momentum: 0.000000
2023-10-10 17:55:59,905 epoch 10 - iter 520/2606 - loss 0.00933623 - time (sec): 272.09 - samples/sec: 281.96 - lr: 0.000014 - momentum: 0.000000
2023-10-10 17:58:14,199 epoch 10 - iter 780/2606 - loss 0.00911268 - time (sec): 406.38 - samples/sec: 276.91 - lr: 0.000013 - momentum: 0.000000
2023-10-10 18:00:25,139 epoch 10 - iter 1040/2606 - loss 0.00886118 - time (sec): 537.32 - samples/sec: 270.22 - lr: 0.000011 - momentum: 0.000000
2023-10-10 18:02:36,223 epoch 10 - iter 1300/2606 - loss 0.00858393 - time (sec): 668.41 - samples/sec: 273.11 - lr: 0.000009 - momentum: 0.000000
2023-10-10 18:04:45,180 epoch 10 - iter 1560/2606 - loss 0.00837399 - time (sec): 797.37 - samples/sec: 274.08 - lr: 0.000007 - momentum: 0.000000
2023-10-10 18:06:54,526 epoch 10 - iter 1820/2606 - loss 0.00838034 - time (sec): 926.71 - samples/sec: 275.86 - lr: 0.000005 - momentum: 0.000000
2023-10-10 18:09:05,520 epoch 10 - iter 2080/2606 - loss 0.00862970 - time (sec): 1057.71 - samples/sec: 278.71 - lr: 0.000004 - momentum: 0.000000
2023-10-10 18:11:13,645 epoch 10 - iter 2340/2606 - loss 0.00858216 - time (sec): 1185.83 - samples/sec: 278.84 - lr: 0.000002 - momentum: 0.000000
2023-10-10 18:13:21,604 epoch 10 - iter 2600/2606 - loss 0.00867931 - time (sec): 1313.79 - samples/sec: 278.96 - lr: 0.000000 - momentum: 0.000000
2023-10-10 18:13:24,617 ----------------------------------------------------------------------------------------------------
2023-10-10 18:13:24,618 EPOCH 10 done: loss 0.0087 - lr: 0.000000
2023-10-10 18:14:05,757 DEV : loss 0.48594337701797485 - f1-score (micro avg) 0.3814
2023-10-10 18:14:06,788 ----------------------------------------------------------------------------------------------------
2023-10-10 18:14:06,791 Loading model from best epoch ...
2023-10-10 18:14:10,833 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-10 18:15:46,590
Results:
- F-score (micro) 0.4343
- F-score (macro) 0.3054
- Accuracy 0.2807
By class:
precision recall f1-score support
LOC 0.4638 0.4901 0.4766 1214
PER 0.4027 0.4480 0.4241 808
ORG 0.3274 0.3144 0.3208 353
HumanProd 0.0000 0.0000 0.0000 15
micro avg 0.4225 0.4469 0.4343 2390
macro avg 0.2985 0.3131 0.3054 2390
weighted avg 0.4201 0.4469 0.4328 2390
2023-10-10 18:15:46,590 ----------------------------------------------------------------------------------------------------