2023-10-11 17:36:29,073 ---------------------------------------------------------------------------------------------------- 2023-10-11 17:36:29,075 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 17:36:29,075 ---------------------------------------------------------------------------------------------------- 2023-10-11 17:36:29,075 MultiCorpus: 7142 train + 698 dev + 2570 test sentences - NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator 2023-10-11 17:36:29,075 ---------------------------------------------------------------------------------------------------- 2023-10-11 17:36:29,075 Train: 7142 sentences 2023-10-11 17:36:29,075 (train_with_dev=False, train_with_test=False) 2023-10-11 17:36:29,075 ---------------------------------------------------------------------------------------------------- 2023-10-11 17:36:29,075 Training Params: 2023-10-11 17:36:29,076 - learning_rate: "0.00015" 2023-10-11 17:36:29,076 - mini_batch_size: "4" 2023-10-11 17:36:29,076 - max_epochs: "10" 2023-10-11 17:36:29,076 - shuffle: "True" 2023-10-11 17:36:29,076 ---------------------------------------------------------------------------------------------------- 2023-10-11 17:36:29,076 Plugins: 2023-10-11 17:36:29,076 - TensorboardLogger 2023-10-11 17:36:29,076 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 17:36:29,076 ---------------------------------------------------------------------------------------------------- 2023-10-11 17:36:29,076 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 17:36:29,076 - metric: "('micro avg', 'f1-score')" 2023-10-11 17:36:29,076 ---------------------------------------------------------------------------------------------------- 2023-10-11 17:36:29,076 Computation: 2023-10-11 17:36:29,076 - compute on device: cuda:0 2023-10-11 17:36:29,076 - embedding storage: none 2023-10-11 17:36:29,076 ---------------------------------------------------------------------------------------------------- 2023-10-11 17:36:29,076 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4" 2023-10-11 17:36:29,077 ---------------------------------------------------------------------------------------------------- 2023-10-11 17:36:29,077 ---------------------------------------------------------------------------------------------------- 2023-10-11 17:36:29,077 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 17:37:20,456 epoch 1 - iter 178/1786 - loss 2.81924723 - time (sec): 51.38 - samples/sec: 454.30 - lr: 0.000015 - momentum: 0.000000 2023-10-11 17:38:12,331 epoch 1 - iter 356/1786 - loss 2.64980608 - time (sec): 103.25 - samples/sec: 464.05 - lr: 0.000030 - momentum: 0.000000 2023-10-11 17:39:03,793 epoch 1 - iter 534/1786 - loss 2.37963726 - time (sec): 154.71 - samples/sec: 465.85 - lr: 0.000045 - momentum: 0.000000 2023-10-11 17:39:56,229 epoch 1 - iter 712/1786 - loss 2.08450845 - time (sec): 207.15 - samples/sec: 467.58 - lr: 0.000060 - momentum: 0.000000 2023-10-11 17:40:47,993 epoch 1 - iter 890/1786 - loss 1.81534257 - time (sec): 258.91 - samples/sec: 464.69 - lr: 0.000075 - momentum: 0.000000 2023-10-11 17:41:41,378 epoch 1 - iter 1068/1786 - loss 1.59589063 - time (sec): 312.30 - samples/sec: 468.60 - lr: 0.000090 - momentum: 0.000000 2023-10-11 17:42:35,379 epoch 1 - iter 1246/1786 - loss 1.41200758 - time (sec): 366.30 - samples/sec: 472.40 - lr: 0.000105 - momentum: 0.000000 2023-10-11 17:43:27,539 epoch 1 - iter 1424/1786 - loss 1.28012419 - time (sec): 418.46 - samples/sec: 473.51 - lr: 0.000120 - momentum: 0.000000 2023-10-11 17:44:19,457 epoch 1 - iter 1602/1786 - loss 1.17694462 - time (sec): 470.38 - samples/sec: 472.74 - lr: 0.000134 - momentum: 0.000000 2023-10-11 17:45:12,644 epoch 1 - iter 1780/1786 - loss 1.08357721 - time (sec): 523.57 - samples/sec: 473.74 - lr: 0.000149 - momentum: 0.000000 2023-10-11 17:45:14,235 ---------------------------------------------------------------------------------------------------- 2023-10-11 17:45:14,235 EPOCH 1 done: loss 1.0810 - lr: 0.000149 2023-10-11 17:45:34,466 DEV : loss 0.20361904799938202 - f1-score (micro avg) 0.5388 2023-10-11 17:45:34,497 saving best model 2023-10-11 17:45:35,356 ---------------------------------------------------------------------------------------------------- 2023-10-11 17:46:28,481 epoch 2 - iter 178/1786 - loss 0.21290432 - time (sec): 53.12 - samples/sec: 484.33 - lr: 0.000148 - momentum: 0.000000 2023-10-11 17:47:21,521 epoch 2 - iter 356/1786 - loss 0.20649008 - time (sec): 106.16 - samples/sec: 479.48 - lr: 0.000147 - momentum: 0.000000 2023-10-11 17:48:16,086 epoch 2 - iter 534/1786 - loss 0.19057159 - time (sec): 160.73 - samples/sec: 482.34 - lr: 0.000145 - momentum: 0.000000 2023-10-11 17:49:08,697 epoch 2 - iter 712/1786 - loss 0.17993568 - time (sec): 213.34 - samples/sec: 472.54 - lr: 0.000143 - momentum: 0.000000 2023-10-11 17:50:01,167 epoch 2 - iter 890/1786 - loss 0.17058695 - time (sec): 265.81 - samples/sec: 471.80 - lr: 0.000142 - momentum: 0.000000 2023-10-11 17:50:53,743 epoch 2 - iter 1068/1786 - loss 0.16082396 - time (sec): 318.38 - samples/sec: 470.38 - lr: 0.000140 - momentum: 0.000000 2023-10-11 17:51:45,421 epoch 2 - iter 1246/1786 - loss 0.15569068 - time (sec): 370.06 - samples/sec: 469.79 - lr: 0.000138 - momentum: 0.000000 2023-10-11 17:52:38,828 epoch 2 - iter 1424/1786 - loss 0.14996741 - time (sec): 423.47 - samples/sec: 468.27 - lr: 0.000137 - momentum: 0.000000 2023-10-11 17:53:34,073 epoch 2 - iter 1602/1786 - loss 0.14514985 - time (sec): 478.71 - samples/sec: 463.26 - lr: 0.000135 - momentum: 0.000000 2023-10-11 17:54:30,018 epoch 2 - iter 1780/1786 - loss 0.14157379 - time (sec): 534.66 - samples/sec: 463.16 - lr: 0.000133 - momentum: 0.000000 2023-10-11 17:54:32,031 ---------------------------------------------------------------------------------------------------- 2023-10-11 17:54:32,032 EPOCH 2 done: loss 0.1412 - lr: 0.000133 2023-10-11 17:54:54,099 DEV : loss 0.10268282890319824 - f1-score (micro avg) 0.7698 2023-10-11 17:54:54,135 saving best model 2023-10-11 17:54:56,735 ---------------------------------------------------------------------------------------------------- 2023-10-11 17:55:52,053 epoch 3 - iter 178/1786 - loss 0.07581422 - time (sec): 55.31 - samples/sec: 448.39 - lr: 0.000132 - momentum: 0.000000 2023-10-11 17:56:46,482 epoch 3 - iter 356/1786 - loss 0.08043397 - time (sec): 109.74 - samples/sec: 455.00 - lr: 0.000130 - momentum: 0.000000 2023-10-11 17:57:40,496 epoch 3 - iter 534/1786 - loss 0.07520227 - time (sec): 163.76 - samples/sec: 452.76 - lr: 0.000128 - momentum: 0.000000 2023-10-11 17:58:34,143 epoch 3 - iter 712/1786 - loss 0.07247742 - time (sec): 217.40 - samples/sec: 451.46 - lr: 0.000127 - momentum: 0.000000 2023-10-11 17:59:27,936 epoch 3 - iter 890/1786 - loss 0.07466025 - time (sec): 271.20 - samples/sec: 452.66 - lr: 0.000125 - momentum: 0.000000 2023-10-11 18:00:21,462 epoch 3 - iter 1068/1786 - loss 0.07516997 - time (sec): 324.72 - samples/sec: 454.15 - lr: 0.000123 - momentum: 0.000000 2023-10-11 18:01:15,676 epoch 3 - iter 1246/1786 - loss 0.07379608 - time (sec): 378.94 - samples/sec: 453.76 - lr: 0.000122 - momentum: 0.000000 2023-10-11 18:02:09,837 epoch 3 - iter 1424/1786 - loss 0.07445074 - time (sec): 433.10 - samples/sec: 455.26 - lr: 0.000120 - momentum: 0.000000 2023-10-11 18:03:03,839 epoch 3 - iter 1602/1786 - loss 0.07358598 - time (sec): 487.10 - samples/sec: 456.93 - lr: 0.000118 - momentum: 0.000000 2023-10-11 18:03:58,389 epoch 3 - iter 1780/1786 - loss 0.07423850 - time (sec): 541.65 - samples/sec: 457.42 - lr: 0.000117 - momentum: 0.000000 2023-10-11 18:04:00,209 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:04:00,209 EPOCH 3 done: loss 0.0743 - lr: 0.000117 2023-10-11 18:04:21,642 DEV : loss 0.13196961581707 - f1-score (micro avg) 0.7713 2023-10-11 18:04:21,673 saving best model 2023-10-11 18:04:24,252 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:05:16,538 epoch 4 - iter 178/1786 - loss 0.05605407 - time (sec): 52.28 - samples/sec: 471.55 - lr: 0.000115 - momentum: 0.000000 2023-10-11 18:06:08,745 epoch 4 - iter 356/1786 - loss 0.05000837 - time (sec): 104.49 - samples/sec: 475.50 - lr: 0.000113 - momentum: 0.000000 2023-10-11 18:07:01,886 epoch 4 - iter 534/1786 - loss 0.05222854 - time (sec): 157.63 - samples/sec: 481.86 - lr: 0.000112 - momentum: 0.000000 2023-10-11 18:07:54,105 epoch 4 - iter 712/1786 - loss 0.05172677 - time (sec): 209.85 - samples/sec: 479.63 - lr: 0.000110 - momentum: 0.000000 2023-10-11 18:08:45,957 epoch 4 - iter 890/1786 - loss 0.05101322 - time (sec): 261.70 - samples/sec: 477.03 - lr: 0.000108 - momentum: 0.000000 2023-10-11 18:09:40,097 epoch 4 - iter 1068/1786 - loss 0.05070108 - time (sec): 315.84 - samples/sec: 474.53 - lr: 0.000107 - momentum: 0.000000 2023-10-11 18:10:36,928 epoch 4 - iter 1246/1786 - loss 0.05208864 - time (sec): 372.67 - samples/sec: 473.16 - lr: 0.000105 - momentum: 0.000000 2023-10-11 18:11:31,138 epoch 4 - iter 1424/1786 - loss 0.05315052 - time (sec): 426.88 - samples/sec: 467.69 - lr: 0.000103 - momentum: 0.000000 2023-10-11 18:12:24,160 epoch 4 - iter 1602/1786 - loss 0.05314471 - time (sec): 479.91 - samples/sec: 466.16 - lr: 0.000102 - momentum: 0.000000 2023-10-11 18:13:18,131 epoch 4 - iter 1780/1786 - loss 0.05220709 - time (sec): 533.88 - samples/sec: 464.57 - lr: 0.000100 - momentum: 0.000000 2023-10-11 18:13:19,774 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:13:19,775 EPOCH 4 done: loss 0.0522 - lr: 0.000100 2023-10-11 18:13:42,131 DEV : loss 0.15180714428424835 - f1-score (micro avg) 0.7815 2023-10-11 18:13:42,166 saving best model 2023-10-11 18:13:46,961 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:14:40,307 epoch 5 - iter 178/1786 - loss 0.04155092 - time (sec): 53.34 - samples/sec: 445.72 - lr: 0.000098 - momentum: 0.000000 2023-10-11 18:15:33,137 epoch 5 - iter 356/1786 - loss 0.03973759 - time (sec): 106.17 - samples/sec: 441.09 - lr: 0.000097 - momentum: 0.000000 2023-10-11 18:16:28,231 epoch 5 - iter 534/1786 - loss 0.03869343 - time (sec): 161.27 - samples/sec: 455.20 - lr: 0.000095 - momentum: 0.000000 2023-10-11 18:17:22,718 epoch 5 - iter 712/1786 - loss 0.03913412 - time (sec): 215.75 - samples/sec: 457.97 - lr: 0.000093 - momentum: 0.000000 2023-10-11 18:18:15,097 epoch 5 - iter 890/1786 - loss 0.03875654 - time (sec): 268.13 - samples/sec: 458.77 - lr: 0.000092 - momentum: 0.000000 2023-10-11 18:19:07,851 epoch 5 - iter 1068/1786 - loss 0.03759774 - time (sec): 320.89 - samples/sec: 456.30 - lr: 0.000090 - momentum: 0.000000 2023-10-11 18:20:03,982 epoch 5 - iter 1246/1786 - loss 0.03756197 - time (sec): 377.02 - samples/sec: 458.06 - lr: 0.000088 - momentum: 0.000000 2023-10-11 18:20:58,516 epoch 5 - iter 1424/1786 - loss 0.03737673 - time (sec): 431.55 - samples/sec: 455.81 - lr: 0.000087 - momentum: 0.000000 2023-10-11 18:21:52,617 epoch 5 - iter 1602/1786 - loss 0.03776158 - time (sec): 485.65 - samples/sec: 457.43 - lr: 0.000085 - momentum: 0.000000 2023-10-11 18:22:49,426 epoch 5 - iter 1780/1786 - loss 0.03807107 - time (sec): 542.46 - samples/sec: 456.73 - lr: 0.000083 - momentum: 0.000000 2023-10-11 18:22:51,427 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:22:51,427 EPOCH 5 done: loss 0.0380 - lr: 0.000083 2023-10-11 18:23:14,760 DEV : loss 0.16010259091854095 - f1-score (micro avg) 0.8083 2023-10-11 18:23:14,792 saving best model 2023-10-11 18:23:17,472 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:24:12,145 epoch 6 - iter 178/1786 - loss 0.03238792 - time (sec): 54.67 - samples/sec: 459.55 - lr: 0.000082 - momentum: 0.000000 2023-10-11 18:25:05,682 epoch 6 - iter 356/1786 - loss 0.02941863 - time (sec): 108.20 - samples/sec: 458.47 - lr: 0.000080 - momentum: 0.000000 2023-10-11 18:25:58,332 epoch 6 - iter 534/1786 - loss 0.02729745 - time (sec): 160.85 - samples/sec: 457.92 - lr: 0.000078 - momentum: 0.000000 2023-10-11 18:26:52,267 epoch 6 - iter 712/1786 - loss 0.02776539 - time (sec): 214.79 - samples/sec: 459.33 - lr: 0.000077 - momentum: 0.000000 2023-10-11 18:27:47,349 epoch 6 - iter 890/1786 - loss 0.02725920 - time (sec): 269.87 - samples/sec: 455.75 - lr: 0.000075 - momentum: 0.000000 2023-10-11 18:28:44,263 epoch 6 - iter 1068/1786 - loss 0.02841412 - time (sec): 326.79 - samples/sec: 454.06 - lr: 0.000073 - momentum: 0.000000 2023-10-11 18:29:41,292 epoch 6 - iter 1246/1786 - loss 0.02811007 - time (sec): 383.82 - samples/sec: 451.55 - lr: 0.000072 - momentum: 0.000000 2023-10-11 18:30:36,642 epoch 6 - iter 1424/1786 - loss 0.02811630 - time (sec): 439.16 - samples/sec: 451.16 - lr: 0.000070 - momentum: 0.000000 2023-10-11 18:31:31,328 epoch 6 - iter 1602/1786 - loss 0.02770252 - time (sec): 493.85 - samples/sec: 451.21 - lr: 0.000068 - momentum: 0.000000 2023-10-11 18:32:25,480 epoch 6 - iter 1780/1786 - loss 0.02744603 - time (sec): 548.00 - samples/sec: 452.48 - lr: 0.000067 - momentum: 0.000000 2023-10-11 18:32:27,134 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:32:27,135 EPOCH 6 done: loss 0.0275 - lr: 0.000067 2023-10-11 18:32:49,419 DEV : loss 0.1975564956665039 - f1-score (micro avg) 0.7922 2023-10-11 18:32:49,452 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:33:42,384 epoch 7 - iter 178/1786 - loss 0.02364687 - time (sec): 52.93 - samples/sec: 456.89 - lr: 0.000065 - momentum: 0.000000 2023-10-11 18:34:36,361 epoch 7 - iter 356/1786 - loss 0.02566367 - time (sec): 106.91 - samples/sec: 456.18 - lr: 0.000063 - momentum: 0.000000 2023-10-11 18:35:30,916 epoch 7 - iter 534/1786 - loss 0.02607659 - time (sec): 161.46 - samples/sec: 451.34 - lr: 0.000062 - momentum: 0.000000 2023-10-11 18:36:25,221 epoch 7 - iter 712/1786 - loss 0.02345986 - time (sec): 215.77 - samples/sec: 456.83 - lr: 0.000060 - momentum: 0.000000 2023-10-11 18:37:20,581 epoch 7 - iter 890/1786 - loss 0.02360187 - time (sec): 271.13 - samples/sec: 456.96 - lr: 0.000058 - momentum: 0.000000 2023-10-11 18:38:15,580 epoch 7 - iter 1068/1786 - loss 0.02238154 - time (sec): 326.13 - samples/sec: 455.70 - lr: 0.000057 - momentum: 0.000000 2023-10-11 18:39:10,005 epoch 7 - iter 1246/1786 - loss 0.02133457 - time (sec): 380.55 - samples/sec: 454.89 - lr: 0.000055 - momentum: 0.000000 2023-10-11 18:40:05,662 epoch 7 - iter 1424/1786 - loss 0.02172766 - time (sec): 436.21 - samples/sec: 455.44 - lr: 0.000053 - momentum: 0.000000 2023-10-11 18:41:01,549 epoch 7 - iter 1602/1786 - loss 0.02138396 - time (sec): 492.10 - samples/sec: 454.50 - lr: 0.000052 - momentum: 0.000000 2023-10-11 18:41:54,273 epoch 7 - iter 1780/1786 - loss 0.02081679 - time (sec): 544.82 - samples/sec: 454.51 - lr: 0.000050 - momentum: 0.000000 2023-10-11 18:41:56,288 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:41:56,288 EPOCH 7 done: loss 0.0207 - lr: 0.000050 2023-10-11 18:42:18,651 DEV : loss 0.2033122181892395 - f1-score (micro avg) 0.7906 2023-10-11 18:42:18,684 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:43:14,073 epoch 8 - iter 178/1786 - loss 0.01534023 - time (sec): 55.39 - samples/sec: 446.17 - lr: 0.000048 - momentum: 0.000000 2023-10-11 18:44:09,481 epoch 8 - iter 356/1786 - loss 0.01688005 - time (sec): 110.79 - samples/sec: 455.36 - lr: 0.000047 - momentum: 0.000000 2023-10-11 18:45:04,364 epoch 8 - iter 534/1786 - loss 0.01455467 - time (sec): 165.68 - samples/sec: 454.79 - lr: 0.000045 - momentum: 0.000000 2023-10-11 18:46:00,082 epoch 8 - iter 712/1786 - loss 0.01641305 - time (sec): 221.40 - samples/sec: 453.36 - lr: 0.000043 - momentum: 0.000000 2023-10-11 18:46:56,812 epoch 8 - iter 890/1786 - loss 0.01612635 - time (sec): 278.13 - samples/sec: 446.87 - lr: 0.000042 - momentum: 0.000000 2023-10-11 18:47:53,561 epoch 8 - iter 1068/1786 - loss 0.01499776 - time (sec): 334.87 - samples/sec: 443.86 - lr: 0.000040 - momentum: 0.000000 2023-10-11 18:48:50,595 epoch 8 - iter 1246/1786 - loss 0.01544156 - time (sec): 391.91 - samples/sec: 444.25 - lr: 0.000038 - momentum: 0.000000 2023-10-11 18:49:45,028 epoch 8 - iter 1424/1786 - loss 0.01554454 - time (sec): 446.34 - samples/sec: 440.67 - lr: 0.000037 - momentum: 0.000000 2023-10-11 18:50:39,377 epoch 8 - iter 1602/1786 - loss 0.01519250 - time (sec): 500.69 - samples/sec: 444.59 - lr: 0.000035 - momentum: 0.000000 2023-10-11 18:51:34,428 epoch 8 - iter 1780/1786 - loss 0.01551751 - time (sec): 555.74 - samples/sec: 446.37 - lr: 0.000033 - momentum: 0.000000 2023-10-11 18:51:36,048 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:51:36,048 EPOCH 8 done: loss 0.0155 - lr: 0.000033 2023-10-11 18:51:59,372 DEV : loss 0.2236306071281433 - f1-score (micro avg) 0.7925 2023-10-11 18:51:59,405 ---------------------------------------------------------------------------------------------------- 2023-10-11 18:52:53,303 epoch 9 - iter 178/1786 - loss 0.00534514 - time (sec): 53.90 - samples/sec: 438.90 - lr: 0.000032 - momentum: 0.000000 2023-10-11 18:53:48,375 epoch 9 - iter 356/1786 - loss 0.01015032 - time (sec): 108.97 - samples/sec: 453.68 - lr: 0.000030 - momentum: 0.000000 2023-10-11 18:54:44,141 epoch 9 - iter 534/1786 - loss 0.01306851 - time (sec): 164.73 - samples/sec: 459.92 - lr: 0.000028 - momentum: 0.000000 2023-10-11 18:55:39,984 epoch 9 - iter 712/1786 - loss 0.01202527 - time (sec): 220.58 - samples/sec: 456.11 - lr: 0.000027 - momentum: 0.000000 2023-10-11 18:56:34,979 epoch 9 - iter 890/1786 - loss 0.01142279 - time (sec): 275.57 - samples/sec: 452.74 - lr: 0.000025 - momentum: 0.000000 2023-10-11 18:57:31,235 epoch 9 - iter 1068/1786 - loss 0.01115001 - time (sec): 331.83 - samples/sec: 451.22 - lr: 0.000023 - momentum: 0.000000 2023-10-11 18:58:26,472 epoch 9 - iter 1246/1786 - loss 0.01140760 - time (sec): 387.06 - samples/sec: 449.42 - lr: 0.000022 - momentum: 0.000000 2023-10-11 18:59:22,703 epoch 9 - iter 1424/1786 - loss 0.01133329 - time (sec): 443.30 - samples/sec: 447.16 - lr: 0.000020 - momentum: 0.000000 2023-10-11 19:00:18,219 epoch 9 - iter 1602/1786 - loss 0.01110225 - time (sec): 498.81 - samples/sec: 446.00 - lr: 0.000018 - momentum: 0.000000 2023-10-11 19:01:15,055 epoch 9 - iter 1780/1786 - loss 0.01103208 - time (sec): 555.65 - samples/sec: 446.12 - lr: 0.000017 - momentum: 0.000000 2023-10-11 19:01:16,897 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:01:16,897 EPOCH 9 done: loss 0.0110 - lr: 0.000017 2023-10-11 19:01:39,717 DEV : loss 0.23193296790122986 - f1-score (micro avg) 0.7974 2023-10-11 19:01:39,750 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:02:37,006 epoch 10 - iter 178/1786 - loss 0.00698458 - time (sec): 57.25 - samples/sec: 439.62 - lr: 0.000015 - momentum: 0.000000 2023-10-11 19:03:34,910 epoch 10 - iter 356/1786 - loss 0.00817366 - time (sec): 115.16 - samples/sec: 445.19 - lr: 0.000013 - momentum: 0.000000 2023-10-11 19:04:32,347 epoch 10 - iter 534/1786 - loss 0.00869250 - time (sec): 172.59 - samples/sec: 432.02 - lr: 0.000012 - momentum: 0.000000 2023-10-11 19:05:30,527 epoch 10 - iter 712/1786 - loss 0.00881744 - time (sec): 230.77 - samples/sec: 430.16 - lr: 0.000010 - momentum: 0.000000 2023-10-11 19:06:24,815 epoch 10 - iter 890/1786 - loss 0.00835789 - time (sec): 285.06 - samples/sec: 429.36 - lr: 0.000008 - momentum: 0.000000 2023-10-11 19:07:21,724 epoch 10 - iter 1068/1786 - loss 0.00867803 - time (sec): 341.97 - samples/sec: 436.06 - lr: 0.000007 - momentum: 0.000000 2023-10-11 19:08:17,370 epoch 10 - iter 1246/1786 - loss 0.00883856 - time (sec): 397.62 - samples/sec: 434.74 - lr: 0.000005 - momentum: 0.000000 2023-10-11 19:09:14,246 epoch 10 - iter 1424/1786 - loss 0.00953875 - time (sec): 454.49 - samples/sec: 435.60 - lr: 0.000003 - momentum: 0.000000 2023-10-11 19:10:12,234 epoch 10 - iter 1602/1786 - loss 0.00908507 - time (sec): 512.48 - samples/sec: 435.95 - lr: 0.000002 - momentum: 0.000000 2023-10-11 19:11:08,898 epoch 10 - iter 1780/1786 - loss 0.00899664 - time (sec): 569.15 - samples/sec: 436.09 - lr: 0.000000 - momentum: 0.000000 2023-10-11 19:11:10,476 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:11:10,477 EPOCH 10 done: loss 0.0090 - lr: 0.000000 2023-10-11 19:11:34,051 DEV : loss 0.2363290935754776 - f1-score (micro avg) 0.7909 2023-10-11 19:11:35,015 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:11:35,017 Loading model from best epoch ... 2023-10-11 19:11:41,051 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-11 19:12:53,145 Results: - F-score (micro) 0.6861 - F-score (macro) 0.6058 - Accuracy 0.5407 By class: precision recall f1-score support LOC 0.7063 0.7050 0.7057 1095 PER 0.7809 0.7500 0.7651 1012 ORG 0.4030 0.5994 0.4820 357 HumanProd 0.3846 0.6061 0.4706 33 micro avg 0.6665 0.7068 0.6861 2497 macro avg 0.5687 0.6651 0.6058 2497 weighted avg 0.6889 0.7068 0.6947 2497 2023-10-11 19:12:53,145 ----------------------------------------------------------------------------------------------------