2023-10-11 23:49:43,888 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:49:43,890 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 23:49:43,890 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:49:43,890 MultiCorpus: 7142 train + 698 dev + 2570 test sentences - NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator 2023-10-11 23:49:43,890 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:49:43,891 Train: 7142 sentences 2023-10-11 23:49:43,891 (train_with_dev=False, train_with_test=False) 2023-10-11 23:49:43,891 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:49:43,891 Training Params: 2023-10-11 23:49:43,891 - learning_rate: "0.00015" 2023-10-11 23:49:43,891 - mini_batch_size: "4" 2023-10-11 23:49:43,891 - max_epochs: "10" 2023-10-11 23:49:43,891 - shuffle: "True" 2023-10-11 23:49:43,891 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:49:43,891 Plugins: 2023-10-11 23:49:43,891 - TensorboardLogger 2023-10-11 23:49:43,891 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 23:49:43,891 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:49:43,891 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 23:49:43,891 - metric: "('micro avg', 'f1-score')" 2023-10-11 23:49:43,892 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:49:43,892 Computation: 2023-10-11 23:49:43,892 - compute on device: cuda:0 2023-10-11 23:49:43,892 - embedding storage: none 2023-10-11 23:49:43,892 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:49:43,892 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-5" 2023-10-11 23:49:43,892 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:49:43,892 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:49:43,892 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 23:50:42,826 epoch 1 - iter 178/1786 - loss 2.80679544 - time (sec): 58.93 - samples/sec: 459.62 - lr: 0.000015 - momentum: 0.000000 2023-10-11 23:51:38,946 epoch 1 - iter 356/1786 - loss 2.65034125 - time (sec): 115.05 - samples/sec: 459.46 - lr: 0.000030 - momentum: 0.000000 2023-10-11 23:52:36,349 epoch 1 - iter 534/1786 - loss 2.36581802 - time (sec): 172.45 - samples/sec: 462.09 - lr: 0.000045 - momentum: 0.000000 2023-10-11 23:53:31,134 epoch 1 - iter 712/1786 - loss 2.08798123 - time (sec): 227.24 - samples/sec: 461.30 - lr: 0.000060 - momentum: 0.000000 2023-10-11 23:54:26,815 epoch 1 - iter 890/1786 - loss 1.83005658 - time (sec): 282.92 - samples/sec: 457.29 - lr: 0.000075 - momentum: 0.000000 2023-10-11 23:55:20,922 epoch 1 - iter 1068/1786 - loss 1.63577264 - time (sec): 337.03 - samples/sec: 454.06 - lr: 0.000090 - momentum: 0.000000 2023-10-11 23:56:15,319 epoch 1 - iter 1246/1786 - loss 1.47180576 - time (sec): 391.43 - samples/sec: 452.29 - lr: 0.000105 - momentum: 0.000000 2023-10-11 23:57:09,254 epoch 1 - iter 1424/1786 - loss 1.34808408 - time (sec): 445.36 - samples/sec: 447.88 - lr: 0.000120 - momentum: 0.000000 2023-10-11 23:58:01,774 epoch 1 - iter 1602/1786 - loss 1.23634249 - time (sec): 497.88 - samples/sec: 449.14 - lr: 0.000134 - momentum: 0.000000 2023-10-11 23:58:54,318 epoch 1 - iter 1780/1786 - loss 1.14122061 - time (sec): 550.42 - samples/sec: 450.73 - lr: 0.000149 - momentum: 0.000000 2023-10-11 23:58:55,874 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:58:55,875 EPOCH 1 done: loss 1.1386 - lr: 0.000149 2023-10-11 23:59:15,310 DEV : loss 0.18069760501384735 - f1-score (micro avg) 0.5999 2023-10-11 23:59:15,342 saving best model 2023-10-11 23:59:16,229 ---------------------------------------------------------------------------------------------------- 2023-10-12 00:00:10,169 epoch 2 - iter 178/1786 - loss 0.20711620 - time (sec): 53.94 - samples/sec: 462.66 - lr: 0.000148 - momentum: 0.000000 2023-10-12 00:01:04,653 epoch 2 - iter 356/1786 - loss 0.19247005 - time (sec): 108.42 - samples/sec: 464.22 - lr: 0.000147 - momentum: 0.000000 2023-10-12 00:02:00,536 epoch 2 - iter 534/1786 - loss 0.17794878 - time (sec): 164.30 - samples/sec: 458.93 - lr: 0.000145 - momentum: 0.000000 2023-10-12 00:02:55,523 epoch 2 - iter 712/1786 - loss 0.16748993 - time (sec): 219.29 - samples/sec: 455.31 - lr: 0.000143 - momentum: 0.000000 2023-10-12 00:03:52,876 epoch 2 - iter 890/1786 - loss 0.15565609 - time (sec): 276.64 - samples/sec: 456.14 - lr: 0.000142 - momentum: 0.000000 2023-10-12 00:04:47,012 epoch 2 - iter 1068/1786 - loss 0.15126686 - time (sec): 330.78 - samples/sec: 451.88 - lr: 0.000140 - momentum: 0.000000 2023-10-12 00:05:41,211 epoch 2 - iter 1246/1786 - loss 0.14667868 - time (sec): 384.98 - samples/sec: 450.19 - lr: 0.000138 - momentum: 0.000000 2023-10-12 00:06:37,571 epoch 2 - iter 1424/1786 - loss 0.14213535 - time (sec): 441.34 - samples/sec: 450.42 - lr: 0.000137 - momentum: 0.000000 2023-10-12 00:07:31,406 epoch 2 - iter 1602/1786 - loss 0.13922305 - time (sec): 495.17 - samples/sec: 450.14 - lr: 0.000135 - momentum: 0.000000 2023-10-12 00:08:24,847 epoch 2 - iter 1780/1786 - loss 0.13416379 - time (sec): 548.62 - samples/sec: 451.32 - lr: 0.000133 - momentum: 0.000000 2023-10-12 00:08:26,740 ---------------------------------------------------------------------------------------------------- 2023-10-12 00:08:26,741 EPOCH 2 done: loss 0.1342 - lr: 0.000133 2023-10-12 00:08:48,065 DEV : loss 0.11438284069299698 - f1-score (micro avg) 0.7637 2023-10-12 00:08:48,096 saving best model 2023-10-12 00:08:51,147 ---------------------------------------------------------------------------------------------------- 2023-10-12 00:09:44,510 epoch 3 - iter 178/1786 - loss 0.06683015 - time (sec): 53.36 - samples/sec: 459.96 - lr: 0.000132 - momentum: 0.000000 2023-10-12 00:10:39,022 epoch 3 - iter 356/1786 - loss 0.06750141 - time (sec): 107.87 - samples/sec: 464.38 - lr: 0.000130 - momentum: 0.000000 2023-10-12 00:11:33,861 epoch 3 - iter 534/1786 - loss 0.07127963 - time (sec): 162.71 - samples/sec: 453.13 - lr: 0.000128 - momentum: 0.000000 2023-10-12 00:12:28,893 epoch 3 - iter 712/1786 - loss 0.07159232 - time (sec): 217.74 - samples/sec: 449.53 - lr: 0.000127 - momentum: 0.000000 2023-10-12 00:13:25,201 epoch 3 - iter 890/1786 - loss 0.06993220 - time (sec): 274.05 - samples/sec: 451.16 - lr: 0.000125 - momentum: 0.000000 2023-10-12 00:14:19,775 epoch 3 - iter 1068/1786 - loss 0.07164423 - time (sec): 328.62 - samples/sec: 454.34 - lr: 0.000123 - momentum: 0.000000 2023-10-12 00:15:14,214 epoch 3 - iter 1246/1786 - loss 0.07103782 - time (sec): 383.06 - samples/sec: 454.45 - lr: 0.000122 - momentum: 0.000000 2023-10-12 00:16:08,217 epoch 3 - iter 1424/1786 - loss 0.07200320 - time (sec): 437.07 - samples/sec: 451.39 - lr: 0.000120 - momentum: 0.000000 2023-10-12 00:17:03,415 epoch 3 - iter 1602/1786 - loss 0.07374063 - time (sec): 492.26 - samples/sec: 450.32 - lr: 0.000118 - momentum: 0.000000 2023-10-12 00:17:59,334 epoch 3 - iter 1780/1786 - loss 0.07209817 - time (sec): 548.18 - samples/sec: 452.30 - lr: 0.000117 - momentum: 0.000000 2023-10-12 00:18:00,989 ---------------------------------------------------------------------------------------------------- 2023-10-12 00:18:00,990 EPOCH 3 done: loss 0.0724 - lr: 0.000117 2023-10-12 00:18:24,115 DEV : loss 0.1281142383813858 - f1-score (micro avg) 0.7826 2023-10-12 00:18:24,147 saving best model 2023-10-12 00:18:40,438 ---------------------------------------------------------------------------------------------------- 2023-10-12 00:19:35,647 epoch 4 - iter 178/1786 - loss 0.05657524 - time (sec): 55.21 - samples/sec: 485.86 - lr: 0.000115 - momentum: 0.000000 2023-10-12 00:20:30,635 epoch 4 - iter 356/1786 - loss 0.05425663 - time (sec): 110.19 - samples/sec: 461.11 - lr: 0.000113 - momentum: 0.000000 2023-10-12 00:21:25,821 epoch 4 - iter 534/1786 - loss 0.05035837 - time (sec): 165.38 - samples/sec: 458.97 - lr: 0.000112 - momentum: 0.000000 2023-10-12 00:22:20,078 epoch 4 - iter 712/1786 - loss 0.05157633 - time (sec): 219.64 - samples/sec: 457.43 - lr: 0.000110 - momentum: 0.000000 2023-10-12 00:23:14,198 epoch 4 - iter 890/1786 - loss 0.05225442 - time (sec): 273.76 - samples/sec: 453.18 - lr: 0.000108 - momentum: 0.000000 2023-10-12 00:24:09,428 epoch 4 - iter 1068/1786 - loss 0.05116186 - time (sec): 328.99 - samples/sec: 453.97 - lr: 0.000107 - momentum: 0.000000 2023-10-12 00:25:03,579 epoch 4 - iter 1246/1786 - loss 0.05109624 - time (sec): 383.14 - samples/sec: 451.90 - lr: 0.000105 - momentum: 0.000000 2023-10-12 00:25:57,486 epoch 4 - iter 1424/1786 - loss 0.05046609 - time (sec): 437.04 - samples/sec: 451.18 - lr: 0.000103 - momentum: 0.000000 2023-10-12 00:26:54,137 epoch 4 - iter 1602/1786 - loss 0.05099487 - time (sec): 493.70 - samples/sec: 453.53 - lr: 0.000102 - momentum: 0.000000 2023-10-12 00:27:48,105 epoch 4 - iter 1780/1786 - loss 0.05104060 - time (sec): 547.66 - samples/sec: 453.01 - lr: 0.000100 - momentum: 0.000000 2023-10-12 00:27:49,733 ---------------------------------------------------------------------------------------------------- 2023-10-12 00:27:49,733 EPOCH 4 done: loss 0.0511 - lr: 0.000100 2023-10-12 00:28:11,834 DEV : loss 0.14586448669433594 - f1-score (micro avg) 0.8022 2023-10-12 00:28:11,868 saving best model 2023-10-12 00:28:22,595 ---------------------------------------------------------------------------------------------------- 2023-10-12 00:29:16,940 epoch 5 - iter 178/1786 - loss 0.03100195 - time (sec): 54.34 - samples/sec: 451.46 - lr: 0.000098 - momentum: 0.000000 2023-10-12 00:30:10,437 epoch 5 - iter 356/1786 - loss 0.03243173 - time (sec): 107.84 - samples/sec: 454.11 - lr: 0.000097 - momentum: 0.000000 2023-10-12 00:31:04,353 epoch 5 - iter 534/1786 - loss 0.03471857 - time (sec): 161.75 - samples/sec: 459.07 - lr: 0.000095 - momentum: 0.000000 2023-10-12 00:31:57,749 epoch 5 - iter 712/1786 - loss 0.03271395 - time (sec): 215.15 - samples/sec: 454.35 - lr: 0.000093 - momentum: 0.000000 2023-10-12 00:32:54,440 epoch 5 - iter 890/1786 - loss 0.03439854 - time (sec): 271.84 - samples/sec: 447.74 - lr: 0.000092 - momentum: 0.000000 2023-10-12 00:33:50,980 epoch 5 - iter 1068/1786 - loss 0.03404286 - time (sec): 328.38 - samples/sec: 445.72 - lr: 0.000090 - momentum: 0.000000 2023-10-12 00:34:49,518 epoch 5 - iter 1246/1786 - loss 0.03481655 - time (sec): 386.92 - samples/sec: 448.76 - lr: 0.000088 - momentum: 0.000000 2023-10-12 00:35:44,748 epoch 5 - iter 1424/1786 - loss 0.03575715 - time (sec): 442.15 - samples/sec: 448.84 - lr: 0.000087 - momentum: 0.000000 2023-10-12 00:36:41,662 epoch 5 - iter 1602/1786 - loss 0.03690779 - time (sec): 499.06 - samples/sec: 447.33 - lr: 0.000085 - momentum: 0.000000 2023-10-12 00:37:37,592 epoch 5 - iter 1780/1786 - loss 0.03763514 - time (sec): 554.99 - samples/sec: 447.04 - lr: 0.000083 - momentum: 0.000000 2023-10-12 00:37:39,239 ---------------------------------------------------------------------------------------------------- 2023-10-12 00:37:39,239 EPOCH 5 done: loss 0.0377 - lr: 0.000083 2023-10-12 00:38:02,466 DEV : loss 0.16297270357608795 - f1-score (micro avg) 0.8016 2023-10-12 00:38:02,499 ---------------------------------------------------------------------------------------------------- 2023-10-12 00:38:57,768 epoch 6 - iter 178/1786 - loss 0.02668569 - time (sec): 55.27 - samples/sec: 464.94 - lr: 0.000082 - momentum: 0.000000 2023-10-12 00:39:51,559 epoch 6 - iter 356/1786 - loss 0.02601327 - time (sec): 109.06 - samples/sec: 456.49 - lr: 0.000080 - momentum: 0.000000 2023-10-12 00:40:48,681 epoch 6 - iter 534/1786 - loss 0.02650022 - time (sec): 166.18 - samples/sec: 464.21 - lr: 0.000078 - momentum: 0.000000 2023-10-12 00:41:43,107 epoch 6 - iter 712/1786 - loss 0.02802396 - time (sec): 220.61 - samples/sec: 459.29 - lr: 0.000077 - momentum: 0.000000 2023-10-12 00:42:38,563 epoch 6 - iter 890/1786 - loss 0.02839531 - time (sec): 276.06 - samples/sec: 461.31 - lr: 0.000075 - momentum: 0.000000 2023-10-12 00:43:36,355 epoch 6 - iter 1068/1786 - loss 0.02844378 - time (sec): 333.85 - samples/sec: 454.37 - lr: 0.000073 - momentum: 0.000000 2023-10-12 00:44:32,658 epoch 6 - iter 1246/1786 - loss 0.02807222 - time (sec): 390.16 - samples/sec: 450.56 - lr: 0.000072 - momentum: 0.000000 2023-10-12 00:45:29,116 epoch 6 - iter 1424/1786 - loss 0.02797952 - time (sec): 446.62 - samples/sec: 449.93 - lr: 0.000070 - momentum: 0.000000 2023-10-12 00:46:23,766 epoch 6 - iter 1602/1786 - loss 0.02796450 - time (sec): 501.27 - samples/sec: 447.34 - lr: 0.000068 - momentum: 0.000000 2023-10-12 00:47:17,742 epoch 6 - iter 1780/1786 - loss 0.02870438 - time (sec): 555.24 - samples/sec: 446.07 - lr: 0.000067 - momentum: 0.000000 2023-10-12 00:47:19,636 ---------------------------------------------------------------------------------------------------- 2023-10-12 00:47:19,636 EPOCH 6 done: loss 0.0287 - lr: 0.000067 2023-10-12 00:47:42,108 DEV : loss 0.18033917248249054 - f1-score (micro avg) 0.7928 2023-10-12 00:47:42,140 ---------------------------------------------------------------------------------------------------- 2023-10-12 00:48:37,819 epoch 7 - iter 178/1786 - loss 0.02839486 - time (sec): 55.68 - samples/sec: 432.28 - lr: 0.000065 - momentum: 0.000000 2023-10-12 00:49:33,964 epoch 7 - iter 356/1786 - loss 0.02271661 - time (sec): 111.82 - samples/sec: 445.36 - lr: 0.000063 - momentum: 0.000000 2023-10-12 00:50:28,746 epoch 7 - iter 534/1786 - loss 0.02274955 - time (sec): 166.60 - samples/sec: 441.96 - lr: 0.000062 - momentum: 0.000000 2023-10-12 00:51:23,278 epoch 7 - iter 712/1786 - loss 0.02044904 - time (sec): 221.14 - samples/sec: 449.84 - lr: 0.000060 - momentum: 0.000000 2023-10-12 00:52:15,708 epoch 7 - iter 890/1786 - loss 0.02037246 - time (sec): 273.57 - samples/sec: 455.10 - lr: 0.000058 - momentum: 0.000000 2023-10-12 00:53:08,412 epoch 7 - iter 1068/1786 - loss 0.01965538 - time (sec): 326.27 - samples/sec: 457.62 - lr: 0.000057 - momentum: 0.000000 2023-10-12 00:54:02,855 epoch 7 - iter 1246/1786 - loss 0.01962515 - time (sec): 380.71 - samples/sec: 456.00 - lr: 0.000055 - momentum: 0.000000 2023-10-12 00:54:57,358 epoch 7 - iter 1424/1786 - loss 0.02031428 - time (sec): 435.22 - samples/sec: 454.87 - lr: 0.000053 - momentum: 0.000000 2023-10-12 00:55:52,160 epoch 7 - iter 1602/1786 - loss 0.02032900 - time (sec): 490.02 - samples/sec: 455.60 - lr: 0.000052 - momentum: 0.000000 2023-10-12 00:56:45,131 epoch 7 - iter 1780/1786 - loss 0.02028515 - time (sec): 542.99 - samples/sec: 456.49 - lr: 0.000050 - momentum: 0.000000 2023-10-12 00:56:46,800 ---------------------------------------------------------------------------------------------------- 2023-10-12 00:56:46,801 EPOCH 7 done: loss 0.0202 - lr: 0.000050 2023-10-12 00:57:08,705 DEV : loss 0.19071684777736664 - f1-score (micro avg) 0.7876 2023-10-12 00:57:08,738 ---------------------------------------------------------------------------------------------------- 2023-10-12 00:58:03,715 epoch 8 - iter 178/1786 - loss 0.01789598 - time (sec): 54.97 - samples/sec: 455.59 - lr: 0.000048 - momentum: 0.000000 2023-10-12 00:58:59,441 epoch 8 - iter 356/1786 - loss 0.01800163 - time (sec): 110.70 - samples/sec: 454.42 - lr: 0.000047 - momentum: 0.000000 2023-10-12 00:59:55,590 epoch 8 - iter 534/1786 - loss 0.01707568 - time (sec): 166.85 - samples/sec: 450.43 - lr: 0.000045 - momentum: 0.000000 2023-10-12 01:00:50,158 epoch 8 - iter 712/1786 - loss 0.01690096 - time (sec): 221.42 - samples/sec: 444.03 - lr: 0.000043 - momentum: 0.000000 2023-10-12 01:01:42,049 epoch 8 - iter 890/1786 - loss 0.01620502 - time (sec): 273.31 - samples/sec: 445.86 - lr: 0.000042 - momentum: 0.000000 2023-10-12 01:02:35,862 epoch 8 - iter 1068/1786 - loss 0.01592801 - time (sec): 327.12 - samples/sec: 452.71 - lr: 0.000040 - momentum: 0.000000 2023-10-12 01:03:29,081 epoch 8 - iter 1246/1786 - loss 0.01640440 - time (sec): 380.34 - samples/sec: 449.84 - lr: 0.000038 - momentum: 0.000000 2023-10-12 01:04:24,552 epoch 8 - iter 1424/1786 - loss 0.01591161 - time (sec): 435.81 - samples/sec: 452.42 - lr: 0.000037 - momentum: 0.000000 2023-10-12 01:05:20,992 epoch 8 - iter 1602/1786 - loss 0.01561111 - time (sec): 492.25 - samples/sec: 453.96 - lr: 0.000035 - momentum: 0.000000 2023-10-12 01:06:14,936 epoch 8 - iter 1780/1786 - loss 0.01528229 - time (sec): 546.20 - samples/sec: 453.87 - lr: 0.000033 - momentum: 0.000000 2023-10-12 01:06:16,726 ---------------------------------------------------------------------------------------------------- 2023-10-12 01:06:16,726 EPOCH 8 done: loss 0.0152 - lr: 0.000033 2023-10-12 01:06:38,521 DEV : loss 0.2063598781824112 - f1-score (micro avg) 0.7803 2023-10-12 01:06:38,552 ---------------------------------------------------------------------------------------------------- 2023-10-12 01:07:33,634 epoch 9 - iter 178/1786 - loss 0.01097164 - time (sec): 55.08 - samples/sec: 470.62 - lr: 0.000032 - momentum: 0.000000 2023-10-12 01:08:26,848 epoch 9 - iter 356/1786 - loss 0.00871512 - time (sec): 108.29 - samples/sec: 461.16 - lr: 0.000030 - momentum: 0.000000 2023-10-12 01:09:20,346 epoch 9 - iter 534/1786 - loss 0.01138957 - time (sec): 161.79 - samples/sec: 459.33 - lr: 0.000028 - momentum: 0.000000 2023-10-12 01:10:14,450 epoch 9 - iter 712/1786 - loss 0.01055240 - time (sec): 215.90 - samples/sec: 455.40 - lr: 0.000027 - momentum: 0.000000 2023-10-12 01:11:10,367 epoch 9 - iter 890/1786 - loss 0.00994944 - time (sec): 271.81 - samples/sec: 451.87 - lr: 0.000025 - momentum: 0.000000 2023-10-12 01:12:06,227 epoch 9 - iter 1068/1786 - loss 0.00965138 - time (sec): 327.67 - samples/sec: 455.01 - lr: 0.000023 - momentum: 0.000000 2023-10-12 01:13:00,155 epoch 9 - iter 1246/1786 - loss 0.01059332 - time (sec): 381.60 - samples/sec: 459.44 - lr: 0.000022 - momentum: 0.000000 2023-10-12 01:13:53,715 epoch 9 - iter 1424/1786 - loss 0.01083122 - time (sec): 435.16 - samples/sec: 460.45 - lr: 0.000020 - momentum: 0.000000 2023-10-12 01:14:47,729 epoch 9 - iter 1602/1786 - loss 0.01115893 - time (sec): 489.17 - samples/sec: 458.87 - lr: 0.000018 - momentum: 0.000000 2023-10-12 01:15:40,190 epoch 9 - iter 1780/1786 - loss 0.01106046 - time (sec): 541.64 - samples/sec: 458.00 - lr: 0.000017 - momentum: 0.000000 2023-10-12 01:15:41,822 ---------------------------------------------------------------------------------------------------- 2023-10-12 01:15:41,822 EPOCH 9 done: loss 0.0110 - lr: 0.000017 2023-10-12 01:16:03,451 DEV : loss 0.21163716912269592 - f1-score (micro avg) 0.791 2023-10-12 01:16:03,484 ---------------------------------------------------------------------------------------------------- 2023-10-12 01:16:56,422 epoch 10 - iter 178/1786 - loss 0.00564481 - time (sec): 52.94 - samples/sec: 476.78 - lr: 0.000015 - momentum: 0.000000 2023-10-12 01:17:50,049 epoch 10 - iter 356/1786 - loss 0.00861168 - time (sec): 106.56 - samples/sec: 473.86 - lr: 0.000013 - momentum: 0.000000 2023-10-12 01:18:41,937 epoch 10 - iter 534/1786 - loss 0.00834080 - time (sec): 158.45 - samples/sec: 478.79 - lr: 0.000012 - momentum: 0.000000 2023-10-12 01:19:35,823 epoch 10 - iter 712/1786 - loss 0.00920845 - time (sec): 212.34 - samples/sec: 476.27 - lr: 0.000010 - momentum: 0.000000 2023-10-12 01:20:28,835 epoch 10 - iter 890/1786 - loss 0.00915627 - time (sec): 265.35 - samples/sec: 476.63 - lr: 0.000008 - momentum: 0.000000 2023-10-12 01:21:19,749 epoch 10 - iter 1068/1786 - loss 0.00855923 - time (sec): 316.26 - samples/sec: 475.74 - lr: 0.000007 - momentum: 0.000000 2023-10-12 01:22:11,915 epoch 10 - iter 1246/1786 - loss 0.00883037 - time (sec): 368.43 - samples/sec: 478.28 - lr: 0.000005 - momentum: 0.000000 2023-10-12 01:23:02,937 epoch 10 - iter 1424/1786 - loss 0.00832188 - time (sec): 419.45 - samples/sec: 477.38 - lr: 0.000003 - momentum: 0.000000 2023-10-12 01:23:54,187 epoch 10 - iter 1602/1786 - loss 0.00826011 - time (sec): 470.70 - samples/sec: 476.61 - lr: 0.000002 - momentum: 0.000000 2023-10-12 01:24:45,370 epoch 10 - iter 1780/1786 - loss 0.00851589 - time (sec): 521.88 - samples/sec: 475.55 - lr: 0.000000 - momentum: 0.000000 2023-10-12 01:24:46,804 ---------------------------------------------------------------------------------------------------- 2023-10-12 01:24:46,804 EPOCH 10 done: loss 0.0085 - lr: 0.000000 2023-10-12 01:25:08,325 DEV : loss 0.20861276984214783 - f1-score (micro avg) 0.7838 2023-10-12 01:25:09,217 ---------------------------------------------------------------------------------------------------- 2023-10-12 01:25:09,219 Loading model from best epoch ... 2023-10-12 01:25:13,155 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-12 01:26:23,943 Results: - F-score (micro) 0.6972 - F-score (macro) 0.5865 - Accuracy 0.5472 By class: precision recall f1-score support LOC 0.7162 0.6959 0.7059 1095 PER 0.7693 0.7678 0.7685 1012 ORG 0.5026 0.5490 0.5248 357 HumanProd 0.2615 0.5152 0.3469 33 micro avg 0.6928 0.7016 0.6972 2497 macro avg 0.5624 0.6320 0.5865 2497 weighted avg 0.7012 0.7016 0.7006 2497 2023-10-12 01:26:23,943 ----------------------------------------------------------------------------------------------------