2023-10-06 14:11:25,611 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:11:25,612 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-06 14:11:25,612 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:11:25,612 MultiCorpus: 1214 train + 266 dev + 251 test sentences - NER_HIPE_2022 Corpus: 1214 train + 266 dev + 251 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/en/with_doc_seperator 2023-10-06 14:11:25,612 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:11:25,612 Train: 1214 sentences 2023-10-06 14:11:25,612 (train_with_dev=False, train_with_test=False) 2023-10-06 14:11:25,612 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:11:25,613 Training Params: 2023-10-06 14:11:25,613 - learning_rate: "0.00016" 2023-10-06 14:11:25,613 - mini_batch_size: "4" 2023-10-06 14:11:25,613 - max_epochs: "10" 2023-10-06 14:11:25,613 - shuffle: "True" 2023-10-06 14:11:25,613 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:11:25,613 Plugins: 2023-10-06 14:11:25,613 - TensorboardLogger 2023-10-06 14:11:25,613 - LinearScheduler | warmup_fraction: '0.1' 2023-10-06 14:11:25,613 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:11:25,613 Final evaluation on model from best epoch (best-model.pt) 2023-10-06 14:11:25,613 - metric: "('micro avg', 'f1-score')" 2023-10-06 14:11:25,613 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:11:25,613 Computation: 2023-10-06 14:11:25,613 - compute on device: cuda:0 2023-10-06 14:11:25,613 - embedding storage: none 2023-10-06 14:11:25,613 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:11:25,613 Model training base path: "hmbench-ajmc/en-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4" 2023-10-06 14:11:25,613 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:11:25,613 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:11:25,614 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-06 14:11:36,883 epoch 1 - iter 30/304 - loss 3.24794717 - time (sec): 11.27 - samples/sec: 271.92 - lr: 0.000015 - momentum: 0.000000 2023-10-06 14:11:48,388 epoch 1 - iter 60/304 - loss 3.23649970 - time (sec): 22.77 - samples/sec: 271.03 - lr: 0.000031 - momentum: 0.000000 2023-10-06 14:11:59,480 epoch 1 - iter 90/304 - loss 3.21188821 - time (sec): 33.86 - samples/sec: 270.37 - lr: 0.000047 - momentum: 0.000000 2023-10-06 14:12:10,887 epoch 1 - iter 120/304 - loss 3.13757647 - time (sec): 45.27 - samples/sec: 270.54 - lr: 0.000063 - momentum: 0.000000 2023-10-06 14:12:22,654 epoch 1 - iter 150/304 - loss 3.02078335 - time (sec): 57.04 - samples/sec: 272.02 - lr: 0.000078 - momentum: 0.000000 2023-10-06 14:12:34,042 epoch 1 - iter 180/304 - loss 2.90736795 - time (sec): 68.43 - samples/sec: 271.43 - lr: 0.000094 - momentum: 0.000000 2023-10-06 14:12:45,083 epoch 1 - iter 210/304 - loss 2.78130427 - time (sec): 79.47 - samples/sec: 270.54 - lr: 0.000110 - momentum: 0.000000 2023-10-06 14:12:56,244 epoch 1 - iter 240/304 - loss 2.64825607 - time (sec): 90.63 - samples/sec: 269.30 - lr: 0.000126 - momentum: 0.000000 2023-10-06 14:13:08,193 epoch 1 - iter 270/304 - loss 2.48837414 - time (sec): 102.58 - samples/sec: 270.25 - lr: 0.000142 - momentum: 0.000000 2023-10-06 14:13:18,910 epoch 1 - iter 300/304 - loss 2.34919296 - time (sec): 113.29 - samples/sec: 269.78 - lr: 0.000157 - momentum: 0.000000 2023-10-06 14:13:20,396 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:13:20,397 EPOCH 1 done: loss 2.3323 - lr: 0.000157 2023-10-06 14:13:27,220 DEV : loss 0.8810060620307922 - f1-score (micro avg) 0.0 2023-10-06 14:13:27,227 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:13:38,873 epoch 2 - iter 30/304 - loss 0.74122896 - time (sec): 11.65 - samples/sec: 276.51 - lr: 0.000158 - momentum: 0.000000 2023-10-06 14:13:50,109 epoch 2 - iter 60/304 - loss 0.68607239 - time (sec): 22.88 - samples/sec: 272.36 - lr: 0.000157 - momentum: 0.000000 2023-10-06 14:14:01,460 epoch 2 - iter 90/304 - loss 0.66330841 - time (sec): 34.23 - samples/sec: 271.12 - lr: 0.000155 - momentum: 0.000000 2023-10-06 14:14:13,021 epoch 2 - iter 120/304 - loss 0.62361761 - time (sec): 45.79 - samples/sec: 274.36 - lr: 0.000153 - momentum: 0.000000 2023-10-06 14:14:23,948 epoch 2 - iter 150/304 - loss 0.60082612 - time (sec): 56.72 - samples/sec: 273.75 - lr: 0.000151 - momentum: 0.000000 2023-10-06 14:14:35,282 epoch 2 - iter 180/304 - loss 0.56186732 - time (sec): 68.05 - samples/sec: 272.54 - lr: 0.000150 - momentum: 0.000000 2023-10-06 14:14:46,549 epoch 2 - iter 210/304 - loss 0.54139779 - time (sec): 79.32 - samples/sec: 272.38 - lr: 0.000148 - momentum: 0.000000 2023-10-06 14:14:57,495 epoch 2 - iter 240/304 - loss 0.51007846 - time (sec): 90.27 - samples/sec: 272.36 - lr: 0.000146 - momentum: 0.000000 2023-10-06 14:15:09,184 epoch 2 - iter 270/304 - loss 0.49170617 - time (sec): 101.96 - samples/sec: 273.59 - lr: 0.000144 - momentum: 0.000000 2023-10-06 14:15:20,178 epoch 2 - iter 300/304 - loss 0.47259793 - time (sec): 112.95 - samples/sec: 272.11 - lr: 0.000143 - momentum: 0.000000 2023-10-06 14:15:21,353 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:15:21,353 EPOCH 2 done: loss 0.4708 - lr: 0.000143 2023-10-06 14:15:28,534 DEV : loss 0.3191007077693939 - f1-score (micro avg) 0.4907 2023-10-06 14:15:28,543 saving best model 2023-10-06 14:15:29,413 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:15:40,687 epoch 3 - iter 30/304 - loss 0.22409010 - time (sec): 11.27 - samples/sec: 264.63 - lr: 0.000141 - momentum: 0.000000 2023-10-06 14:15:52,371 epoch 3 - iter 60/304 - loss 0.22786621 - time (sec): 22.96 - samples/sec: 268.94 - lr: 0.000139 - momentum: 0.000000 2023-10-06 14:16:03,169 epoch 3 - iter 90/304 - loss 0.23089277 - time (sec): 33.75 - samples/sec: 263.73 - lr: 0.000137 - momentum: 0.000000 2023-10-06 14:16:14,285 epoch 3 - iter 120/304 - loss 0.23250409 - time (sec): 44.87 - samples/sec: 262.94 - lr: 0.000135 - momentum: 0.000000 2023-10-06 14:16:26,187 epoch 3 - iter 150/304 - loss 0.22008502 - time (sec): 56.77 - samples/sec: 266.63 - lr: 0.000134 - momentum: 0.000000 2023-10-06 14:16:37,381 epoch 3 - iter 180/304 - loss 0.21586374 - time (sec): 67.97 - samples/sec: 265.25 - lr: 0.000132 - momentum: 0.000000 2023-10-06 14:16:49,144 epoch 3 - iter 210/304 - loss 0.20951350 - time (sec): 79.73 - samples/sec: 267.41 - lr: 0.000130 - momentum: 0.000000 2023-10-06 14:17:00,811 epoch 3 - iter 240/304 - loss 0.20749487 - time (sec): 91.40 - samples/sec: 267.39 - lr: 0.000128 - momentum: 0.000000 2023-10-06 14:17:12,634 epoch 3 - iter 270/304 - loss 0.20252725 - time (sec): 103.22 - samples/sec: 267.93 - lr: 0.000127 - momentum: 0.000000 2023-10-06 14:17:23,905 epoch 3 - iter 300/304 - loss 0.19933893 - time (sec): 114.49 - samples/sec: 266.80 - lr: 0.000125 - momentum: 0.000000 2023-10-06 14:17:25,518 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:17:25,518 EPOCH 3 done: loss 0.1990 - lr: 0.000125 2023-10-06 14:17:33,399 DEV : loss 0.18546858429908752 - f1-score (micro avg) 0.6987 2023-10-06 14:17:33,407 saving best model 2023-10-06 14:17:37,766 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:17:49,248 epoch 4 - iter 30/304 - loss 0.13091282 - time (sec): 11.48 - samples/sec: 256.89 - lr: 0.000123 - momentum: 0.000000 2023-10-06 14:18:01,293 epoch 4 - iter 60/304 - loss 0.12913109 - time (sec): 23.53 - samples/sec: 261.17 - lr: 0.000121 - momentum: 0.000000 2023-10-06 14:18:12,651 epoch 4 - iter 90/304 - loss 0.12301054 - time (sec): 34.88 - samples/sec: 256.29 - lr: 0.000119 - momentum: 0.000000 2023-10-06 14:18:24,503 epoch 4 - iter 120/304 - loss 0.11594677 - time (sec): 46.73 - samples/sec: 254.76 - lr: 0.000118 - momentum: 0.000000 2023-10-06 14:18:36,414 epoch 4 - iter 150/304 - loss 0.11223834 - time (sec): 58.65 - samples/sec: 253.88 - lr: 0.000116 - momentum: 0.000000 2023-10-06 14:18:48,710 epoch 4 - iter 180/304 - loss 0.11360638 - time (sec): 70.94 - samples/sec: 254.05 - lr: 0.000114 - momentum: 0.000000 2023-10-06 14:19:01,112 epoch 4 - iter 210/304 - loss 0.11598109 - time (sec): 83.34 - samples/sec: 255.32 - lr: 0.000112 - momentum: 0.000000 2023-10-06 14:19:13,227 epoch 4 - iter 240/304 - loss 0.11480521 - time (sec): 95.46 - samples/sec: 255.52 - lr: 0.000111 - momentum: 0.000000 2023-10-06 14:19:25,625 epoch 4 - iter 270/304 - loss 0.11435502 - time (sec): 107.86 - samples/sec: 256.28 - lr: 0.000109 - momentum: 0.000000 2023-10-06 14:19:37,641 epoch 4 - iter 300/304 - loss 0.11002677 - time (sec): 119.87 - samples/sec: 256.46 - lr: 0.000107 - momentum: 0.000000 2023-10-06 14:19:38,872 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:19:38,873 EPOCH 4 done: loss 0.1096 - lr: 0.000107 2023-10-06 14:19:46,800 DEV : loss 0.1540684998035431 - f1-score (micro avg) 0.8169 2023-10-06 14:19:46,809 saving best model 2023-10-06 14:19:51,173 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:20:02,875 epoch 5 - iter 30/304 - loss 0.06569301 - time (sec): 11.70 - samples/sec: 253.60 - lr: 0.000105 - momentum: 0.000000 2023-10-06 14:20:15,213 epoch 5 - iter 60/304 - loss 0.06515493 - time (sec): 24.04 - samples/sec: 260.50 - lr: 0.000103 - momentum: 0.000000 2023-10-06 14:20:27,606 epoch 5 - iter 90/304 - loss 0.06435294 - time (sec): 36.43 - samples/sec: 257.99 - lr: 0.000102 - momentum: 0.000000 2023-10-06 14:20:39,487 epoch 5 - iter 120/304 - loss 0.06345297 - time (sec): 48.31 - samples/sec: 255.22 - lr: 0.000100 - momentum: 0.000000 2023-10-06 14:20:51,426 epoch 5 - iter 150/304 - loss 0.06293608 - time (sec): 60.25 - samples/sec: 254.97 - lr: 0.000098 - momentum: 0.000000 2023-10-06 14:21:04,316 epoch 5 - iter 180/304 - loss 0.06304714 - time (sec): 73.14 - samples/sec: 256.54 - lr: 0.000096 - momentum: 0.000000 2023-10-06 14:21:16,681 epoch 5 - iter 210/304 - loss 0.07005881 - time (sec): 85.51 - samples/sec: 257.33 - lr: 0.000094 - momentum: 0.000000 2023-10-06 14:21:28,657 epoch 5 - iter 240/304 - loss 0.07079656 - time (sec): 97.48 - samples/sec: 255.02 - lr: 0.000093 - momentum: 0.000000 2023-10-06 14:21:40,448 epoch 5 - iter 270/304 - loss 0.06829423 - time (sec): 109.27 - samples/sec: 254.92 - lr: 0.000091 - momentum: 0.000000 2023-10-06 14:21:51,676 epoch 5 - iter 300/304 - loss 0.06782035 - time (sec): 120.50 - samples/sec: 253.91 - lr: 0.000089 - momentum: 0.000000 2023-10-06 14:21:53,203 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:21:53,204 EPOCH 5 done: loss 0.0681 - lr: 0.000089 2023-10-06 14:22:01,230 DEV : loss 0.14909522235393524 - f1-score (micro avg) 0.8284 2023-10-06 14:22:01,238 saving best model 2023-10-06 14:22:05,592 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:22:17,219 epoch 6 - iter 30/304 - loss 0.02579684 - time (sec): 11.63 - samples/sec: 253.40 - lr: 0.000087 - momentum: 0.000000 2023-10-06 14:22:28,885 epoch 6 - iter 60/304 - loss 0.03871116 - time (sec): 23.29 - samples/sec: 250.65 - lr: 0.000085 - momentum: 0.000000 2023-10-06 14:22:40,696 epoch 6 - iter 90/304 - loss 0.03597576 - time (sec): 35.10 - samples/sec: 249.02 - lr: 0.000084 - momentum: 0.000000 2023-10-06 14:22:52,987 epoch 6 - iter 120/304 - loss 0.04134854 - time (sec): 47.39 - samples/sec: 251.11 - lr: 0.000082 - momentum: 0.000000 2023-10-06 14:23:05,126 epoch 6 - iter 150/304 - loss 0.03971115 - time (sec): 59.53 - samples/sec: 253.00 - lr: 0.000080 - momentum: 0.000000 2023-10-06 14:23:17,267 epoch 6 - iter 180/304 - loss 0.04314250 - time (sec): 71.67 - samples/sec: 253.75 - lr: 0.000078 - momentum: 0.000000 2023-10-06 14:23:29,160 epoch 6 - iter 210/304 - loss 0.04971771 - time (sec): 83.57 - samples/sec: 254.08 - lr: 0.000077 - momentum: 0.000000 2023-10-06 14:23:41,053 epoch 6 - iter 240/304 - loss 0.05075503 - time (sec): 95.46 - samples/sec: 253.45 - lr: 0.000075 - momentum: 0.000000 2023-10-06 14:23:52,975 epoch 6 - iter 270/304 - loss 0.05062134 - time (sec): 107.38 - samples/sec: 253.36 - lr: 0.000073 - momentum: 0.000000 2023-10-06 14:24:05,473 epoch 6 - iter 300/304 - loss 0.04896354 - time (sec): 119.88 - samples/sec: 254.14 - lr: 0.000071 - momentum: 0.000000 2023-10-06 14:24:07,244 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:24:07,244 EPOCH 6 done: loss 0.0503 - lr: 0.000071 2023-10-06 14:24:15,275 DEV : loss 0.1544189751148224 - f1-score (micro avg) 0.8257 2023-10-06 14:24:15,282 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:24:28,110 epoch 7 - iter 30/304 - loss 0.06052569 - time (sec): 12.83 - samples/sec: 269.91 - lr: 0.000069 - momentum: 0.000000 2023-10-06 14:24:40,058 epoch 7 - iter 60/304 - loss 0.04475238 - time (sec): 24.77 - samples/sec: 259.51 - lr: 0.000068 - momentum: 0.000000 2023-10-06 14:24:51,827 epoch 7 - iter 90/304 - loss 0.04355562 - time (sec): 36.54 - samples/sec: 259.31 - lr: 0.000066 - momentum: 0.000000 2023-10-06 14:25:03,072 epoch 7 - iter 120/304 - loss 0.04393992 - time (sec): 47.79 - samples/sec: 253.95 - lr: 0.000064 - momentum: 0.000000 2023-10-06 14:25:15,495 epoch 7 - iter 150/304 - loss 0.04278608 - time (sec): 60.21 - samples/sec: 254.55 - lr: 0.000062 - momentum: 0.000000 2023-10-06 14:25:27,238 epoch 7 - iter 180/304 - loss 0.03979482 - time (sec): 71.95 - samples/sec: 253.66 - lr: 0.000061 - momentum: 0.000000 2023-10-06 14:25:39,338 epoch 7 - iter 210/304 - loss 0.03687164 - time (sec): 84.05 - samples/sec: 253.43 - lr: 0.000059 - momentum: 0.000000 2023-10-06 14:25:51,070 epoch 7 - iter 240/304 - loss 0.03885836 - time (sec): 95.79 - samples/sec: 253.28 - lr: 0.000057 - momentum: 0.000000 2023-10-06 14:26:03,398 epoch 7 - iter 270/304 - loss 0.03622058 - time (sec): 108.11 - samples/sec: 254.27 - lr: 0.000055 - momentum: 0.000000 2023-10-06 14:26:15,655 epoch 7 - iter 300/304 - loss 0.03868131 - time (sec): 120.37 - samples/sec: 254.71 - lr: 0.000054 - momentum: 0.000000 2023-10-06 14:26:17,064 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:26:17,065 EPOCH 7 done: loss 0.0383 - lr: 0.000054 2023-10-06 14:26:25,110 DEV : loss 0.15451328456401825 - f1-score (micro avg) 0.8367 2023-10-06 14:26:25,118 saving best model 2023-10-06 14:26:29,468 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:26:41,475 epoch 8 - iter 30/304 - loss 0.02165609 - time (sec): 12.01 - samples/sec: 254.30 - lr: 0.000052 - momentum: 0.000000 2023-10-06 14:26:53,772 epoch 8 - iter 60/304 - loss 0.04272985 - time (sec): 24.30 - samples/sec: 256.60 - lr: 0.000050 - momentum: 0.000000 2023-10-06 14:27:05,779 epoch 8 - iter 90/304 - loss 0.03773130 - time (sec): 36.31 - samples/sec: 255.44 - lr: 0.000048 - momentum: 0.000000 2023-10-06 14:27:18,542 epoch 8 - iter 120/304 - loss 0.03506583 - time (sec): 49.07 - samples/sec: 257.66 - lr: 0.000046 - momentum: 0.000000 2023-10-06 14:27:30,602 epoch 8 - iter 150/304 - loss 0.03570404 - time (sec): 61.13 - samples/sec: 256.42 - lr: 0.000045 - momentum: 0.000000 2023-10-06 14:27:41,936 epoch 8 - iter 180/304 - loss 0.03432633 - time (sec): 72.47 - samples/sec: 254.27 - lr: 0.000043 - momentum: 0.000000 2023-10-06 14:27:53,914 epoch 8 - iter 210/304 - loss 0.03452223 - time (sec): 84.44 - samples/sec: 253.22 - lr: 0.000041 - momentum: 0.000000 2023-10-06 14:28:05,627 epoch 8 - iter 240/304 - loss 0.03310915 - time (sec): 96.16 - samples/sec: 252.85 - lr: 0.000039 - momentum: 0.000000 2023-10-06 14:28:17,751 epoch 8 - iter 270/304 - loss 0.03337918 - time (sec): 108.28 - samples/sec: 253.27 - lr: 0.000038 - momentum: 0.000000 2023-10-06 14:28:30,045 epoch 8 - iter 300/304 - loss 0.03190035 - time (sec): 120.58 - samples/sec: 253.70 - lr: 0.000036 - momentum: 0.000000 2023-10-06 14:28:31,609 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:28:31,609 EPOCH 8 done: loss 0.0315 - lr: 0.000036 2023-10-06 14:28:39,657 DEV : loss 0.1636699140071869 - f1-score (micro avg) 0.837 2023-10-06 14:28:39,666 saving best model 2023-10-06 14:28:44,025 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:28:56,685 epoch 9 - iter 30/304 - loss 0.01426336 - time (sec): 12.66 - samples/sec: 262.21 - lr: 0.000034 - momentum: 0.000000 2023-10-06 14:29:09,194 epoch 9 - iter 60/304 - loss 0.01439267 - time (sec): 25.17 - samples/sec: 259.14 - lr: 0.000032 - momentum: 0.000000 2023-10-06 14:29:20,744 epoch 9 - iter 90/304 - loss 0.01983298 - time (sec): 36.72 - samples/sec: 253.67 - lr: 0.000030 - momentum: 0.000000 2023-10-06 14:29:33,082 epoch 9 - iter 120/304 - loss 0.01866103 - time (sec): 49.06 - samples/sec: 254.65 - lr: 0.000029 - momentum: 0.000000 2023-10-06 14:29:45,307 epoch 9 - iter 150/304 - loss 0.02460981 - time (sec): 61.28 - samples/sec: 255.99 - lr: 0.000027 - momentum: 0.000000 2023-10-06 14:29:57,380 epoch 9 - iter 180/304 - loss 0.02524037 - time (sec): 73.35 - samples/sec: 255.68 - lr: 0.000025 - momentum: 0.000000 2023-10-06 14:30:09,567 epoch 9 - iter 210/304 - loss 0.02667408 - time (sec): 85.54 - samples/sec: 256.38 - lr: 0.000023 - momentum: 0.000000 2023-10-06 14:30:20,750 epoch 9 - iter 240/304 - loss 0.02742361 - time (sec): 96.72 - samples/sec: 255.42 - lr: 0.000022 - momentum: 0.000000 2023-10-06 14:30:32,314 epoch 9 - iter 270/304 - loss 0.02776823 - time (sec): 108.29 - samples/sec: 256.06 - lr: 0.000020 - momentum: 0.000000 2023-10-06 14:30:43,325 epoch 9 - iter 300/304 - loss 0.02735437 - time (sec): 119.30 - samples/sec: 256.23 - lr: 0.000018 - momentum: 0.000000 2023-10-06 14:30:44,855 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:30:44,856 EPOCH 9 done: loss 0.0270 - lr: 0.000018 2023-10-06 14:30:52,100 DEV : loss 0.16385288536548615 - f1-score (micro avg) 0.8431 2023-10-06 14:30:52,108 saving best model 2023-10-06 14:30:53,032 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:31:04,431 epoch 10 - iter 30/304 - loss 0.02745366 - time (sec): 11.40 - samples/sec: 262.51 - lr: 0.000016 - momentum: 0.000000 2023-10-06 14:31:15,791 epoch 10 - iter 60/304 - loss 0.02565244 - time (sec): 22.76 - samples/sec: 265.62 - lr: 0.000014 - momentum: 0.000000 2023-10-06 14:31:27,018 epoch 10 - iter 90/304 - loss 0.02200870 - time (sec): 33.98 - samples/sec: 266.83 - lr: 0.000013 - momentum: 0.000000 2023-10-06 14:31:38,453 epoch 10 - iter 120/304 - loss 0.02154119 - time (sec): 45.42 - samples/sec: 264.38 - lr: 0.000011 - momentum: 0.000000 2023-10-06 14:31:49,351 epoch 10 - iter 150/304 - loss 0.01862176 - time (sec): 56.32 - samples/sec: 261.87 - lr: 0.000009 - momentum: 0.000000 2023-10-06 14:32:01,026 epoch 10 - iter 180/304 - loss 0.02338622 - time (sec): 67.99 - samples/sec: 263.48 - lr: 0.000007 - momentum: 0.000000 2023-10-06 14:32:12,148 epoch 10 - iter 210/304 - loss 0.02194682 - time (sec): 79.11 - samples/sec: 264.10 - lr: 0.000006 - momentum: 0.000000 2023-10-06 14:32:23,856 epoch 10 - iter 240/304 - loss 0.02236526 - time (sec): 90.82 - samples/sec: 265.65 - lr: 0.000004 - momentum: 0.000000 2023-10-06 14:32:35,435 epoch 10 - iter 270/304 - loss 0.02102846 - time (sec): 102.40 - samples/sec: 267.80 - lr: 0.000002 - momentum: 0.000000 2023-10-06 14:32:47,064 epoch 10 - iter 300/304 - loss 0.02123298 - time (sec): 114.03 - samples/sec: 268.67 - lr: 0.000000 - momentum: 0.000000 2023-10-06 14:32:48,418 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:32:48,418 EPOCH 10 done: loss 0.0224 - lr: 0.000000 2023-10-06 14:32:55,632 DEV : loss 0.16673216223716736 - f1-score (micro avg) 0.841 2023-10-06 14:32:56,519 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:32:56,520 Loading model from best epoch ... 2023-10-06 14:33:00,143 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-date, B-date, E-date, I-date, S-object, B-object, E-object, I-object 2023-10-06 14:33:06,776 Results: - F-score (micro) 0.8082 - F-score (macro) 0.6229 - Accuracy 0.6843 By class: precision recall f1-score support scope 0.7722 0.8079 0.7896 151 pers 0.7778 0.9479 0.8545 96 work 0.7523 0.8632 0.8039 95 loc 0.6667 0.6667 0.6667 3 date 0.0000 0.0000 0.0000 3 micro avg 0.7674 0.8534 0.8082 348 macro avg 0.5938 0.6571 0.6229 348 weighted avg 0.7607 0.8534 0.8036 348 2023-10-06 14:33:06,776 ----------------------------------------------------------------------------------------------------