2023-10-06 21:01:36,944 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:01:36,946 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-06 21:01:36,946 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:01:36,946 MultiCorpus: 1100 train + 206 dev + 240 test sentences - NER_HIPE_2022 Corpus: 1100 train + 206 dev + 240 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/de/with_doc_seperator 2023-10-06 21:01:36,946 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:01:36,946 Train: 1100 sentences 2023-10-06 21:01:36,946 (train_with_dev=False, train_with_test=False) 2023-10-06 21:01:36,946 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:01:36,946 Training Params: 2023-10-06 21:01:36,946 - learning_rate: "0.00016" 2023-10-06 21:01:36,946 - mini_batch_size: "4" 2023-10-06 21:01:36,946 - max_epochs: "10" 2023-10-06 21:01:36,946 - shuffle: "True" 2023-10-06 21:01:36,946 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:01:36,946 Plugins: 2023-10-06 21:01:36,946 - TensorboardLogger 2023-10-06 21:01:36,946 - LinearScheduler | warmup_fraction: '0.1' 2023-10-06 21:01:36,946 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:01:36,947 Final evaluation on model from best epoch (best-model.pt) 2023-10-06 21:01:36,947 - metric: "('micro avg', 'f1-score')" 2023-10-06 21:01:36,947 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:01:36,947 Computation: 2023-10-06 21:01:36,947 - compute on device: cuda:0 2023-10-06 21:01:36,947 - embedding storage: none 2023-10-06 21:01:36,947 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:01:36,947 Model training base path: "hmbench-ajmc/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-1" 2023-10-06 21:01:36,947 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:01:36,947 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:01:36,947 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-06 21:01:47,402 epoch 1 - iter 27/275 - loss 3.22833207 - time (sec): 10.45 - samples/sec: 211.40 - lr: 0.000015 - momentum: 0.000000 2023-10-06 21:01:58,525 epoch 1 - iter 54/275 - loss 3.21740774 - time (sec): 21.58 - samples/sec: 203.64 - lr: 0.000031 - momentum: 0.000000 2023-10-06 21:02:09,265 epoch 1 - iter 81/275 - loss 3.19662009 - time (sec): 32.32 - samples/sec: 202.99 - lr: 0.000047 - momentum: 0.000000 2023-10-06 21:02:19,769 epoch 1 - iter 108/275 - loss 3.14572703 - time (sec): 42.82 - samples/sec: 202.17 - lr: 0.000062 - momentum: 0.000000 2023-10-06 21:02:31,381 epoch 1 - iter 135/275 - loss 3.04591352 - time (sec): 54.43 - samples/sec: 204.64 - lr: 0.000078 - momentum: 0.000000 2023-10-06 21:02:42,609 epoch 1 - iter 162/275 - loss 2.93792612 - time (sec): 65.66 - samples/sec: 205.18 - lr: 0.000094 - momentum: 0.000000 2023-10-06 21:02:53,822 epoch 1 - iter 189/275 - loss 2.82739079 - time (sec): 76.87 - samples/sec: 205.84 - lr: 0.000109 - momentum: 0.000000 2023-10-06 21:03:04,383 epoch 1 - iter 216/275 - loss 2.71282571 - time (sec): 87.44 - samples/sec: 206.16 - lr: 0.000125 - momentum: 0.000000 2023-10-06 21:03:15,131 epoch 1 - iter 243/275 - loss 2.59134021 - time (sec): 98.18 - samples/sec: 206.34 - lr: 0.000141 - momentum: 0.000000 2023-10-06 21:03:25,168 epoch 1 - iter 270/275 - loss 2.48706332 - time (sec): 108.22 - samples/sec: 205.82 - lr: 0.000157 - momentum: 0.000000 2023-10-06 21:03:27,515 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:03:27,515 EPOCH 1 done: loss 2.4613 - lr: 0.000157 2023-10-06 21:03:34,060 DEV : loss 1.1231565475463867 - f1-score (micro avg) 0.0 2023-10-06 21:03:34,066 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:03:44,652 epoch 2 - iter 27/275 - loss 1.02744112 - time (sec): 10.59 - samples/sec: 207.93 - lr: 0.000158 - momentum: 0.000000 2023-10-06 21:03:55,710 epoch 2 - iter 54/275 - loss 0.91058745 - time (sec): 21.64 - samples/sec: 206.49 - lr: 0.000157 - momentum: 0.000000 2023-10-06 21:04:06,143 epoch 2 - iter 81/275 - loss 0.91519657 - time (sec): 32.08 - samples/sec: 205.36 - lr: 0.000155 - momentum: 0.000000 2023-10-06 21:04:16,151 epoch 2 - iter 108/275 - loss 0.86737118 - time (sec): 42.08 - samples/sec: 202.45 - lr: 0.000153 - momentum: 0.000000 2023-10-06 21:04:26,497 epoch 2 - iter 135/275 - loss 0.84523607 - time (sec): 52.43 - samples/sec: 201.60 - lr: 0.000151 - momentum: 0.000000 2023-10-06 21:04:37,903 epoch 2 - iter 162/275 - loss 0.80302043 - time (sec): 63.84 - samples/sec: 203.27 - lr: 0.000150 - momentum: 0.000000 2023-10-06 21:04:49,320 epoch 2 - iter 189/275 - loss 0.76259034 - time (sec): 75.25 - samples/sec: 204.92 - lr: 0.000148 - momentum: 0.000000 2023-10-06 21:05:00,257 epoch 2 - iter 216/275 - loss 0.72720466 - time (sec): 86.19 - samples/sec: 205.44 - lr: 0.000146 - momentum: 0.000000 2023-10-06 21:05:11,177 epoch 2 - iter 243/275 - loss 0.69492358 - time (sec): 97.11 - samples/sec: 206.05 - lr: 0.000144 - momentum: 0.000000 2023-10-06 21:05:22,007 epoch 2 - iter 270/275 - loss 0.67160554 - time (sec): 107.94 - samples/sec: 206.52 - lr: 0.000143 - momentum: 0.000000 2023-10-06 21:05:24,008 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:05:24,008 EPOCH 2 done: loss 0.6684 - lr: 0.000143 2023-10-06 21:05:30,641 DEV : loss 0.39727556705474854 - f1-score (micro avg) 0.3715 2023-10-06 21:05:30,646 saving best model 2023-10-06 21:05:31,694 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:05:42,313 epoch 3 - iter 27/275 - loss 0.37832761 - time (sec): 10.62 - samples/sec: 208.72 - lr: 0.000141 - momentum: 0.000000 2023-10-06 21:05:53,709 epoch 3 - iter 54/275 - loss 0.35655984 - time (sec): 22.01 - samples/sec: 212.06 - lr: 0.000139 - momentum: 0.000000 2023-10-06 21:06:04,671 epoch 3 - iter 81/275 - loss 0.35093005 - time (sec): 32.98 - samples/sec: 212.83 - lr: 0.000137 - momentum: 0.000000 2023-10-06 21:06:15,850 epoch 3 - iter 108/275 - loss 0.34679625 - time (sec): 44.15 - samples/sec: 212.59 - lr: 0.000135 - momentum: 0.000000 2023-10-06 21:06:26,372 epoch 3 - iter 135/275 - loss 0.33706333 - time (sec): 54.68 - samples/sec: 211.41 - lr: 0.000134 - momentum: 0.000000 2023-10-06 21:06:37,264 epoch 3 - iter 162/275 - loss 0.33137490 - time (sec): 65.57 - samples/sec: 210.90 - lr: 0.000132 - momentum: 0.000000 2023-10-06 21:06:47,375 epoch 3 - iter 189/275 - loss 0.31978246 - time (sec): 75.68 - samples/sec: 208.92 - lr: 0.000130 - momentum: 0.000000 2023-10-06 21:06:57,740 epoch 3 - iter 216/275 - loss 0.30780358 - time (sec): 86.04 - samples/sec: 207.17 - lr: 0.000128 - momentum: 0.000000 2023-10-06 21:07:09,451 epoch 3 - iter 243/275 - loss 0.29879871 - time (sec): 97.75 - samples/sec: 207.83 - lr: 0.000127 - momentum: 0.000000 2023-10-06 21:07:19,646 epoch 3 - iter 270/275 - loss 0.28697615 - time (sec): 107.95 - samples/sec: 206.64 - lr: 0.000125 - momentum: 0.000000 2023-10-06 21:07:21,888 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:07:21,888 EPOCH 3 done: loss 0.2850 - lr: 0.000125 2023-10-06 21:07:28,577 DEV : loss 0.2045798897743225 - f1-score (micro avg) 0.7546 2023-10-06 21:07:28,583 saving best model 2023-10-06 21:07:32,937 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:07:44,862 epoch 4 - iter 27/275 - loss 0.16488031 - time (sec): 11.92 - samples/sec: 218.64 - lr: 0.000123 - momentum: 0.000000 2023-10-06 21:07:55,513 epoch 4 - iter 54/275 - loss 0.14918728 - time (sec): 22.58 - samples/sec: 211.07 - lr: 0.000121 - momentum: 0.000000 2023-10-06 21:08:05,888 epoch 4 - iter 81/275 - loss 0.15702343 - time (sec): 32.95 - samples/sec: 205.64 - lr: 0.000119 - momentum: 0.000000 2023-10-06 21:08:16,549 epoch 4 - iter 108/275 - loss 0.16021996 - time (sec): 43.61 - samples/sec: 206.87 - lr: 0.000118 - momentum: 0.000000 2023-10-06 21:08:27,748 epoch 4 - iter 135/275 - loss 0.16143446 - time (sec): 54.81 - samples/sec: 208.91 - lr: 0.000116 - momentum: 0.000000 2023-10-06 21:08:38,232 epoch 4 - iter 162/275 - loss 0.15574263 - time (sec): 65.29 - samples/sec: 207.57 - lr: 0.000114 - momentum: 0.000000 2023-10-06 21:08:49,169 epoch 4 - iter 189/275 - loss 0.15197121 - time (sec): 76.23 - samples/sec: 207.38 - lr: 0.000112 - momentum: 0.000000 2023-10-06 21:08:59,538 epoch 4 - iter 216/275 - loss 0.15516941 - time (sec): 86.60 - samples/sec: 207.22 - lr: 0.000111 - momentum: 0.000000 2023-10-06 21:09:10,047 epoch 4 - iter 243/275 - loss 0.15034681 - time (sec): 97.11 - samples/sec: 207.33 - lr: 0.000109 - momentum: 0.000000 2023-10-06 21:09:20,913 epoch 4 - iter 270/275 - loss 0.14712355 - time (sec): 107.98 - samples/sec: 206.53 - lr: 0.000107 - momentum: 0.000000 2023-10-06 21:09:23,020 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:09:23,020 EPOCH 4 done: loss 0.1461 - lr: 0.000107 2023-10-06 21:09:29,701 DEV : loss 0.14199717342853546 - f1-score (micro avg) 0.8412 2023-10-06 21:09:29,706 saving best model 2023-10-06 21:09:34,058 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:09:44,862 epoch 5 - iter 27/275 - loss 0.11122000 - time (sec): 10.80 - samples/sec: 202.36 - lr: 0.000105 - momentum: 0.000000 2023-10-06 21:09:55,555 epoch 5 - iter 54/275 - loss 0.09992236 - time (sec): 21.50 - samples/sec: 203.11 - lr: 0.000103 - momentum: 0.000000 2023-10-06 21:10:05,993 epoch 5 - iter 81/275 - loss 0.10665216 - time (sec): 31.93 - samples/sec: 204.89 - lr: 0.000102 - momentum: 0.000000 2023-10-06 21:10:17,851 epoch 5 - iter 108/275 - loss 0.09964466 - time (sec): 43.79 - samples/sec: 208.40 - lr: 0.000100 - momentum: 0.000000 2023-10-06 21:10:29,819 epoch 5 - iter 135/275 - loss 0.09494942 - time (sec): 55.76 - samples/sec: 209.35 - lr: 0.000098 - momentum: 0.000000 2023-10-06 21:10:40,729 epoch 5 - iter 162/275 - loss 0.09497234 - time (sec): 66.67 - samples/sec: 207.68 - lr: 0.000096 - momentum: 0.000000 2023-10-06 21:10:51,626 epoch 5 - iter 189/275 - loss 0.09115680 - time (sec): 77.57 - samples/sec: 207.60 - lr: 0.000095 - momentum: 0.000000 2023-10-06 21:11:02,172 epoch 5 - iter 216/275 - loss 0.09154943 - time (sec): 88.11 - samples/sec: 208.87 - lr: 0.000093 - momentum: 0.000000 2023-10-06 21:11:12,251 epoch 5 - iter 243/275 - loss 0.09193531 - time (sec): 98.19 - samples/sec: 207.31 - lr: 0.000091 - momentum: 0.000000 2023-10-06 21:11:22,446 epoch 5 - iter 270/275 - loss 0.08842455 - time (sec): 108.39 - samples/sec: 206.77 - lr: 0.000089 - momentum: 0.000000 2023-10-06 21:11:24,284 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:11:24,284 EPOCH 5 done: loss 0.0891 - lr: 0.000089 2023-10-06 21:11:30,964 DEV : loss 0.12431611120700836 - f1-score (micro avg) 0.8558 2023-10-06 21:11:30,970 saving best model 2023-10-06 21:11:35,373 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:11:46,300 epoch 6 - iter 27/275 - loss 0.07057993 - time (sec): 10.93 - samples/sec: 214.82 - lr: 0.000087 - momentum: 0.000000 2023-10-06 21:11:56,693 epoch 6 - iter 54/275 - loss 0.06558269 - time (sec): 21.32 - samples/sec: 210.43 - lr: 0.000086 - momentum: 0.000000 2023-10-06 21:12:07,445 epoch 6 - iter 81/275 - loss 0.06122069 - time (sec): 32.07 - samples/sec: 209.82 - lr: 0.000084 - momentum: 0.000000 2023-10-06 21:12:18,621 epoch 6 - iter 108/275 - loss 0.05940627 - time (sec): 43.25 - samples/sec: 208.48 - lr: 0.000082 - momentum: 0.000000 2023-10-06 21:12:29,634 epoch 6 - iter 135/275 - loss 0.05905171 - time (sec): 54.26 - samples/sec: 207.76 - lr: 0.000080 - momentum: 0.000000 2023-10-06 21:12:40,370 epoch 6 - iter 162/275 - loss 0.05632511 - time (sec): 65.00 - samples/sec: 207.26 - lr: 0.000079 - momentum: 0.000000 2023-10-06 21:12:51,563 epoch 6 - iter 189/275 - loss 0.06501524 - time (sec): 76.19 - samples/sec: 209.02 - lr: 0.000077 - momentum: 0.000000 2023-10-06 21:13:02,225 epoch 6 - iter 216/275 - loss 0.06236298 - time (sec): 86.85 - samples/sec: 208.33 - lr: 0.000075 - momentum: 0.000000 2023-10-06 21:13:13,005 epoch 6 - iter 243/275 - loss 0.06493312 - time (sec): 97.63 - samples/sec: 207.90 - lr: 0.000073 - momentum: 0.000000 2023-10-06 21:13:23,126 epoch 6 - iter 270/275 - loss 0.06505844 - time (sec): 107.75 - samples/sec: 207.01 - lr: 0.000072 - momentum: 0.000000 2023-10-06 21:13:25,396 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:13:25,396 EPOCH 6 done: loss 0.0648 - lr: 0.000072 2023-10-06 21:13:32,058 DEV : loss 0.12224514782428741 - f1-score (micro avg) 0.8709 2023-10-06 21:13:32,064 saving best model 2023-10-06 21:13:36,415 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:13:46,779 epoch 7 - iter 27/275 - loss 0.04187397 - time (sec): 10.36 - samples/sec: 197.64 - lr: 0.000070 - momentum: 0.000000 2023-10-06 21:13:57,052 epoch 7 - iter 54/275 - loss 0.04070792 - time (sec): 20.63 - samples/sec: 198.02 - lr: 0.000068 - momentum: 0.000000 2023-10-06 21:14:08,314 epoch 7 - iter 81/275 - loss 0.04907858 - time (sec): 31.90 - samples/sec: 202.68 - lr: 0.000066 - momentum: 0.000000 2023-10-06 21:14:19,438 epoch 7 - iter 108/275 - loss 0.03973019 - time (sec): 43.02 - samples/sec: 207.16 - lr: 0.000064 - momentum: 0.000000 2023-10-06 21:14:30,716 epoch 7 - iter 135/275 - loss 0.04264876 - time (sec): 54.30 - samples/sec: 207.55 - lr: 0.000063 - momentum: 0.000000 2023-10-06 21:14:41,116 epoch 7 - iter 162/275 - loss 0.04839100 - time (sec): 64.70 - samples/sec: 205.43 - lr: 0.000061 - momentum: 0.000000 2023-10-06 21:14:52,120 epoch 7 - iter 189/275 - loss 0.04500606 - time (sec): 75.70 - samples/sec: 205.72 - lr: 0.000059 - momentum: 0.000000 2023-10-06 21:15:03,203 epoch 7 - iter 216/275 - loss 0.04635685 - time (sec): 86.79 - samples/sec: 206.21 - lr: 0.000058 - momentum: 0.000000 2023-10-06 21:15:14,451 epoch 7 - iter 243/275 - loss 0.04828398 - time (sec): 98.03 - samples/sec: 207.62 - lr: 0.000056 - momentum: 0.000000 2023-10-06 21:15:25,007 epoch 7 - iter 270/275 - loss 0.05190075 - time (sec): 108.59 - samples/sec: 206.93 - lr: 0.000054 - momentum: 0.000000 2023-10-06 21:15:26,743 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:15:26,743 EPOCH 7 done: loss 0.0518 - lr: 0.000054 2023-10-06 21:15:33,413 DEV : loss 0.12933534383773804 - f1-score (micro avg) 0.8627 2023-10-06 21:15:33,419 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:15:43,432 epoch 8 - iter 27/275 - loss 0.04380187 - time (sec): 10.01 - samples/sec: 193.98 - lr: 0.000052 - momentum: 0.000000 2023-10-06 21:15:53,820 epoch 8 - iter 54/275 - loss 0.06057894 - time (sec): 20.40 - samples/sec: 203.09 - lr: 0.000050 - momentum: 0.000000 2023-10-06 21:16:04,746 epoch 8 - iter 81/275 - loss 0.05231526 - time (sec): 31.33 - samples/sec: 206.54 - lr: 0.000048 - momentum: 0.000000 2023-10-06 21:16:15,647 epoch 8 - iter 108/275 - loss 0.04889467 - time (sec): 42.23 - samples/sec: 207.48 - lr: 0.000047 - momentum: 0.000000 2023-10-06 21:16:25,985 epoch 8 - iter 135/275 - loss 0.04684457 - time (sec): 52.56 - samples/sec: 204.32 - lr: 0.000045 - momentum: 0.000000 2023-10-06 21:16:37,470 epoch 8 - iter 162/275 - loss 0.04855515 - time (sec): 64.05 - samples/sec: 206.68 - lr: 0.000043 - momentum: 0.000000 2023-10-06 21:16:47,560 epoch 8 - iter 189/275 - loss 0.04673321 - time (sec): 74.14 - samples/sec: 206.10 - lr: 0.000042 - momentum: 0.000000 2023-10-06 21:16:58,987 epoch 8 - iter 216/275 - loss 0.04384603 - time (sec): 85.57 - samples/sec: 206.97 - lr: 0.000040 - momentum: 0.000000 2023-10-06 21:17:10,417 epoch 8 - iter 243/275 - loss 0.04459432 - time (sec): 97.00 - samples/sec: 207.64 - lr: 0.000038 - momentum: 0.000000 2023-10-06 21:17:21,170 epoch 8 - iter 270/275 - loss 0.04257561 - time (sec): 107.75 - samples/sec: 207.30 - lr: 0.000036 - momentum: 0.000000 2023-10-06 21:17:23,241 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:17:23,241 EPOCH 8 done: loss 0.0418 - lr: 0.000036 2023-10-06 21:17:29,906 DEV : loss 0.13218103349208832 - f1-score (micro avg) 0.8892 2023-10-06 21:17:29,912 saving best model 2023-10-06 21:17:34,235 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:17:44,834 epoch 9 - iter 27/275 - loss 0.02652253 - time (sec): 10.60 - samples/sec: 202.68 - lr: 0.000034 - momentum: 0.000000 2023-10-06 21:17:55,396 epoch 9 - iter 54/275 - loss 0.04854387 - time (sec): 21.16 - samples/sec: 204.49 - lr: 0.000032 - momentum: 0.000000 2023-10-06 21:18:06,066 epoch 9 - iter 81/275 - loss 0.05078626 - time (sec): 31.83 - samples/sec: 204.74 - lr: 0.000031 - momentum: 0.000000 2023-10-06 21:18:17,352 epoch 9 - iter 108/275 - loss 0.04987359 - time (sec): 43.12 - samples/sec: 205.19 - lr: 0.000029 - momentum: 0.000000 2023-10-06 21:18:28,170 epoch 9 - iter 135/275 - loss 0.05203896 - time (sec): 53.93 - samples/sec: 205.62 - lr: 0.000027 - momentum: 0.000000 2023-10-06 21:18:38,499 epoch 9 - iter 162/275 - loss 0.05289938 - time (sec): 64.26 - samples/sec: 206.11 - lr: 0.000026 - momentum: 0.000000 2023-10-06 21:18:50,354 epoch 9 - iter 189/275 - loss 0.04529210 - time (sec): 76.12 - samples/sec: 207.59 - lr: 0.000024 - momentum: 0.000000 2023-10-06 21:19:01,010 epoch 9 - iter 216/275 - loss 0.04172254 - time (sec): 86.77 - samples/sec: 207.63 - lr: 0.000022 - momentum: 0.000000 2023-10-06 21:19:11,223 epoch 9 - iter 243/275 - loss 0.03840025 - time (sec): 96.99 - samples/sec: 206.42 - lr: 0.000020 - momentum: 0.000000 2023-10-06 21:19:22,087 epoch 9 - iter 270/275 - loss 0.03840849 - time (sec): 107.85 - samples/sec: 207.09 - lr: 0.000019 - momentum: 0.000000 2023-10-06 21:19:24,089 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:19:24,089 EPOCH 9 done: loss 0.0379 - lr: 0.000019 2023-10-06 21:19:30,702 DEV : loss 0.1376478224992752 - f1-score (micro avg) 0.8664 2023-10-06 21:19:30,708 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:19:41,630 epoch 10 - iter 27/275 - loss 0.05058081 - time (sec): 10.92 - samples/sec: 208.06 - lr: 0.000017 - momentum: 0.000000 2023-10-06 21:19:52,250 epoch 10 - iter 54/275 - loss 0.04581674 - time (sec): 21.54 - samples/sec: 205.80 - lr: 0.000015 - momentum: 0.000000 2023-10-06 21:20:02,820 epoch 10 - iter 81/275 - loss 0.03678298 - time (sec): 32.11 - samples/sec: 206.76 - lr: 0.000013 - momentum: 0.000000 2023-10-06 21:20:13,857 epoch 10 - iter 108/275 - loss 0.03737114 - time (sec): 43.15 - samples/sec: 207.80 - lr: 0.000011 - momentum: 0.000000 2023-10-06 21:20:24,327 epoch 10 - iter 135/275 - loss 0.03215051 - time (sec): 53.62 - samples/sec: 205.81 - lr: 0.000010 - momentum: 0.000000 2023-10-06 21:20:34,938 epoch 10 - iter 162/275 - loss 0.03481797 - time (sec): 64.23 - samples/sec: 205.31 - lr: 0.000008 - momentum: 0.000000 2023-10-06 21:20:45,933 epoch 10 - iter 189/275 - loss 0.03404409 - time (sec): 75.22 - samples/sec: 205.40 - lr: 0.000006 - momentum: 0.000000 2023-10-06 21:20:56,901 epoch 10 - iter 216/275 - loss 0.03300710 - time (sec): 86.19 - samples/sec: 206.14 - lr: 0.000004 - momentum: 0.000000 2023-10-06 21:21:07,747 epoch 10 - iter 243/275 - loss 0.03216155 - time (sec): 97.04 - samples/sec: 206.83 - lr: 0.000003 - momentum: 0.000000 2023-10-06 21:21:18,408 epoch 10 - iter 270/275 - loss 0.03352601 - time (sec): 107.70 - samples/sec: 207.18 - lr: 0.000001 - momentum: 0.000000 2023-10-06 21:21:20,592 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:21:20,592 EPOCH 10 done: loss 0.0334 - lr: 0.000001 2023-10-06 21:21:27,269 DEV : loss 0.1404709368944168 - f1-score (micro avg) 0.8685 2023-10-06 21:21:28,181 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:21:28,182 Loading model from best epoch ... 2023-10-06 21:21:31,522 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-object, B-object, E-object, I-object, S-date, B-date, E-date, I-date 2023-10-06 21:21:38,605 Results: - F-score (micro) 0.8845 - F-score (macro) 0.5336 - Accuracy 0.8101 By class: precision recall f1-score support scope 0.8857 0.8807 0.8832 176 pers 0.9370 0.9297 0.9333 128 work 0.8514 0.8514 0.8514 74 object 0.0000 0.0000 0.0000 2 loc 0.0000 0.0000 0.0000 2 micro avg 0.8868 0.8822 0.8845 382 macro avg 0.5348 0.5323 0.5336 382 weighted avg 0.8870 0.8822 0.8846 382 2023-10-06 21:21:38,605 ----------------------------------------------------------------------------------------------------