2023-10-06 21:59:38,905 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:59:38,906 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-06 21:59:38,906 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:59:38,906 MultiCorpus: 1100 train + 206 dev + 240 test sentences - NER_HIPE_2022 Corpus: 1100 train + 206 dev + 240 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/de/with_doc_seperator 2023-10-06 21:59:38,906 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:59:38,907 Train: 1100 sentences 2023-10-06 21:59:38,907 (train_with_dev=False, train_with_test=False) 2023-10-06 21:59:38,907 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:59:38,907 Training Params: 2023-10-06 21:59:38,907 - learning_rate: "0.00015" 2023-10-06 21:59:38,907 - mini_batch_size: "4" 2023-10-06 21:59:38,907 - max_epochs: "10" 2023-10-06 21:59:38,907 - shuffle: "True" 2023-10-06 21:59:38,907 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:59:38,907 Plugins: 2023-10-06 21:59:38,907 - TensorboardLogger 2023-10-06 21:59:38,907 - LinearScheduler | warmup_fraction: '0.1' 2023-10-06 21:59:38,907 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:59:38,907 Final evaluation on model from best epoch (best-model.pt) 2023-10-06 21:59:38,907 - metric: "('micro avg', 'f1-score')" 2023-10-06 21:59:38,907 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:59:38,908 Computation: 2023-10-06 21:59:38,908 - compute on device: cuda:0 2023-10-06 21:59:38,908 - embedding storage: none 2023-10-06 21:59:38,908 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:59:38,908 Model training base path: "hmbench-ajmc/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2" 2023-10-06 21:59:38,908 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:59:38,908 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:59:38,908 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-06 21:59:49,614 epoch 1 - iter 27/275 - loss 3.23065152 - time (sec): 10.70 - samples/sec: 201.69 - lr: 0.000014 - momentum: 0.000000 2023-10-06 22:00:00,769 epoch 1 - iter 54/275 - loss 3.22127954 - time (sec): 21.86 - samples/sec: 205.31 - lr: 0.000029 - momentum: 0.000000 2023-10-06 22:00:12,135 epoch 1 - iter 81/275 - loss 3.20520988 - time (sec): 33.23 - samples/sec: 207.64 - lr: 0.000044 - momentum: 0.000000 2023-10-06 22:00:22,481 epoch 1 - iter 108/275 - loss 3.17072676 - time (sec): 43.57 - samples/sec: 203.64 - lr: 0.000058 - momentum: 0.000000 2023-10-06 22:00:33,674 epoch 1 - iter 135/275 - loss 3.08892752 - time (sec): 54.76 - samples/sec: 204.40 - lr: 0.000073 - momentum: 0.000000 2023-10-06 22:00:43,604 epoch 1 - iter 162/275 - loss 3.00329462 - time (sec): 64.69 - samples/sec: 202.55 - lr: 0.000088 - momentum: 0.000000 2023-10-06 22:00:54,851 epoch 1 - iter 189/275 - loss 2.89206526 - time (sec): 75.94 - samples/sec: 203.87 - lr: 0.000103 - momentum: 0.000000 2023-10-06 22:01:05,477 epoch 1 - iter 216/275 - loss 2.78247290 - time (sec): 86.57 - samples/sec: 204.08 - lr: 0.000117 - momentum: 0.000000 2023-10-06 22:01:16,951 epoch 1 - iter 243/275 - loss 2.64749133 - time (sec): 98.04 - samples/sec: 205.25 - lr: 0.000132 - momentum: 0.000000 2023-10-06 22:01:27,547 epoch 1 - iter 270/275 - loss 2.52992171 - time (sec): 108.64 - samples/sec: 205.44 - lr: 0.000147 - momentum: 0.000000 2023-10-06 22:01:29,709 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:01:29,709 EPOCH 1 done: loss 2.5047 - lr: 0.000147 2023-10-06 22:01:36,239 DEV : loss 1.192710518836975 - f1-score (micro avg) 0.0 2023-10-06 22:01:36,246 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:01:46,781 epoch 2 - iter 27/275 - loss 1.12953816 - time (sec): 10.53 - samples/sec: 208.58 - lr: 0.000148 - momentum: 0.000000 2023-10-06 22:01:57,711 epoch 2 - iter 54/275 - loss 1.05509480 - time (sec): 21.46 - samples/sec: 209.11 - lr: 0.000147 - momentum: 0.000000 2023-10-06 22:02:07,907 epoch 2 - iter 81/275 - loss 1.00083671 - time (sec): 31.66 - samples/sec: 205.66 - lr: 0.000145 - momentum: 0.000000 2023-10-06 22:02:18,718 epoch 2 - iter 108/275 - loss 0.97558491 - time (sec): 42.47 - samples/sec: 206.43 - lr: 0.000144 - momentum: 0.000000 2023-10-06 22:02:29,877 epoch 2 - iter 135/275 - loss 0.91847143 - time (sec): 53.63 - samples/sec: 207.69 - lr: 0.000142 - momentum: 0.000000 2023-10-06 22:02:40,795 epoch 2 - iter 162/275 - loss 0.87536494 - time (sec): 64.55 - samples/sec: 206.47 - lr: 0.000140 - momentum: 0.000000 2023-10-06 22:02:51,149 epoch 2 - iter 189/275 - loss 0.83516789 - time (sec): 74.90 - samples/sec: 204.72 - lr: 0.000139 - momentum: 0.000000 2023-10-06 22:03:02,317 epoch 2 - iter 216/275 - loss 0.78444891 - time (sec): 86.07 - samples/sec: 204.59 - lr: 0.000137 - momentum: 0.000000 2023-10-06 22:03:13,447 epoch 2 - iter 243/275 - loss 0.74993418 - time (sec): 97.20 - samples/sec: 205.59 - lr: 0.000135 - momentum: 0.000000 2023-10-06 22:03:24,133 epoch 2 - iter 270/275 - loss 0.72456254 - time (sec): 107.88 - samples/sec: 206.42 - lr: 0.000134 - momentum: 0.000000 2023-10-06 22:03:26,332 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:03:26,332 EPOCH 2 done: loss 0.7185 - lr: 0.000134 2023-10-06 22:03:33,148 DEV : loss 0.44027742743492126 - f1-score (micro avg) 0.4034 2023-10-06 22:03:33,153 saving best model 2023-10-06 22:03:34,026 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:03:44,280 epoch 3 - iter 27/275 - loss 0.41458116 - time (sec): 10.25 - samples/sec: 208.73 - lr: 0.000132 - momentum: 0.000000 2023-10-06 22:03:55,709 epoch 3 - iter 54/275 - loss 0.39220872 - time (sec): 21.68 - samples/sec: 211.52 - lr: 0.000130 - momentum: 0.000000 2023-10-06 22:04:06,966 epoch 3 - iter 81/275 - loss 0.38979042 - time (sec): 32.94 - samples/sec: 210.45 - lr: 0.000129 - momentum: 0.000000 2023-10-06 22:04:17,390 epoch 3 - iter 108/275 - loss 0.37175723 - time (sec): 43.36 - samples/sec: 207.23 - lr: 0.000127 - momentum: 0.000000 2023-10-06 22:04:27,712 epoch 3 - iter 135/275 - loss 0.35146327 - time (sec): 53.68 - samples/sec: 205.14 - lr: 0.000125 - momentum: 0.000000 2023-10-06 22:04:38,992 epoch 3 - iter 162/275 - loss 0.34842281 - time (sec): 64.96 - samples/sec: 207.01 - lr: 0.000124 - momentum: 0.000000 2023-10-06 22:04:49,923 epoch 3 - iter 189/275 - loss 0.34323045 - time (sec): 75.90 - samples/sec: 207.43 - lr: 0.000122 - momentum: 0.000000 2023-10-06 22:05:00,437 epoch 3 - iter 216/275 - loss 0.33193939 - time (sec): 86.41 - samples/sec: 206.10 - lr: 0.000120 - momentum: 0.000000 2023-10-06 22:05:11,908 epoch 3 - iter 243/275 - loss 0.31883371 - time (sec): 97.88 - samples/sec: 207.14 - lr: 0.000119 - momentum: 0.000000 2023-10-06 22:05:22,296 epoch 3 - iter 270/275 - loss 0.31479827 - time (sec): 108.27 - samples/sec: 207.20 - lr: 0.000117 - momentum: 0.000000 2023-10-06 22:05:24,069 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:05:24,069 EPOCH 3 done: loss 0.3129 - lr: 0.000117 2023-10-06 22:05:30,685 DEV : loss 0.22887194156646729 - f1-score (micro avg) 0.752 2023-10-06 22:05:30,691 saving best model 2023-10-06 22:05:31,644 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:05:42,734 epoch 4 - iter 27/275 - loss 0.21864021 - time (sec): 11.09 - samples/sec: 213.74 - lr: 0.000115 - momentum: 0.000000 2023-10-06 22:05:53,503 epoch 4 - iter 54/275 - loss 0.20048883 - time (sec): 21.86 - samples/sec: 212.29 - lr: 0.000114 - momentum: 0.000000 2023-10-06 22:06:04,269 epoch 4 - iter 81/275 - loss 0.20798152 - time (sec): 32.62 - samples/sec: 211.69 - lr: 0.000112 - momentum: 0.000000 2023-10-06 22:06:15,238 epoch 4 - iter 108/275 - loss 0.19814588 - time (sec): 43.59 - samples/sec: 212.29 - lr: 0.000110 - momentum: 0.000000 2023-10-06 22:06:25,542 epoch 4 - iter 135/275 - loss 0.19570901 - time (sec): 53.90 - samples/sec: 208.68 - lr: 0.000109 - momentum: 0.000000 2023-10-06 22:06:36,266 epoch 4 - iter 162/275 - loss 0.19285571 - time (sec): 64.62 - samples/sec: 207.77 - lr: 0.000107 - momentum: 0.000000 2023-10-06 22:06:47,345 epoch 4 - iter 189/275 - loss 0.18445052 - time (sec): 75.70 - samples/sec: 207.18 - lr: 0.000105 - momentum: 0.000000 2023-10-06 22:06:58,120 epoch 4 - iter 216/275 - loss 0.17837144 - time (sec): 86.47 - samples/sec: 207.16 - lr: 0.000104 - momentum: 0.000000 2023-10-06 22:07:08,629 epoch 4 - iter 243/275 - loss 0.17707333 - time (sec): 96.98 - samples/sec: 207.13 - lr: 0.000102 - momentum: 0.000000 2023-10-06 22:07:19,850 epoch 4 - iter 270/275 - loss 0.17394904 - time (sec): 108.20 - samples/sec: 206.79 - lr: 0.000101 - momentum: 0.000000 2023-10-06 22:07:21,801 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:07:21,801 EPOCH 4 done: loss 0.1721 - lr: 0.000101 2023-10-06 22:07:28,443 DEV : loss 0.14735616743564606 - f1-score (micro avg) 0.8069 2023-10-06 22:07:28,449 saving best model 2023-10-06 22:07:29,386 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:07:40,690 epoch 5 - iter 27/275 - loss 0.14355788 - time (sec): 11.30 - samples/sec: 223.23 - lr: 0.000099 - momentum: 0.000000 2023-10-06 22:07:51,146 epoch 5 - iter 54/275 - loss 0.12894441 - time (sec): 21.76 - samples/sec: 212.89 - lr: 0.000097 - momentum: 0.000000 2023-10-06 22:08:02,138 epoch 5 - iter 81/275 - loss 0.11945573 - time (sec): 32.75 - samples/sec: 212.06 - lr: 0.000095 - momentum: 0.000000 2023-10-06 22:08:12,785 epoch 5 - iter 108/275 - loss 0.11314833 - time (sec): 43.40 - samples/sec: 211.08 - lr: 0.000094 - momentum: 0.000000 2023-10-06 22:08:22,690 epoch 5 - iter 135/275 - loss 0.10788541 - time (sec): 53.30 - samples/sec: 207.48 - lr: 0.000092 - momentum: 0.000000 2023-10-06 22:08:33,920 epoch 5 - iter 162/275 - loss 0.10084582 - time (sec): 64.53 - samples/sec: 208.47 - lr: 0.000090 - momentum: 0.000000 2023-10-06 22:08:45,321 epoch 5 - iter 189/275 - loss 0.10620767 - time (sec): 75.93 - samples/sec: 209.18 - lr: 0.000089 - momentum: 0.000000 2023-10-06 22:08:55,765 epoch 5 - iter 216/275 - loss 0.10384327 - time (sec): 86.38 - samples/sec: 207.68 - lr: 0.000087 - momentum: 0.000000 2023-10-06 22:09:06,436 epoch 5 - iter 243/275 - loss 0.10439555 - time (sec): 97.05 - samples/sec: 207.05 - lr: 0.000086 - momentum: 0.000000 2023-10-06 22:09:17,187 epoch 5 - iter 270/275 - loss 0.10629713 - time (sec): 107.80 - samples/sec: 207.21 - lr: 0.000084 - momentum: 0.000000 2023-10-06 22:09:19,308 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:09:19,308 EPOCH 5 done: loss 0.1059 - lr: 0.000084 2023-10-06 22:09:25,942 DEV : loss 0.1252857744693756 - f1-score (micro avg) 0.8643 2023-10-06 22:09:25,948 saving best model 2023-10-06 22:09:26,871 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:09:38,139 epoch 6 - iter 27/275 - loss 0.06535205 - time (sec): 11.27 - samples/sec: 204.24 - lr: 0.000082 - momentum: 0.000000 2023-10-06 22:09:49,580 epoch 6 - iter 54/275 - loss 0.06970825 - time (sec): 22.71 - samples/sec: 208.04 - lr: 0.000080 - momentum: 0.000000 2023-10-06 22:10:00,735 epoch 6 - iter 81/275 - loss 0.06597967 - time (sec): 33.86 - samples/sec: 209.79 - lr: 0.000079 - momentum: 0.000000 2023-10-06 22:10:11,768 epoch 6 - iter 108/275 - loss 0.07295507 - time (sec): 44.90 - samples/sec: 209.53 - lr: 0.000077 - momentum: 0.000000 2023-10-06 22:10:22,123 epoch 6 - iter 135/275 - loss 0.08097559 - time (sec): 55.25 - samples/sec: 208.83 - lr: 0.000075 - momentum: 0.000000 2023-10-06 22:10:32,005 epoch 6 - iter 162/275 - loss 0.08137998 - time (sec): 65.13 - samples/sec: 206.77 - lr: 0.000074 - momentum: 0.000000 2023-10-06 22:10:42,862 epoch 6 - iter 189/275 - loss 0.07889055 - time (sec): 75.99 - samples/sec: 206.52 - lr: 0.000072 - momentum: 0.000000 2023-10-06 22:10:53,507 epoch 6 - iter 216/275 - loss 0.08031741 - time (sec): 86.63 - samples/sec: 206.51 - lr: 0.000071 - momentum: 0.000000 2023-10-06 22:11:03,985 epoch 6 - iter 243/275 - loss 0.08136227 - time (sec): 97.11 - samples/sec: 206.48 - lr: 0.000069 - momentum: 0.000000 2023-10-06 22:11:15,145 epoch 6 - iter 270/275 - loss 0.07938297 - time (sec): 108.27 - samples/sec: 206.30 - lr: 0.000067 - momentum: 0.000000 2023-10-06 22:11:17,252 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:11:17,252 EPOCH 6 done: loss 0.0789 - lr: 0.000067 2023-10-06 22:11:23,914 DEV : loss 0.12225247174501419 - f1-score (micro avg) 0.8708 2023-10-06 22:11:23,920 saving best model 2023-10-06 22:11:24,835 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:11:35,987 epoch 7 - iter 27/275 - loss 0.04320475 - time (sec): 11.15 - samples/sec: 217.31 - lr: 0.000065 - momentum: 0.000000 2023-10-06 22:11:46,908 epoch 7 - iter 54/275 - loss 0.05718567 - time (sec): 22.07 - samples/sec: 212.09 - lr: 0.000064 - momentum: 0.000000 2023-10-06 22:11:57,278 epoch 7 - iter 81/275 - loss 0.04812136 - time (sec): 32.44 - samples/sec: 206.80 - lr: 0.000062 - momentum: 0.000000 2023-10-06 22:12:07,796 epoch 7 - iter 108/275 - loss 0.04707596 - time (sec): 42.96 - samples/sec: 205.50 - lr: 0.000060 - momentum: 0.000000 2023-10-06 22:12:18,163 epoch 7 - iter 135/275 - loss 0.04907389 - time (sec): 53.33 - samples/sec: 205.21 - lr: 0.000059 - momentum: 0.000000 2023-10-06 22:12:29,730 epoch 7 - iter 162/275 - loss 0.05223885 - time (sec): 64.89 - samples/sec: 206.26 - lr: 0.000057 - momentum: 0.000000 2023-10-06 22:12:40,149 epoch 7 - iter 189/275 - loss 0.05686615 - time (sec): 75.31 - samples/sec: 204.81 - lr: 0.000056 - momentum: 0.000000 2023-10-06 22:12:50,742 epoch 7 - iter 216/275 - loss 0.05794970 - time (sec): 85.91 - samples/sec: 204.99 - lr: 0.000054 - momentum: 0.000000 2023-10-06 22:13:01,346 epoch 7 - iter 243/275 - loss 0.06521357 - time (sec): 96.51 - samples/sec: 205.33 - lr: 0.000052 - momentum: 0.000000 2023-10-06 22:13:12,795 epoch 7 - iter 270/275 - loss 0.06305060 - time (sec): 107.96 - samples/sec: 206.59 - lr: 0.000051 - momentum: 0.000000 2023-10-06 22:13:14,989 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:13:14,989 EPOCH 7 done: loss 0.0634 - lr: 0.000051 2023-10-06 22:13:21,638 DEV : loss 0.12818391621112823 - f1-score (micro avg) 0.8728 2023-10-06 22:13:21,644 saving best model 2023-10-06 22:13:22,559 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:13:33,282 epoch 8 - iter 27/275 - loss 0.04960943 - time (sec): 10.72 - samples/sec: 201.10 - lr: 0.000049 - momentum: 0.000000 2023-10-06 22:13:43,728 epoch 8 - iter 54/275 - loss 0.05456906 - time (sec): 21.17 - samples/sec: 203.95 - lr: 0.000047 - momentum: 0.000000 2023-10-06 22:13:54,984 epoch 8 - iter 81/275 - loss 0.06442454 - time (sec): 32.42 - samples/sec: 207.26 - lr: 0.000045 - momentum: 0.000000 2023-10-06 22:14:06,261 epoch 8 - iter 108/275 - loss 0.06335364 - time (sec): 43.70 - samples/sec: 208.54 - lr: 0.000044 - momentum: 0.000000 2023-10-06 22:14:17,182 epoch 8 - iter 135/275 - loss 0.05724056 - time (sec): 54.62 - samples/sec: 208.88 - lr: 0.000042 - momentum: 0.000000 2023-10-06 22:14:27,655 epoch 8 - iter 162/275 - loss 0.05568522 - time (sec): 65.09 - samples/sec: 208.21 - lr: 0.000041 - momentum: 0.000000 2023-10-06 22:14:38,214 epoch 8 - iter 189/275 - loss 0.05327032 - time (sec): 75.65 - samples/sec: 207.26 - lr: 0.000039 - momentum: 0.000000 2023-10-06 22:14:49,563 epoch 8 - iter 216/275 - loss 0.05427243 - time (sec): 87.00 - samples/sec: 208.34 - lr: 0.000037 - momentum: 0.000000 2023-10-06 22:14:59,790 epoch 8 - iter 243/275 - loss 0.05083171 - time (sec): 97.23 - samples/sec: 206.37 - lr: 0.000036 - momentum: 0.000000 2023-10-06 22:15:10,707 epoch 8 - iter 270/275 - loss 0.04869488 - time (sec): 108.15 - samples/sec: 206.64 - lr: 0.000034 - momentum: 0.000000 2023-10-06 22:15:12,809 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:15:12,809 EPOCH 8 done: loss 0.0498 - lr: 0.000034 2023-10-06 22:15:19,448 DEV : loss 0.1288895606994629 - f1-score (micro avg) 0.8793 2023-10-06 22:15:19,454 saving best model 2023-10-06 22:15:20,378 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:15:31,033 epoch 9 - iter 27/275 - loss 0.06180422 - time (sec): 10.65 - samples/sec: 216.00 - lr: 0.000032 - momentum: 0.000000 2023-10-06 22:15:42,722 epoch 9 - iter 54/275 - loss 0.04169288 - time (sec): 22.34 - samples/sec: 215.74 - lr: 0.000030 - momentum: 0.000000 2023-10-06 22:15:52,860 epoch 9 - iter 81/275 - loss 0.04467575 - time (sec): 32.48 - samples/sec: 208.29 - lr: 0.000029 - momentum: 0.000000 2023-10-06 22:16:03,473 epoch 9 - iter 108/275 - loss 0.04195775 - time (sec): 43.09 - samples/sec: 206.37 - lr: 0.000027 - momentum: 0.000000 2023-10-06 22:16:14,651 epoch 9 - iter 135/275 - loss 0.03934601 - time (sec): 54.27 - samples/sec: 207.29 - lr: 0.000026 - momentum: 0.000000 2023-10-06 22:16:25,891 epoch 9 - iter 162/275 - loss 0.03557879 - time (sec): 65.51 - samples/sec: 208.77 - lr: 0.000024 - momentum: 0.000000 2023-10-06 22:16:36,270 epoch 9 - iter 189/275 - loss 0.03314889 - time (sec): 75.89 - samples/sec: 207.33 - lr: 0.000022 - momentum: 0.000000 2023-10-06 22:16:46,545 epoch 9 - iter 216/275 - loss 0.03682123 - time (sec): 86.16 - samples/sec: 206.32 - lr: 0.000021 - momentum: 0.000000 2023-10-06 22:16:56,721 epoch 9 - iter 243/275 - loss 0.04040504 - time (sec): 96.34 - samples/sec: 206.17 - lr: 0.000019 - momentum: 0.000000 2023-10-06 22:17:08,179 epoch 9 - iter 270/275 - loss 0.04053340 - time (sec): 107.80 - samples/sec: 206.71 - lr: 0.000017 - momentum: 0.000000 2023-10-06 22:17:10,402 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:17:10,402 EPOCH 9 done: loss 0.0448 - lr: 0.000017 2023-10-06 22:17:17,249 DEV : loss 0.13075672090053558 - f1-score (micro avg) 0.8778 2023-10-06 22:17:17,255 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:17:27,681 epoch 10 - iter 27/275 - loss 0.03177845 - time (sec): 10.42 - samples/sec: 210.18 - lr: 0.000015 - momentum: 0.000000 2023-10-06 22:17:38,414 epoch 10 - iter 54/275 - loss 0.03897675 - time (sec): 21.16 - samples/sec: 209.90 - lr: 0.000014 - momentum: 0.000000 2023-10-06 22:17:48,702 epoch 10 - iter 81/275 - loss 0.03842644 - time (sec): 31.45 - samples/sec: 205.50 - lr: 0.000012 - momentum: 0.000000 2023-10-06 22:17:59,391 epoch 10 - iter 108/275 - loss 0.04243627 - time (sec): 42.13 - samples/sec: 203.87 - lr: 0.000011 - momentum: 0.000000 2023-10-06 22:18:11,812 epoch 10 - iter 135/275 - loss 0.04422130 - time (sec): 54.56 - samples/sec: 205.90 - lr: 0.000009 - momentum: 0.000000 2023-10-06 22:18:21,900 epoch 10 - iter 162/275 - loss 0.04540603 - time (sec): 64.64 - samples/sec: 206.26 - lr: 0.000007 - momentum: 0.000000 2023-10-06 22:18:32,914 epoch 10 - iter 189/275 - loss 0.04279804 - time (sec): 75.66 - samples/sec: 206.84 - lr: 0.000006 - momentum: 0.000000 2023-10-06 22:18:44,089 epoch 10 - iter 216/275 - loss 0.03953642 - time (sec): 86.83 - samples/sec: 207.73 - lr: 0.000004 - momentum: 0.000000 2023-10-06 22:18:55,067 epoch 10 - iter 243/275 - loss 0.03951628 - time (sec): 97.81 - samples/sec: 207.77 - lr: 0.000002 - momentum: 0.000000 2023-10-06 22:19:05,421 epoch 10 - iter 270/275 - loss 0.03821943 - time (sec): 108.16 - samples/sec: 207.04 - lr: 0.000001 - momentum: 0.000000 2023-10-06 22:19:07,202 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:19:07,202 EPOCH 10 done: loss 0.0386 - lr: 0.000001 2023-10-06 22:19:13,854 DEV : loss 0.13056319952011108 - f1-score (micro avg) 0.8751 2023-10-06 22:19:14,749 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:19:14,751 Loading model from best epoch ... 2023-10-06 22:19:18,297 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-object, B-object, E-object, I-object, S-date, B-date, E-date, I-date 2023-10-06 22:19:25,459 Results: - F-score (micro) 0.8817 - F-score (macro) 0.5243 - Accuracy 0.8091 By class: precision recall f1-score support scope 0.8846 0.9148 0.8994 176 pers 0.9008 0.9219 0.9112 128 work 0.8108 0.8108 0.8108 74 object 0.0000 0.0000 0.0000 2 loc 0.0000 0.0000 0.0000 2 micro avg 0.8760 0.8874 0.8817 382 macro avg 0.5192 0.5295 0.5243 382 weighted avg 0.8665 0.8874 0.8768 382 2023-10-06 22:19:25,460 ----------------------------------------------------------------------------------------------------