2023-10-07 01:15:22,530 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:15:22,532 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-07 01:15:22,532 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:15:22,532 MultiCorpus: 1100 train + 206 dev + 240 test sentences - NER_HIPE_2022 Corpus: 1100 train + 206 dev + 240 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/de/with_doc_seperator 2023-10-07 01:15:22,532 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:15:22,532 Train: 1100 sentences 2023-10-07 01:15:22,532 (train_with_dev=False, train_with_test=False) 2023-10-07 01:15:22,532 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:15:22,532 Training Params: 2023-10-07 01:15:22,532 - learning_rate: "0.00015" 2023-10-07 01:15:22,532 - mini_batch_size: "8" 2023-10-07 01:15:22,532 - max_epochs: "10" 2023-10-07 01:15:22,532 - shuffle: "True" 2023-10-07 01:15:22,532 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:15:22,532 Plugins: 2023-10-07 01:15:22,532 - TensorboardLogger 2023-10-07 01:15:22,532 - LinearScheduler | warmup_fraction: '0.1' 2023-10-07 01:15:22,532 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:15:22,533 Final evaluation on model from best epoch (best-model.pt) 2023-10-07 01:15:22,533 - metric: "('micro avg', 'f1-score')" 2023-10-07 01:15:22,533 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:15:22,533 Computation: 2023-10-07 01:15:22,533 - compute on device: cuda:0 2023-10-07 01:15:22,533 - embedding storage: none 2023-10-07 01:15:22,533 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:15:22,533 Model training base path: "hmbench-ajmc/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4" 2023-10-07 01:15:22,533 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:15:22,533 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:15:22,533 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-07 01:15:31,538 epoch 1 - iter 13/138 - loss 3.24828859 - time (sec): 9.00 - samples/sec: 224.23 - lr: 0.000013 - momentum: 0.000000 2023-10-07 01:15:40,922 epoch 1 - iter 26/138 - loss 3.24306645 - time (sec): 18.39 - samples/sec: 229.50 - lr: 0.000027 - momentum: 0.000000 2023-10-07 01:15:50,749 epoch 1 - iter 39/138 - loss 3.23291278 - time (sec): 28.22 - samples/sec: 232.92 - lr: 0.000041 - momentum: 0.000000 2023-10-07 01:16:00,629 epoch 1 - iter 52/138 - loss 3.21835524 - time (sec): 38.09 - samples/sec: 232.53 - lr: 0.000055 - momentum: 0.000000 2023-10-07 01:16:09,702 epoch 1 - iter 65/138 - loss 3.19377640 - time (sec): 47.17 - samples/sec: 230.75 - lr: 0.000070 - momentum: 0.000000 2023-10-07 01:16:19,190 epoch 1 - iter 78/138 - loss 3.14821775 - time (sec): 56.66 - samples/sec: 228.98 - lr: 0.000084 - momentum: 0.000000 2023-10-07 01:16:28,019 epoch 1 - iter 91/138 - loss 3.08866051 - time (sec): 65.48 - samples/sec: 228.63 - lr: 0.000098 - momentum: 0.000000 2023-10-07 01:16:38,138 epoch 1 - iter 104/138 - loss 3.00670410 - time (sec): 75.60 - samples/sec: 229.38 - lr: 0.000112 - momentum: 0.000000 2023-10-07 01:16:47,382 epoch 1 - iter 117/138 - loss 2.93035609 - time (sec): 84.85 - samples/sec: 229.73 - lr: 0.000126 - momentum: 0.000000 2023-10-07 01:16:56,570 epoch 1 - iter 130/138 - loss 2.85220031 - time (sec): 94.04 - samples/sec: 228.72 - lr: 0.000140 - momentum: 0.000000 2023-10-07 01:17:02,058 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:17:02,059 EPOCH 1 done: loss 2.8027 - lr: 0.000140 2023-10-07 01:17:08,371 DEV : loss 1.8371483087539673 - f1-score (micro avg) 0.0 2023-10-07 01:17:08,376 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:17:17,992 epoch 2 - iter 13/138 - loss 1.78002652 - time (sec): 9.61 - samples/sec: 232.76 - lr: 0.000149 - momentum: 0.000000 2023-10-07 01:17:27,209 epoch 2 - iter 26/138 - loss 1.70494439 - time (sec): 18.83 - samples/sec: 224.88 - lr: 0.000147 - momentum: 0.000000 2023-10-07 01:17:35,792 epoch 2 - iter 39/138 - loss 1.60987014 - time (sec): 27.42 - samples/sec: 221.41 - lr: 0.000145 - momentum: 0.000000 2023-10-07 01:17:45,114 epoch 2 - iter 52/138 - loss 1.51958579 - time (sec): 36.74 - samples/sec: 224.76 - lr: 0.000144 - momentum: 0.000000 2023-10-07 01:17:54,597 epoch 2 - iter 65/138 - loss 1.42527343 - time (sec): 46.22 - samples/sec: 225.77 - lr: 0.000142 - momentum: 0.000000 2023-10-07 01:18:04,025 epoch 2 - iter 78/138 - loss 1.35043793 - time (sec): 55.65 - samples/sec: 225.44 - lr: 0.000141 - momentum: 0.000000 2023-10-07 01:18:13,985 epoch 2 - iter 91/138 - loss 1.29544695 - time (sec): 65.61 - samples/sec: 226.39 - lr: 0.000139 - momentum: 0.000000 2023-10-07 01:18:23,880 epoch 2 - iter 104/138 - loss 1.23545553 - time (sec): 75.50 - samples/sec: 226.96 - lr: 0.000138 - momentum: 0.000000 2023-10-07 01:18:32,973 epoch 2 - iter 117/138 - loss 1.21725016 - time (sec): 84.60 - samples/sec: 228.38 - lr: 0.000136 - momentum: 0.000000 2023-10-07 01:18:42,496 epoch 2 - iter 130/138 - loss 1.17433177 - time (sec): 94.12 - samples/sec: 228.75 - lr: 0.000134 - momentum: 0.000000 2023-10-07 01:18:47,903 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:18:47,903 EPOCH 2 done: loss 1.1550 - lr: 0.000134 2023-10-07 01:18:54,337 DEV : loss 0.845672607421875 - f1-score (micro avg) 0.0 2023-10-07 01:18:54,342 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:19:03,480 epoch 3 - iter 13/138 - loss 0.76719476 - time (sec): 9.14 - samples/sec: 225.45 - lr: 0.000132 - momentum: 0.000000 2023-10-07 01:19:12,704 epoch 3 - iter 26/138 - loss 0.75545689 - time (sec): 18.36 - samples/sec: 229.73 - lr: 0.000130 - momentum: 0.000000 2023-10-07 01:19:22,852 epoch 3 - iter 39/138 - loss 0.69073662 - time (sec): 28.51 - samples/sec: 230.21 - lr: 0.000129 - momentum: 0.000000 2023-10-07 01:19:32,459 epoch 3 - iter 52/138 - loss 0.64569560 - time (sec): 38.12 - samples/sec: 229.11 - lr: 0.000127 - momentum: 0.000000 2023-10-07 01:19:41,993 epoch 3 - iter 65/138 - loss 0.63526192 - time (sec): 47.65 - samples/sec: 229.30 - lr: 0.000126 - momentum: 0.000000 2023-10-07 01:19:51,185 epoch 3 - iter 78/138 - loss 0.61936500 - time (sec): 56.84 - samples/sec: 228.12 - lr: 0.000124 - momentum: 0.000000 2023-10-07 01:20:00,681 epoch 3 - iter 91/138 - loss 0.59743899 - time (sec): 66.34 - samples/sec: 229.93 - lr: 0.000123 - momentum: 0.000000 2023-10-07 01:20:10,114 epoch 3 - iter 104/138 - loss 0.58003767 - time (sec): 75.77 - samples/sec: 229.74 - lr: 0.000121 - momentum: 0.000000 2023-10-07 01:20:19,561 epoch 3 - iter 117/138 - loss 0.56742023 - time (sec): 85.22 - samples/sec: 229.06 - lr: 0.000119 - momentum: 0.000000 2023-10-07 01:20:28,595 epoch 3 - iter 130/138 - loss 0.55633486 - time (sec): 94.25 - samples/sec: 228.21 - lr: 0.000118 - momentum: 0.000000 2023-10-07 01:20:34,112 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:20:34,113 EPOCH 3 done: loss 0.5499 - lr: 0.000118 2023-10-07 01:20:40,531 DEV : loss 0.4134528338909149 - f1-score (micro avg) 0.5185 2023-10-07 01:20:40,537 saving best model 2023-10-07 01:20:41,383 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:20:49,674 epoch 4 - iter 13/138 - loss 0.41651853 - time (sec): 8.29 - samples/sec: 206.66 - lr: 0.000115 - momentum: 0.000000 2023-10-07 01:20:59,027 epoch 4 - iter 26/138 - loss 0.37358104 - time (sec): 17.64 - samples/sec: 219.36 - lr: 0.000114 - momentum: 0.000000 2023-10-07 01:21:08,798 epoch 4 - iter 39/138 - loss 0.37215514 - time (sec): 27.41 - samples/sec: 225.62 - lr: 0.000112 - momentum: 0.000000 2023-10-07 01:21:18,418 epoch 4 - iter 52/138 - loss 0.37238785 - time (sec): 37.03 - samples/sec: 229.25 - lr: 0.000111 - momentum: 0.000000 2023-10-07 01:21:27,308 epoch 4 - iter 65/138 - loss 0.37104112 - time (sec): 45.92 - samples/sec: 228.16 - lr: 0.000109 - momentum: 0.000000 2023-10-07 01:21:36,664 epoch 4 - iter 78/138 - loss 0.35234194 - time (sec): 55.28 - samples/sec: 228.37 - lr: 0.000107 - momentum: 0.000000 2023-10-07 01:21:45,175 epoch 4 - iter 91/138 - loss 0.34394455 - time (sec): 63.79 - samples/sec: 226.31 - lr: 0.000106 - momentum: 0.000000 2023-10-07 01:21:55,033 epoch 4 - iter 104/138 - loss 0.33542247 - time (sec): 73.65 - samples/sec: 227.54 - lr: 0.000104 - momentum: 0.000000 2023-10-07 01:22:04,919 epoch 4 - iter 117/138 - loss 0.33674126 - time (sec): 83.53 - samples/sec: 228.41 - lr: 0.000103 - momentum: 0.000000 2023-10-07 01:22:15,458 epoch 4 - iter 130/138 - loss 0.32805211 - time (sec): 94.07 - samples/sec: 229.30 - lr: 0.000101 - momentum: 0.000000 2023-10-07 01:22:21,001 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:22:21,001 EPOCH 4 done: loss 0.3246 - lr: 0.000101 2023-10-07 01:22:27,423 DEV : loss 0.2694088816642761 - f1-score (micro avg) 0.7589 2023-10-07 01:22:27,428 saving best model 2023-10-07 01:22:28,303 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:22:37,145 epoch 5 - iter 13/138 - loss 0.30090111 - time (sec): 8.84 - samples/sec: 237.87 - lr: 0.000099 - momentum: 0.000000 2023-10-07 01:22:46,256 epoch 5 - iter 26/138 - loss 0.26577049 - time (sec): 17.95 - samples/sec: 226.73 - lr: 0.000097 - momentum: 0.000000 2023-10-07 01:22:56,278 epoch 5 - iter 39/138 - loss 0.22896971 - time (sec): 27.97 - samples/sec: 229.18 - lr: 0.000096 - momentum: 0.000000 2023-10-07 01:23:05,442 epoch 5 - iter 52/138 - loss 0.21383942 - time (sec): 37.14 - samples/sec: 228.64 - lr: 0.000094 - momentum: 0.000000 2023-10-07 01:23:14,990 epoch 5 - iter 65/138 - loss 0.21585880 - time (sec): 46.69 - samples/sec: 229.79 - lr: 0.000092 - momentum: 0.000000 2023-10-07 01:23:24,460 epoch 5 - iter 78/138 - loss 0.21291348 - time (sec): 56.16 - samples/sec: 229.15 - lr: 0.000091 - momentum: 0.000000 2023-10-07 01:23:33,904 epoch 5 - iter 91/138 - loss 0.21386054 - time (sec): 65.60 - samples/sec: 229.06 - lr: 0.000089 - momentum: 0.000000 2023-10-07 01:23:43,519 epoch 5 - iter 104/138 - loss 0.21509638 - time (sec): 75.21 - samples/sec: 229.56 - lr: 0.000088 - momentum: 0.000000 2023-10-07 01:23:53,245 epoch 5 - iter 117/138 - loss 0.21472409 - time (sec): 84.94 - samples/sec: 230.45 - lr: 0.000086 - momentum: 0.000000 2023-10-07 01:24:02,719 epoch 5 - iter 130/138 - loss 0.21814584 - time (sec): 94.41 - samples/sec: 230.02 - lr: 0.000085 - momentum: 0.000000 2023-10-07 01:24:07,647 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:24:07,647 EPOCH 5 done: loss 0.2169 - lr: 0.000085 2023-10-07 01:24:14,043 DEV : loss 0.19746224582195282 - f1-score (micro avg) 0.7865 2023-10-07 01:24:14,048 saving best model 2023-10-07 01:24:14,906 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:24:24,467 epoch 6 - iter 13/138 - loss 0.15336532 - time (sec): 9.56 - samples/sec: 241.22 - lr: 0.000082 - momentum: 0.000000 2023-10-07 01:24:34,331 epoch 6 - iter 26/138 - loss 0.16846546 - time (sec): 19.42 - samples/sec: 236.46 - lr: 0.000080 - momentum: 0.000000 2023-10-07 01:24:43,301 epoch 6 - iter 39/138 - loss 0.17300366 - time (sec): 28.39 - samples/sec: 234.63 - lr: 0.000079 - momentum: 0.000000 2023-10-07 01:24:52,559 epoch 6 - iter 52/138 - loss 0.16181288 - time (sec): 37.65 - samples/sec: 230.64 - lr: 0.000077 - momentum: 0.000000 2023-10-07 01:25:01,549 epoch 6 - iter 65/138 - loss 0.16255739 - time (sec): 46.64 - samples/sec: 229.11 - lr: 0.000076 - momentum: 0.000000 2023-10-07 01:25:11,148 epoch 6 - iter 78/138 - loss 0.16099642 - time (sec): 56.24 - samples/sec: 229.16 - lr: 0.000074 - momentum: 0.000000 2023-10-07 01:25:20,519 epoch 6 - iter 91/138 - loss 0.16207104 - time (sec): 65.61 - samples/sec: 228.56 - lr: 0.000073 - momentum: 0.000000 2023-10-07 01:25:30,256 epoch 6 - iter 104/138 - loss 0.15770321 - time (sec): 75.35 - samples/sec: 229.04 - lr: 0.000071 - momentum: 0.000000 2023-10-07 01:25:39,585 epoch 6 - iter 117/138 - loss 0.15414663 - time (sec): 84.68 - samples/sec: 228.34 - lr: 0.000070 - momentum: 0.000000 2023-10-07 01:25:48,985 epoch 6 - iter 130/138 - loss 0.15062209 - time (sec): 94.08 - samples/sec: 227.10 - lr: 0.000068 - momentum: 0.000000 2023-10-07 01:25:55,024 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:25:55,025 EPOCH 6 done: loss 0.1525 - lr: 0.000068 2023-10-07 01:26:01,693 DEV : loss 0.15915270149707794 - f1-score (micro avg) 0.8373 2023-10-07 01:26:01,698 saving best model 2023-10-07 01:26:02,575 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:26:12,347 epoch 7 - iter 13/138 - loss 0.13219699 - time (sec): 9.77 - samples/sec: 227.53 - lr: 0.000065 - momentum: 0.000000 2023-10-07 01:26:21,486 epoch 7 - iter 26/138 - loss 0.13025357 - time (sec): 18.91 - samples/sec: 227.82 - lr: 0.000064 - momentum: 0.000000 2023-10-07 01:26:30,405 epoch 7 - iter 39/138 - loss 0.13191905 - time (sec): 27.83 - samples/sec: 222.90 - lr: 0.000062 - momentum: 0.000000 2023-10-07 01:26:40,469 epoch 7 - iter 52/138 - loss 0.12075848 - time (sec): 37.89 - samples/sec: 221.21 - lr: 0.000061 - momentum: 0.000000 2023-10-07 01:26:49,595 epoch 7 - iter 65/138 - loss 0.11734169 - time (sec): 47.02 - samples/sec: 221.11 - lr: 0.000059 - momentum: 0.000000 2023-10-07 01:26:58,753 epoch 7 - iter 78/138 - loss 0.11326191 - time (sec): 56.18 - samples/sec: 220.45 - lr: 0.000058 - momentum: 0.000000 2023-10-07 01:27:09,090 epoch 7 - iter 91/138 - loss 0.11408908 - time (sec): 66.51 - samples/sec: 222.26 - lr: 0.000056 - momentum: 0.000000 2023-10-07 01:27:18,707 epoch 7 - iter 104/138 - loss 0.11252134 - time (sec): 76.13 - samples/sec: 223.84 - lr: 0.000054 - momentum: 0.000000 2023-10-07 01:27:28,634 epoch 7 - iter 117/138 - loss 0.11428033 - time (sec): 86.06 - samples/sec: 224.33 - lr: 0.000053 - momentum: 0.000000 2023-10-07 01:27:38,838 epoch 7 - iter 130/138 - loss 0.11554929 - time (sec): 96.26 - samples/sec: 224.36 - lr: 0.000051 - momentum: 0.000000 2023-10-07 01:27:44,390 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:27:44,390 EPOCH 7 done: loss 0.1145 - lr: 0.000051 2023-10-07 01:27:51,074 DEV : loss 0.14204637706279755 - f1-score (micro avg) 0.8422 2023-10-07 01:27:51,081 saving best model 2023-10-07 01:27:51,965 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:28:01,706 epoch 8 - iter 13/138 - loss 0.07932656 - time (sec): 9.74 - samples/sec: 227.12 - lr: 0.000049 - momentum: 0.000000 2023-10-07 01:28:11,678 epoch 8 - iter 26/138 - loss 0.07910965 - time (sec): 19.71 - samples/sec: 227.39 - lr: 0.000047 - momentum: 0.000000 2023-10-07 01:28:20,730 epoch 8 - iter 39/138 - loss 0.08042982 - time (sec): 28.76 - samples/sec: 224.98 - lr: 0.000046 - momentum: 0.000000 2023-10-07 01:28:30,382 epoch 8 - iter 52/138 - loss 0.08587795 - time (sec): 38.41 - samples/sec: 225.30 - lr: 0.000044 - momentum: 0.000000 2023-10-07 01:28:39,368 epoch 8 - iter 65/138 - loss 0.08723165 - time (sec): 47.40 - samples/sec: 224.41 - lr: 0.000043 - momentum: 0.000000 2023-10-07 01:28:48,684 epoch 8 - iter 78/138 - loss 0.08915768 - time (sec): 56.72 - samples/sec: 223.74 - lr: 0.000041 - momentum: 0.000000 2023-10-07 01:28:58,262 epoch 8 - iter 91/138 - loss 0.09084241 - time (sec): 66.30 - samples/sec: 223.35 - lr: 0.000039 - momentum: 0.000000 2023-10-07 01:29:08,052 epoch 8 - iter 104/138 - loss 0.08892701 - time (sec): 76.08 - samples/sec: 224.54 - lr: 0.000038 - momentum: 0.000000 2023-10-07 01:29:18,376 epoch 8 - iter 117/138 - loss 0.08692053 - time (sec): 86.41 - samples/sec: 225.35 - lr: 0.000036 - momentum: 0.000000 2023-10-07 01:29:27,508 epoch 8 - iter 130/138 - loss 0.08573298 - time (sec): 95.54 - samples/sec: 225.11 - lr: 0.000035 - momentum: 0.000000 2023-10-07 01:29:32,893 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:29:32,893 EPOCH 8 done: loss 0.0922 - lr: 0.000035 2023-10-07 01:29:39,367 DEV : loss 0.13560204207897186 - f1-score (micro avg) 0.8487 2023-10-07 01:29:39,372 saving best model 2023-10-07 01:29:40,250 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:29:49,586 epoch 9 - iter 13/138 - loss 0.07874872 - time (sec): 9.33 - samples/sec: 229.78 - lr: 0.000032 - momentum: 0.000000 2023-10-07 01:29:59,591 epoch 9 - iter 26/138 - loss 0.07917520 - time (sec): 19.34 - samples/sec: 231.28 - lr: 0.000031 - momentum: 0.000000 2023-10-07 01:30:09,142 epoch 9 - iter 39/138 - loss 0.07841021 - time (sec): 28.89 - samples/sec: 232.68 - lr: 0.000029 - momentum: 0.000000 2023-10-07 01:30:19,580 epoch 9 - iter 52/138 - loss 0.07759241 - time (sec): 39.33 - samples/sec: 234.51 - lr: 0.000027 - momentum: 0.000000 2023-10-07 01:30:28,813 epoch 9 - iter 65/138 - loss 0.07677469 - time (sec): 48.56 - samples/sec: 231.89 - lr: 0.000026 - momentum: 0.000000 2023-10-07 01:30:39,300 epoch 9 - iter 78/138 - loss 0.07753629 - time (sec): 59.05 - samples/sec: 230.42 - lr: 0.000024 - momentum: 0.000000 2023-10-07 01:30:48,249 epoch 9 - iter 91/138 - loss 0.07580801 - time (sec): 68.00 - samples/sec: 228.01 - lr: 0.000023 - momentum: 0.000000 2023-10-07 01:30:57,903 epoch 9 - iter 104/138 - loss 0.07869909 - time (sec): 77.65 - samples/sec: 227.93 - lr: 0.000021 - momentum: 0.000000 2023-10-07 01:31:06,851 epoch 9 - iter 117/138 - loss 0.07986836 - time (sec): 86.60 - samples/sec: 226.44 - lr: 0.000020 - momentum: 0.000000 2023-10-07 01:31:15,797 epoch 9 - iter 130/138 - loss 0.07964601 - time (sec): 95.55 - samples/sec: 226.71 - lr: 0.000018 - momentum: 0.000000 2023-10-07 01:31:21,066 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:31:21,066 EPOCH 9 done: loss 0.0797 - lr: 0.000018 2023-10-07 01:31:27,542 DEV : loss 0.13089357316493988 - f1-score (micro avg) 0.8609 2023-10-07 01:31:27,547 saving best model 2023-10-07 01:31:28,443 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:31:37,497 epoch 10 - iter 13/138 - loss 0.05251554 - time (sec): 9.05 - samples/sec: 224.90 - lr: 0.000016 - momentum: 0.000000 2023-10-07 01:31:46,636 epoch 10 - iter 26/138 - loss 0.05742682 - time (sec): 18.19 - samples/sec: 226.20 - lr: 0.000014 - momentum: 0.000000 2023-10-07 01:31:55,887 epoch 10 - iter 39/138 - loss 0.06060797 - time (sec): 27.44 - samples/sec: 226.48 - lr: 0.000012 - momentum: 0.000000 2023-10-07 01:32:05,632 epoch 10 - iter 52/138 - loss 0.06450398 - time (sec): 37.19 - samples/sec: 226.39 - lr: 0.000011 - momentum: 0.000000 2023-10-07 01:32:15,827 epoch 10 - iter 65/138 - loss 0.06574069 - time (sec): 47.38 - samples/sec: 226.98 - lr: 0.000009 - momentum: 0.000000 2023-10-07 01:32:25,236 epoch 10 - iter 78/138 - loss 0.06923330 - time (sec): 56.79 - samples/sec: 226.30 - lr: 0.000008 - momentum: 0.000000 2023-10-07 01:32:35,271 epoch 10 - iter 91/138 - loss 0.06896849 - time (sec): 66.83 - samples/sec: 227.11 - lr: 0.000006 - momentum: 0.000000 2023-10-07 01:32:44,202 epoch 10 - iter 104/138 - loss 0.06823662 - time (sec): 75.76 - samples/sec: 225.11 - lr: 0.000005 - momentum: 0.000000 2023-10-07 01:32:54,442 epoch 10 - iter 117/138 - loss 0.06764845 - time (sec): 86.00 - samples/sec: 225.88 - lr: 0.000003 - momentum: 0.000000 2023-10-07 01:33:03,573 epoch 10 - iter 130/138 - loss 0.07308496 - time (sec): 95.13 - samples/sec: 225.56 - lr: 0.000001 - momentum: 0.000000 2023-10-07 01:33:09,302 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:33:09,302 EPOCH 10 done: loss 0.0725 - lr: 0.000001 2023-10-07 01:33:15,797 DEV : loss 0.1304098665714264 - f1-score (micro avg) 0.8589 2023-10-07 01:33:16,632 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:33:16,634 Loading model from best epoch ... 2023-10-07 01:33:19,817 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-object, B-object, E-object, I-object, S-date, B-date, E-date, I-date 2023-10-07 01:33:26,784 Results: - F-score (micro) 0.8903 - F-score (macro) 0.5298 - Accuracy 0.8257 By class: precision recall f1-score support scope 0.9148 0.9148 0.9148 176 pers 0.9070 0.9141 0.9105 128 work 0.7975 0.8514 0.8235 74 object 0.0000 0.0000 0.0000 2 loc 0.0000 0.0000 0.0000 2 micro avg 0.8880 0.8927 0.8903 382 macro avg 0.5238 0.5360 0.5298 382 weighted avg 0.8799 0.8927 0.8861 382 2023-10-07 01:33:26,784 ----------------------------------------------------------------------------------------------------