2023-10-12 20:36:34,417 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:36:34,419 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 20:36:34,420 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:36:34,420 MultiCorpus: 7936 train + 992 dev + 992 test sentences - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr 2023-10-12 20:36:34,420 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:36:34,420 Train: 7936 sentences 2023-10-12 20:36:34,420 (train_with_dev=False, train_with_test=False) 2023-10-12 20:36:34,421 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:36:34,421 Training Params: 2023-10-12 20:36:34,421 - learning_rate: "0.00015" 2023-10-12 20:36:34,421 - mini_batch_size: "8" 2023-10-12 20:36:34,421 - max_epochs: "10" 2023-10-12 20:36:34,421 - shuffle: "True" 2023-10-12 20:36:34,421 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:36:34,421 Plugins: 2023-10-12 20:36:34,421 - TensorboardLogger 2023-10-12 20:36:34,421 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 20:36:34,422 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:36:34,422 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 20:36:34,422 - metric: "('micro avg', 'f1-score')" 2023-10-12 20:36:34,422 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:36:34,422 Computation: 2023-10-12 20:36:34,422 - compute on device: cuda:0 2023-10-12 20:36:34,422 - embedding storage: none 2023-10-12 20:36:34,422 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:36:34,422 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3" 2023-10-12 20:36:34,422 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:36:34,423 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:36:34,423 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 20:37:23,526 epoch 1 - iter 99/992 - loss 2.53940455 - time (sec): 49.10 - samples/sec: 327.17 - lr: 0.000015 - momentum: 0.000000 2023-10-12 20:38:09,552 epoch 1 - iter 198/992 - loss 2.45209354 - time (sec): 95.13 - samples/sec: 334.01 - lr: 0.000030 - momentum: 0.000000 2023-10-12 20:38:56,579 epoch 1 - iter 297/992 - loss 2.23771454 - time (sec): 142.15 - samples/sec: 342.90 - lr: 0.000045 - momentum: 0.000000 2023-10-12 20:39:47,556 epoch 1 - iter 396/992 - loss 2.00240804 - time (sec): 193.13 - samples/sec: 337.89 - lr: 0.000060 - momentum: 0.000000 2023-10-12 20:40:37,028 epoch 1 - iter 495/992 - loss 1.75078774 - time (sec): 242.60 - samples/sec: 338.44 - lr: 0.000075 - momentum: 0.000000 2023-10-12 20:41:27,744 epoch 1 - iter 594/992 - loss 1.53136792 - time (sec): 293.32 - samples/sec: 335.30 - lr: 0.000090 - momentum: 0.000000 2023-10-12 20:42:16,877 epoch 1 - iter 693/992 - loss 1.36980903 - time (sec): 342.45 - samples/sec: 332.96 - lr: 0.000105 - momentum: 0.000000 2023-10-12 20:43:05,181 epoch 1 - iter 792/992 - loss 1.23246121 - time (sec): 390.76 - samples/sec: 334.01 - lr: 0.000120 - momentum: 0.000000 2023-10-12 20:43:58,462 epoch 1 - iter 891/992 - loss 1.11904567 - time (sec): 444.04 - samples/sec: 332.10 - lr: 0.000135 - momentum: 0.000000 2023-10-12 20:44:49,457 epoch 1 - iter 990/992 - loss 1.02887479 - time (sec): 495.03 - samples/sec: 330.68 - lr: 0.000150 - momentum: 0.000000 2023-10-12 20:44:50,413 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:44:50,413 EPOCH 1 done: loss 1.0276 - lr: 0.000150 2023-10-12 20:45:15,991 DEV : loss 0.16679786145687103 - f1-score (micro avg) 0.492 2023-10-12 20:45:16,031 saving best model 2023-10-12 20:45:16,981 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:46:06,496 epoch 2 - iter 99/992 - loss 0.17077119 - time (sec): 49.51 - samples/sec: 340.13 - lr: 0.000148 - momentum: 0.000000 2023-10-12 20:46:56,296 epoch 2 - iter 198/992 - loss 0.16840803 - time (sec): 99.31 - samples/sec: 332.86 - lr: 0.000147 - momentum: 0.000000 2023-10-12 20:47:44,810 epoch 2 - iter 297/992 - loss 0.16705195 - time (sec): 147.83 - samples/sec: 333.86 - lr: 0.000145 - momentum: 0.000000 2023-10-12 20:48:32,058 epoch 2 - iter 396/992 - loss 0.15897276 - time (sec): 195.08 - samples/sec: 339.12 - lr: 0.000143 - momentum: 0.000000 2023-10-12 20:49:21,939 epoch 2 - iter 495/992 - loss 0.15604786 - time (sec): 244.96 - samples/sec: 335.31 - lr: 0.000142 - momentum: 0.000000 2023-10-12 20:50:12,969 epoch 2 - iter 594/992 - loss 0.15226274 - time (sec): 295.99 - samples/sec: 334.72 - lr: 0.000140 - momentum: 0.000000 2023-10-12 20:51:01,723 epoch 2 - iter 693/992 - loss 0.14825625 - time (sec): 344.74 - samples/sec: 336.07 - lr: 0.000138 - momentum: 0.000000 2023-10-12 20:51:55,514 epoch 2 - iter 792/992 - loss 0.14565453 - time (sec): 398.53 - samples/sec: 330.04 - lr: 0.000137 - momentum: 0.000000 2023-10-12 20:52:44,252 epoch 2 - iter 891/992 - loss 0.14261297 - time (sec): 447.27 - samples/sec: 329.90 - lr: 0.000135 - momentum: 0.000000 2023-10-12 20:53:32,986 epoch 2 - iter 990/992 - loss 0.14067570 - time (sec): 496.00 - samples/sec: 330.13 - lr: 0.000133 - momentum: 0.000000 2023-10-12 20:53:33,941 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:53:33,942 EPOCH 2 done: loss 0.1406 - lr: 0.000133 2023-10-12 20:53:59,736 DEV : loss 0.09202314913272858 - f1-score (micro avg) 0.7419 2023-10-12 20:53:59,777 saving best model 2023-10-12 20:54:02,426 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:54:56,733 epoch 3 - iter 99/992 - loss 0.08481471 - time (sec): 54.29 - samples/sec: 316.06 - lr: 0.000132 - momentum: 0.000000 2023-10-12 20:55:48,391 epoch 3 - iter 198/992 - loss 0.08839525 - time (sec): 105.95 - samples/sec: 313.11 - lr: 0.000130 - momentum: 0.000000 2023-10-12 20:56:36,336 epoch 3 - iter 297/992 - loss 0.08581007 - time (sec): 153.89 - samples/sec: 317.80 - lr: 0.000128 - momentum: 0.000000 2023-10-12 20:57:25,973 epoch 3 - iter 396/992 - loss 0.08230993 - time (sec): 203.53 - samples/sec: 317.82 - lr: 0.000127 - momentum: 0.000000 2023-10-12 20:58:15,723 epoch 3 - iter 495/992 - loss 0.08191812 - time (sec): 253.28 - samples/sec: 321.05 - lr: 0.000125 - momentum: 0.000000 2023-10-12 20:59:05,929 epoch 3 - iter 594/992 - loss 0.08257402 - time (sec): 303.49 - samples/sec: 321.37 - lr: 0.000123 - momentum: 0.000000 2023-10-12 20:59:53,635 epoch 3 - iter 693/992 - loss 0.08166994 - time (sec): 351.19 - samples/sec: 322.98 - lr: 0.000122 - momentum: 0.000000 2023-10-12 21:00:42,974 epoch 3 - iter 792/992 - loss 0.08101549 - time (sec): 400.53 - samples/sec: 324.25 - lr: 0.000120 - momentum: 0.000000 2023-10-12 21:01:32,280 epoch 3 - iter 891/992 - loss 0.08026505 - time (sec): 449.84 - samples/sec: 325.56 - lr: 0.000118 - momentum: 0.000000 2023-10-12 21:02:23,827 epoch 3 - iter 990/992 - loss 0.08033861 - time (sec): 501.38 - samples/sec: 326.64 - lr: 0.000117 - momentum: 0.000000 2023-10-12 21:02:24,696 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:02:24,696 EPOCH 3 done: loss 0.0803 - lr: 0.000117 2023-10-12 21:02:49,665 DEV : loss 0.08990765362977982 - f1-score (micro avg) 0.7604 2023-10-12 21:02:49,705 saving best model 2023-10-12 21:02:52,306 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:03:41,885 epoch 4 - iter 99/992 - loss 0.05709566 - time (sec): 49.57 - samples/sec: 351.31 - lr: 0.000115 - momentum: 0.000000 2023-10-12 21:04:31,315 epoch 4 - iter 198/992 - loss 0.05384534 - time (sec): 99.00 - samples/sec: 335.44 - lr: 0.000113 - momentum: 0.000000 2023-10-12 21:05:20,247 epoch 4 - iter 297/992 - loss 0.05406144 - time (sec): 147.94 - samples/sec: 334.00 - lr: 0.000112 - momentum: 0.000000 2023-10-12 21:06:07,442 epoch 4 - iter 396/992 - loss 0.05302751 - time (sec): 195.13 - samples/sec: 336.83 - lr: 0.000110 - momentum: 0.000000 2023-10-12 21:06:53,930 epoch 4 - iter 495/992 - loss 0.05258435 - time (sec): 241.62 - samples/sec: 339.91 - lr: 0.000108 - momentum: 0.000000 2023-10-12 21:07:42,776 epoch 4 - iter 594/992 - loss 0.05291144 - time (sec): 290.47 - samples/sec: 337.89 - lr: 0.000107 - momentum: 0.000000 2023-10-12 21:08:31,390 epoch 4 - iter 693/992 - loss 0.05440967 - time (sec): 339.08 - samples/sec: 335.36 - lr: 0.000105 - momentum: 0.000000 2023-10-12 21:09:21,204 epoch 4 - iter 792/992 - loss 0.05467823 - time (sec): 388.89 - samples/sec: 335.88 - lr: 0.000103 - momentum: 0.000000 2023-10-12 21:10:13,890 epoch 4 - iter 891/992 - loss 0.05499260 - time (sec): 441.58 - samples/sec: 332.35 - lr: 0.000102 - momentum: 0.000000 2023-10-12 21:11:07,827 epoch 4 - iter 990/992 - loss 0.05480641 - time (sec): 495.52 - samples/sec: 330.33 - lr: 0.000100 - momentum: 0.000000 2023-10-12 21:11:08,918 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:11:08,919 EPOCH 4 done: loss 0.0550 - lr: 0.000100 2023-10-12 21:11:34,092 DEV : loss 0.11015438288450241 - f1-score (micro avg) 0.7516 2023-10-12 21:11:34,132 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:12:29,220 epoch 5 - iter 99/992 - loss 0.03490950 - time (sec): 55.09 - samples/sec: 281.83 - lr: 0.000098 - momentum: 0.000000 2023-10-12 21:13:19,557 epoch 5 - iter 198/992 - loss 0.03110268 - time (sec): 105.42 - samples/sec: 302.69 - lr: 0.000097 - momentum: 0.000000 2023-10-12 21:14:08,872 epoch 5 - iter 297/992 - loss 0.03567964 - time (sec): 154.74 - samples/sec: 313.09 - lr: 0.000095 - momentum: 0.000000 2023-10-12 21:15:00,906 epoch 5 - iter 396/992 - loss 0.03894388 - time (sec): 206.77 - samples/sec: 307.97 - lr: 0.000093 - momentum: 0.000000 2023-10-12 21:15:50,989 epoch 5 - iter 495/992 - loss 0.03871853 - time (sec): 256.85 - samples/sec: 308.70 - lr: 0.000092 - momentum: 0.000000 2023-10-12 21:16:42,127 epoch 5 - iter 594/992 - loss 0.03866059 - time (sec): 307.99 - samples/sec: 311.40 - lr: 0.000090 - momentum: 0.000000 2023-10-12 21:17:33,881 epoch 5 - iter 693/992 - loss 0.03769547 - time (sec): 359.75 - samples/sec: 316.19 - lr: 0.000088 - momentum: 0.000000 2023-10-12 21:18:22,487 epoch 5 - iter 792/992 - loss 0.03775340 - time (sec): 408.35 - samples/sec: 319.19 - lr: 0.000087 - momentum: 0.000000 2023-10-12 21:19:12,969 epoch 5 - iter 891/992 - loss 0.03942820 - time (sec): 458.83 - samples/sec: 319.35 - lr: 0.000085 - momentum: 0.000000 2023-10-12 21:20:05,611 epoch 5 - iter 990/992 - loss 0.03993968 - time (sec): 511.48 - samples/sec: 319.98 - lr: 0.000083 - momentum: 0.000000 2023-10-12 21:20:06,684 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:20:06,685 EPOCH 5 done: loss 0.0399 - lr: 0.000083 2023-10-12 21:20:34,573 DEV : loss 0.12319868057966232 - f1-score (micro avg) 0.7494 2023-10-12 21:20:34,627 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:21:25,537 epoch 6 - iter 99/992 - loss 0.03811252 - time (sec): 50.91 - samples/sec: 319.23 - lr: 0.000082 - momentum: 0.000000 2023-10-12 21:22:16,408 epoch 6 - iter 198/992 - loss 0.03228055 - time (sec): 101.78 - samples/sec: 316.51 - lr: 0.000080 - momentum: 0.000000 2023-10-12 21:23:07,761 epoch 6 - iter 297/992 - loss 0.03228149 - time (sec): 153.13 - samples/sec: 318.71 - lr: 0.000078 - momentum: 0.000000 2023-10-12 21:24:00,236 epoch 6 - iter 396/992 - loss 0.03210612 - time (sec): 205.61 - samples/sec: 318.41 - lr: 0.000077 - momentum: 0.000000 2023-10-12 21:24:53,252 epoch 6 - iter 495/992 - loss 0.03193389 - time (sec): 258.62 - samples/sec: 317.54 - lr: 0.000075 - momentum: 0.000000 2023-10-12 21:25:42,174 epoch 6 - iter 594/992 - loss 0.03222843 - time (sec): 307.54 - samples/sec: 318.60 - lr: 0.000073 - momentum: 0.000000 2023-10-12 21:26:32,031 epoch 6 - iter 693/992 - loss 0.03237733 - time (sec): 357.40 - samples/sec: 322.22 - lr: 0.000072 - momentum: 0.000000 2023-10-12 21:27:23,817 epoch 6 - iter 792/992 - loss 0.03084620 - time (sec): 409.19 - samples/sec: 321.91 - lr: 0.000070 - momentum: 0.000000 2023-10-12 21:28:16,696 epoch 6 - iter 891/992 - loss 0.03172155 - time (sec): 462.07 - samples/sec: 321.02 - lr: 0.000068 - momentum: 0.000000 2023-10-12 21:29:06,526 epoch 6 - iter 990/992 - loss 0.03169299 - time (sec): 511.90 - samples/sec: 319.93 - lr: 0.000067 - momentum: 0.000000 2023-10-12 21:29:07,472 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:29:07,473 EPOCH 6 done: loss 0.0317 - lr: 0.000067 2023-10-12 21:29:34,854 DEV : loss 0.1523526906967163 - f1-score (micro avg) 0.745 2023-10-12 21:29:34,898 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:30:29,230 epoch 7 - iter 99/992 - loss 0.01836233 - time (sec): 54.33 - samples/sec: 297.39 - lr: 0.000065 - momentum: 0.000000 2023-10-12 21:31:25,213 epoch 7 - iter 198/992 - loss 0.01888426 - time (sec): 110.31 - samples/sec: 298.53 - lr: 0.000063 - momentum: 0.000000 2023-10-12 21:32:16,337 epoch 7 - iter 297/992 - loss 0.02103087 - time (sec): 161.44 - samples/sec: 303.30 - lr: 0.000062 - momentum: 0.000000 2023-10-12 21:33:10,363 epoch 7 - iter 396/992 - loss 0.02296196 - time (sec): 215.46 - samples/sec: 303.71 - lr: 0.000060 - momentum: 0.000000 2023-10-12 21:34:04,597 epoch 7 - iter 495/992 - loss 0.02381151 - time (sec): 269.70 - samples/sec: 302.28 - lr: 0.000058 - momentum: 0.000000 2023-10-12 21:34:59,557 epoch 7 - iter 594/992 - loss 0.02454321 - time (sec): 324.66 - samples/sec: 302.98 - lr: 0.000057 - momentum: 0.000000 2023-10-12 21:35:53,242 epoch 7 - iter 693/992 - loss 0.02362879 - time (sec): 378.34 - samples/sec: 303.19 - lr: 0.000055 - momentum: 0.000000 2023-10-12 21:36:41,463 epoch 7 - iter 792/992 - loss 0.02358871 - time (sec): 426.56 - samples/sec: 307.73 - lr: 0.000053 - momentum: 0.000000 2023-10-12 21:37:29,049 epoch 7 - iter 891/992 - loss 0.02327723 - time (sec): 474.15 - samples/sec: 310.66 - lr: 0.000052 - momentum: 0.000000 2023-10-12 21:38:17,393 epoch 7 - iter 990/992 - loss 0.02414669 - time (sec): 522.49 - samples/sec: 313.30 - lr: 0.000050 - momentum: 0.000000 2023-10-12 21:38:18,316 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:38:18,316 EPOCH 7 done: loss 0.0241 - lr: 0.000050 2023-10-12 21:38:44,375 DEV : loss 0.16238392889499664 - f1-score (micro avg) 0.7575 2023-10-12 21:38:44,416 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:39:32,311 epoch 8 - iter 99/992 - loss 0.02098412 - time (sec): 47.89 - samples/sec: 345.39 - lr: 0.000048 - momentum: 0.000000 2023-10-12 21:40:21,125 epoch 8 - iter 198/992 - loss 0.02001400 - time (sec): 96.71 - samples/sec: 330.15 - lr: 0.000047 - momentum: 0.000000 2023-10-12 21:41:11,175 epoch 8 - iter 297/992 - loss 0.01778574 - time (sec): 146.76 - samples/sec: 332.28 - lr: 0.000045 - momentum: 0.000000 2023-10-12 21:42:00,933 epoch 8 - iter 396/992 - loss 0.01851136 - time (sec): 196.51 - samples/sec: 333.89 - lr: 0.000043 - momentum: 0.000000 2023-10-12 21:42:47,636 epoch 8 - iter 495/992 - loss 0.01977844 - time (sec): 243.22 - samples/sec: 337.22 - lr: 0.000042 - momentum: 0.000000 2023-10-12 21:43:35,853 epoch 8 - iter 594/992 - loss 0.01974787 - time (sec): 291.43 - samples/sec: 336.66 - lr: 0.000040 - momentum: 0.000000 2023-10-12 21:44:21,954 epoch 8 - iter 693/992 - loss 0.01908591 - time (sec): 337.54 - samples/sec: 338.15 - lr: 0.000038 - momentum: 0.000000 2023-10-12 21:45:09,680 epoch 8 - iter 792/992 - loss 0.01817565 - time (sec): 385.26 - samples/sec: 340.05 - lr: 0.000037 - momentum: 0.000000 2023-10-12 21:45:56,435 epoch 8 - iter 891/992 - loss 0.01823549 - time (sec): 432.02 - samples/sec: 340.11 - lr: 0.000035 - momentum: 0.000000 2023-10-12 21:46:44,381 epoch 8 - iter 990/992 - loss 0.01881597 - time (sec): 479.96 - samples/sec: 340.91 - lr: 0.000033 - momentum: 0.000000 2023-10-12 21:46:45,354 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:46:45,354 EPOCH 8 done: loss 0.0188 - lr: 0.000033 2023-10-12 21:47:10,756 DEV : loss 0.1798100620508194 - f1-score (micro avg) 0.7504 2023-10-12 21:47:10,796 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:47:57,915 epoch 9 - iter 99/992 - loss 0.01330830 - time (sec): 47.12 - samples/sec: 328.12 - lr: 0.000032 - momentum: 0.000000 2023-10-12 21:48:46,407 epoch 9 - iter 198/992 - loss 0.01224787 - time (sec): 95.61 - samples/sec: 322.93 - lr: 0.000030 - momentum: 0.000000 2023-10-12 21:49:35,505 epoch 9 - iter 297/992 - loss 0.01382402 - time (sec): 144.71 - samples/sec: 326.64 - lr: 0.000028 - momentum: 0.000000 2023-10-12 21:50:25,041 epoch 9 - iter 396/992 - loss 0.01437517 - time (sec): 194.24 - samples/sec: 331.02 - lr: 0.000027 - momentum: 0.000000 2023-10-12 21:51:12,360 epoch 9 - iter 495/992 - loss 0.01368674 - time (sec): 241.56 - samples/sec: 334.81 - lr: 0.000025 - momentum: 0.000000 2023-10-12 21:51:59,940 epoch 9 - iter 594/992 - loss 0.01453578 - time (sec): 289.14 - samples/sec: 340.98 - lr: 0.000023 - momentum: 0.000000 2023-10-12 21:52:46,742 epoch 9 - iter 693/992 - loss 0.01553987 - time (sec): 335.94 - samples/sec: 343.65 - lr: 0.000022 - momentum: 0.000000 2023-10-12 21:53:34,899 epoch 9 - iter 792/992 - loss 0.01569527 - time (sec): 384.10 - samples/sec: 343.96 - lr: 0.000020 - momentum: 0.000000 2023-10-12 21:54:22,482 epoch 9 - iter 891/992 - loss 0.01579337 - time (sec): 431.68 - samples/sec: 344.20 - lr: 0.000018 - momentum: 0.000000 2023-10-12 21:55:10,034 epoch 9 - iter 990/992 - loss 0.01620441 - time (sec): 479.24 - samples/sec: 341.45 - lr: 0.000017 - momentum: 0.000000 2023-10-12 21:55:10,982 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:55:10,982 EPOCH 9 done: loss 0.0162 - lr: 0.000017 2023-10-12 21:55:37,009 DEV : loss 0.18687152862548828 - f1-score (micro avg) 0.7575 2023-10-12 21:55:37,055 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:56:24,933 epoch 10 - iter 99/992 - loss 0.00770149 - time (sec): 47.88 - samples/sec: 344.62 - lr: 0.000015 - momentum: 0.000000 2023-10-12 21:57:12,835 epoch 10 - iter 198/992 - loss 0.00989841 - time (sec): 95.78 - samples/sec: 346.95 - lr: 0.000013 - momentum: 0.000000 2023-10-12 21:58:00,731 epoch 10 - iter 297/992 - loss 0.01166225 - time (sec): 143.67 - samples/sec: 348.74 - lr: 0.000012 - momentum: 0.000000 2023-10-12 21:58:46,948 epoch 10 - iter 396/992 - loss 0.01166889 - time (sec): 189.89 - samples/sec: 348.12 - lr: 0.000010 - momentum: 0.000000 2023-10-12 21:59:35,210 epoch 10 - iter 495/992 - loss 0.01103474 - time (sec): 238.15 - samples/sec: 346.06 - lr: 0.000008 - momentum: 0.000000 2023-10-12 22:00:23,313 epoch 10 - iter 594/992 - loss 0.01167473 - time (sec): 286.26 - samples/sec: 343.43 - lr: 0.000007 - momentum: 0.000000 2023-10-12 22:01:12,909 epoch 10 - iter 693/992 - loss 0.01237909 - time (sec): 335.85 - samples/sec: 342.47 - lr: 0.000005 - momentum: 0.000000 2023-10-12 22:01:59,272 epoch 10 - iter 792/992 - loss 0.01251453 - time (sec): 382.21 - samples/sec: 342.30 - lr: 0.000004 - momentum: 0.000000 2023-10-12 22:02:46,892 epoch 10 - iter 891/992 - loss 0.01256245 - time (sec): 429.83 - samples/sec: 343.86 - lr: 0.000002 - momentum: 0.000000 2023-10-12 22:03:34,464 epoch 10 - iter 990/992 - loss 0.01261074 - time (sec): 477.41 - samples/sec: 342.85 - lr: 0.000000 - momentum: 0.000000 2023-10-12 22:03:35,396 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:03:35,397 EPOCH 10 done: loss 0.0126 - lr: 0.000000 2023-10-12 22:04:01,331 DEV : loss 0.19634360074996948 - f1-score (micro avg) 0.7592 2023-10-12 22:04:02,299 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:04:02,301 Loading model from best epoch ... 2023-10-12 22:04:05,962 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-12 22:04:30,881 Results: - F-score (micro) 0.7525 - F-score (macro) 0.6889 - Accuracy 0.6284 By class: precision recall f1-score support LOC 0.7875 0.8092 0.7982 655 PER 0.7255 0.8296 0.7741 223 ORG 0.4595 0.5354 0.4945 127 micro avg 0.7277 0.7791 0.7525 1005 macro avg 0.6575 0.7247 0.6889 1005 weighted avg 0.7323 0.7791 0.7545 1005 2023-10-12 22:04:30,881 ----------------------------------------------------------------------------------------------------