2023-10-13 10:08:42,076 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:08:42,079 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 10:08:42,079 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:08:42,079 MultiCorpus: 14465 train + 1392 dev + 2432 test sentences - NER_HIPE_2022 Corpus: 14465 train + 1392 dev + 2432 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/letemps/fr/with_doc_seperator 2023-10-13 10:08:42,079 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:08:42,079 Train: 14465 sentences 2023-10-13 10:08:42,079 (train_with_dev=False, train_with_test=False) 2023-10-13 10:08:42,079 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:08:42,079 Training Params: 2023-10-13 10:08:42,079 - learning_rate: "0.00015" 2023-10-13 10:08:42,080 - mini_batch_size: "8" 2023-10-13 10:08:42,080 - max_epochs: "10" 2023-10-13 10:08:42,080 - shuffle: "True" 2023-10-13 10:08:42,080 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:08:42,080 Plugins: 2023-10-13 10:08:42,080 - TensorboardLogger 2023-10-13 10:08:42,080 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 10:08:42,080 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:08:42,080 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 10:08:42,080 - metric: "('micro avg', 'f1-score')" 2023-10-13 10:08:42,080 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:08:42,080 Computation: 2023-10-13 10:08:42,080 - compute on device: cuda:0 2023-10-13 10:08:42,080 - embedding storage: none 2023-10-13 10:08:42,080 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:08:42,081 Model training base path: "hmbench-letemps/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2" 2023-10-13 10:08:42,081 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:08:42,081 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:08:42,081 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-13 10:10:16,670 epoch 1 - iter 180/1809 - loss 2.55193452 - time (sec): 94.59 - samples/sec: 393.52 - lr: 0.000015 - momentum: 0.000000 2023-10-13 10:11:50,746 epoch 1 - iter 360/1809 - loss 2.32664425 - time (sec): 188.66 - samples/sec: 395.41 - lr: 0.000030 - momentum: 0.000000 2023-10-13 10:13:30,449 epoch 1 - iter 540/1809 - loss 1.97458754 - time (sec): 288.37 - samples/sec: 391.31 - lr: 0.000045 - momentum: 0.000000 2023-10-13 10:15:09,687 epoch 1 - iter 720/1809 - loss 1.63149756 - time (sec): 387.60 - samples/sec: 388.96 - lr: 0.000060 - momentum: 0.000000 2023-10-13 10:16:47,097 epoch 1 - iter 900/1809 - loss 1.36469605 - time (sec): 485.01 - samples/sec: 389.16 - lr: 0.000075 - momentum: 0.000000 2023-10-13 10:18:23,052 epoch 1 - iter 1080/1809 - loss 1.17974807 - time (sec): 580.97 - samples/sec: 389.57 - lr: 0.000089 - momentum: 0.000000 2023-10-13 10:19:58,384 epoch 1 - iter 1260/1809 - loss 1.03904632 - time (sec): 676.30 - samples/sec: 390.55 - lr: 0.000104 - momentum: 0.000000 2023-10-13 10:21:34,085 epoch 1 - iter 1440/1809 - loss 0.92828376 - time (sec): 772.00 - samples/sec: 391.23 - lr: 0.000119 - momentum: 0.000000 2023-10-13 10:23:11,299 epoch 1 - iter 1620/1809 - loss 0.84195771 - time (sec): 869.22 - samples/sec: 391.84 - lr: 0.000134 - momentum: 0.000000 2023-10-13 10:24:46,248 epoch 1 - iter 1800/1809 - loss 0.77324166 - time (sec): 964.17 - samples/sec: 392.14 - lr: 0.000149 - momentum: 0.000000 2023-10-13 10:24:50,774 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:24:50,774 EPOCH 1 done: loss 0.7705 - lr: 0.000149 2023-10-13 10:25:30,426 DEV : loss 0.14874930679798126 - f1-score (micro avg) 0.4029 2023-10-13 10:25:30,486 saving best model 2023-10-13 10:25:31,350 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:27:05,720 epoch 2 - iter 180/1809 - loss 0.13805801 - time (sec): 94.37 - samples/sec: 388.33 - lr: 0.000148 - momentum: 0.000000 2023-10-13 10:28:42,834 epoch 2 - iter 360/1809 - loss 0.12933060 - time (sec): 191.48 - samples/sec: 389.43 - lr: 0.000147 - momentum: 0.000000 2023-10-13 10:30:19,877 epoch 2 - iter 540/1809 - loss 0.12513132 - time (sec): 288.52 - samples/sec: 390.84 - lr: 0.000145 - momentum: 0.000000 2023-10-13 10:31:57,990 epoch 2 - iter 720/1809 - loss 0.12053367 - time (sec): 386.64 - samples/sec: 389.16 - lr: 0.000143 - momentum: 0.000000 2023-10-13 10:33:35,527 epoch 2 - iter 900/1809 - loss 0.11629545 - time (sec): 484.17 - samples/sec: 391.21 - lr: 0.000142 - momentum: 0.000000 2023-10-13 10:35:08,984 epoch 2 - iter 1080/1809 - loss 0.11333380 - time (sec): 577.63 - samples/sec: 391.35 - lr: 0.000140 - momentum: 0.000000 2023-10-13 10:36:43,056 epoch 2 - iter 1260/1809 - loss 0.10988685 - time (sec): 671.70 - samples/sec: 391.95 - lr: 0.000138 - momentum: 0.000000 2023-10-13 10:38:22,782 epoch 2 - iter 1440/1809 - loss 0.10663465 - time (sec): 771.43 - samples/sec: 392.15 - lr: 0.000137 - momentum: 0.000000 2023-10-13 10:40:03,308 epoch 2 - iter 1620/1809 - loss 0.10350687 - time (sec): 871.95 - samples/sec: 390.60 - lr: 0.000135 - momentum: 0.000000 2023-10-13 10:41:38,900 epoch 2 - iter 1800/1809 - loss 0.10224515 - time (sec): 967.55 - samples/sec: 390.97 - lr: 0.000133 - momentum: 0.000000 2023-10-13 10:41:43,136 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:41:43,137 EPOCH 2 done: loss 0.1022 - lr: 0.000133 2023-10-13 10:42:24,954 DEV : loss 0.09910175204277039 - f1-score (micro avg) 0.5719 2023-10-13 10:42:25,015 saving best model 2023-10-13 10:42:27,591 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:44:04,602 epoch 3 - iter 180/1809 - loss 0.06155697 - time (sec): 97.01 - samples/sec: 403.70 - lr: 0.000132 - momentum: 0.000000 2023-10-13 10:45:38,856 epoch 3 - iter 360/1809 - loss 0.06070254 - time (sec): 191.26 - samples/sec: 395.73 - lr: 0.000130 - momentum: 0.000000 2023-10-13 10:47:14,475 epoch 3 - iter 540/1809 - loss 0.06120311 - time (sec): 286.88 - samples/sec: 394.26 - lr: 0.000128 - momentum: 0.000000 2023-10-13 10:48:53,782 epoch 3 - iter 720/1809 - loss 0.06292844 - time (sec): 386.19 - samples/sec: 390.36 - lr: 0.000127 - momentum: 0.000000 2023-10-13 10:50:32,623 epoch 3 - iter 900/1809 - loss 0.06307044 - time (sec): 485.03 - samples/sec: 389.58 - lr: 0.000125 - momentum: 0.000000 2023-10-13 10:52:10,549 epoch 3 - iter 1080/1809 - loss 0.06402904 - time (sec): 582.95 - samples/sec: 386.68 - lr: 0.000123 - momentum: 0.000000 2023-10-13 10:53:49,156 epoch 3 - iter 1260/1809 - loss 0.06377227 - time (sec): 681.56 - samples/sec: 389.15 - lr: 0.000122 - momentum: 0.000000 2023-10-13 10:55:24,344 epoch 3 - iter 1440/1809 - loss 0.06426367 - time (sec): 776.75 - samples/sec: 388.15 - lr: 0.000120 - momentum: 0.000000 2023-10-13 10:57:00,411 epoch 3 - iter 1620/1809 - loss 0.06362259 - time (sec): 872.82 - samples/sec: 389.28 - lr: 0.000118 - momentum: 0.000000 2023-10-13 10:58:38,665 epoch 3 - iter 1800/1809 - loss 0.06327889 - time (sec): 971.07 - samples/sec: 389.10 - lr: 0.000117 - momentum: 0.000000 2023-10-13 10:58:43,368 ---------------------------------------------------------------------------------------------------- 2023-10-13 10:58:43,368 EPOCH 3 done: loss 0.0632 - lr: 0.000117 2023-10-13 10:59:24,367 DEV : loss 0.11729110032320023 - f1-score (micro avg) 0.6357 2023-10-13 10:59:24,427 saving best model 2023-10-13 10:59:26,988 ---------------------------------------------------------------------------------------------------- 2023-10-13 11:01:01,873 epoch 4 - iter 180/1809 - loss 0.03980255 - time (sec): 94.88 - samples/sec: 390.73 - lr: 0.000115 - momentum: 0.000000 2023-10-13 11:02:41,954 epoch 4 - iter 360/1809 - loss 0.04218853 - time (sec): 194.96 - samples/sec: 389.89 - lr: 0.000113 - momentum: 0.000000 2023-10-13 11:04:22,545 epoch 4 - iter 540/1809 - loss 0.04514616 - time (sec): 295.55 - samples/sec: 382.99 - lr: 0.000112 - momentum: 0.000000 2023-10-13 11:06:00,574 epoch 4 - iter 720/1809 - loss 0.04552944 - time (sec): 393.58 - samples/sec: 382.84 - lr: 0.000110 - momentum: 0.000000 2023-10-13 11:07:37,214 epoch 4 - iter 900/1809 - loss 0.04691891 - time (sec): 490.22 - samples/sec: 384.07 - lr: 0.000108 - momentum: 0.000000 2023-10-13 11:09:17,007 epoch 4 - iter 1080/1809 - loss 0.04597866 - time (sec): 590.01 - samples/sec: 382.76 - lr: 0.000107 - momentum: 0.000000 2023-10-13 11:10:57,169 epoch 4 - iter 1260/1809 - loss 0.04513610 - time (sec): 690.18 - samples/sec: 381.27 - lr: 0.000105 - momentum: 0.000000 2023-10-13 11:12:40,828 epoch 4 - iter 1440/1809 - loss 0.04438682 - time (sec): 793.83 - samples/sec: 379.78 - lr: 0.000103 - momentum: 0.000000 2023-10-13 11:14:18,442 epoch 4 - iter 1620/1809 - loss 0.04429229 - time (sec): 891.45 - samples/sec: 381.80 - lr: 0.000102 - momentum: 0.000000 2023-10-13 11:16:00,362 epoch 4 - iter 1800/1809 - loss 0.04572766 - time (sec): 993.37 - samples/sec: 380.72 - lr: 0.000100 - momentum: 0.000000 2023-10-13 11:16:04,771 ---------------------------------------------------------------------------------------------------- 2023-10-13 11:16:04,772 EPOCH 4 done: loss 0.0457 - lr: 0.000100 2023-10-13 11:16:44,751 DEV : loss 0.16882555186748505 - f1-score (micro avg) 0.6361 2023-10-13 11:16:44,823 saving best model 2023-10-13 11:16:47,519 ---------------------------------------------------------------------------------------------------- 2023-10-13 11:18:28,087 epoch 5 - iter 180/1809 - loss 0.02745004 - time (sec): 100.56 - samples/sec: 383.98 - lr: 0.000098 - momentum: 0.000000 2023-10-13 11:20:05,038 epoch 5 - iter 360/1809 - loss 0.02948707 - time (sec): 197.51 - samples/sec: 391.81 - lr: 0.000097 - momentum: 0.000000 2023-10-13 11:21:43,338 epoch 5 - iter 540/1809 - loss 0.02991506 - time (sec): 295.81 - samples/sec: 385.31 - lr: 0.000095 - momentum: 0.000000 2023-10-13 11:23:23,915 epoch 5 - iter 720/1809 - loss 0.03207680 - time (sec): 396.39 - samples/sec: 386.48 - lr: 0.000093 - momentum: 0.000000 2023-10-13 11:25:03,777 epoch 5 - iter 900/1809 - loss 0.03173525 - time (sec): 496.25 - samples/sec: 386.05 - lr: 0.000092 - momentum: 0.000000 2023-10-13 11:26:41,708 epoch 5 - iter 1080/1809 - loss 0.03291552 - time (sec): 594.18 - samples/sec: 383.07 - lr: 0.000090 - momentum: 0.000000 2023-10-13 11:28:20,067 epoch 5 - iter 1260/1809 - loss 0.03293994 - time (sec): 692.54 - samples/sec: 383.92 - lr: 0.000088 - momentum: 0.000000 2023-10-13 11:29:59,571 epoch 5 - iter 1440/1809 - loss 0.03244028 - time (sec): 792.05 - samples/sec: 383.50 - lr: 0.000087 - momentum: 0.000000 2023-10-13 11:31:39,767 epoch 5 - iter 1620/1809 - loss 0.03323533 - time (sec): 892.24 - samples/sec: 381.40 - lr: 0.000085 - momentum: 0.000000 2023-10-13 11:33:19,274 epoch 5 - iter 1800/1809 - loss 0.03363279 - time (sec): 991.75 - samples/sec: 381.47 - lr: 0.000083 - momentum: 0.000000 2023-10-13 11:33:23,698 ---------------------------------------------------------------------------------------------------- 2023-10-13 11:33:23,698 EPOCH 5 done: loss 0.0337 - lr: 0.000083 2023-10-13 11:34:04,732 DEV : loss 0.22161424160003662 - f1-score (micro avg) 0.6488 2023-10-13 11:34:04,800 saving best model 2023-10-13 11:34:07,393 ---------------------------------------------------------------------------------------------------- 2023-10-13 11:35:48,090 epoch 6 - iter 180/1809 - loss 0.01989408 - time (sec): 100.69 - samples/sec: 377.17 - lr: 0.000082 - momentum: 0.000000 2023-10-13 11:37:24,788 epoch 6 - iter 360/1809 - loss 0.02145466 - time (sec): 197.39 - samples/sec: 380.27 - lr: 0.000080 - momentum: 0.000000 2023-10-13 11:39:00,908 epoch 6 - iter 540/1809 - loss 0.02242469 - time (sec): 293.51 - samples/sec: 381.09 - lr: 0.000078 - momentum: 0.000000 2023-10-13 11:40:34,463 epoch 6 - iter 720/1809 - loss 0.02387583 - time (sec): 387.06 - samples/sec: 388.26 - lr: 0.000077 - momentum: 0.000000 2023-10-13 11:42:09,699 epoch 6 - iter 900/1809 - loss 0.02427821 - time (sec): 482.30 - samples/sec: 388.84 - lr: 0.000075 - momentum: 0.000000 2023-10-13 11:43:42,056 epoch 6 - iter 1080/1809 - loss 0.02375360 - time (sec): 574.66 - samples/sec: 391.77 - lr: 0.000073 - momentum: 0.000000 2023-10-13 11:45:16,686 epoch 6 - iter 1260/1809 - loss 0.02379900 - time (sec): 669.29 - samples/sec: 393.07 - lr: 0.000072 - momentum: 0.000000 2023-10-13 11:46:50,964 epoch 6 - iter 1440/1809 - loss 0.02386266 - time (sec): 763.57 - samples/sec: 395.39 - lr: 0.000070 - momentum: 0.000000 2023-10-13 11:48:23,979 epoch 6 - iter 1620/1809 - loss 0.02406347 - time (sec): 856.58 - samples/sec: 396.80 - lr: 0.000068 - momentum: 0.000000 2023-10-13 11:49:57,153 epoch 6 - iter 1800/1809 - loss 0.02485654 - time (sec): 949.75 - samples/sec: 398.12 - lr: 0.000067 - momentum: 0.000000 2023-10-13 11:50:01,456 ---------------------------------------------------------------------------------------------------- 2023-10-13 11:50:01,456 EPOCH 6 done: loss 0.0248 - lr: 0.000067 2023-10-13 11:50:42,664 DEV : loss 0.26427823305130005 - f1-score (micro avg) 0.6499 2023-10-13 11:50:42,729 saving best model 2023-10-13 11:50:45,273 ---------------------------------------------------------------------------------------------------- 2023-10-13 11:52:24,565 epoch 7 - iter 180/1809 - loss 0.01829401 - time (sec): 99.29 - samples/sec: 388.56 - lr: 0.000065 - momentum: 0.000000 2023-10-13 11:54:00,473 epoch 7 - iter 360/1809 - loss 0.01662527 - time (sec): 195.20 - samples/sec: 390.08 - lr: 0.000063 - momentum: 0.000000 2023-10-13 11:55:35,250 epoch 7 - iter 540/1809 - loss 0.01711330 - time (sec): 289.97 - samples/sec: 396.97 - lr: 0.000062 - momentum: 0.000000 2023-10-13 11:57:12,171 epoch 7 - iter 720/1809 - loss 0.01763868 - time (sec): 386.89 - samples/sec: 391.83 - lr: 0.000060 - momentum: 0.000000 2023-10-13 11:58:52,384 epoch 7 - iter 900/1809 - loss 0.01892285 - time (sec): 487.11 - samples/sec: 388.52 - lr: 0.000058 - momentum: 0.000000 2023-10-13 12:00:33,274 epoch 7 - iter 1080/1809 - loss 0.01907594 - time (sec): 588.00 - samples/sec: 389.53 - lr: 0.000057 - momentum: 0.000000 2023-10-13 12:02:10,151 epoch 7 - iter 1260/1809 - loss 0.01948090 - time (sec): 684.87 - samples/sec: 389.17 - lr: 0.000055 - momentum: 0.000000 2023-10-13 12:03:44,517 epoch 7 - iter 1440/1809 - loss 0.01959196 - time (sec): 779.24 - samples/sec: 389.11 - lr: 0.000053 - momentum: 0.000000 2023-10-13 12:05:18,459 epoch 7 - iter 1620/1809 - loss 0.01974823 - time (sec): 873.18 - samples/sec: 389.17 - lr: 0.000052 - momentum: 0.000000 2023-10-13 12:06:52,523 epoch 7 - iter 1800/1809 - loss 0.01891607 - time (sec): 967.24 - samples/sec: 390.92 - lr: 0.000050 - momentum: 0.000000 2023-10-13 12:06:56,738 ---------------------------------------------------------------------------------------------------- 2023-10-13 12:06:56,738 EPOCH 7 done: loss 0.0190 - lr: 0.000050 2023-10-13 12:07:37,764 DEV : loss 0.3006477653980255 - f1-score (micro avg) 0.6484 2023-10-13 12:07:37,830 ---------------------------------------------------------------------------------------------------- 2023-10-13 12:09:14,953 epoch 8 - iter 180/1809 - loss 0.01107605 - time (sec): 97.12 - samples/sec: 390.74 - lr: 0.000048 - momentum: 0.000000 2023-10-13 12:10:55,384 epoch 8 - iter 360/1809 - loss 0.01371757 - time (sec): 197.55 - samples/sec: 391.33 - lr: 0.000047 - momentum: 0.000000 2023-10-13 12:12:34,854 epoch 8 - iter 540/1809 - loss 0.01237565 - time (sec): 297.02 - samples/sec: 389.62 - lr: 0.000045 - momentum: 0.000000 2023-10-13 12:14:13,295 epoch 8 - iter 720/1809 - loss 0.01229570 - time (sec): 395.46 - samples/sec: 389.84 - lr: 0.000043 - momentum: 0.000000 2023-10-13 12:15:49,208 epoch 8 - iter 900/1809 - loss 0.01315215 - time (sec): 491.38 - samples/sec: 387.35 - lr: 0.000042 - momentum: 0.000000 2023-10-13 12:17:23,939 epoch 8 - iter 1080/1809 - loss 0.01311433 - time (sec): 586.11 - samples/sec: 391.13 - lr: 0.000040 - momentum: 0.000000 2023-10-13 12:18:59,994 epoch 8 - iter 1260/1809 - loss 0.01292896 - time (sec): 682.16 - samples/sec: 389.79 - lr: 0.000038 - momentum: 0.000000 2023-10-13 12:20:36,705 epoch 8 - iter 1440/1809 - loss 0.01275007 - time (sec): 778.87 - samples/sec: 389.16 - lr: 0.000037 - momentum: 0.000000 2023-10-13 12:22:13,261 epoch 8 - iter 1620/1809 - loss 0.01281415 - time (sec): 875.43 - samples/sec: 389.99 - lr: 0.000035 - momentum: 0.000000 2023-10-13 12:23:53,434 epoch 8 - iter 1800/1809 - loss 0.01361457 - time (sec): 975.60 - samples/sec: 387.96 - lr: 0.000033 - momentum: 0.000000 2023-10-13 12:23:57,584 ---------------------------------------------------------------------------------------------------- 2023-10-13 12:23:57,584 EPOCH 8 done: loss 0.0136 - lr: 0.000033 2023-10-13 12:24:39,214 DEV : loss 0.3133712708950043 - f1-score (micro avg) 0.6443 2023-10-13 12:24:39,281 ---------------------------------------------------------------------------------------------------- 2023-10-13 12:26:14,295 epoch 9 - iter 180/1809 - loss 0.00811211 - time (sec): 95.01 - samples/sec: 381.60 - lr: 0.000032 - momentum: 0.000000 2023-10-13 12:27:51,642 epoch 9 - iter 360/1809 - loss 0.01014665 - time (sec): 192.36 - samples/sec: 386.96 - lr: 0.000030 - momentum: 0.000000 2023-10-13 12:29:28,757 epoch 9 - iter 540/1809 - loss 0.01204379 - time (sec): 289.47 - samples/sec: 389.86 - lr: 0.000028 - momentum: 0.000000 2023-10-13 12:31:06,089 epoch 9 - iter 720/1809 - loss 0.01210666 - time (sec): 386.81 - samples/sec: 389.63 - lr: 0.000027 - momentum: 0.000000 2023-10-13 12:32:48,351 epoch 9 - iter 900/1809 - loss 0.01191859 - time (sec): 489.07 - samples/sec: 387.06 - lr: 0.000025 - momentum: 0.000000 2023-10-13 12:34:29,374 epoch 9 - iter 1080/1809 - loss 0.01103504 - time (sec): 590.09 - samples/sec: 385.32 - lr: 0.000023 - momentum: 0.000000 2023-10-13 12:36:06,450 epoch 9 - iter 1260/1809 - loss 0.01185981 - time (sec): 687.17 - samples/sec: 384.86 - lr: 0.000022 - momentum: 0.000000 2023-10-13 12:37:44,048 epoch 9 - iter 1440/1809 - loss 0.01164861 - time (sec): 784.76 - samples/sec: 383.08 - lr: 0.000020 - momentum: 0.000000 2023-10-13 12:39:21,038 epoch 9 - iter 1620/1809 - loss 0.01129939 - time (sec): 881.75 - samples/sec: 384.22 - lr: 0.000018 - momentum: 0.000000 2023-10-13 12:41:04,369 epoch 9 - iter 1800/1809 - loss 0.01127319 - time (sec): 985.09 - samples/sec: 383.82 - lr: 0.000017 - momentum: 0.000000 2023-10-13 12:41:09,282 ---------------------------------------------------------------------------------------------------- 2023-10-13 12:41:09,283 EPOCH 9 done: loss 0.0112 - lr: 0.000017 2023-10-13 12:41:52,191 DEV : loss 0.33530837297439575 - f1-score (micro avg) 0.6476 2023-10-13 12:41:52,272 ---------------------------------------------------------------------------------------------------- 2023-10-13 12:43:32,493 epoch 10 - iter 180/1809 - loss 0.00503398 - time (sec): 100.22 - samples/sec: 380.94 - lr: 0.000015 - momentum: 0.000000 2023-10-13 12:45:10,018 epoch 10 - iter 360/1809 - loss 0.00514473 - time (sec): 197.74 - samples/sec: 383.57 - lr: 0.000013 - momentum: 0.000000 2023-10-13 12:46:46,023 epoch 10 - iter 540/1809 - loss 0.00587847 - time (sec): 293.75 - samples/sec: 385.18 - lr: 0.000012 - momentum: 0.000000 2023-10-13 12:48:21,614 epoch 10 - iter 720/1809 - loss 0.00651019 - time (sec): 389.34 - samples/sec: 389.69 - lr: 0.000010 - momentum: 0.000000 2023-10-13 12:49:57,424 epoch 10 - iter 900/1809 - loss 0.00679671 - time (sec): 485.15 - samples/sec: 389.14 - lr: 0.000008 - momentum: 0.000000 2023-10-13 12:51:33,136 epoch 10 - iter 1080/1809 - loss 0.00712549 - time (sec): 580.86 - samples/sec: 389.66 - lr: 0.000007 - momentum: 0.000000 2023-10-13 12:53:07,935 epoch 10 - iter 1260/1809 - loss 0.00705147 - time (sec): 675.66 - samples/sec: 391.82 - lr: 0.000005 - momentum: 0.000000 2023-10-13 12:54:42,621 epoch 10 - iter 1440/1809 - loss 0.00701041 - time (sec): 770.35 - samples/sec: 394.49 - lr: 0.000003 - momentum: 0.000000 2023-10-13 12:56:16,729 epoch 10 - iter 1620/1809 - loss 0.00703049 - time (sec): 864.45 - samples/sec: 392.77 - lr: 0.000002 - momentum: 0.000000 2023-10-13 12:57:53,169 epoch 10 - iter 1800/1809 - loss 0.00712609 - time (sec): 960.89 - samples/sec: 393.85 - lr: 0.000000 - momentum: 0.000000 2023-10-13 12:57:57,308 ---------------------------------------------------------------------------------------------------- 2023-10-13 12:57:57,308 EPOCH 10 done: loss 0.0071 - lr: 0.000000 2023-10-13 12:58:38,220 DEV : loss 0.3420470952987671 - f1-score (micro avg) 0.6541 2023-10-13 12:58:38,294 saving best model 2023-10-13 12:58:45,503 ---------------------------------------------------------------------------------------------------- 2023-10-13 12:58:45,506 Loading model from best epoch ... 2023-10-13 12:58:51,051 SequenceTagger predicts: Dictionary with 13 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org 2023-10-13 12:59:49,682 Results: - F-score (micro) 0.6291 - F-score (macro) 0.5022 - Accuracy 0.4688 By class: precision recall f1-score support loc 0.6234 0.7479 0.6800 591 pers 0.5742 0.6611 0.6146 357 org 0.2642 0.1772 0.2121 79 micro avg 0.5899 0.6738 0.6291 1027 macro avg 0.4873 0.5287 0.5022 1027 weighted avg 0.5787 0.6738 0.6213 1027 2023-10-13 12:59:49,682 ----------------------------------------------------------------------------------------------------