|
2023-10-12 01:27:16,208 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:27:16,210 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-12 01:27:16,210 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:27:16,210 MultiCorpus: 7142 train + 698 dev + 2570 test sentences |
|
- NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator |
|
2023-10-12 01:27:16,210 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:27:16,210 Train: 7142 sentences |
|
2023-10-12 01:27:16,211 (train_with_dev=False, train_with_test=False) |
|
2023-10-12 01:27:16,211 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:27:16,211 Training Params: |
|
2023-10-12 01:27:16,211 - learning_rate: "0.00016" |
|
2023-10-12 01:27:16,211 - mini_batch_size: "4" |
|
2023-10-12 01:27:16,211 - max_epochs: "10" |
|
2023-10-12 01:27:16,211 - shuffle: "True" |
|
2023-10-12 01:27:16,211 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:27:16,211 Plugins: |
|
2023-10-12 01:27:16,211 - TensorboardLogger |
|
2023-10-12 01:27:16,211 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-12 01:27:16,211 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:27:16,211 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-12 01:27:16,211 - metric: "('micro avg', 'f1-score')" |
|
2023-10-12 01:27:16,211 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:27:16,212 Computation: |
|
2023-10-12 01:27:16,212 - compute on device: cuda:0 |
|
2023-10-12 01:27:16,212 - embedding storage: none |
|
2023-10-12 01:27:16,212 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:27:16,212 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-5" |
|
2023-10-12 01:27:16,212 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:27:16,212 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:27:16,212 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-12 01:28:09,890 epoch 1 - iter 178/1786 - loss 2.80507616 - time (sec): 53.68 - samples/sec: 504.62 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-12 01:29:02,680 epoch 1 - iter 356/1786 - loss 2.63434995 - time (sec): 106.47 - samples/sec: 496.51 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-12 01:29:57,335 epoch 1 - iter 534/1786 - loss 2.33660891 - time (sec): 161.12 - samples/sec: 494.59 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-12 01:30:49,640 epoch 1 - iter 712/1786 - loss 2.05010669 - time (sec): 213.43 - samples/sec: 491.15 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-12 01:31:40,298 epoch 1 - iter 890/1786 - loss 1.79302625 - time (sec): 264.08 - samples/sec: 489.91 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-12 01:32:31,757 epoch 1 - iter 1068/1786 - loss 1.60141163 - time (sec): 315.54 - samples/sec: 484.97 - lr: 0.000096 - momentum: 0.000000 |
|
2023-10-12 01:33:23,002 epoch 1 - iter 1246/1786 - loss 1.43923476 - time (sec): 366.79 - samples/sec: 482.67 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-12 01:34:17,377 epoch 1 - iter 1424/1786 - loss 1.31726143 - time (sec): 421.16 - samples/sec: 473.61 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-12 01:35:08,642 epoch 1 - iter 1602/1786 - loss 1.20681398 - time (sec): 472.43 - samples/sec: 473.34 - lr: 0.000143 - momentum: 0.000000 |
|
2023-10-12 01:35:58,724 epoch 1 - iter 1780/1786 - loss 1.11235817 - time (sec): 522.51 - samples/sec: 474.81 - lr: 0.000159 - momentum: 0.000000 |
|
2023-10-12 01:36:00,213 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:36:00,213 EPOCH 1 done: loss 1.1097 - lr: 0.000159 |
|
2023-10-12 01:36:18,867 DEV : loss 0.16794759035110474 - f1-score (micro avg) 0.5603 |
|
2023-10-12 01:36:18,895 saving best model |
|
2023-10-12 01:36:19,749 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:37:09,858 epoch 2 - iter 178/1786 - loss 0.18793256 - time (sec): 50.11 - samples/sec: 498.04 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-12 01:38:01,560 epoch 2 - iter 356/1786 - loss 0.17776196 - time (sec): 101.81 - samples/sec: 494.37 - lr: 0.000156 - momentum: 0.000000 |
|
2023-10-12 01:38:56,898 epoch 2 - iter 534/1786 - loss 0.16472447 - time (sec): 157.15 - samples/sec: 479.83 - lr: 0.000155 - momentum: 0.000000 |
|
2023-10-12 01:39:48,644 epoch 2 - iter 712/1786 - loss 0.15515345 - time (sec): 208.89 - samples/sec: 477.97 - lr: 0.000153 - momentum: 0.000000 |
|
2023-10-12 01:40:41,766 epoch 2 - iter 890/1786 - loss 0.14481166 - time (sec): 262.02 - samples/sec: 481.61 - lr: 0.000151 - momentum: 0.000000 |
|
2023-10-12 01:41:32,730 epoch 2 - iter 1068/1786 - loss 0.14070964 - time (sec): 312.98 - samples/sec: 477.58 - lr: 0.000149 - momentum: 0.000000 |
|
2023-10-12 01:42:23,528 epoch 2 - iter 1246/1786 - loss 0.13733140 - time (sec): 363.78 - samples/sec: 476.43 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-12 01:43:15,628 epoch 2 - iter 1424/1786 - loss 0.13364336 - time (sec): 415.88 - samples/sec: 477.99 - lr: 0.000146 - momentum: 0.000000 |
|
2023-10-12 01:44:06,504 epoch 2 - iter 1602/1786 - loss 0.13125581 - time (sec): 466.75 - samples/sec: 477.55 - lr: 0.000144 - momentum: 0.000000 |
|
2023-10-12 01:45:00,109 epoch 2 - iter 1780/1786 - loss 0.12730060 - time (sec): 520.36 - samples/sec: 475.83 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-12 01:45:02,016 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:45:02,016 EPOCH 2 done: loss 0.1274 - lr: 0.000142 |
|
2023-10-12 01:45:24,922 DEV : loss 0.10730913281440735 - f1-score (micro avg) 0.7623 |
|
2023-10-12 01:45:24,958 saving best model |
|
2023-10-12 01:45:28,483 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:46:20,438 epoch 3 - iter 178/1786 - loss 0.06605315 - time (sec): 51.95 - samples/sec: 472.43 - lr: 0.000140 - momentum: 0.000000 |
|
2023-10-12 01:47:12,817 epoch 3 - iter 356/1786 - loss 0.06559355 - time (sec): 104.33 - samples/sec: 480.15 - lr: 0.000139 - momentum: 0.000000 |
|
2023-10-12 01:48:08,733 epoch 3 - iter 534/1786 - loss 0.06577095 - time (sec): 160.24 - samples/sec: 460.10 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-12 01:48:58,726 epoch 3 - iter 712/1786 - loss 0.06735674 - time (sec): 210.24 - samples/sec: 465.58 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-12 01:49:49,932 epoch 3 - iter 890/1786 - loss 0.06649420 - time (sec): 261.44 - samples/sec: 472.92 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-12 01:50:47,225 epoch 3 - iter 1068/1786 - loss 0.06836462 - time (sec): 318.74 - samples/sec: 468.44 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-12 01:51:41,972 epoch 3 - iter 1246/1786 - loss 0.06785363 - time (sec): 373.48 - samples/sec: 466.11 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-12 01:52:37,998 epoch 3 - iter 1424/1786 - loss 0.06934020 - time (sec): 429.51 - samples/sec: 459.33 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-12 01:53:30,221 epoch 3 - iter 1602/1786 - loss 0.07082217 - time (sec): 481.73 - samples/sec: 460.16 - lr: 0.000126 - momentum: 0.000000 |
|
2023-10-12 01:54:22,946 epoch 3 - iter 1780/1786 - loss 0.06960424 - time (sec): 534.46 - samples/sec: 463.92 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-12 01:54:24,559 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:54:24,559 EPOCH 3 done: loss 0.0697 - lr: 0.000125 |
|
2023-10-12 01:54:46,566 DEV : loss 0.12944428622722626 - f1-score (micro avg) 0.7745 |
|
2023-10-12 01:54:46,599 saving best model |
|
2023-10-12 01:54:49,221 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 01:55:44,398 epoch 4 - iter 178/1786 - loss 0.05577792 - time (sec): 55.17 - samples/sec: 486.15 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-12 01:56:39,284 epoch 4 - iter 356/1786 - loss 0.05538162 - time (sec): 110.06 - samples/sec: 461.68 - lr: 0.000121 - momentum: 0.000000 |
|
2023-10-12 01:57:31,509 epoch 4 - iter 534/1786 - loss 0.05233655 - time (sec): 162.28 - samples/sec: 467.73 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-12 01:58:23,217 epoch 4 - iter 712/1786 - loss 0.05259265 - time (sec): 213.99 - samples/sec: 469.50 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-12 01:59:13,920 epoch 4 - iter 890/1786 - loss 0.05373927 - time (sec): 264.69 - samples/sec: 468.70 - lr: 0.000116 - momentum: 0.000000 |
|
2023-10-12 02:00:06,383 epoch 4 - iter 1068/1786 - loss 0.05149636 - time (sec): 317.16 - samples/sec: 470.91 - lr: 0.000114 - momentum: 0.000000 |
|
2023-10-12 02:00:58,159 epoch 4 - iter 1246/1786 - loss 0.05173730 - time (sec): 368.93 - samples/sec: 469.30 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-12 02:01:49,397 epoch 4 - iter 1424/1786 - loss 0.05114958 - time (sec): 420.17 - samples/sec: 469.30 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-12 02:02:43,021 epoch 4 - iter 1602/1786 - loss 0.05108851 - time (sec): 473.80 - samples/sec: 472.58 - lr: 0.000109 - momentum: 0.000000 |
|
2023-10-12 02:03:34,895 epoch 4 - iter 1780/1786 - loss 0.05115817 - time (sec): 525.67 - samples/sec: 471.96 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-12 02:03:36,495 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 02:03:36,495 EPOCH 4 done: loss 0.0511 - lr: 0.000107 |
|
2023-10-12 02:03:56,766 DEV : loss 0.14689753949642181 - f1-score (micro avg) 0.785 |
|
2023-10-12 02:03:56,795 saving best model |
|
2023-10-12 02:03:59,385 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 02:04:55,346 epoch 5 - iter 178/1786 - loss 0.03039438 - time (sec): 55.96 - samples/sec: 438.41 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-12 02:05:49,136 epoch 5 - iter 356/1786 - loss 0.03164096 - time (sec): 109.75 - samples/sec: 446.21 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-12 02:06:42,843 epoch 5 - iter 534/1786 - loss 0.03344778 - time (sec): 163.45 - samples/sec: 454.29 - lr: 0.000101 - momentum: 0.000000 |
|
2023-10-12 02:07:35,189 epoch 5 - iter 712/1786 - loss 0.03152047 - time (sec): 215.80 - samples/sec: 452.98 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-12 02:08:30,997 epoch 5 - iter 890/1786 - loss 0.03361595 - time (sec): 271.61 - samples/sec: 448.12 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-12 02:09:23,289 epoch 5 - iter 1068/1786 - loss 0.03322299 - time (sec): 323.90 - samples/sec: 451.89 - lr: 0.000096 - momentum: 0.000000 |
|
2023-10-12 02:10:20,398 epoch 5 - iter 1246/1786 - loss 0.03402782 - time (sec): 381.01 - samples/sec: 455.72 - lr: 0.000094 - momentum: 0.000000 |
|
2023-10-12 02:11:15,612 epoch 5 - iter 1424/1786 - loss 0.03608606 - time (sec): 436.22 - samples/sec: 454.93 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-12 02:12:11,200 epoch 5 - iter 1602/1786 - loss 0.03657160 - time (sec): 491.81 - samples/sec: 453.93 - lr: 0.000091 - momentum: 0.000000 |
|
2023-10-12 02:13:08,194 epoch 5 - iter 1780/1786 - loss 0.03671433 - time (sec): 548.80 - samples/sec: 452.08 - lr: 0.000089 - momentum: 0.000000 |
|
2023-10-12 02:13:09,930 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 02:13:09,931 EPOCH 5 done: loss 0.0367 - lr: 0.000089 |
|
2023-10-12 02:13:31,881 DEV : loss 0.16735321283340454 - f1-score (micro avg) 0.7933 |
|
2023-10-12 02:13:31,912 saving best model |
|
2023-10-12 02:13:34,616 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 02:14:29,724 epoch 6 - iter 178/1786 - loss 0.02884951 - time (sec): 55.10 - samples/sec: 466.32 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-12 02:15:24,276 epoch 6 - iter 356/1786 - loss 0.02844827 - time (sec): 109.66 - samples/sec: 454.00 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-12 02:16:22,180 epoch 6 - iter 534/1786 - loss 0.02788463 - time (sec): 167.56 - samples/sec: 460.39 - lr: 0.000084 - momentum: 0.000000 |
|
2023-10-12 02:17:16,497 epoch 6 - iter 712/1786 - loss 0.02904160 - time (sec): 221.88 - samples/sec: 456.66 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-12 02:18:11,328 epoch 6 - iter 890/1786 - loss 0.02995947 - time (sec): 276.71 - samples/sec: 460.24 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-12 02:19:04,986 epoch 6 - iter 1068/1786 - loss 0.02837177 - time (sec): 330.37 - samples/sec: 459.17 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-12 02:19:57,466 epoch 6 - iter 1246/1786 - loss 0.02758746 - time (sec): 382.85 - samples/sec: 459.16 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-12 02:20:51,594 epoch 6 - iter 1424/1786 - loss 0.02731558 - time (sec): 436.97 - samples/sec: 459.85 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-12 02:21:44,296 epoch 6 - iter 1602/1786 - loss 0.02738174 - time (sec): 489.68 - samples/sec: 457.93 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-12 02:22:38,466 epoch 6 - iter 1780/1786 - loss 0.02826862 - time (sec): 543.85 - samples/sec: 455.42 - lr: 0.000071 - momentum: 0.000000 |
|
2023-10-12 02:22:40,335 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 02:22:40,335 EPOCH 6 done: loss 0.0282 - lr: 0.000071 |
|
2023-10-12 02:23:01,339 DEV : loss 0.1834934949874878 - f1-score (micro avg) 0.7908 |
|
2023-10-12 02:23:01,368 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 02:23:53,130 epoch 7 - iter 178/1786 - loss 0.02428625 - time (sec): 51.76 - samples/sec: 464.99 - lr: 0.000069 - momentum: 0.000000 |
|
2023-10-12 02:24:46,191 epoch 7 - iter 356/1786 - loss 0.01938786 - time (sec): 104.82 - samples/sec: 475.11 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-12 02:25:37,985 epoch 7 - iter 534/1786 - loss 0.01994075 - time (sec): 156.62 - samples/sec: 470.15 - lr: 0.000066 - momentum: 0.000000 |
|
2023-10-12 02:26:35,234 epoch 7 - iter 712/1786 - loss 0.01883127 - time (sec): 213.86 - samples/sec: 465.13 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-12 02:27:29,734 epoch 7 - iter 890/1786 - loss 0.02000040 - time (sec): 268.36 - samples/sec: 463.92 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-12 02:28:24,145 epoch 7 - iter 1068/1786 - loss 0.01988337 - time (sec): 322.78 - samples/sec: 462.58 - lr: 0.000061 - momentum: 0.000000 |
|
2023-10-12 02:29:21,554 epoch 7 - iter 1246/1786 - loss 0.01983623 - time (sec): 380.18 - samples/sec: 456.64 - lr: 0.000059 - momentum: 0.000000 |
|
2023-10-12 02:30:17,602 epoch 7 - iter 1424/1786 - loss 0.02013644 - time (sec): 436.23 - samples/sec: 453.81 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-12 02:31:11,178 epoch 7 - iter 1602/1786 - loss 0.02053497 - time (sec): 489.81 - samples/sec: 455.80 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-12 02:32:06,509 epoch 7 - iter 1780/1786 - loss 0.02060644 - time (sec): 545.14 - samples/sec: 454.69 - lr: 0.000053 - momentum: 0.000000 |
|
2023-10-12 02:32:08,217 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 02:32:08,217 EPOCH 7 done: loss 0.0206 - lr: 0.000053 |
|
2023-10-12 02:32:28,843 DEV : loss 0.2012760192155838 - f1-score (micro avg) 0.7981 |
|
2023-10-12 02:32:28,873 saving best model |
|
2023-10-12 02:32:31,463 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 02:33:25,115 epoch 8 - iter 178/1786 - loss 0.02276046 - time (sec): 53.65 - samples/sec: 466.86 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-12 02:34:19,214 epoch 8 - iter 356/1786 - loss 0.01914228 - time (sec): 107.75 - samples/sec: 466.87 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-12 02:35:14,651 epoch 8 - iter 534/1786 - loss 0.01763461 - time (sec): 163.18 - samples/sec: 460.55 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-12 02:36:05,888 epoch 8 - iter 712/1786 - loss 0.01587325 - time (sec): 214.42 - samples/sec: 458.52 - lr: 0.000046 - momentum: 0.000000 |
|
2023-10-12 02:36:56,876 epoch 8 - iter 890/1786 - loss 0.01556667 - time (sec): 265.41 - samples/sec: 459.13 - lr: 0.000044 - momentum: 0.000000 |
|
2023-10-12 02:37:50,858 epoch 8 - iter 1068/1786 - loss 0.01500536 - time (sec): 319.39 - samples/sec: 463.67 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-12 02:38:42,418 epoch 8 - iter 1246/1786 - loss 0.01483176 - time (sec): 370.95 - samples/sec: 461.23 - lr: 0.000041 - momentum: 0.000000 |
|
2023-10-12 02:39:40,033 epoch 8 - iter 1424/1786 - loss 0.01450104 - time (sec): 428.57 - samples/sec: 460.07 - lr: 0.000039 - momentum: 0.000000 |
|
2023-10-12 02:40:35,374 epoch 8 - iter 1602/1786 - loss 0.01489033 - time (sec): 483.91 - samples/sec: 461.79 - lr: 0.000037 - momentum: 0.000000 |
|
2023-10-12 02:41:31,522 epoch 8 - iter 1780/1786 - loss 0.01450964 - time (sec): 540.05 - samples/sec: 459.03 - lr: 0.000036 - momentum: 0.000000 |
|
2023-10-12 02:41:33,346 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 02:41:33,347 EPOCH 8 done: loss 0.0145 - lr: 0.000036 |
|
2023-10-12 02:41:58,535 DEV : loss 0.2015368640422821 - f1-score (micro avg) 0.8133 |
|
2023-10-12 02:41:58,571 saving best model |
|
2023-10-12 02:42:01,252 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 02:42:54,587 epoch 9 - iter 178/1786 - loss 0.00872584 - time (sec): 53.33 - samples/sec: 486.07 - lr: 0.000034 - momentum: 0.000000 |
|
2023-10-12 02:43:46,379 epoch 9 - iter 356/1786 - loss 0.00630023 - time (sec): 105.12 - samples/sec: 475.08 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-12 02:44:39,948 epoch 9 - iter 534/1786 - loss 0.00756441 - time (sec): 158.69 - samples/sec: 468.31 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-12 02:45:33,997 epoch 9 - iter 712/1786 - loss 0.00695522 - time (sec): 212.74 - samples/sec: 462.16 - lr: 0.000028 - momentum: 0.000000 |
|
2023-10-12 02:46:30,457 epoch 9 - iter 890/1786 - loss 0.00724403 - time (sec): 269.20 - samples/sec: 456.26 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-12 02:47:24,449 epoch 9 - iter 1068/1786 - loss 0.00812575 - time (sec): 323.19 - samples/sec: 461.32 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-12 02:48:19,624 epoch 9 - iter 1246/1786 - loss 0.00937037 - time (sec): 378.37 - samples/sec: 463.37 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-12 02:49:13,453 epoch 9 - iter 1424/1786 - loss 0.00930061 - time (sec): 432.20 - samples/sec: 463.61 - lr: 0.000021 - momentum: 0.000000 |
|
2023-10-12 02:50:05,206 epoch 9 - iter 1602/1786 - loss 0.00976462 - time (sec): 483.95 - samples/sec: 463.83 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-12 02:51:00,266 epoch 9 - iter 1780/1786 - loss 0.00953790 - time (sec): 539.01 - samples/sec: 460.23 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-12 02:51:02,088 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 02:51:02,088 EPOCH 9 done: loss 0.0095 - lr: 0.000018 |
|
2023-10-12 02:51:23,679 DEV : loss 0.2193737030029297 - f1-score (micro avg) 0.8133 |
|
2023-10-12 02:51:23,714 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 02:52:17,448 epoch 10 - iter 178/1786 - loss 0.00370170 - time (sec): 53.73 - samples/sec: 469.72 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-12 02:53:11,609 epoch 10 - iter 356/1786 - loss 0.00580275 - time (sec): 107.89 - samples/sec: 468.02 - lr: 0.000014 - momentum: 0.000000 |
|
2023-10-12 02:54:05,708 epoch 10 - iter 534/1786 - loss 0.00544806 - time (sec): 161.99 - samples/sec: 468.33 - lr: 0.000012 - momentum: 0.000000 |
|
2023-10-12 02:54:59,294 epoch 10 - iter 712/1786 - loss 0.00565942 - time (sec): 215.58 - samples/sec: 469.11 - lr: 0.000011 - momentum: 0.000000 |
|
2023-10-12 02:55:53,156 epoch 10 - iter 890/1786 - loss 0.00585775 - time (sec): 269.44 - samples/sec: 469.40 - lr: 0.000009 - momentum: 0.000000 |
|
2023-10-12 02:56:46,577 epoch 10 - iter 1068/1786 - loss 0.00554799 - time (sec): 322.86 - samples/sec: 466.02 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-12 02:57:40,202 epoch 10 - iter 1246/1786 - loss 0.00618939 - time (sec): 376.49 - samples/sec: 468.04 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-12 02:58:32,386 epoch 10 - iter 1424/1786 - loss 0.00593712 - time (sec): 428.67 - samples/sec: 467.12 - lr: 0.000004 - momentum: 0.000000 |
|
2023-10-12 02:59:24,805 epoch 10 - iter 1602/1786 - loss 0.00597006 - time (sec): 481.09 - samples/sec: 466.32 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-12 03:00:19,462 epoch 10 - iter 1780/1786 - loss 0.00611996 - time (sec): 535.75 - samples/sec: 463.25 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-12 03:00:20,969 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 03:00:20,970 EPOCH 10 done: loss 0.0061 - lr: 0.000000 |
|
2023-10-12 03:00:43,438 DEV : loss 0.2238311767578125 - f1-score (micro avg) 0.8046 |
|
2023-10-12 03:00:44,547 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 03:00:44,549 Loading model from best epoch ... |
|
2023-10-12 03:00:48,482 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd |
|
2023-10-12 03:02:01,023 |
|
Results: |
|
- F-score (micro) 0.7119 |
|
- F-score (macro) 0.6266 |
|
- Accuracy 0.5663 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
LOC 0.7359 0.7251 0.7305 1095 |
|
PER 0.7911 0.7747 0.7828 1012 |
|
ORG 0.4551 0.5826 0.5111 357 |
|
HumanProd 0.4000 0.6061 0.4819 33 |
|
|
|
micro avg 0.7008 0.7233 0.7119 2497 |
|
macro avg 0.5955 0.6721 0.6266 2497 |
|
weighted avg 0.7137 0.7233 0.7170 2497 |
|
|
|
2023-10-12 03:02:01,023 ---------------------------------------------------------------------------------------------------- |
|
|