2022-08-06 15:12:29,180 ---------------------------------------------------------------------------------------------------- 2022-08-06 15:12:29,182 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(42000, 768, padding_idx=0) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (3): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (4): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (5): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (6): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (7): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (8): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (9): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (10): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (11): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (word_dropout): WordDropout(p=0.05) (locked_dropout): LockedDropout(p=0.5) (rnn): LSTM(768, 512, batch_first=True, bidirectional=True) (linear): Linear(in_features=1024, out_features=30, bias=True) (beta): 1.0 (weights): None (weight_tensor) None )" 2022-08-06 15:12:29,182 ---------------------------------------------------------------------------------------------------- 2022-08-06 15:12:29,183 Corpus: "Corpus: 24000 train + 3000 dev + 3000 test sentences" 2022-08-06 15:12:29,183 ---------------------------------------------------------------------------------------------------- 2022-08-06 15:12:29,183 Parameters: 2022-08-06 15:12:29,183 - learning_rate: "0.1" 2022-08-06 15:12:29,183 - mini_batch_size: "8" 2022-08-06 15:12:29,183 - patience: "3" 2022-08-06 15:12:29,183 - anneal_factor: "0.5" 2022-08-06 15:12:29,183 - max_epochs: "10" 2022-08-06 15:12:29,183 - shuffle: "True" 2022-08-06 15:12:29,183 - train_with_dev: "True" 2022-08-06 15:12:29,183 - batch_growth_annealing: "False" 2022-08-06 15:12:29,183 ---------------------------------------------------------------------------------------------------- 2022-08-06 15:12:29,183 Model training base path: "data/pos-Uppsala/model" 2022-08-06 15:12:29,183 ---------------------------------------------------------------------------------------------------- 2022-08-06 15:12:29,183 Device: cuda:0 2022-08-06 15:12:29,183 ---------------------------------------------------------------------------------------------------- 2022-08-06 15:12:29,184 Embeddings storage mode: gpu 2022-08-06 15:12:29,185 ---------------------------------------------------------------------------------------------------- 2022-08-06 15:13:18,972 epoch 1 - iter 337/3375 - loss 0.74289984 - samples/sec: 54.18 - lr: 0.100000 2022-08-06 15:14:15,036 epoch 1 - iter 674/3375 - loss 0.53599298 - samples/sec: 48.11 - lr: 0.100000 2022-08-06 15:15:12,610 epoch 1 - iter 1011/3375 - loss 0.45754038 - samples/sec: 46.85 - lr: 0.100000 2022-08-06 15:16:09,043 epoch 1 - iter 1348/3375 - loss 0.40111208 - samples/sec: 47.79 - lr: 0.100000 2022-08-06 15:17:04,137 epoch 1 - iter 1685/3375 - loss 0.36712663 - samples/sec: 48.96 - lr: 0.100000 2022-08-06 15:17:58,402 epoch 1 - iter 2022/3375 - loss 0.34049225 - samples/sec: 49.70 - lr: 0.100000 2022-08-06 15:18:55,276 epoch 1 - iter 2359/3375 - loss 0.32076226 - samples/sec: 47.42 - lr: 0.100000 2022-08-06 15:19:49,979 epoch 1 - iter 2696/3375 - loss 0.31015506 - samples/sec: 49.31 - lr: 0.100000 2022-08-06 15:20:48,410 epoch 1 - iter 3033/3375 - loss 0.29391699 - samples/sec: 46.16 - lr: 0.100000 2022-08-06 15:21:47,572 epoch 1 - iter 3370/3375 - loss 0.27989028 - samples/sec: 45.59 - lr: 0.100000 2022-08-06 15:21:48,555 ---------------------------------------------------------------------------------------------------- 2022-08-06 15:21:48,555 EPOCH 1 done: loss 0.2795 - lr 0.1000000 2022-08-06 15:21:48,555 BAD EPOCHS (no improvement): 0 2022-08-06 15:21:48,555 ---------------------------------------------------------------------------------------------------- 2022-08-06 15:22:45,590 epoch 2 - iter 337/3375 - loss 0.18085661 - samples/sec: 47.29 - lr: 0.100000 2022-08-06 15:23:42,698 epoch 2 - iter 674/3375 - loss 0.17216272 - samples/sec: 47.23 - lr: 0.100000 2022-08-06 15:24:38,534 epoch 2 - iter 1011/3375 - loss 0.16694117 - samples/sec: 48.31 - lr: 0.100000 2022-08-06 15:25:36,464 epoch 2 - iter 1348/3375 - loss 0.16500505 - samples/sec: 46.56 - lr: 0.100000 2022-08-06 15:26:32,174 epoch 2 - iter 1685/3375 - loss 0.16167195 - samples/sec: 48.42 - lr: 0.100000 2022-08-06 15:27:28,418 epoch 2 - iter 2022/3375 - loss 0.15991464 - samples/sec: 47.96 - lr: 0.100000 2022-08-06 15:28:30,730 epoch 2 - iter 2359/3375 - loss 0.15942296 - samples/sec: 43.29 - lr: 0.100000 2022-08-06 15:29:27,444 epoch 2 - iter 2696/3375 - loss 0.15779417 - samples/sec: 47.56 - lr: 0.100000 2022-08-06 15:30:25,187 epoch 2 - iter 3033/3375 - loss 0.15553239 - samples/sec: 46.71 - lr: 0.100000 2022-08-06 15:31:21,714 epoch 2 - iter 3370/3375 - loss 0.15352182 - samples/sec: 47.72 - lr: 0.100000 2022-08-06 15:31:22,712 ---------------------------------------------------------------------------------------------------- 2022-08-06 15:31:22,712 EPOCH 2 done: loss 0.1537 - lr 0.1000000 2022-08-06 15:31:22,712 BAD EPOCHS (no improvement): 0 2022-08-06 15:31:22,712 ---------------------------------------------------------------------------------------------------- 2022-08-06 15:32:23,790 epoch 3 - iter 337/3375 - loss 0.11867195 - samples/sec: 44.16 - lr: 0.100000 2022-08-06 15:33:21,161 epoch 3 - iter 674/3375 - loss 0.11878234 - samples/sec: 47.02 - lr: 0.100000 2022-08-06 15:34:20,702 epoch 3 - iter 1011/3375 - loss 0.11942785 - samples/sec: 45.31 - lr: 0.100000 2022-08-06 15:35:18,259 epoch 3 - iter 1348/3375 - loss 0.11958903 - samples/sec: 46.86 - lr: 0.100000 2022-08-06 15:36:16,967 epoch 3 - iter 1685/3375 - loss 0.11914369 - samples/sec: 45.94 - lr: 0.100000 2022-08-06 15:37:13,560 epoch 3 - iter 2022/3375 - loss 0.11916365 - samples/sec: 47.66 - lr: 0.100000 2022-08-06 15:38:10,624 epoch 3 - iter 2359/3375 - loss 0.12096981 - samples/sec: 47.27 - lr: 0.100000 2022-08-06 15:39:10,034 epoch 3 - iter 2696/3375 - loss 0.11987245 - samples/sec: 45.40 - lr: 0.100000 2022-08-06 15:40:07,877 epoch 3 - iter 3033/3375 - loss 0.11973164 - samples/sec: 46.63 - lr: 0.100000 2022-08-06 15:41:05,610 epoch 3 - iter 3370/3375 - loss 0.12003917 - samples/sec: 46.72 - lr: 0.100000 2022-08-06 15:41:06,450 ---------------------------------------------------------------------------------------------------- 2022-08-06 15:41:06,450 EPOCH 3 done: loss 0.1200 - lr 0.1000000 2022-08-06 15:41:06,450 BAD EPOCHS (no improvement): 0 2022-08-06 15:41:06,451 ---------------------------------------------------------------------------------------------------- 2022-08-06 15:42:04,442 epoch 4 - iter 337/3375 - loss 0.09805702 - samples/sec: 46.51 - lr: 0.100000 2022-08-06 15:43:05,164 epoch 4 - iter 674/3375 - loss 0.09888569 - samples/sec: 44.42 - lr: 0.100000 2022-08-06 15:44:02,546 epoch 4 - iter 1011/3375 - loss 0.10053644 - samples/sec: 47.01 - lr: 0.100000 2022-08-06 15:45:01,384 epoch 4 - iter 1348/3375 - loss 0.10119574 - samples/sec: 45.84 - lr: 0.100000 2022-08-06 15:46:00,229 epoch 4 - iter 1685/3375 - loss 0.10374826 - samples/sec: 45.84 - lr: 0.100000 2022-08-06 15:46:59,791 epoch 4 - iter 2022/3375 - loss 0.10405522 - samples/sec: 45.28 - lr: 0.100000 2022-08-06 15:47:57,607 epoch 4 - iter 2359/3375 - loss 0.10411718 - samples/sec: 46.65 - lr: 0.100000 2022-08-06 15:48:55,410 epoch 4 - iter 2696/3375 - loss 0.10394934 - samples/sec: 46.66 - lr: 0.100000 2022-08-06 15:49:56,783 epoch 4 - iter 3033/3375 - loss 0.10374714 - samples/sec: 43.95 - lr: 0.100000 2022-08-06 15:50:54,113 epoch 4 - iter 3370/3375 - loss 0.10333066 - samples/sec: 47.05 - lr: 0.100000 2022-08-06 15:50:54,961 ---------------------------------------------------------------------------------------------------- 2022-08-06 15:50:54,961 EPOCH 4 done: loss 0.1033 - lr 0.1000000 2022-08-06 15:50:54,961 BAD EPOCHS (no improvement): 0 2022-08-06 15:50:54,961 ---------------------------------------------------------------------------------------------------- 2022-08-06 15:51:52,151 epoch 5 - iter 337/3375 - loss 0.08744228 - samples/sec: 47.17 - lr: 0.100000 2022-08-06 15:52:49,910 epoch 5 - iter 674/3375 - loss 0.08896766 - samples/sec: 46.70 - lr: 0.100000 2022-08-06 15:53:50,861 epoch 5 - iter 1011/3375 - loss 0.09000325 - samples/sec: 44.25 - lr: 0.100000 2022-08-06 15:54:48,357 epoch 5 - iter 1348/3375 - loss 0.09103779 - samples/sec: 46.91 - lr: 0.100000 2022-08-06 15:55:48,122 epoch 5 - iter 1685/3375 - loss 0.09107958 - samples/sec: 45.13 - lr: 0.100000 2022-08-06 15:56:49,324 epoch 5 - iter 2022/3375 - loss 0.09135469 - samples/sec: 44.07 - lr: 0.100000 2022-08-06 15:57:47,393 epoch 5 - iter 2359/3375 - loss 0.09172710 - samples/sec: 46.45 - lr: 0.100000 2022-08-06 15:58:45,694 epoch 5 - iter 2696/3375 - loss 0.09238154 - samples/sec: 46.27 - lr: 0.100000 2022-08-06 15:59:42,885 epoch 5 - iter 3033/3375 - loss 0.09253470 - samples/sec: 47.16 - lr: 0.100000 2022-08-06 16:00:44,492 epoch 5 - iter 3370/3375 - loss 0.09240350 - samples/sec: 43.78 - lr: 0.100000 2022-08-06 16:00:45,327 ---------------------------------------------------------------------------------------------------- 2022-08-06 16:00:45,328 EPOCH 5 done: loss 0.0924 - lr 0.1000000 2022-08-06 16:00:45,328 BAD EPOCHS (no improvement): 0 2022-08-06 16:00:45,328 ---------------------------------------------------------------------------------------------------- 2022-08-06 16:01:42,167 epoch 6 - iter 337/3375 - loss 0.08075428 - samples/sec: 47.46 - lr: 0.100000 2022-08-06 16:02:39,509 epoch 6 - iter 674/3375 - loss 0.08099115 - samples/sec: 47.04 - lr: 0.100000 2022-08-06 16:03:37,688 epoch 6 - iter 1011/3375 - loss 0.08140463 - samples/sec: 46.36 - lr: 0.100000 2022-08-06 16:04:38,640 epoch 6 - iter 1348/3375 - loss 0.08175190 - samples/sec: 44.25 - lr: 0.100000 2022-08-06 16:05:35,459 epoch 6 - iter 1685/3375 - loss 0.08233525 - samples/sec: 47.47 - lr: 0.100000 2022-08-06 16:06:33,941 epoch 6 - iter 2022/3375 - loss 0.08333964 - samples/sec: 46.12 - lr: 0.100000 2022-08-06 16:07:34,247 epoch 6 - iter 2359/3375 - loss 0.08370656 - samples/sec: 44.73 - lr: 0.100000 2022-08-06 16:08:32,546 epoch 6 - iter 2696/3375 - loss 0.08503503 - samples/sec: 46.27 - lr: 0.100000 2022-08-06 16:09:30,447 epoch 6 - iter 3033/3375 - loss 0.08526801 - samples/sec: 46.58 - lr: 0.100000 2022-08-06 16:10:29,216 epoch 6 - iter 3370/3375 - loss 0.08506276 - samples/sec: 45.90 - lr: 0.100000 2022-08-06 16:10:29,946 ---------------------------------------------------------------------------------------------------- 2022-08-06 16:10:29,947 EPOCH 6 done: loss 0.0851 - lr 0.1000000 2022-08-06 16:10:29,947 BAD EPOCHS (no improvement): 0 2022-08-06 16:10:29,947 ---------------------------------------------------------------------------------------------------- 2022-08-06 16:11:31,042 epoch 7 - iter 337/3375 - loss 0.07328964 - samples/sec: 44.15 - lr: 0.100000 2022-08-06 16:12:31,218 epoch 7 - iter 674/3375 - loss 0.07556648 - samples/sec: 44.82 - lr: 0.100000 2022-08-06 16:13:28,468 epoch 7 - iter 1011/3375 - loss 0.07578294 - samples/sec: 47.11 - lr: 0.100000 2022-08-06 16:14:28,318 epoch 7 - iter 1348/3375 - loss 0.07581855 - samples/sec: 45.07 - lr: 0.100000 2022-08-06 16:15:27,119 epoch 7 - iter 1685/3375 - loss 0.07674717 - samples/sec: 45.87 - lr: 0.100000 2022-08-06 16:16:25,205 epoch 7 - iter 2022/3375 - loss 0.07800463 - samples/sec: 46.44 - lr: 0.100000 2022-08-06 16:17:25,635 epoch 7 - iter 2359/3375 - loss 0.07788540 - samples/sec: 44.64 - lr: 0.100000 2022-08-06 16:18:25,934 epoch 7 - iter 2696/3375 - loss 0.07823310 - samples/sec: 44.73 - lr: 0.100000 2022-08-06 16:19:25,742 epoch 7 - iter 3033/3375 - loss 0.07862489 - samples/sec: 45.10 - lr: 0.100000 2022-08-06 16:20:24,514 epoch 7 - iter 3370/3375 - loss 0.07864779 - samples/sec: 45.89 - lr: 0.100000 2022-08-06 16:20:25,316 ---------------------------------------------------------------------------------------------------- 2022-08-06 16:20:25,317 EPOCH 7 done: loss 0.0786 - lr 0.1000000 2022-08-06 16:20:25,317 BAD EPOCHS (no improvement): 0 2022-08-06 16:20:25,317 ---------------------------------------------------------------------------------------------------- 2022-08-06 16:21:23,040 epoch 8 - iter 337/3375 - loss 0.06876001 - samples/sec: 46.73 - lr: 0.100000 2022-08-06 16:22:25,028 epoch 8 - iter 674/3375 - loss 0.06867038 - samples/sec: 43.51 - lr: 0.100000 2022-08-06 16:23:25,046 epoch 8 - iter 1011/3375 - loss 0.07011779 - samples/sec: 44.94 - lr: 0.100000 2022-08-06 16:24:23,287 epoch 8 - iter 1348/3375 - loss 0.07118411 - samples/sec: 46.31 - lr: 0.100000 2022-08-06 16:25:24,939 epoch 8 - iter 1685/3375 - loss 0.07159055 - samples/sec: 43.75 - lr: 0.100000 2022-08-06 16:26:23,316 epoch 8 - iter 2022/3375 - loss 0.07167687 - samples/sec: 46.21 - lr: 0.100000 2022-08-06 16:27:22,234 epoch 8 - iter 2359/3375 - loss 0.07190781 - samples/sec: 45.78 - lr: 0.100000 2022-08-06 16:28:20,921 epoch 8 - iter 2696/3375 - loss 0.07263123 - samples/sec: 45.96 - lr: 0.100000 2022-08-06 16:29:21,637 epoch 8 - iter 3033/3375 - loss 0.07345723 - samples/sec: 44.42 - lr: 0.100000 2022-08-06 16:30:20,403 epoch 8 - iter 3370/3375 - loss 0.07338627 - samples/sec: 45.90 - lr: 0.100000 2022-08-06 16:30:21,375 ---------------------------------------------------------------------------------------------------- 2022-08-06 16:30:21,375 EPOCH 8 done: loss 0.0734 - lr 0.1000000 2022-08-06 16:30:21,375 BAD EPOCHS (no improvement): 0 2022-08-06 16:30:21,376 ---------------------------------------------------------------------------------------------------- 2022-08-06 16:31:18,803 epoch 9 - iter 337/3375 - loss 0.06314787 - samples/sec: 46.97 - lr: 0.100000 2022-08-06 16:32:16,661 epoch 9 - iter 674/3375 - loss 0.06638022 - samples/sec: 46.62 - lr: 0.100000 2022-08-06 16:33:15,745 epoch 9 - iter 1011/3375 - loss 0.06547021 - samples/sec: 45.65 - lr: 0.100000 2022-08-06 16:34:14,632 epoch 9 - iter 1348/3375 - loss 0.06593581 - samples/sec: 45.81 - lr: 0.100000 2022-08-06 16:35:13,668 epoch 9 - iter 1685/3375 - loss 0.06772817 - samples/sec: 45.69 - lr: 0.100000 2022-08-06 16:36:15,567 epoch 9 - iter 2022/3375 - loss 0.06808051 - samples/sec: 43.58 - lr: 0.100000 2022-08-06 16:37:16,651 epoch 9 - iter 2359/3375 - loss 0.06796916 - samples/sec: 44.16 - lr: 0.100000 2022-08-06 16:38:14,513 epoch 9 - iter 2696/3375 - loss 0.06906572 - samples/sec: 46.62 - lr: 0.100000 2022-08-06 16:39:13,107 epoch 9 - iter 3033/3375 - loss 0.06917054 - samples/sec: 46.03 - lr: 0.100000 2022-08-06 16:40:12,475 epoch 9 - iter 3370/3375 - loss 0.06913866 - samples/sec: 45.43 - lr: 0.100000 2022-08-06 16:40:13,344 ---------------------------------------------------------------------------------------------------- 2022-08-06 16:40:13,344 EPOCH 9 done: loss 0.0691 - lr 0.1000000 2022-08-06 16:40:13,344 BAD EPOCHS (no improvement): 0 2022-08-06 16:40:13,345 ---------------------------------------------------------------------------------------------------- 2022-08-06 16:41:11,629 epoch 10 - iter 337/3375 - loss 0.05727560 - samples/sec: 46.28 - lr: 0.100000 2022-08-06 16:42:09,047 epoch 10 - iter 674/3375 - loss 0.06063155 - samples/sec: 46.98 - lr: 0.100000 2022-08-06 16:43:09,515 epoch 10 - iter 1011/3375 - loss 0.06369582 - samples/sec: 44.61 - lr: 0.100000 2022-08-06 16:44:07,978 epoch 10 - iter 1348/3375 - loss 0.06421773 - samples/sec: 46.14 - lr: 0.100000 2022-08-06 16:45:07,015 epoch 10 - iter 1685/3375 - loss 0.06397856 - samples/sec: 45.69 - lr: 0.100000 2022-08-06 16:46:05,736 epoch 10 - iter 2022/3375 - loss 0.06424947 - samples/sec: 45.93 - lr: 0.100000 2022-08-06 16:47:06,945 epoch 10 - iter 2359/3375 - loss 0.06511606 - samples/sec: 44.07 - lr: 0.100000 2022-08-06 16:48:05,819 epoch 10 - iter 2696/3375 - loss 0.06574495 - samples/sec: 45.82 - lr: 0.100000 2022-08-06 16:49:03,924 epoch 10 - iter 3033/3375 - loss 0.06552271 - samples/sec: 46.42 - lr: 0.100000 2022-08-06 16:50:00,641 epoch 10 - iter 3370/3375 - loss 0.06594147 - samples/sec: 47.56 - lr: 0.100000 2022-08-06 16:50:01,493 ---------------------------------------------------------------------------------------------------- 2022-08-06 16:50:01,493 EPOCH 10 done: loss 0.0659 - lr 0.1000000 2022-08-06 16:50:01,493 BAD EPOCHS (no improvement): 0 2022-08-06 16:50:02,708 ---------------------------------------------------------------------------------------------------- 2022-08-06 16:50:02,709 Testing using last state of model ... 2022-08-06 16:53:40,214 0.9632 0.9632 0.9632 0.9632 2022-08-06 16:53:40,215 Results: - F-score (micro) 0.9632 - F-score (macro) 0.9031 - Accuracy 0.9632 By class: precision recall f1-score support N_SING 0.9691 0.9565 0.9627 30553 P 0.9560 0.9937 0.9745 9951 DELM 0.9936 0.9906 0.9921 8122 ADJ 0.9205 0.9152 0.9179 7466 CON 0.9892 0.9799 0.9845 6823 N_PL 0.9476 0.9642 0.9558 5163 V_PA 0.9729 0.9746 0.9737 2873 V_PRS 0.9825 0.9898 0.9861 2841 PRO 0.9656 0.9455 0.9555 2258 NUM 0.9937 0.9933 0.9935 2232 DET 0.9423 0.9698 0.9559 1853 CLITIC 0.9992 1.0000 0.9996 1259 V_PP 0.9699 0.9741 0.9720 1158 V_SUB 0.9620 0.9573 0.9596 1031 ADV 0.7784 0.8182 0.7978 880 ADV_TIME 0.9126 0.9611 0.9363 489 V_AUX 0.9869 0.9974 0.9921 379 ADJ_SUP 0.9851 0.9815 0.9833 270 ADJ_CMPR 0.9246 0.9534 0.9388 193 ADJ_INO 0.7294 0.7381 0.7337 168 ADV_NEG 0.9034 0.8792 0.8912 149 ADV_I 0.8926 0.7714 0.8276 140 FW 0.6893 0.5772 0.6283 123 ADV_COMP 0.8267 0.8158 0.8212 76 ADV_LOC 0.9722 0.9589 0.9655 73 V_IMP 0.7292 0.6250 0.6731 56 PREV 0.9286 0.8125 0.8667 32 INT 0.9231 0.5000 0.6486 24 micro avg 0.9632 0.9632 0.9632 86635 macro avg 0.9195 0.8926 0.9031 86635 weighted avg 0.9633 0.9632 0.9631 86635 samples avg 0.9632 0.9632 0.9632 86635 2022-08-06 16:53:40,215 ----------------------------------------------------------------------------------------------------