Token Classification
Flair
PyTorch
Spanish
sequence-tagger-model
Jens Grivolla commited on
Commit
fb1f685
·
1 Parent(s): 6025fb2

add initial model files

Browse files
Files changed (7) hide show
  1. best-model.pt +3 -0
  2. dev.tsv +0 -0
  3. final-model.pt +3 -0
  4. loss.tsv +21 -0
  5. test.tsv +0 -0
  6. training.log +394 -0
  7. weights.txt +0 -0
best-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b7cea919da61f8f323e4ca03ebcb43c6e5ba5e06bdbccce37fc9e7e0e9a3e128
3
+ size 2256908487
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f3e8a2744ee6be82974bdacf8310fd6a531114bc6fe626880576cdf85e6dadcd
3
+ size 2256908884
loss.tsv ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 19:16:41 0.0000 0.19026665994182357 0.08347803354263306 0.2119 0.2807 0.2415 0.1517
3
+ 2 19:20:19 0.0000 0.07510200789904357 0.06594807654619217 0.6336 0.7281 0.6776 0.5355
4
+ 3 19:23:56 0.0000 0.051059842073423525 0.06074140965938568 0.6457 0.7193 0.6805 0.543
5
+ 4 19:27:35 0.0000 0.04975723893997647 0.06857836991548538 0.7523 0.7193 0.7354 0.5942
6
+ 5 19:31:13 0.0000 0.03971213988859546 0.08110673725605011 0.7018 0.7018 0.7018 0.5674
7
+ 6 19:34:49 0.0000 0.03546830191468582 0.07366479188203812 0.7107 0.7544 0.7319 0.5972
8
+ 7 19:38:25 0.0000 0.03473060495712673 0.07581108063459396 0.661 0.6842 0.6724 0.5417
9
+ 8 19:42:01 0.0000 0.031095477296456932 0.07848106324672699 0.7107 0.7544 0.7319 0.6014
10
+ 9 19:45:38 0.0000 0.025149721216192127 0.08854210376739502 0.7143 0.7456 0.7296 0.5986
11
+ 10 19:49:14 0.0000 0.023745396892456357 0.06483861804008484 0.6535 0.7281 0.6888 0.5461
12
+ 11 19:52:50 0.0000 0.022273265507065026 0.06557412445545197 0.6942 0.7368 0.7149 0.5833
13
+ 12 19:56:25 0.0000 0.021611838273819493 0.06691710650920868 0.7016 0.7632 0.7311 0.5959
14
+ 13 20:00:01 0.0000 0.01710983789516771 0.06450295448303223 0.7373 0.7632 0.75 0.6214
15
+ 14 20:03:39 0.0000 0.015293634907960702 0.0911756381392479 0.7049 0.7544 0.7288 0.5972
16
+ 15 20:07:15 0.0000 0.012690879626054398 0.0763971135020256 0.7391 0.7456 0.7424 0.6159
17
+ 16 20:10:51 0.0000 0.013051816123982237 0.07494457066059113 0.7436 0.7632 0.7532 0.6304
18
+ 17 20:14:28 0.0000 0.01346253304226053 0.07131695002317429 0.7083 0.7456 0.7265 0.5944
19
+ 18 20:18:04 0.0000 0.011784355731974503 0.06895702332258224 0.6855 0.7456 0.7143 0.5822
20
+ 19 20:21:39 0.0000 0.011475109279551204 0.06978413462638855 0.7143 0.7456 0.7296 0.5986
21
+ 20 20:25:15 0.0000 0.01079436888010318 0.06992758810520172 0.7143 0.7456 0.7296 0.5986
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,394 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-04-29 19:13:06,967 ----------------------------------------------------------------------------------------------------
2
+ 2024-04-29 19:13:06,968 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): XLMRobertaModel(
5
+ (embeddings): XLMRobertaEmbeddings(
6
+ (word_embeddings): Embedding(250003, 1024)
7
+ (position_embeddings): Embedding(514, 1024, padding_idx=1)
8
+ (token_type_embeddings): Embedding(1, 1024)
9
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): XLMRobertaEncoder(
13
+ (layer): ModuleList(
14
+ (0-23): 24 x XLMRobertaLayer(
15
+ (attention): XLMRobertaAttention(
16
+ (self): XLMRobertaSelfAttention(
17
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
18
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
19
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): XLMRobertaSelfOutput(
23
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
24
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): XLMRobertaIntermediate(
29
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
30
+ (intermediate_act_fn): GELUActivation()
31
+ )
32
+ (output): XLMRobertaOutput(
33
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
34
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
35
+ (dropout): Dropout(p=0.1, inplace=False)
36
+ )
37
+ )
38
+ )
39
+ )
40
+ (pooler): XLMRobertaPooler(
41
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
42
+ (activation): Tanh()
43
+ )
44
+ )
45
+ )
46
+ (locked_dropout): LockedDropout(p=0.5)
47
+ (linear): Linear(in_features=1024, out_features=25, bias=True)
48
+ (loss_function): CrossEntropyLoss()
49
+ )"
50
+ 2024-04-29 19:13:06,968 ----------------------------------------------------------------------------------------------------
51
+ 2024-04-29 19:13:06,968 Corpus: "Corpus: 5301 train + 589 dev + 654 test sentences"
52
+ 2024-04-29 19:13:06,968 ----------------------------------------------------------------------------------------------------
53
+ 2024-04-29 19:13:06,968 Parameters:
54
+ 2024-04-29 19:13:06,968 - learning_rate: "0.000005"
55
+ 2024-04-29 19:13:06,968 - mini_batch_size: "4"
56
+ 2024-04-29 19:13:06,968 - patience: "3"
57
+ 2024-04-29 19:13:06,968 - anneal_factor: "0.5"
58
+ 2024-04-29 19:13:06,968 - max_epochs: "20"
59
+ 2024-04-29 19:13:06,968 - shuffle: "True"
60
+ 2024-04-29 19:13:06,968 - train_with_dev: "False"
61
+ 2024-04-29 19:13:06,968 - batch_growth_annealing: "False"
62
+ 2024-04-29 19:13:06,968 ----------------------------------------------------------------------------------------------------
63
+ 2024-04-29 19:13:06,968 Model training base path: "resources/taggers/ner-spanish-large-np-finetune"
64
+ 2024-04-29 19:13:06,968 ----------------------------------------------------------------------------------------------------
65
+ 2024-04-29 19:13:06,968 Device: cuda:0
66
+ 2024-04-29 19:13:06,969 ----------------------------------------------------------------------------------------------------
67
+ 2024-04-29 19:13:06,969 Embeddings storage mode: none
68
+ 2024-04-29 19:13:06,969 ----------------------------------------------------------------------------------------------------
69
+ 2024-04-29 19:13:30,396 epoch 1 - iter 132/1326 - loss 0.62261396 - time (sec): 23.43 - samples/sec: 1442.78 - lr: 0.000005
70
+ 2024-04-29 19:13:52,923 epoch 1 - iter 264/1326 - loss 0.38574243 - time (sec): 45.95 - samples/sec: 1428.48 - lr: 0.000005
71
+ 2024-04-29 19:14:13,281 epoch 1 - iter 396/1326 - loss 0.33621740 - time (sec): 66.31 - samples/sec: 1214.90 - lr: 0.000005
72
+ 2024-04-29 19:14:33,173 epoch 1 - iter 528/1326 - loss 0.27934057 - time (sec): 86.20 - samples/sec: 1133.17 - lr: 0.000005
73
+ 2024-04-29 19:14:54,027 epoch 1 - iter 660/1326 - loss 0.28667008 - time (sec): 107.06 - samples/sec: 1088.07 - lr: 0.000005
74
+ 2024-04-29 19:15:13,282 epoch 1 - iter 792/1326 - loss 0.26728950 - time (sec): 126.31 - samples/sec: 1005.56 - lr: 0.000005
75
+ 2024-04-29 19:15:32,488 epoch 1 - iter 924/1326 - loss 0.24258707 - time (sec): 145.52 - samples/sec: 961.73 - lr: 0.000005
76
+ 2024-04-29 19:15:52,452 epoch 1 - iter 1056/1326 - loss 0.22206141 - time (sec): 165.48 - samples/sec: 923.87 - lr: 0.000005
77
+ 2024-04-29 19:16:12,585 epoch 1 - iter 1188/1326 - loss 0.21078620 - time (sec): 185.62 - samples/sec: 898.59 - lr: 0.000005
78
+ 2024-04-29 19:16:35,102 epoch 1 - iter 1320/1326 - loss 0.18962778 - time (sec): 208.13 - samples/sec: 944.88 - lr: 0.000005
79
+ 2024-04-29 19:16:36,057 ----------------------------------------------------------------------------------------------------
80
+ 2024-04-29 19:16:36,057 EPOCH 1 done: loss 0.1903 - lr 0.000005
81
+ 2024-04-29 19:16:41,931 Evaluating as a multi-label problem: False
82
+ 2024-04-29 19:16:41,938 DEV : loss 0.08347803354263306 - f1-score (micro avg) 0.2415
83
+ 2024-04-29 19:16:41,946 saving best model
84
+ 2024-04-29 19:16:43,696 ----------------------------------------------------------------------------------------------------
85
+ 2024-04-29 19:17:04,299 epoch 2 - iter 132/1326 - loss 0.05433737 - time (sec): 20.60 - samples/sec: 939.30 - lr: 0.000005
86
+ 2024-04-29 19:17:25,134 epoch 2 - iter 264/1326 - loss 0.07221647 - time (sec): 41.44 - samples/sec: 959.97 - lr: 0.000005
87
+ 2024-04-29 19:17:45,581 epoch 2 - iter 396/1326 - loss 0.06800126 - time (sec): 61.88 - samples/sec: 926.79 - lr: 0.000005
88
+ 2024-04-29 19:18:05,931 epoch 2 - iter 528/1326 - loss 0.06976185 - time (sec): 82.24 - samples/sec: 908.61 - lr: 0.000005
89
+ 2024-04-29 19:18:26,703 epoch 2 - iter 660/1326 - loss 0.07144914 - time (sec): 103.01 - samples/sec: 916.36 - lr: 0.000005
90
+ 2024-04-29 19:18:47,551 epoch 2 - iter 792/1326 - loss 0.07129850 - time (sec): 123.86 - samples/sec: 921.43 - lr: 0.000005
91
+ 2024-04-29 19:19:08,831 epoch 2 - iter 924/1326 - loss 0.07364315 - time (sec): 145.13 - samples/sec: 937.72 - lr: 0.000005
92
+ 2024-04-29 19:19:30,404 epoch 2 - iter 1056/1326 - loss 0.07429358 - time (sec): 166.71 - samples/sec: 947.01 - lr: 0.000005
93
+ 2024-04-29 19:19:51,382 epoch 2 - iter 1188/1326 - loss 0.07472375 - time (sec): 187.69 - samples/sec: 951.54 - lr: 0.000005
94
+ 2024-04-29 19:20:11,837 epoch 2 - iter 1320/1326 - loss 0.07525889 - time (sec): 208.14 - samples/sec: 945.83 - lr: 0.000005
95
+ 2024-04-29 19:20:12,667 ----------------------------------------------------------------------------------------------------
96
+ 2024-04-29 19:20:12,667 EPOCH 2 done: loss 0.0751 - lr 0.000005
97
+ 2024-04-29 19:20:19,289 Evaluating as a multi-label problem: False
98
+ 2024-04-29 19:20:19,296 DEV : loss 0.06594807654619217 - f1-score (micro avg) 0.6776
99
+ 2024-04-29 19:20:19,305 saving best model
100
+ 2024-04-29 19:20:21,153 ----------------------------------------------------------------------------------------------------
101
+ 2024-04-29 19:20:42,076 epoch 3 - iter 132/1326 - loss 0.03075654 - time (sec): 20.92 - samples/sec: 1045.61 - lr: 0.000005
102
+ 2024-04-29 19:21:03,363 epoch 3 - iter 264/1326 - loss 0.06844082 - time (sec): 42.21 - samples/sec: 1076.19 - lr: 0.000005
103
+ 2024-04-29 19:21:24,222 epoch 3 - iter 396/1326 - loss 0.06805718 - time (sec): 63.07 - samples/sec: 1049.91 - lr: 0.000005
104
+ 2024-04-29 19:21:44,517 epoch 3 - iter 528/1326 - loss 0.06164637 - time (sec): 83.36 - samples/sec: 980.80 - lr: 0.000005
105
+ 2024-04-29 19:22:05,198 epoch 3 - iter 660/1326 - loss 0.05812095 - time (sec): 104.04 - samples/sec: 971.12 - lr: 0.000005
106
+ 2024-04-29 19:22:25,741 epoch 3 - iter 792/1326 - loss 0.05654579 - time (sec): 124.59 - samples/sec: 951.60 - lr: 0.000005
107
+ 2024-04-29 19:22:46,750 epoch 3 - iter 924/1326 - loss 0.05279483 - time (sec): 145.60 - samples/sec: 952.78 - lr: 0.000005
108
+ 2024-04-29 19:23:07,347 epoch 3 - iter 1056/1326 - loss 0.05517769 - time (sec): 166.19 - samples/sec: 948.83 - lr: 0.000005
109
+ 2024-04-29 19:23:28,146 epoch 3 - iter 1188/1326 - loss 0.05270269 - time (sec): 186.99 - samples/sec: 942.41 - lr: 0.000005
110
+ 2024-04-29 19:23:49,190 epoch 3 - iter 1320/1326 - loss 0.05130536 - time (sec): 208.04 - samples/sec: 943.77 - lr: 0.000005
111
+ 2024-04-29 19:23:50,082 ----------------------------------------------------------------------------------------------------
112
+ 2024-04-29 19:23:50,082 EPOCH 3 done: loss 0.0511 - lr 0.000005
113
+ 2024-04-29 19:23:56,695 Evaluating as a multi-label problem: False
114
+ 2024-04-29 19:23:56,702 DEV : loss 0.06074140965938568 - f1-score (micro avg) 0.6805
115
+ 2024-04-29 19:23:56,711 saving best model
116
+ 2024-04-29 19:23:58,467 ----------------------------------------------------------------------------------------------------
117
+ 2024-04-29 19:24:19,456 epoch 4 - iter 132/1326 - loss 0.05354733 - time (sec): 20.99 - samples/sec: 965.04 - lr: 0.000005
118
+ 2024-04-29 19:24:40,634 epoch 4 - iter 264/1326 - loss 0.05477016 - time (sec): 42.17 - samples/sec: 987.83 - lr: 0.000005
119
+ 2024-04-29 19:25:02,087 epoch 4 - iter 396/1326 - loss 0.04696782 - time (sec): 63.62 - samples/sec: 1021.52 - lr: 0.000005
120
+ 2024-04-29 19:25:22,809 epoch 4 - iter 528/1326 - loss 0.04301686 - time (sec): 84.34 - samples/sec: 989.27 - lr: 0.000005
121
+ 2024-04-29 19:25:43,866 epoch 4 - iter 660/1326 - loss 0.04884093 - time (sec): 105.40 - samples/sec: 969.26 - lr: 0.000005
122
+ 2024-04-29 19:26:04,584 epoch 4 - iter 792/1326 - loss 0.04698956 - time (sec): 126.12 - samples/sec: 952.53 - lr: 0.000005
123
+ 2024-04-29 19:26:25,832 epoch 4 - iter 924/1326 - loss 0.05039226 - time (sec): 147.36 - samples/sec: 964.02 - lr: 0.000005
124
+ 2024-04-29 19:26:46,770 epoch 4 - iter 1056/1326 - loss 0.05071681 - time (sec): 168.30 - samples/sec: 958.30 - lr: 0.000005
125
+ 2024-04-29 19:27:07,471 epoch 4 - iter 1188/1326 - loss 0.05135564 - time (sec): 189.00 - samples/sec: 954.33 - lr: 0.000005
126
+ 2024-04-29 19:27:27,862 epoch 4 - iter 1320/1326 - loss 0.04986070 - time (sec): 209.40 - samples/sec: 940.94 - lr: 0.000005
127
+ 2024-04-29 19:27:28,674 ----------------------------------------------------------------------------------------------------
128
+ 2024-04-29 19:27:28,674 EPOCH 4 done: loss 0.0498 - lr 0.000005
129
+ 2024-04-29 19:27:35,422 Evaluating as a multi-label problem: False
130
+ 2024-04-29 19:27:35,430 DEV : loss 0.06857836991548538 - f1-score (micro avg) 0.7354
131
+ 2024-04-29 19:27:35,440 saving best model
132
+ 2024-04-29 19:27:37,201 ----------------------------------------------------------------------------------------------------
133
+ 2024-04-29 19:27:57,750 epoch 5 - iter 132/1326 - loss 0.05922121 - time (sec): 20.55 - samples/sec: 900.56 - lr: 0.000004
134
+ 2024-04-29 19:28:18,733 epoch 5 - iter 264/1326 - loss 0.04787888 - time (sec): 41.53 - samples/sec: 920.18 - lr: 0.000004
135
+ 2024-04-29 19:28:39,884 epoch 5 - iter 396/1326 - loss 0.04606074 - time (sec): 62.68 - samples/sec: 933.19 - lr: 0.000004
136
+ 2024-04-29 19:29:00,559 epoch 5 - iter 528/1326 - loss 0.04040456 - time (sec): 83.36 - samples/sec: 939.85 - lr: 0.000004
137
+ 2024-04-29 19:29:21,358 epoch 5 - iter 660/1326 - loss 0.03768252 - time (sec): 104.16 - samples/sec: 939.20 - lr: 0.000004
138
+ 2024-04-29 19:29:42,554 epoch 5 - iter 792/1326 - loss 0.03721055 - time (sec): 125.35 - samples/sec: 954.40 - lr: 0.000004
139
+ 2024-04-29 19:30:03,515 epoch 5 - iter 924/1326 - loss 0.04173413 - time (sec): 146.31 - samples/sec: 952.84 - lr: 0.000004
140
+ 2024-04-29 19:30:24,180 epoch 5 - iter 1056/1326 - loss 0.04019307 - time (sec): 166.98 - samples/sec: 945.63 - lr: 0.000004
141
+ 2024-04-29 19:30:44,955 epoch 5 - iter 1188/1326 - loss 0.04065091 - time (sec): 187.75 - samples/sec: 945.36 - lr: 0.000004
142
+ 2024-04-29 19:31:05,582 epoch 5 - iter 1320/1326 - loss 0.03986708 - time (sec): 208.38 - samples/sec: 944.48 - lr: 0.000004
143
+ 2024-04-29 19:31:06,418 ----------------------------------------------------------------------------------------------------
144
+ 2024-04-29 19:31:06,418 EPOCH 5 done: loss 0.0397 - lr 0.000004
145
+ 2024-04-29 19:31:13,174 Evaluating as a multi-label problem: False
146
+ 2024-04-29 19:31:13,181 DEV : loss 0.08110673725605011 - f1-score (micro avg) 0.7018
147
+ 2024-04-29 19:31:13,191 ----------------------------------------------------------------------------------------------------
148
+ 2024-04-29 19:31:34,400 epoch 6 - iter 132/1326 - loss 0.03815522 - time (sec): 21.21 - samples/sec: 956.70 - lr: 0.000004
149
+ 2024-04-29 19:31:55,259 epoch 6 - iter 264/1326 - loss 0.02812200 - time (sec): 42.07 - samples/sec: 942.47 - lr: 0.000004
150
+ 2024-04-29 19:32:16,061 epoch 6 - iter 396/1326 - loss 0.02891512 - time (sec): 62.87 - samples/sec: 943.72 - lr: 0.000004
151
+ 2024-04-29 19:32:36,977 epoch 6 - iter 528/1326 - loss 0.03180526 - time (sec): 83.79 - samples/sec: 958.40 - lr: 0.000004
152
+ 2024-04-29 19:32:57,437 epoch 6 - iter 660/1326 - loss 0.03231067 - time (sec): 104.25 - samples/sec: 933.33 - lr: 0.000004
153
+ 2024-04-29 19:33:18,594 epoch 6 - iter 792/1326 - loss 0.03713967 - time (sec): 125.40 - samples/sec: 948.30 - lr: 0.000004
154
+ 2024-04-29 19:33:39,684 epoch 6 - iter 924/1326 - loss 0.04015671 - time (sec): 146.49 - samples/sec: 954.01 - lr: 0.000004
155
+ 2024-04-29 19:34:00,351 epoch 6 - iter 1056/1326 - loss 0.03794536 - time (sec): 167.16 - samples/sec: 955.37 - lr: 0.000004
156
+ 2024-04-29 19:34:21,179 epoch 6 - iter 1188/1326 - loss 0.03738144 - time (sec): 187.99 - samples/sec: 950.05 - lr: 0.000004
157
+ 2024-04-29 19:34:41,735 epoch 6 - iter 1320/1326 - loss 0.03564541 - time (sec): 208.54 - samples/sec: 942.71 - lr: 0.000004
158
+ 2024-04-29 19:34:42,592 ----------------------------------------------------------------------------------------------------
159
+ 2024-04-29 19:34:42,592 EPOCH 6 done: loss 0.0355 - lr 0.000004
160
+ 2024-04-29 19:34:49,634 Evaluating as a multi-label problem: False
161
+ 2024-04-29 19:34:49,640 DEV : loss 0.07366479188203812 - f1-score (micro avg) 0.7319
162
+ 2024-04-29 19:34:49,648 ----------------------------------------------------------------------------------------------------
163
+ 2024-04-29 19:35:10,664 epoch 7 - iter 132/1326 - loss 0.02986449 - time (sec): 21.02 - samples/sec: 1005.59 - lr: 0.000004
164
+ 2024-04-29 19:35:31,301 epoch 7 - iter 264/1326 - loss 0.02579215 - time (sec): 41.65 - samples/sec: 979.37 - lr: 0.000004
165
+ 2024-04-29 19:35:51,752 epoch 7 - iter 396/1326 - loss 0.02557348 - time (sec): 62.10 - samples/sec: 929.89 - lr: 0.000004
166
+ 2024-04-29 19:36:12,434 epoch 7 - iter 528/1326 - loss 0.02216509 - time (sec): 82.79 - samples/sec: 937.33 - lr: 0.000004
167
+ 2024-04-29 19:36:33,493 epoch 7 - iter 660/1326 - loss 0.03440919 - time (sec): 103.84 - samples/sec: 940.35 - lr: 0.000004
168
+ 2024-04-29 19:36:54,140 epoch 7 - iter 792/1326 - loss 0.03607959 - time (sec): 124.49 - samples/sec: 937.29 - lr: 0.000004
169
+ 2024-04-29 19:37:14,771 epoch 7 - iter 924/1326 - loss 0.03383704 - time (sec): 145.12 - samples/sec: 934.65 - lr: 0.000004
170
+ 2024-04-29 19:37:36,142 epoch 7 - iter 1056/1326 - loss 0.03426834 - time (sec): 166.49 - samples/sec: 949.38 - lr: 0.000004
171
+ 2024-04-29 19:37:57,002 epoch 7 - iter 1188/1326 - loss 0.03352186 - time (sec): 187.35 - samples/sec: 943.93 - lr: 0.000004
172
+ 2024-04-29 19:38:17,959 epoch 7 - iter 1320/1326 - loss 0.03484361 - time (sec): 208.31 - samples/sec: 945.40 - lr: 0.000004
173
+ 2024-04-29 19:38:18,781 ----------------------------------------------------------------------------------------------------
174
+ 2024-04-29 19:38:18,781 EPOCH 7 done: loss 0.0347 - lr 0.000004
175
+ 2024-04-29 19:38:25,463 Evaluating as a multi-label problem: False
176
+ 2024-04-29 19:38:25,471 DEV : loss 0.07581108063459396 - f1-score (micro avg) 0.6724
177
+ 2024-04-29 19:38:25,479 ----------------------------------------------------------------------------------------------------
178
+ 2024-04-29 19:38:47,172 epoch 8 - iter 132/1326 - loss 0.01665357 - time (sec): 21.69 - samples/sec: 1085.75 - lr: 0.000004
179
+ 2024-04-29 19:39:08,019 epoch 8 - iter 264/1326 - loss 0.01646746 - time (sec): 42.54 - samples/sec: 1005.05 - lr: 0.000004
180
+ 2024-04-29 19:39:28,795 epoch 8 - iter 396/1326 - loss 0.02015931 - time (sec): 63.32 - samples/sec: 966.81 - lr: 0.000004
181
+ 2024-04-29 19:39:49,482 epoch 8 - iter 528/1326 - loss 0.01847810 - time (sec): 84.00 - samples/sec: 950.00 - lr: 0.000003
182
+ 2024-04-29 19:40:10,218 epoch 8 - iter 660/1326 - loss 0.02597538 - time (sec): 104.74 - samples/sec: 938.07 - lr: 0.000003
183
+ 2024-04-29 19:40:30,984 epoch 8 - iter 792/1326 - loss 0.03006068 - time (sec): 125.51 - samples/sec: 939.37 - lr: 0.000003
184
+ 2024-04-29 19:40:51,804 epoch 8 - iter 924/1326 - loss 0.03293034 - time (sec): 146.33 - samples/sec: 934.87 - lr: 0.000003
185
+ 2024-04-29 19:41:12,625 epoch 8 - iter 1056/1326 - loss 0.03359083 - time (sec): 167.15 - samples/sec: 933.82 - lr: 0.000003
186
+ 2024-04-29 19:41:33,314 epoch 8 - iter 1188/1326 - loss 0.03258698 - time (sec): 187.84 - samples/sec: 935.06 - lr: 0.000003
187
+ 2024-04-29 19:41:54,067 epoch 8 - iter 1320/1326 - loss 0.03148140 - time (sec): 208.59 - samples/sec: 935.60 - lr: 0.000003
188
+ 2024-04-29 19:41:55,132 ----------------------------------------------------------------------------------------------------
189
+ 2024-04-29 19:41:55,133 EPOCH 8 done: loss 0.0311 - lr 0.000003
190
+ 2024-04-29 19:42:01,776 Evaluating as a multi-label problem: False
191
+ 2024-04-29 19:42:01,783 DEV : loss 0.07848106324672699 - f1-score (micro avg) 0.7319
192
+ 2024-04-29 19:42:01,793 ----------------------------------------------------------------------------------------------------
193
+ 2024-04-29 19:42:23,088 epoch 9 - iter 132/1326 - loss 0.05663547 - time (sec): 21.30 - samples/sec: 957.54 - lr: 0.000003
194
+ 2024-04-29 19:42:43,801 epoch 9 - iter 264/1326 - loss 0.03800695 - time (sec): 42.01 - samples/sec: 912.51 - lr: 0.000003
195
+ 2024-04-29 19:43:04,820 epoch 9 - iter 396/1326 - loss 0.04253517 - time (sec): 63.03 - samples/sec: 935.29 - lr: 0.000003
196
+ 2024-04-29 19:43:26,182 epoch 9 - iter 528/1326 - loss 0.03301648 - time (sec): 84.39 - samples/sec: 979.99 - lr: 0.000003
197
+ 2024-04-29 19:43:46,865 epoch 9 - iter 660/1326 - loss 0.03207756 - time (sec): 105.07 - samples/sec: 970.86 - lr: 0.000003
198
+ 2024-04-29 19:44:07,837 epoch 9 - iter 792/1326 - loss 0.02954204 - time (sec): 126.04 - samples/sec: 968.05 - lr: 0.000003
199
+ 2024-04-29 19:44:28,325 epoch 9 - iter 924/1326 - loss 0.02863646 - time (sec): 146.53 - samples/sec: 957.30 - lr: 0.000003
200
+ 2024-04-29 19:44:49,246 epoch 9 - iter 1056/1326 - loss 0.02794484 - time (sec): 167.45 - samples/sec: 953.28 - lr: 0.000003
201
+ 2024-04-29 19:45:09,934 epoch 9 - iter 1188/1326 - loss 0.02632601 - time (sec): 188.14 - samples/sec: 957.31 - lr: 0.000003
202
+ 2024-04-29 19:45:30,508 epoch 9 - iter 1320/1326 - loss 0.02520696 - time (sec): 208.72 - samples/sec: 944.49 - lr: 0.000003
203
+ 2024-04-29 19:45:31,338 ----------------------------------------------------------------------------------------------------
204
+ 2024-04-29 19:45:31,338 EPOCH 9 done: loss 0.0251 - lr 0.000003
205
+ 2024-04-29 19:45:38,067 Evaluating as a multi-label problem: False
206
+ 2024-04-29 19:45:38,074 DEV : loss 0.08854210376739502 - f1-score (micro avg) 0.7296
207
+ 2024-04-29 19:45:38,082 ----------------------------------------------------------------------------------------------------
208
+ 2024-04-29 19:45:59,093 epoch 10 - iter 132/1326 - loss 0.01233456 - time (sec): 21.01 - samples/sec: 955.33 - lr: 0.000003
209
+ 2024-04-29 19:46:19,871 epoch 10 - iter 264/1326 - loss 0.01442453 - time (sec): 41.79 - samples/sec: 952.73 - lr: 0.000003
210
+ 2024-04-29 19:46:40,768 epoch 10 - iter 396/1326 - loss 0.02226487 - time (sec): 62.69 - samples/sec: 949.03 - lr: 0.000003
211
+ 2024-04-29 19:47:01,240 epoch 10 - iter 528/1326 - loss 0.02393851 - time (sec): 83.16 - samples/sec: 934.21 - lr: 0.000003
212
+ 2024-04-29 19:47:22,284 epoch 10 - iter 660/1326 - loss 0.02377848 - time (sec): 104.20 - samples/sec: 950.07 - lr: 0.000003
213
+ 2024-04-29 19:47:42,834 epoch 10 - iter 792/1326 - loss 0.02245760 - time (sec): 124.75 - samples/sec: 938.46 - lr: 0.000003
214
+ 2024-04-29 19:48:04,028 epoch 10 - iter 924/1326 - loss 0.02302608 - time (sec): 145.95 - samples/sec: 946.93 - lr: 0.000003
215
+ 2024-04-29 19:48:24,992 epoch 10 - iter 1056/1326 - loss 0.02390901 - time (sec): 166.91 - samples/sec: 950.20 - lr: 0.000003
216
+ 2024-04-29 19:48:45,625 epoch 10 - iter 1188/1326 - loss 0.02196178 - time (sec): 187.54 - samples/sec: 947.51 - lr: 0.000003
217
+ 2024-04-29 19:49:06,386 epoch 10 - iter 1320/1326 - loss 0.02391247 - time (sec): 208.30 - samples/sec: 941.85 - lr: 0.000003
218
+ 2024-04-29 19:49:07,302 ----------------------------------------------------------------------------------------------------
219
+ 2024-04-29 19:49:07,302 EPOCH 10 done: loss 0.0237 - lr 0.000003
220
+ 2024-04-29 19:49:14,037 Evaluating as a multi-label problem: False
221
+ 2024-04-29 19:49:14,044 DEV : loss 0.06483861804008484 - f1-score (micro avg) 0.6888
222
+ 2024-04-29 19:49:14,054 ----------------------------------------------------------------------------------------------------
223
+ 2024-04-29 19:49:35,184 epoch 11 - iter 132/1326 - loss 0.01376016 - time (sec): 21.13 - samples/sec: 954.01 - lr: 0.000002
224
+ 2024-04-29 19:49:56,040 epoch 11 - iter 264/1326 - loss 0.00968912 - time (sec): 41.99 - samples/sec: 983.60 - lr: 0.000002
225
+ 2024-04-29 19:50:17,256 epoch 11 - iter 396/1326 - loss 0.01934988 - time (sec): 63.20 - samples/sec: 982.16 - lr: 0.000002
226
+ 2024-04-29 19:50:37,928 epoch 11 - iter 528/1326 - loss 0.02163005 - time (sec): 83.87 - samples/sec: 960.39 - lr: 0.000002
227
+ 2024-04-29 19:50:58,697 epoch 11 - iter 660/1326 - loss 0.02394657 - time (sec): 104.64 - samples/sec: 960.85 - lr: 0.000002
228
+ 2024-04-29 19:51:19,441 epoch 11 - iter 792/1326 - loss 0.02269684 - time (sec): 125.39 - samples/sec: 961.89 - lr: 0.000002
229
+ 2024-04-29 19:51:40,444 epoch 11 - iter 924/1326 - loss 0.02139990 - time (sec): 146.39 - samples/sec: 962.55 - lr: 0.000002
230
+ 2024-04-29 19:52:00,980 epoch 11 - iter 1056/1326 - loss 0.02027515 - time (sec): 166.93 - samples/sec: 952.95 - lr: 0.000002
231
+ 2024-04-29 19:52:21,518 epoch 11 - iter 1188/1326 - loss 0.02049033 - time (sec): 187.46 - samples/sec: 947.71 - lr: 0.000002
232
+ 2024-04-29 19:52:42,468 epoch 11 - iter 1320/1326 - loss 0.02230984 - time (sec): 208.41 - samples/sec: 946.46 - lr: 0.000002
233
+ 2024-04-29 19:52:43,279 ----------------------------------------------------------------------------------------------------
234
+ 2024-04-29 19:52:43,279 EPOCH 11 done: loss 0.0223 - lr 0.000002
235
+ 2024-04-29 19:52:50,027 Evaluating as a multi-label problem: False
236
+ 2024-04-29 19:52:50,034 DEV : loss 0.06557412445545197 - f1-score (micro avg) 0.7149
237
+ 2024-04-29 19:52:50,043 ----------------------------------------------------------------------------------------------------
238
+ 2024-04-29 19:53:11,582 epoch 12 - iter 132/1326 - loss 0.01024615 - time (sec): 21.54 - samples/sec: 1071.15 - lr: 0.000002
239
+ 2024-04-29 19:53:32,395 epoch 12 - iter 264/1326 - loss 0.00787421 - time (sec): 42.35 - samples/sec: 1031.69 - lr: 0.000002
240
+ 2024-04-29 19:53:52,905 epoch 12 - iter 396/1326 - loss 0.01420230 - time (sec): 62.86 - samples/sec: 977.17 - lr: 0.000002
241
+ 2024-04-29 19:54:13,373 epoch 12 - iter 528/1326 - loss 0.01885577 - time (sec): 83.33 - samples/sec: 946.77 - lr: 0.000002
242
+ 2024-04-29 19:54:33,970 epoch 12 - iter 660/1326 - loss 0.01639885 - time (sec): 103.93 - samples/sec: 939.51 - lr: 0.000002
243
+ 2024-04-29 19:54:54,465 epoch 12 - iter 792/1326 - loss 0.01795589 - time (sec): 124.42 - samples/sec: 926.71 - lr: 0.000002
244
+ 2024-04-29 19:55:15,425 epoch 12 - iter 924/1326 - loss 0.01955026 - time (sec): 145.38 - samples/sec: 932.15 - lr: 0.000002
245
+ 2024-04-29 19:55:36,226 epoch 12 - iter 1056/1326 - loss 0.01918242 - time (sec): 166.18 - samples/sec: 937.12 - lr: 0.000002
246
+ 2024-04-29 19:55:57,215 epoch 12 - iter 1188/1326 - loss 0.02175725 - time (sec): 187.17 - samples/sec: 940.08 - lr: 0.000002
247
+ 2024-04-29 19:56:18,316 epoch 12 - iter 1320/1326 - loss 0.02169361 - time (sec): 208.27 - samples/sec: 945.08 - lr: 0.000002
248
+ 2024-04-29 19:56:19,169 ----------------------------------------------------------------------------------------------------
249
+ 2024-04-29 19:56:19,169 EPOCH 12 done: loss 0.0216 - lr 0.000002
250
+ 2024-04-29 19:56:25,901 Evaluating as a multi-label problem: False
251
+ 2024-04-29 19:56:25,908 DEV : loss 0.06691710650920868 - f1-score (micro avg) 0.7311
252
+ 2024-04-29 19:56:25,917 ----------------------------------------------------------------------------------------------------
253
+ 2024-04-29 19:56:46,852 epoch 13 - iter 132/1326 - loss 0.03298837 - time (sec): 20.93 - samples/sec: 897.53 - lr: 0.000002
254
+ 2024-04-29 19:57:07,720 epoch 13 - iter 264/1326 - loss 0.02150903 - time (sec): 41.80 - samples/sec: 951.62 - lr: 0.000002
255
+ 2024-04-29 19:57:28,830 epoch 13 - iter 396/1326 - loss 0.02136301 - time (sec): 62.91 - samples/sec: 965.03 - lr: 0.000002
256
+ 2024-04-29 19:57:49,321 epoch 13 - iter 528/1326 - loss 0.01898453 - time (sec): 83.40 - samples/sec: 945.01 - lr: 0.000002
257
+ 2024-04-29 19:58:09,888 epoch 13 - iter 660/1326 - loss 0.01879336 - time (sec): 103.97 - samples/sec: 935.56 - lr: 0.000002
258
+ 2024-04-29 19:58:30,626 epoch 13 - iter 792/1326 - loss 0.01660965 - time (sec): 124.71 - samples/sec: 939.56 - lr: 0.000002
259
+ 2024-04-29 19:58:50,952 epoch 13 - iter 924/1326 - loss 0.01499323 - time (sec): 145.03 - samples/sec: 927.70 - lr: 0.000001
260
+ 2024-04-29 19:59:12,003 epoch 13 - iter 1056/1326 - loss 0.01833583 - time (sec): 166.09 - samples/sec: 931.71 - lr: 0.000001
261
+ 2024-04-29 19:59:32,777 epoch 13 - iter 1188/1326 - loss 0.01712627 - time (sec): 186.86 - samples/sec: 940.76 - lr: 0.000001
262
+ 2024-04-29 19:59:53,925 epoch 13 - iter 1320/1326 - loss 0.01715871 - time (sec): 208.01 - samples/sec: 947.16 - lr: 0.000001
263
+ 2024-04-29 19:59:54,752 ----------------------------------------------------------------------------------------------------
264
+ 2024-04-29 19:59:54,752 EPOCH 13 done: loss 0.0171 - lr 0.000001
265
+ 2024-04-29 20:00:01,490 Evaluating as a multi-label problem: False
266
+ 2024-04-29 20:00:01,498 DEV : loss 0.06450295448303223 - f1-score (micro avg) 0.75
267
+ 2024-04-29 20:00:01,507 saving best model
268
+ 2024-04-29 20:00:03,658 ----------------------------------------------------------------------------------------------------
269
+ 2024-04-29 20:00:24,702 epoch 14 - iter 132/1326 - loss 0.03889764 - time (sec): 21.04 - samples/sec: 941.38 - lr: 0.000001
270
+ 2024-04-29 20:00:45,459 epoch 14 - iter 264/1326 - loss 0.02605007 - time (sec): 41.80 - samples/sec: 966.52 - lr: 0.000001
271
+ 2024-04-29 20:01:06,329 epoch 14 - iter 396/1326 - loss 0.01987798 - time (sec): 62.67 - samples/sec: 978.59 - lr: 0.000001
272
+ 2024-04-29 20:01:27,084 epoch 14 - iter 528/1326 - loss 0.01886847 - time (sec): 83.43 - samples/sec: 974.50 - lr: 0.000001
273
+ 2024-04-29 20:01:47,683 epoch 14 - iter 660/1326 - loss 0.01798242 - time (sec): 104.02 - samples/sec: 955.07 - lr: 0.000001
274
+ 2024-04-29 20:02:08,157 epoch 14 - iter 792/1326 - loss 0.01593590 - time (sec): 124.50 - samples/sec: 943.49 - lr: 0.000001
275
+ 2024-04-29 20:02:29,259 epoch 14 - iter 924/1326 - loss 0.01623625 - time (sec): 145.60 - samples/sec: 947.55 - lr: 0.000001
276
+ 2024-04-29 20:02:49,779 epoch 14 - iter 1056/1326 - loss 0.01708562 - time (sec): 166.12 - samples/sec: 936.69 - lr: 0.000001
277
+ 2024-04-29 20:03:10,855 epoch 14 - iter 1188/1326 - loss 0.01556387 - time (sec): 187.20 - samples/sec: 949.03 - lr: 0.000001
278
+ 2024-04-29 20:03:31,683 epoch 14 - iter 1320/1326 - loss 0.01533842 - time (sec): 208.02 - samples/sec: 947.01 - lr: 0.000001
279
+ 2024-04-29 20:03:32,470 ----------------------------------------------------------------------------------------------------
280
+ 2024-04-29 20:03:32,470 EPOCH 14 done: loss 0.0153 - lr 0.000001
281
+ 2024-04-29 20:03:39,240 Evaluating as a multi-label problem: False
282
+ 2024-04-29 20:03:39,247 DEV : loss 0.0911756381392479 - f1-score (micro avg) 0.7288
283
+ 2024-04-29 20:03:39,257 ----------------------------------------------------------------------------------------------------
284
+ 2024-04-29 20:04:00,018 epoch 15 - iter 132/1326 - loss 0.01237652 - time (sec): 20.76 - samples/sec: 878.16 - lr: 0.000001
285
+ 2024-04-29 20:04:20,615 epoch 15 - iter 264/1326 - loss 0.01436397 - time (sec): 41.36 - samples/sec: 879.70 - lr: 0.000001
286
+ 2024-04-29 20:04:41,840 epoch 15 - iter 396/1326 - loss 0.01188224 - time (sec): 62.58 - samples/sec: 935.60 - lr: 0.000001
287
+ 2024-04-29 20:05:02,449 epoch 15 - iter 528/1326 - loss 0.01191348 - time (sec): 83.19 - samples/sec: 931.41 - lr: 0.000001
288
+ 2024-04-29 20:05:23,576 epoch 15 - iter 660/1326 - loss 0.01318250 - time (sec): 104.32 - samples/sec: 936.74 - lr: 0.000001
289
+ 2024-04-29 20:05:44,259 epoch 15 - iter 792/1326 - loss 0.01610301 - time (sec): 125.00 - samples/sec: 935.86 - lr: 0.000001
290
+ 2024-04-29 20:06:05,148 epoch 15 - iter 924/1326 - loss 0.01402320 - time (sec): 145.89 - samples/sec: 935.57 - lr: 0.000001
291
+ 2024-04-29 20:06:26,080 epoch 15 - iter 1056/1326 - loss 0.01456286 - time (sec): 166.82 - samples/sec: 943.62 - lr: 0.000001
292
+ 2024-04-29 20:06:46,684 epoch 15 - iter 1188/1326 - loss 0.01366503 - time (sec): 187.43 - samples/sec: 941.11 - lr: 0.000001
293
+ 2024-04-29 20:07:07,514 epoch 15 - iter 1320/1326 - loss 0.01271998 - time (sec): 208.26 - samples/sec: 946.56 - lr: 0.000001
294
+ 2024-04-29 20:07:08,330 ----------------------------------------------------------------------------------------------------
295
+ 2024-04-29 20:07:08,330 EPOCH 15 done: loss 0.0127 - lr 0.000001
296
+ 2024-04-29 20:07:15,376 Evaluating as a multi-label problem: False
297
+ 2024-04-29 20:07:15,383 DEV : loss 0.0763971135020256 - f1-score (micro avg) 0.7424
298
+ 2024-04-29 20:07:15,392 ----------------------------------------------------------------------------------------------------
299
+ 2024-04-29 20:07:35,973 epoch 16 - iter 132/1326 - loss 0.00604141 - time (sec): 20.58 - samples/sec: 925.87 - lr: 0.000001
300
+ 2024-04-29 20:07:56,877 epoch 16 - iter 264/1326 - loss 0.00962106 - time (sec): 41.48 - samples/sec: 944.64 - lr: 0.000001
301
+ 2024-04-29 20:08:17,697 epoch 16 - iter 396/1326 - loss 0.00897610 - time (sec): 62.30 - samples/sec: 966.51 - lr: 0.000001
302
+ 2024-04-29 20:08:38,452 epoch 16 - iter 528/1326 - loss 0.00930250 - time (sec): 83.06 - samples/sec: 969.77 - lr: 0.000001
303
+ 2024-04-29 20:08:58,968 epoch 16 - iter 660/1326 - loss 0.01240910 - time (sec): 103.58 - samples/sec: 948.32 - lr: 0.000001
304
+ 2024-04-29 20:09:19,865 epoch 16 - iter 792/1326 - loss 0.01194240 - time (sec): 124.47 - samples/sec: 955.31 - lr: 0.000001
305
+ 2024-04-29 20:09:40,936 epoch 16 - iter 924/1326 - loss 0.01136229 - time (sec): 145.54 - samples/sec: 954.94 - lr: 0.000001
306
+ 2024-04-29 20:10:01,211 epoch 16 - iter 1056/1326 - loss 0.01241511 - time (sec): 165.82 - samples/sec: 941.01 - lr: 0.000001
307
+ 2024-04-29 20:10:22,278 epoch 16 - iter 1188/1326 - loss 0.01237126 - time (sec): 186.89 - samples/sec: 943.29 - lr: 0.000001
308
+ 2024-04-29 20:10:43,285 epoch 16 - iter 1320/1326 - loss 0.01312447 - time (sec): 207.89 - samples/sec: 945.12 - lr: 0.000000
309
+ 2024-04-29 20:10:44,167 ----------------------------------------------------------------------------------------------------
310
+ 2024-04-29 20:10:44,168 EPOCH 16 done: loss 0.0131 - lr 0.000000
311
+ 2024-04-29 20:10:51,196 Evaluating as a multi-label problem: False
312
+ 2024-04-29 20:10:51,203 DEV : loss 0.07494457066059113 - f1-score (micro avg) 0.7532
313
+ 2024-04-29 20:10:51,212 saving best model
314
+ 2024-04-29 20:10:53,114 ----------------------------------------------------------------------------------------------------
315
+ 2024-04-29 20:11:14,140 epoch 17 - iter 132/1326 - loss 0.01086358 - time (sec): 21.03 - samples/sec: 984.72 - lr: 0.000000
316
+ 2024-04-29 20:11:34,547 epoch 17 - iter 264/1326 - loss 0.00784167 - time (sec): 41.43 - samples/sec: 909.28 - lr: 0.000000
317
+ 2024-04-29 20:11:55,214 epoch 17 - iter 396/1326 - loss 0.00810142 - time (sec): 62.10 - samples/sec: 903.48 - lr: 0.000000
318
+ 2024-04-29 20:12:16,277 epoch 17 - iter 528/1326 - loss 0.01276904 - time (sec): 83.16 - samples/sec: 959.18 - lr: 0.000000
319
+ 2024-04-29 20:12:37,331 epoch 17 - iter 660/1326 - loss 0.01460244 - time (sec): 104.22 - samples/sec: 968.65 - lr: 0.000000
320
+ 2024-04-29 20:12:58,109 epoch 17 - iter 792/1326 - loss 0.01585436 - time (sec): 124.99 - samples/sec: 959.39 - lr: 0.000000
321
+ 2024-04-29 20:13:18,704 epoch 17 - iter 924/1326 - loss 0.01492635 - time (sec): 145.59 - samples/sec: 951.40 - lr: 0.000000
322
+ 2024-04-29 20:13:39,820 epoch 17 - iter 1056/1326 - loss 0.01390514 - time (sec): 166.71 - samples/sec: 959.19 - lr: 0.000000
323
+ 2024-04-29 20:14:00,123 epoch 17 - iter 1188/1326 - loss 0.01331555 - time (sec): 187.01 - samples/sec: 942.26 - lr: 0.000000
324
+ 2024-04-29 20:14:21,036 epoch 17 - iter 1320/1326 - loss 0.01345542 - time (sec): 207.92 - samples/sec: 947.22 - lr: 0.000000
325
+ 2024-04-29 20:14:21,870 ----------------------------------------------------------------------------------------------------
326
+ 2024-04-29 20:14:21,870 EPOCH 17 done: loss 0.0135 - lr 0.000000
327
+ 2024-04-29 20:14:28,607 Evaluating as a multi-label problem: False
328
+ 2024-04-29 20:14:28,614 DEV : loss 0.07131695002317429 - f1-score (micro avg) 0.7265
329
+ 2024-04-29 20:14:28,624 ----------------------------------------------------------------------------------------------------
330
+ 2024-04-29 20:14:49,351 epoch 18 - iter 132/1326 - loss 0.00639355 - time (sec): 20.73 - samples/sec: 843.00 - lr: 0.000000
331
+ 2024-04-29 20:15:10,862 epoch 18 - iter 264/1326 - loss 0.00675824 - time (sec): 42.24 - samples/sec: 972.00 - lr: 0.000000
332
+ 2024-04-29 20:15:31,579 epoch 18 - iter 396/1326 - loss 0.00743064 - time (sec): 62.95 - samples/sec: 969.54 - lr: 0.000000
333
+ 2024-04-29 20:15:52,280 epoch 18 - iter 528/1326 - loss 0.00891165 - time (sec): 83.66 - samples/sec: 964.18 - lr: 0.000000
334
+ 2024-04-29 20:16:13,313 epoch 18 - iter 660/1326 - loss 0.01136151 - time (sec): 104.69 - samples/sec: 974.25 - lr: 0.000000
335
+ 2024-04-29 20:16:34,312 epoch 18 - iter 792/1326 - loss 0.01088078 - time (sec): 125.69 - samples/sec: 974.73 - lr: 0.000000
336
+ 2024-04-29 20:16:54,598 epoch 18 - iter 924/1326 - loss 0.01056691 - time (sec): 145.97 - samples/sec: 955.73 - lr: 0.000000
337
+ 2024-04-29 20:17:15,485 epoch 18 - iter 1056/1326 - loss 0.01338623 - time (sec): 166.86 - samples/sec: 954.02 - lr: 0.000000
338
+ 2024-04-29 20:17:36,152 epoch 18 - iter 1188/1326 - loss 0.01294660 - time (sec): 187.53 - samples/sec: 946.42 - lr: 0.000000
339
+ 2024-04-29 20:17:56,803 epoch 18 - iter 1320/1326 - loss 0.01185896 - time (sec): 208.18 - samples/sec: 943.11 - lr: 0.000000
340
+ 2024-04-29 20:17:57,706 ----------------------------------------------------------------------------------------------------
341
+ 2024-04-29 20:17:57,707 EPOCH 18 done: loss 0.0118 - lr 0.000000
342
+ 2024-04-29 20:18:04,460 Evaluating as a multi-label problem: False
343
+ 2024-04-29 20:18:04,467 DEV : loss 0.06895702332258224 - f1-score (micro avg) 0.7143
344
+ 2024-04-29 20:18:04,477 ----------------------------------------------------------------------------------------------------
345
+ 2024-04-29 20:18:25,241 epoch 19 - iter 132/1326 - loss 0.00779178 - time (sec): 20.76 - samples/sec: 859.34 - lr: 0.000000
346
+ 2024-04-29 20:18:45,931 epoch 19 - iter 264/1326 - loss 0.00883730 - time (sec): 41.45 - samples/sec: 903.23 - lr: 0.000000
347
+ 2024-04-29 20:19:06,584 epoch 19 - iter 396/1326 - loss 0.00865701 - time (sec): 62.11 - samples/sec: 929.04 - lr: 0.000000
348
+ 2024-04-29 20:19:27,143 epoch 19 - iter 528/1326 - loss 0.00931012 - time (sec): 82.67 - samples/sec: 933.53 - lr: 0.000000
349
+ 2024-04-29 20:19:47,967 epoch 19 - iter 660/1326 - loss 0.00893505 - time (sec): 103.49 - samples/sec: 941.05 - lr: 0.000000
350
+ 2024-04-29 20:20:09,276 epoch 19 - iter 792/1326 - loss 0.00983372 - time (sec): 124.80 - samples/sec: 965.06 - lr: 0.000000
351
+ 2024-04-29 20:20:29,681 epoch 19 - iter 924/1326 - loss 0.01071250 - time (sec): 145.20 - samples/sec: 946.81 - lr: 0.000000
352
+ 2024-04-29 20:20:50,714 epoch 19 - iter 1056/1326 - loss 0.01008226 - time (sec): 166.24 - samples/sec: 945.81 - lr: 0.000000
353
+ 2024-04-29 20:21:11,577 epoch 19 - iter 1188/1326 - loss 0.01218936 - time (sec): 187.10 - samples/sec: 948.77 - lr: 0.000000
354
+ 2024-04-29 20:21:32,191 epoch 19 - iter 1320/1326 - loss 0.01151174 - time (sec): 207.71 - samples/sec: 946.17 - lr: 0.000000
355
+ 2024-04-29 20:21:33,090 ----------------------------------------------------------------------------------------------------
356
+ 2024-04-29 20:21:33,090 EPOCH 19 done: loss 0.0115 - lr 0.000000
357
+ 2024-04-29 20:21:39,713 Evaluating as a multi-label problem: False
358
+ 2024-04-29 20:21:39,722 DEV : loss 0.06978413462638855 - f1-score (micro avg) 0.7296
359
+ 2024-04-29 20:21:39,734 ----------------------------------------------------------------------------------------------------
360
+ 2024-04-29 20:22:00,944 epoch 20 - iter 132/1326 - loss 0.00412526 - time (sec): 21.21 - samples/sec: 947.92 - lr: 0.000000
361
+ 2024-04-29 20:22:21,834 epoch 20 - iter 264/1326 - loss 0.00391955 - time (sec): 42.10 - samples/sec: 947.36 - lr: 0.000000
362
+ 2024-04-29 20:22:42,606 epoch 20 - iter 396/1326 - loss 0.00617386 - time (sec): 62.87 - samples/sec: 941.76 - lr: 0.000000
363
+ 2024-04-29 20:23:02,937 epoch 20 - iter 528/1326 - loss 0.00598707 - time (sec): 83.20 - samples/sec: 924.06 - lr: 0.000000
364
+ 2024-04-29 20:23:23,893 epoch 20 - iter 660/1326 - loss 0.00815138 - time (sec): 104.16 - samples/sec: 928.58 - lr: 0.000000
365
+ 2024-04-29 20:23:45,135 epoch 20 - iter 792/1326 - loss 0.00815129 - time (sec): 125.40 - samples/sec: 958.62 - lr: 0.000000
366
+ 2024-04-29 20:24:05,808 epoch 20 - iter 924/1326 - loss 0.00848857 - time (sec): 146.07 - samples/sec: 951.78 - lr: 0.000000
367
+ 2024-04-29 20:24:26,900 epoch 20 - iter 1056/1326 - loss 0.00786540 - time (sec): 167.17 - samples/sec: 960.67 - lr: 0.000000
368
+ 2024-04-29 20:24:47,236 epoch 20 - iter 1188/1326 - loss 0.00819986 - time (sec): 187.50 - samples/sec: 949.58 - lr: 0.000000
369
+ 2024-04-29 20:25:07,654 epoch 20 - iter 1320/1326 - loss 0.01085452 - time (sec): 207.92 - samples/sec: 944.93 - lr: 0.000000
370
+ 2024-04-29 20:25:08,614 ----------------------------------------------------------------------------------------------------
371
+ 2024-04-29 20:25:08,614 EPOCH 20 done: loss 0.0108 - lr 0.000000
372
+ 2024-04-29 20:25:15,217 Evaluating as a multi-label problem: False
373
+ 2024-04-29 20:25:15,224 DEV : loss 0.06992758810520172 - f1-score (micro avg) 0.7296
374
+ 2024-04-29 20:25:16,992 ----------------------------------------------------------------------------------------------------
375
+ 2024-04-29 20:25:47,102 SequenceTagger predicts: Dictionary with 25 tags: O, S-ORG, B-ORG, E-ORG, I-ORG, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-MISC, B-MISC, E-MISC, I-MISC, S-UTE, B-UTE, E-UTE, I-UTE, S-SINGLE_COMPANY, B-SINGLE_COMPANY, E-SINGLE_COMPANY, I-SINGLE_COMPANY
376
+ 2024-04-29 20:25:54,356 Evaluating as a multi-label problem: False
377
+ 2024-04-29 20:25:54,363 0.7039 0.7868 0.7431 0.5944
378
+ 2024-04-29 20:25:54,363
379
+ Results:
380
+ - F-score (micro) 0.7431
381
+ - F-score (macro) 0.7429
382
+ - Accuracy 0.5944
383
+
384
+ By class:
385
+ precision recall f1-score support
386
+
387
+ UTE 0.7568 0.7887 0.7724 71
388
+ SINGLE_COMPANY 0.6538 0.7846 0.7133 65
389
+
390
+ micro avg 0.7039 0.7868 0.7431 136
391
+ macro avg 0.7053 0.7867 0.7429 136
392
+ weighted avg 0.7076 0.7868 0.7442 136
393
+
394
+ 2024-04-29 20:25:54,363 ----------------------------------------------------------------------------------------------------
weights.txt ADDED
File without changes