Jens Grivolla
commited on
Commit
·
fb1f685
1
Parent(s):
6025fb2
add initial model files
Browse files- best-model.pt +3 -0
- dev.tsv +0 -0
- final-model.pt +3 -0
- loss.tsv +21 -0
- test.tsv +0 -0
- training.log +394 -0
- weights.txt +0 -0
best-model.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b7cea919da61f8f323e4ca03ebcb43c6e5ba5e06bdbccce37fc9e7e0e9a3e128
|
3 |
+
size 2256908487
|
dev.tsv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
final-model.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f3e8a2744ee6be82974bdacf8310fd6a531114bc6fe626880576cdf85e6dadcd
|
3 |
+
size 2256908884
|
loss.tsv
ADDED
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
|
2 |
+
1 19:16:41 0.0000 0.19026665994182357 0.08347803354263306 0.2119 0.2807 0.2415 0.1517
|
3 |
+
2 19:20:19 0.0000 0.07510200789904357 0.06594807654619217 0.6336 0.7281 0.6776 0.5355
|
4 |
+
3 19:23:56 0.0000 0.051059842073423525 0.06074140965938568 0.6457 0.7193 0.6805 0.543
|
5 |
+
4 19:27:35 0.0000 0.04975723893997647 0.06857836991548538 0.7523 0.7193 0.7354 0.5942
|
6 |
+
5 19:31:13 0.0000 0.03971213988859546 0.08110673725605011 0.7018 0.7018 0.7018 0.5674
|
7 |
+
6 19:34:49 0.0000 0.03546830191468582 0.07366479188203812 0.7107 0.7544 0.7319 0.5972
|
8 |
+
7 19:38:25 0.0000 0.03473060495712673 0.07581108063459396 0.661 0.6842 0.6724 0.5417
|
9 |
+
8 19:42:01 0.0000 0.031095477296456932 0.07848106324672699 0.7107 0.7544 0.7319 0.6014
|
10 |
+
9 19:45:38 0.0000 0.025149721216192127 0.08854210376739502 0.7143 0.7456 0.7296 0.5986
|
11 |
+
10 19:49:14 0.0000 0.023745396892456357 0.06483861804008484 0.6535 0.7281 0.6888 0.5461
|
12 |
+
11 19:52:50 0.0000 0.022273265507065026 0.06557412445545197 0.6942 0.7368 0.7149 0.5833
|
13 |
+
12 19:56:25 0.0000 0.021611838273819493 0.06691710650920868 0.7016 0.7632 0.7311 0.5959
|
14 |
+
13 20:00:01 0.0000 0.01710983789516771 0.06450295448303223 0.7373 0.7632 0.75 0.6214
|
15 |
+
14 20:03:39 0.0000 0.015293634907960702 0.0911756381392479 0.7049 0.7544 0.7288 0.5972
|
16 |
+
15 20:07:15 0.0000 0.012690879626054398 0.0763971135020256 0.7391 0.7456 0.7424 0.6159
|
17 |
+
16 20:10:51 0.0000 0.013051816123982237 0.07494457066059113 0.7436 0.7632 0.7532 0.6304
|
18 |
+
17 20:14:28 0.0000 0.01346253304226053 0.07131695002317429 0.7083 0.7456 0.7265 0.5944
|
19 |
+
18 20:18:04 0.0000 0.011784355731974503 0.06895702332258224 0.6855 0.7456 0.7143 0.5822
|
20 |
+
19 20:21:39 0.0000 0.011475109279551204 0.06978413462638855 0.7143 0.7456 0.7296 0.5986
|
21 |
+
20 20:25:15 0.0000 0.01079436888010318 0.06992758810520172 0.7143 0.7456 0.7296 0.5986
|
test.tsv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
training.log
ADDED
@@ -0,0 +1,394 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2024-04-29 19:13:06,967 ----------------------------------------------------------------------------------------------------
|
2 |
+
2024-04-29 19:13:06,968 Model: "SequenceTagger(
|
3 |
+
(embeddings): TransformerWordEmbeddings(
|
4 |
+
(model): XLMRobertaModel(
|
5 |
+
(embeddings): XLMRobertaEmbeddings(
|
6 |
+
(word_embeddings): Embedding(250003, 1024)
|
7 |
+
(position_embeddings): Embedding(514, 1024, padding_idx=1)
|
8 |
+
(token_type_embeddings): Embedding(1, 1024)
|
9 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
10 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
11 |
+
)
|
12 |
+
(encoder): XLMRobertaEncoder(
|
13 |
+
(layer): ModuleList(
|
14 |
+
(0-23): 24 x XLMRobertaLayer(
|
15 |
+
(attention): XLMRobertaAttention(
|
16 |
+
(self): XLMRobertaSelfAttention(
|
17 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
18 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
19 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
20 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
21 |
+
)
|
22 |
+
(output): XLMRobertaSelfOutput(
|
23 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
24 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
25 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
26 |
+
)
|
27 |
+
)
|
28 |
+
(intermediate): XLMRobertaIntermediate(
|
29 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
30 |
+
(intermediate_act_fn): GELUActivation()
|
31 |
+
)
|
32 |
+
(output): XLMRobertaOutput(
|
33 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
34 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
35 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
36 |
+
)
|
37 |
+
)
|
38 |
+
)
|
39 |
+
)
|
40 |
+
(pooler): XLMRobertaPooler(
|
41 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
42 |
+
(activation): Tanh()
|
43 |
+
)
|
44 |
+
)
|
45 |
+
)
|
46 |
+
(locked_dropout): LockedDropout(p=0.5)
|
47 |
+
(linear): Linear(in_features=1024, out_features=25, bias=True)
|
48 |
+
(loss_function): CrossEntropyLoss()
|
49 |
+
)"
|
50 |
+
2024-04-29 19:13:06,968 ----------------------------------------------------------------------------------------------------
|
51 |
+
2024-04-29 19:13:06,968 Corpus: "Corpus: 5301 train + 589 dev + 654 test sentences"
|
52 |
+
2024-04-29 19:13:06,968 ----------------------------------------------------------------------------------------------------
|
53 |
+
2024-04-29 19:13:06,968 Parameters:
|
54 |
+
2024-04-29 19:13:06,968 - learning_rate: "0.000005"
|
55 |
+
2024-04-29 19:13:06,968 - mini_batch_size: "4"
|
56 |
+
2024-04-29 19:13:06,968 - patience: "3"
|
57 |
+
2024-04-29 19:13:06,968 - anneal_factor: "0.5"
|
58 |
+
2024-04-29 19:13:06,968 - max_epochs: "20"
|
59 |
+
2024-04-29 19:13:06,968 - shuffle: "True"
|
60 |
+
2024-04-29 19:13:06,968 - train_with_dev: "False"
|
61 |
+
2024-04-29 19:13:06,968 - batch_growth_annealing: "False"
|
62 |
+
2024-04-29 19:13:06,968 ----------------------------------------------------------------------------------------------------
|
63 |
+
2024-04-29 19:13:06,968 Model training base path: "resources/taggers/ner-spanish-large-np-finetune"
|
64 |
+
2024-04-29 19:13:06,968 ----------------------------------------------------------------------------------------------------
|
65 |
+
2024-04-29 19:13:06,968 Device: cuda:0
|
66 |
+
2024-04-29 19:13:06,969 ----------------------------------------------------------------------------------------------------
|
67 |
+
2024-04-29 19:13:06,969 Embeddings storage mode: none
|
68 |
+
2024-04-29 19:13:06,969 ----------------------------------------------------------------------------------------------------
|
69 |
+
2024-04-29 19:13:30,396 epoch 1 - iter 132/1326 - loss 0.62261396 - time (sec): 23.43 - samples/sec: 1442.78 - lr: 0.000005
|
70 |
+
2024-04-29 19:13:52,923 epoch 1 - iter 264/1326 - loss 0.38574243 - time (sec): 45.95 - samples/sec: 1428.48 - lr: 0.000005
|
71 |
+
2024-04-29 19:14:13,281 epoch 1 - iter 396/1326 - loss 0.33621740 - time (sec): 66.31 - samples/sec: 1214.90 - lr: 0.000005
|
72 |
+
2024-04-29 19:14:33,173 epoch 1 - iter 528/1326 - loss 0.27934057 - time (sec): 86.20 - samples/sec: 1133.17 - lr: 0.000005
|
73 |
+
2024-04-29 19:14:54,027 epoch 1 - iter 660/1326 - loss 0.28667008 - time (sec): 107.06 - samples/sec: 1088.07 - lr: 0.000005
|
74 |
+
2024-04-29 19:15:13,282 epoch 1 - iter 792/1326 - loss 0.26728950 - time (sec): 126.31 - samples/sec: 1005.56 - lr: 0.000005
|
75 |
+
2024-04-29 19:15:32,488 epoch 1 - iter 924/1326 - loss 0.24258707 - time (sec): 145.52 - samples/sec: 961.73 - lr: 0.000005
|
76 |
+
2024-04-29 19:15:52,452 epoch 1 - iter 1056/1326 - loss 0.22206141 - time (sec): 165.48 - samples/sec: 923.87 - lr: 0.000005
|
77 |
+
2024-04-29 19:16:12,585 epoch 1 - iter 1188/1326 - loss 0.21078620 - time (sec): 185.62 - samples/sec: 898.59 - lr: 0.000005
|
78 |
+
2024-04-29 19:16:35,102 epoch 1 - iter 1320/1326 - loss 0.18962778 - time (sec): 208.13 - samples/sec: 944.88 - lr: 0.000005
|
79 |
+
2024-04-29 19:16:36,057 ----------------------------------------------------------------------------------------------------
|
80 |
+
2024-04-29 19:16:36,057 EPOCH 1 done: loss 0.1903 - lr 0.000005
|
81 |
+
2024-04-29 19:16:41,931 Evaluating as a multi-label problem: False
|
82 |
+
2024-04-29 19:16:41,938 DEV : loss 0.08347803354263306 - f1-score (micro avg) 0.2415
|
83 |
+
2024-04-29 19:16:41,946 saving best model
|
84 |
+
2024-04-29 19:16:43,696 ----------------------------------------------------------------------------------------------------
|
85 |
+
2024-04-29 19:17:04,299 epoch 2 - iter 132/1326 - loss 0.05433737 - time (sec): 20.60 - samples/sec: 939.30 - lr: 0.000005
|
86 |
+
2024-04-29 19:17:25,134 epoch 2 - iter 264/1326 - loss 0.07221647 - time (sec): 41.44 - samples/sec: 959.97 - lr: 0.000005
|
87 |
+
2024-04-29 19:17:45,581 epoch 2 - iter 396/1326 - loss 0.06800126 - time (sec): 61.88 - samples/sec: 926.79 - lr: 0.000005
|
88 |
+
2024-04-29 19:18:05,931 epoch 2 - iter 528/1326 - loss 0.06976185 - time (sec): 82.24 - samples/sec: 908.61 - lr: 0.000005
|
89 |
+
2024-04-29 19:18:26,703 epoch 2 - iter 660/1326 - loss 0.07144914 - time (sec): 103.01 - samples/sec: 916.36 - lr: 0.000005
|
90 |
+
2024-04-29 19:18:47,551 epoch 2 - iter 792/1326 - loss 0.07129850 - time (sec): 123.86 - samples/sec: 921.43 - lr: 0.000005
|
91 |
+
2024-04-29 19:19:08,831 epoch 2 - iter 924/1326 - loss 0.07364315 - time (sec): 145.13 - samples/sec: 937.72 - lr: 0.000005
|
92 |
+
2024-04-29 19:19:30,404 epoch 2 - iter 1056/1326 - loss 0.07429358 - time (sec): 166.71 - samples/sec: 947.01 - lr: 0.000005
|
93 |
+
2024-04-29 19:19:51,382 epoch 2 - iter 1188/1326 - loss 0.07472375 - time (sec): 187.69 - samples/sec: 951.54 - lr: 0.000005
|
94 |
+
2024-04-29 19:20:11,837 epoch 2 - iter 1320/1326 - loss 0.07525889 - time (sec): 208.14 - samples/sec: 945.83 - lr: 0.000005
|
95 |
+
2024-04-29 19:20:12,667 ----------------------------------------------------------------------------------------------------
|
96 |
+
2024-04-29 19:20:12,667 EPOCH 2 done: loss 0.0751 - lr 0.000005
|
97 |
+
2024-04-29 19:20:19,289 Evaluating as a multi-label problem: False
|
98 |
+
2024-04-29 19:20:19,296 DEV : loss 0.06594807654619217 - f1-score (micro avg) 0.6776
|
99 |
+
2024-04-29 19:20:19,305 saving best model
|
100 |
+
2024-04-29 19:20:21,153 ----------------------------------------------------------------------------------------------------
|
101 |
+
2024-04-29 19:20:42,076 epoch 3 - iter 132/1326 - loss 0.03075654 - time (sec): 20.92 - samples/sec: 1045.61 - lr: 0.000005
|
102 |
+
2024-04-29 19:21:03,363 epoch 3 - iter 264/1326 - loss 0.06844082 - time (sec): 42.21 - samples/sec: 1076.19 - lr: 0.000005
|
103 |
+
2024-04-29 19:21:24,222 epoch 3 - iter 396/1326 - loss 0.06805718 - time (sec): 63.07 - samples/sec: 1049.91 - lr: 0.000005
|
104 |
+
2024-04-29 19:21:44,517 epoch 3 - iter 528/1326 - loss 0.06164637 - time (sec): 83.36 - samples/sec: 980.80 - lr: 0.000005
|
105 |
+
2024-04-29 19:22:05,198 epoch 3 - iter 660/1326 - loss 0.05812095 - time (sec): 104.04 - samples/sec: 971.12 - lr: 0.000005
|
106 |
+
2024-04-29 19:22:25,741 epoch 3 - iter 792/1326 - loss 0.05654579 - time (sec): 124.59 - samples/sec: 951.60 - lr: 0.000005
|
107 |
+
2024-04-29 19:22:46,750 epoch 3 - iter 924/1326 - loss 0.05279483 - time (sec): 145.60 - samples/sec: 952.78 - lr: 0.000005
|
108 |
+
2024-04-29 19:23:07,347 epoch 3 - iter 1056/1326 - loss 0.05517769 - time (sec): 166.19 - samples/sec: 948.83 - lr: 0.000005
|
109 |
+
2024-04-29 19:23:28,146 epoch 3 - iter 1188/1326 - loss 0.05270269 - time (sec): 186.99 - samples/sec: 942.41 - lr: 0.000005
|
110 |
+
2024-04-29 19:23:49,190 epoch 3 - iter 1320/1326 - loss 0.05130536 - time (sec): 208.04 - samples/sec: 943.77 - lr: 0.000005
|
111 |
+
2024-04-29 19:23:50,082 ----------------------------------------------------------------------------------------------------
|
112 |
+
2024-04-29 19:23:50,082 EPOCH 3 done: loss 0.0511 - lr 0.000005
|
113 |
+
2024-04-29 19:23:56,695 Evaluating as a multi-label problem: False
|
114 |
+
2024-04-29 19:23:56,702 DEV : loss 0.06074140965938568 - f1-score (micro avg) 0.6805
|
115 |
+
2024-04-29 19:23:56,711 saving best model
|
116 |
+
2024-04-29 19:23:58,467 ----------------------------------------------------------------------------------------------------
|
117 |
+
2024-04-29 19:24:19,456 epoch 4 - iter 132/1326 - loss 0.05354733 - time (sec): 20.99 - samples/sec: 965.04 - lr: 0.000005
|
118 |
+
2024-04-29 19:24:40,634 epoch 4 - iter 264/1326 - loss 0.05477016 - time (sec): 42.17 - samples/sec: 987.83 - lr: 0.000005
|
119 |
+
2024-04-29 19:25:02,087 epoch 4 - iter 396/1326 - loss 0.04696782 - time (sec): 63.62 - samples/sec: 1021.52 - lr: 0.000005
|
120 |
+
2024-04-29 19:25:22,809 epoch 4 - iter 528/1326 - loss 0.04301686 - time (sec): 84.34 - samples/sec: 989.27 - lr: 0.000005
|
121 |
+
2024-04-29 19:25:43,866 epoch 4 - iter 660/1326 - loss 0.04884093 - time (sec): 105.40 - samples/sec: 969.26 - lr: 0.000005
|
122 |
+
2024-04-29 19:26:04,584 epoch 4 - iter 792/1326 - loss 0.04698956 - time (sec): 126.12 - samples/sec: 952.53 - lr: 0.000005
|
123 |
+
2024-04-29 19:26:25,832 epoch 4 - iter 924/1326 - loss 0.05039226 - time (sec): 147.36 - samples/sec: 964.02 - lr: 0.000005
|
124 |
+
2024-04-29 19:26:46,770 epoch 4 - iter 1056/1326 - loss 0.05071681 - time (sec): 168.30 - samples/sec: 958.30 - lr: 0.000005
|
125 |
+
2024-04-29 19:27:07,471 epoch 4 - iter 1188/1326 - loss 0.05135564 - time (sec): 189.00 - samples/sec: 954.33 - lr: 0.000005
|
126 |
+
2024-04-29 19:27:27,862 epoch 4 - iter 1320/1326 - loss 0.04986070 - time (sec): 209.40 - samples/sec: 940.94 - lr: 0.000005
|
127 |
+
2024-04-29 19:27:28,674 ----------------------------------------------------------------------------------------------------
|
128 |
+
2024-04-29 19:27:28,674 EPOCH 4 done: loss 0.0498 - lr 0.000005
|
129 |
+
2024-04-29 19:27:35,422 Evaluating as a multi-label problem: False
|
130 |
+
2024-04-29 19:27:35,430 DEV : loss 0.06857836991548538 - f1-score (micro avg) 0.7354
|
131 |
+
2024-04-29 19:27:35,440 saving best model
|
132 |
+
2024-04-29 19:27:37,201 ----------------------------------------------------------------------------------------------------
|
133 |
+
2024-04-29 19:27:57,750 epoch 5 - iter 132/1326 - loss 0.05922121 - time (sec): 20.55 - samples/sec: 900.56 - lr: 0.000004
|
134 |
+
2024-04-29 19:28:18,733 epoch 5 - iter 264/1326 - loss 0.04787888 - time (sec): 41.53 - samples/sec: 920.18 - lr: 0.000004
|
135 |
+
2024-04-29 19:28:39,884 epoch 5 - iter 396/1326 - loss 0.04606074 - time (sec): 62.68 - samples/sec: 933.19 - lr: 0.000004
|
136 |
+
2024-04-29 19:29:00,559 epoch 5 - iter 528/1326 - loss 0.04040456 - time (sec): 83.36 - samples/sec: 939.85 - lr: 0.000004
|
137 |
+
2024-04-29 19:29:21,358 epoch 5 - iter 660/1326 - loss 0.03768252 - time (sec): 104.16 - samples/sec: 939.20 - lr: 0.000004
|
138 |
+
2024-04-29 19:29:42,554 epoch 5 - iter 792/1326 - loss 0.03721055 - time (sec): 125.35 - samples/sec: 954.40 - lr: 0.000004
|
139 |
+
2024-04-29 19:30:03,515 epoch 5 - iter 924/1326 - loss 0.04173413 - time (sec): 146.31 - samples/sec: 952.84 - lr: 0.000004
|
140 |
+
2024-04-29 19:30:24,180 epoch 5 - iter 1056/1326 - loss 0.04019307 - time (sec): 166.98 - samples/sec: 945.63 - lr: 0.000004
|
141 |
+
2024-04-29 19:30:44,955 epoch 5 - iter 1188/1326 - loss 0.04065091 - time (sec): 187.75 - samples/sec: 945.36 - lr: 0.000004
|
142 |
+
2024-04-29 19:31:05,582 epoch 5 - iter 1320/1326 - loss 0.03986708 - time (sec): 208.38 - samples/sec: 944.48 - lr: 0.000004
|
143 |
+
2024-04-29 19:31:06,418 ----------------------------------------------------------------------------------------------------
|
144 |
+
2024-04-29 19:31:06,418 EPOCH 5 done: loss 0.0397 - lr 0.000004
|
145 |
+
2024-04-29 19:31:13,174 Evaluating as a multi-label problem: False
|
146 |
+
2024-04-29 19:31:13,181 DEV : loss 0.08110673725605011 - f1-score (micro avg) 0.7018
|
147 |
+
2024-04-29 19:31:13,191 ----------------------------------------------------------------------------------------------------
|
148 |
+
2024-04-29 19:31:34,400 epoch 6 - iter 132/1326 - loss 0.03815522 - time (sec): 21.21 - samples/sec: 956.70 - lr: 0.000004
|
149 |
+
2024-04-29 19:31:55,259 epoch 6 - iter 264/1326 - loss 0.02812200 - time (sec): 42.07 - samples/sec: 942.47 - lr: 0.000004
|
150 |
+
2024-04-29 19:32:16,061 epoch 6 - iter 396/1326 - loss 0.02891512 - time (sec): 62.87 - samples/sec: 943.72 - lr: 0.000004
|
151 |
+
2024-04-29 19:32:36,977 epoch 6 - iter 528/1326 - loss 0.03180526 - time (sec): 83.79 - samples/sec: 958.40 - lr: 0.000004
|
152 |
+
2024-04-29 19:32:57,437 epoch 6 - iter 660/1326 - loss 0.03231067 - time (sec): 104.25 - samples/sec: 933.33 - lr: 0.000004
|
153 |
+
2024-04-29 19:33:18,594 epoch 6 - iter 792/1326 - loss 0.03713967 - time (sec): 125.40 - samples/sec: 948.30 - lr: 0.000004
|
154 |
+
2024-04-29 19:33:39,684 epoch 6 - iter 924/1326 - loss 0.04015671 - time (sec): 146.49 - samples/sec: 954.01 - lr: 0.000004
|
155 |
+
2024-04-29 19:34:00,351 epoch 6 - iter 1056/1326 - loss 0.03794536 - time (sec): 167.16 - samples/sec: 955.37 - lr: 0.000004
|
156 |
+
2024-04-29 19:34:21,179 epoch 6 - iter 1188/1326 - loss 0.03738144 - time (sec): 187.99 - samples/sec: 950.05 - lr: 0.000004
|
157 |
+
2024-04-29 19:34:41,735 epoch 6 - iter 1320/1326 - loss 0.03564541 - time (sec): 208.54 - samples/sec: 942.71 - lr: 0.000004
|
158 |
+
2024-04-29 19:34:42,592 ----------------------------------------------------------------------------------------------------
|
159 |
+
2024-04-29 19:34:42,592 EPOCH 6 done: loss 0.0355 - lr 0.000004
|
160 |
+
2024-04-29 19:34:49,634 Evaluating as a multi-label problem: False
|
161 |
+
2024-04-29 19:34:49,640 DEV : loss 0.07366479188203812 - f1-score (micro avg) 0.7319
|
162 |
+
2024-04-29 19:34:49,648 ----------------------------------------------------------------------------------------------------
|
163 |
+
2024-04-29 19:35:10,664 epoch 7 - iter 132/1326 - loss 0.02986449 - time (sec): 21.02 - samples/sec: 1005.59 - lr: 0.000004
|
164 |
+
2024-04-29 19:35:31,301 epoch 7 - iter 264/1326 - loss 0.02579215 - time (sec): 41.65 - samples/sec: 979.37 - lr: 0.000004
|
165 |
+
2024-04-29 19:35:51,752 epoch 7 - iter 396/1326 - loss 0.02557348 - time (sec): 62.10 - samples/sec: 929.89 - lr: 0.000004
|
166 |
+
2024-04-29 19:36:12,434 epoch 7 - iter 528/1326 - loss 0.02216509 - time (sec): 82.79 - samples/sec: 937.33 - lr: 0.000004
|
167 |
+
2024-04-29 19:36:33,493 epoch 7 - iter 660/1326 - loss 0.03440919 - time (sec): 103.84 - samples/sec: 940.35 - lr: 0.000004
|
168 |
+
2024-04-29 19:36:54,140 epoch 7 - iter 792/1326 - loss 0.03607959 - time (sec): 124.49 - samples/sec: 937.29 - lr: 0.000004
|
169 |
+
2024-04-29 19:37:14,771 epoch 7 - iter 924/1326 - loss 0.03383704 - time (sec): 145.12 - samples/sec: 934.65 - lr: 0.000004
|
170 |
+
2024-04-29 19:37:36,142 epoch 7 - iter 1056/1326 - loss 0.03426834 - time (sec): 166.49 - samples/sec: 949.38 - lr: 0.000004
|
171 |
+
2024-04-29 19:37:57,002 epoch 7 - iter 1188/1326 - loss 0.03352186 - time (sec): 187.35 - samples/sec: 943.93 - lr: 0.000004
|
172 |
+
2024-04-29 19:38:17,959 epoch 7 - iter 1320/1326 - loss 0.03484361 - time (sec): 208.31 - samples/sec: 945.40 - lr: 0.000004
|
173 |
+
2024-04-29 19:38:18,781 ----------------------------------------------------------------------------------------------------
|
174 |
+
2024-04-29 19:38:18,781 EPOCH 7 done: loss 0.0347 - lr 0.000004
|
175 |
+
2024-04-29 19:38:25,463 Evaluating as a multi-label problem: False
|
176 |
+
2024-04-29 19:38:25,471 DEV : loss 0.07581108063459396 - f1-score (micro avg) 0.6724
|
177 |
+
2024-04-29 19:38:25,479 ----------------------------------------------------------------------------------------------------
|
178 |
+
2024-04-29 19:38:47,172 epoch 8 - iter 132/1326 - loss 0.01665357 - time (sec): 21.69 - samples/sec: 1085.75 - lr: 0.000004
|
179 |
+
2024-04-29 19:39:08,019 epoch 8 - iter 264/1326 - loss 0.01646746 - time (sec): 42.54 - samples/sec: 1005.05 - lr: 0.000004
|
180 |
+
2024-04-29 19:39:28,795 epoch 8 - iter 396/1326 - loss 0.02015931 - time (sec): 63.32 - samples/sec: 966.81 - lr: 0.000004
|
181 |
+
2024-04-29 19:39:49,482 epoch 8 - iter 528/1326 - loss 0.01847810 - time (sec): 84.00 - samples/sec: 950.00 - lr: 0.000003
|
182 |
+
2024-04-29 19:40:10,218 epoch 8 - iter 660/1326 - loss 0.02597538 - time (sec): 104.74 - samples/sec: 938.07 - lr: 0.000003
|
183 |
+
2024-04-29 19:40:30,984 epoch 8 - iter 792/1326 - loss 0.03006068 - time (sec): 125.51 - samples/sec: 939.37 - lr: 0.000003
|
184 |
+
2024-04-29 19:40:51,804 epoch 8 - iter 924/1326 - loss 0.03293034 - time (sec): 146.33 - samples/sec: 934.87 - lr: 0.000003
|
185 |
+
2024-04-29 19:41:12,625 epoch 8 - iter 1056/1326 - loss 0.03359083 - time (sec): 167.15 - samples/sec: 933.82 - lr: 0.000003
|
186 |
+
2024-04-29 19:41:33,314 epoch 8 - iter 1188/1326 - loss 0.03258698 - time (sec): 187.84 - samples/sec: 935.06 - lr: 0.000003
|
187 |
+
2024-04-29 19:41:54,067 epoch 8 - iter 1320/1326 - loss 0.03148140 - time (sec): 208.59 - samples/sec: 935.60 - lr: 0.000003
|
188 |
+
2024-04-29 19:41:55,132 ----------------------------------------------------------------------------------------------------
|
189 |
+
2024-04-29 19:41:55,133 EPOCH 8 done: loss 0.0311 - lr 0.000003
|
190 |
+
2024-04-29 19:42:01,776 Evaluating as a multi-label problem: False
|
191 |
+
2024-04-29 19:42:01,783 DEV : loss 0.07848106324672699 - f1-score (micro avg) 0.7319
|
192 |
+
2024-04-29 19:42:01,793 ----------------------------------------------------------------------------------------------------
|
193 |
+
2024-04-29 19:42:23,088 epoch 9 - iter 132/1326 - loss 0.05663547 - time (sec): 21.30 - samples/sec: 957.54 - lr: 0.000003
|
194 |
+
2024-04-29 19:42:43,801 epoch 9 - iter 264/1326 - loss 0.03800695 - time (sec): 42.01 - samples/sec: 912.51 - lr: 0.000003
|
195 |
+
2024-04-29 19:43:04,820 epoch 9 - iter 396/1326 - loss 0.04253517 - time (sec): 63.03 - samples/sec: 935.29 - lr: 0.000003
|
196 |
+
2024-04-29 19:43:26,182 epoch 9 - iter 528/1326 - loss 0.03301648 - time (sec): 84.39 - samples/sec: 979.99 - lr: 0.000003
|
197 |
+
2024-04-29 19:43:46,865 epoch 9 - iter 660/1326 - loss 0.03207756 - time (sec): 105.07 - samples/sec: 970.86 - lr: 0.000003
|
198 |
+
2024-04-29 19:44:07,837 epoch 9 - iter 792/1326 - loss 0.02954204 - time (sec): 126.04 - samples/sec: 968.05 - lr: 0.000003
|
199 |
+
2024-04-29 19:44:28,325 epoch 9 - iter 924/1326 - loss 0.02863646 - time (sec): 146.53 - samples/sec: 957.30 - lr: 0.000003
|
200 |
+
2024-04-29 19:44:49,246 epoch 9 - iter 1056/1326 - loss 0.02794484 - time (sec): 167.45 - samples/sec: 953.28 - lr: 0.000003
|
201 |
+
2024-04-29 19:45:09,934 epoch 9 - iter 1188/1326 - loss 0.02632601 - time (sec): 188.14 - samples/sec: 957.31 - lr: 0.000003
|
202 |
+
2024-04-29 19:45:30,508 epoch 9 - iter 1320/1326 - loss 0.02520696 - time (sec): 208.72 - samples/sec: 944.49 - lr: 0.000003
|
203 |
+
2024-04-29 19:45:31,338 ----------------------------------------------------------------------------------------------------
|
204 |
+
2024-04-29 19:45:31,338 EPOCH 9 done: loss 0.0251 - lr 0.000003
|
205 |
+
2024-04-29 19:45:38,067 Evaluating as a multi-label problem: False
|
206 |
+
2024-04-29 19:45:38,074 DEV : loss 0.08854210376739502 - f1-score (micro avg) 0.7296
|
207 |
+
2024-04-29 19:45:38,082 ----------------------------------------------------------------------------------------------------
|
208 |
+
2024-04-29 19:45:59,093 epoch 10 - iter 132/1326 - loss 0.01233456 - time (sec): 21.01 - samples/sec: 955.33 - lr: 0.000003
|
209 |
+
2024-04-29 19:46:19,871 epoch 10 - iter 264/1326 - loss 0.01442453 - time (sec): 41.79 - samples/sec: 952.73 - lr: 0.000003
|
210 |
+
2024-04-29 19:46:40,768 epoch 10 - iter 396/1326 - loss 0.02226487 - time (sec): 62.69 - samples/sec: 949.03 - lr: 0.000003
|
211 |
+
2024-04-29 19:47:01,240 epoch 10 - iter 528/1326 - loss 0.02393851 - time (sec): 83.16 - samples/sec: 934.21 - lr: 0.000003
|
212 |
+
2024-04-29 19:47:22,284 epoch 10 - iter 660/1326 - loss 0.02377848 - time (sec): 104.20 - samples/sec: 950.07 - lr: 0.000003
|
213 |
+
2024-04-29 19:47:42,834 epoch 10 - iter 792/1326 - loss 0.02245760 - time (sec): 124.75 - samples/sec: 938.46 - lr: 0.000003
|
214 |
+
2024-04-29 19:48:04,028 epoch 10 - iter 924/1326 - loss 0.02302608 - time (sec): 145.95 - samples/sec: 946.93 - lr: 0.000003
|
215 |
+
2024-04-29 19:48:24,992 epoch 10 - iter 1056/1326 - loss 0.02390901 - time (sec): 166.91 - samples/sec: 950.20 - lr: 0.000003
|
216 |
+
2024-04-29 19:48:45,625 epoch 10 - iter 1188/1326 - loss 0.02196178 - time (sec): 187.54 - samples/sec: 947.51 - lr: 0.000003
|
217 |
+
2024-04-29 19:49:06,386 epoch 10 - iter 1320/1326 - loss 0.02391247 - time (sec): 208.30 - samples/sec: 941.85 - lr: 0.000003
|
218 |
+
2024-04-29 19:49:07,302 ----------------------------------------------------------------------------------------------------
|
219 |
+
2024-04-29 19:49:07,302 EPOCH 10 done: loss 0.0237 - lr 0.000003
|
220 |
+
2024-04-29 19:49:14,037 Evaluating as a multi-label problem: False
|
221 |
+
2024-04-29 19:49:14,044 DEV : loss 0.06483861804008484 - f1-score (micro avg) 0.6888
|
222 |
+
2024-04-29 19:49:14,054 ----------------------------------------------------------------------------------------------------
|
223 |
+
2024-04-29 19:49:35,184 epoch 11 - iter 132/1326 - loss 0.01376016 - time (sec): 21.13 - samples/sec: 954.01 - lr: 0.000002
|
224 |
+
2024-04-29 19:49:56,040 epoch 11 - iter 264/1326 - loss 0.00968912 - time (sec): 41.99 - samples/sec: 983.60 - lr: 0.000002
|
225 |
+
2024-04-29 19:50:17,256 epoch 11 - iter 396/1326 - loss 0.01934988 - time (sec): 63.20 - samples/sec: 982.16 - lr: 0.000002
|
226 |
+
2024-04-29 19:50:37,928 epoch 11 - iter 528/1326 - loss 0.02163005 - time (sec): 83.87 - samples/sec: 960.39 - lr: 0.000002
|
227 |
+
2024-04-29 19:50:58,697 epoch 11 - iter 660/1326 - loss 0.02394657 - time (sec): 104.64 - samples/sec: 960.85 - lr: 0.000002
|
228 |
+
2024-04-29 19:51:19,441 epoch 11 - iter 792/1326 - loss 0.02269684 - time (sec): 125.39 - samples/sec: 961.89 - lr: 0.000002
|
229 |
+
2024-04-29 19:51:40,444 epoch 11 - iter 924/1326 - loss 0.02139990 - time (sec): 146.39 - samples/sec: 962.55 - lr: 0.000002
|
230 |
+
2024-04-29 19:52:00,980 epoch 11 - iter 1056/1326 - loss 0.02027515 - time (sec): 166.93 - samples/sec: 952.95 - lr: 0.000002
|
231 |
+
2024-04-29 19:52:21,518 epoch 11 - iter 1188/1326 - loss 0.02049033 - time (sec): 187.46 - samples/sec: 947.71 - lr: 0.000002
|
232 |
+
2024-04-29 19:52:42,468 epoch 11 - iter 1320/1326 - loss 0.02230984 - time (sec): 208.41 - samples/sec: 946.46 - lr: 0.000002
|
233 |
+
2024-04-29 19:52:43,279 ----------------------------------------------------------------------------------------------------
|
234 |
+
2024-04-29 19:52:43,279 EPOCH 11 done: loss 0.0223 - lr 0.000002
|
235 |
+
2024-04-29 19:52:50,027 Evaluating as a multi-label problem: False
|
236 |
+
2024-04-29 19:52:50,034 DEV : loss 0.06557412445545197 - f1-score (micro avg) 0.7149
|
237 |
+
2024-04-29 19:52:50,043 ----------------------------------------------------------------------------------------------------
|
238 |
+
2024-04-29 19:53:11,582 epoch 12 - iter 132/1326 - loss 0.01024615 - time (sec): 21.54 - samples/sec: 1071.15 - lr: 0.000002
|
239 |
+
2024-04-29 19:53:32,395 epoch 12 - iter 264/1326 - loss 0.00787421 - time (sec): 42.35 - samples/sec: 1031.69 - lr: 0.000002
|
240 |
+
2024-04-29 19:53:52,905 epoch 12 - iter 396/1326 - loss 0.01420230 - time (sec): 62.86 - samples/sec: 977.17 - lr: 0.000002
|
241 |
+
2024-04-29 19:54:13,373 epoch 12 - iter 528/1326 - loss 0.01885577 - time (sec): 83.33 - samples/sec: 946.77 - lr: 0.000002
|
242 |
+
2024-04-29 19:54:33,970 epoch 12 - iter 660/1326 - loss 0.01639885 - time (sec): 103.93 - samples/sec: 939.51 - lr: 0.000002
|
243 |
+
2024-04-29 19:54:54,465 epoch 12 - iter 792/1326 - loss 0.01795589 - time (sec): 124.42 - samples/sec: 926.71 - lr: 0.000002
|
244 |
+
2024-04-29 19:55:15,425 epoch 12 - iter 924/1326 - loss 0.01955026 - time (sec): 145.38 - samples/sec: 932.15 - lr: 0.000002
|
245 |
+
2024-04-29 19:55:36,226 epoch 12 - iter 1056/1326 - loss 0.01918242 - time (sec): 166.18 - samples/sec: 937.12 - lr: 0.000002
|
246 |
+
2024-04-29 19:55:57,215 epoch 12 - iter 1188/1326 - loss 0.02175725 - time (sec): 187.17 - samples/sec: 940.08 - lr: 0.000002
|
247 |
+
2024-04-29 19:56:18,316 epoch 12 - iter 1320/1326 - loss 0.02169361 - time (sec): 208.27 - samples/sec: 945.08 - lr: 0.000002
|
248 |
+
2024-04-29 19:56:19,169 ----------------------------------------------------------------------------------------------------
|
249 |
+
2024-04-29 19:56:19,169 EPOCH 12 done: loss 0.0216 - lr 0.000002
|
250 |
+
2024-04-29 19:56:25,901 Evaluating as a multi-label problem: False
|
251 |
+
2024-04-29 19:56:25,908 DEV : loss 0.06691710650920868 - f1-score (micro avg) 0.7311
|
252 |
+
2024-04-29 19:56:25,917 ----------------------------------------------------------------------------------------------------
|
253 |
+
2024-04-29 19:56:46,852 epoch 13 - iter 132/1326 - loss 0.03298837 - time (sec): 20.93 - samples/sec: 897.53 - lr: 0.000002
|
254 |
+
2024-04-29 19:57:07,720 epoch 13 - iter 264/1326 - loss 0.02150903 - time (sec): 41.80 - samples/sec: 951.62 - lr: 0.000002
|
255 |
+
2024-04-29 19:57:28,830 epoch 13 - iter 396/1326 - loss 0.02136301 - time (sec): 62.91 - samples/sec: 965.03 - lr: 0.000002
|
256 |
+
2024-04-29 19:57:49,321 epoch 13 - iter 528/1326 - loss 0.01898453 - time (sec): 83.40 - samples/sec: 945.01 - lr: 0.000002
|
257 |
+
2024-04-29 19:58:09,888 epoch 13 - iter 660/1326 - loss 0.01879336 - time (sec): 103.97 - samples/sec: 935.56 - lr: 0.000002
|
258 |
+
2024-04-29 19:58:30,626 epoch 13 - iter 792/1326 - loss 0.01660965 - time (sec): 124.71 - samples/sec: 939.56 - lr: 0.000002
|
259 |
+
2024-04-29 19:58:50,952 epoch 13 - iter 924/1326 - loss 0.01499323 - time (sec): 145.03 - samples/sec: 927.70 - lr: 0.000001
|
260 |
+
2024-04-29 19:59:12,003 epoch 13 - iter 1056/1326 - loss 0.01833583 - time (sec): 166.09 - samples/sec: 931.71 - lr: 0.000001
|
261 |
+
2024-04-29 19:59:32,777 epoch 13 - iter 1188/1326 - loss 0.01712627 - time (sec): 186.86 - samples/sec: 940.76 - lr: 0.000001
|
262 |
+
2024-04-29 19:59:53,925 epoch 13 - iter 1320/1326 - loss 0.01715871 - time (sec): 208.01 - samples/sec: 947.16 - lr: 0.000001
|
263 |
+
2024-04-29 19:59:54,752 ----------------------------------------------------------------------------------------------------
|
264 |
+
2024-04-29 19:59:54,752 EPOCH 13 done: loss 0.0171 - lr 0.000001
|
265 |
+
2024-04-29 20:00:01,490 Evaluating as a multi-label problem: False
|
266 |
+
2024-04-29 20:00:01,498 DEV : loss 0.06450295448303223 - f1-score (micro avg) 0.75
|
267 |
+
2024-04-29 20:00:01,507 saving best model
|
268 |
+
2024-04-29 20:00:03,658 ----------------------------------------------------------------------------------------------------
|
269 |
+
2024-04-29 20:00:24,702 epoch 14 - iter 132/1326 - loss 0.03889764 - time (sec): 21.04 - samples/sec: 941.38 - lr: 0.000001
|
270 |
+
2024-04-29 20:00:45,459 epoch 14 - iter 264/1326 - loss 0.02605007 - time (sec): 41.80 - samples/sec: 966.52 - lr: 0.000001
|
271 |
+
2024-04-29 20:01:06,329 epoch 14 - iter 396/1326 - loss 0.01987798 - time (sec): 62.67 - samples/sec: 978.59 - lr: 0.000001
|
272 |
+
2024-04-29 20:01:27,084 epoch 14 - iter 528/1326 - loss 0.01886847 - time (sec): 83.43 - samples/sec: 974.50 - lr: 0.000001
|
273 |
+
2024-04-29 20:01:47,683 epoch 14 - iter 660/1326 - loss 0.01798242 - time (sec): 104.02 - samples/sec: 955.07 - lr: 0.000001
|
274 |
+
2024-04-29 20:02:08,157 epoch 14 - iter 792/1326 - loss 0.01593590 - time (sec): 124.50 - samples/sec: 943.49 - lr: 0.000001
|
275 |
+
2024-04-29 20:02:29,259 epoch 14 - iter 924/1326 - loss 0.01623625 - time (sec): 145.60 - samples/sec: 947.55 - lr: 0.000001
|
276 |
+
2024-04-29 20:02:49,779 epoch 14 - iter 1056/1326 - loss 0.01708562 - time (sec): 166.12 - samples/sec: 936.69 - lr: 0.000001
|
277 |
+
2024-04-29 20:03:10,855 epoch 14 - iter 1188/1326 - loss 0.01556387 - time (sec): 187.20 - samples/sec: 949.03 - lr: 0.000001
|
278 |
+
2024-04-29 20:03:31,683 epoch 14 - iter 1320/1326 - loss 0.01533842 - time (sec): 208.02 - samples/sec: 947.01 - lr: 0.000001
|
279 |
+
2024-04-29 20:03:32,470 ----------------------------------------------------------------------------------------------------
|
280 |
+
2024-04-29 20:03:32,470 EPOCH 14 done: loss 0.0153 - lr 0.000001
|
281 |
+
2024-04-29 20:03:39,240 Evaluating as a multi-label problem: False
|
282 |
+
2024-04-29 20:03:39,247 DEV : loss 0.0911756381392479 - f1-score (micro avg) 0.7288
|
283 |
+
2024-04-29 20:03:39,257 ----------------------------------------------------------------------------------------------------
|
284 |
+
2024-04-29 20:04:00,018 epoch 15 - iter 132/1326 - loss 0.01237652 - time (sec): 20.76 - samples/sec: 878.16 - lr: 0.000001
|
285 |
+
2024-04-29 20:04:20,615 epoch 15 - iter 264/1326 - loss 0.01436397 - time (sec): 41.36 - samples/sec: 879.70 - lr: 0.000001
|
286 |
+
2024-04-29 20:04:41,840 epoch 15 - iter 396/1326 - loss 0.01188224 - time (sec): 62.58 - samples/sec: 935.60 - lr: 0.000001
|
287 |
+
2024-04-29 20:05:02,449 epoch 15 - iter 528/1326 - loss 0.01191348 - time (sec): 83.19 - samples/sec: 931.41 - lr: 0.000001
|
288 |
+
2024-04-29 20:05:23,576 epoch 15 - iter 660/1326 - loss 0.01318250 - time (sec): 104.32 - samples/sec: 936.74 - lr: 0.000001
|
289 |
+
2024-04-29 20:05:44,259 epoch 15 - iter 792/1326 - loss 0.01610301 - time (sec): 125.00 - samples/sec: 935.86 - lr: 0.000001
|
290 |
+
2024-04-29 20:06:05,148 epoch 15 - iter 924/1326 - loss 0.01402320 - time (sec): 145.89 - samples/sec: 935.57 - lr: 0.000001
|
291 |
+
2024-04-29 20:06:26,080 epoch 15 - iter 1056/1326 - loss 0.01456286 - time (sec): 166.82 - samples/sec: 943.62 - lr: 0.000001
|
292 |
+
2024-04-29 20:06:46,684 epoch 15 - iter 1188/1326 - loss 0.01366503 - time (sec): 187.43 - samples/sec: 941.11 - lr: 0.000001
|
293 |
+
2024-04-29 20:07:07,514 epoch 15 - iter 1320/1326 - loss 0.01271998 - time (sec): 208.26 - samples/sec: 946.56 - lr: 0.000001
|
294 |
+
2024-04-29 20:07:08,330 ----------------------------------------------------------------------------------------------------
|
295 |
+
2024-04-29 20:07:08,330 EPOCH 15 done: loss 0.0127 - lr 0.000001
|
296 |
+
2024-04-29 20:07:15,376 Evaluating as a multi-label problem: False
|
297 |
+
2024-04-29 20:07:15,383 DEV : loss 0.0763971135020256 - f1-score (micro avg) 0.7424
|
298 |
+
2024-04-29 20:07:15,392 ----------------------------------------------------------------------------------------------------
|
299 |
+
2024-04-29 20:07:35,973 epoch 16 - iter 132/1326 - loss 0.00604141 - time (sec): 20.58 - samples/sec: 925.87 - lr: 0.000001
|
300 |
+
2024-04-29 20:07:56,877 epoch 16 - iter 264/1326 - loss 0.00962106 - time (sec): 41.48 - samples/sec: 944.64 - lr: 0.000001
|
301 |
+
2024-04-29 20:08:17,697 epoch 16 - iter 396/1326 - loss 0.00897610 - time (sec): 62.30 - samples/sec: 966.51 - lr: 0.000001
|
302 |
+
2024-04-29 20:08:38,452 epoch 16 - iter 528/1326 - loss 0.00930250 - time (sec): 83.06 - samples/sec: 969.77 - lr: 0.000001
|
303 |
+
2024-04-29 20:08:58,968 epoch 16 - iter 660/1326 - loss 0.01240910 - time (sec): 103.58 - samples/sec: 948.32 - lr: 0.000001
|
304 |
+
2024-04-29 20:09:19,865 epoch 16 - iter 792/1326 - loss 0.01194240 - time (sec): 124.47 - samples/sec: 955.31 - lr: 0.000001
|
305 |
+
2024-04-29 20:09:40,936 epoch 16 - iter 924/1326 - loss 0.01136229 - time (sec): 145.54 - samples/sec: 954.94 - lr: 0.000001
|
306 |
+
2024-04-29 20:10:01,211 epoch 16 - iter 1056/1326 - loss 0.01241511 - time (sec): 165.82 - samples/sec: 941.01 - lr: 0.000001
|
307 |
+
2024-04-29 20:10:22,278 epoch 16 - iter 1188/1326 - loss 0.01237126 - time (sec): 186.89 - samples/sec: 943.29 - lr: 0.000001
|
308 |
+
2024-04-29 20:10:43,285 epoch 16 - iter 1320/1326 - loss 0.01312447 - time (sec): 207.89 - samples/sec: 945.12 - lr: 0.000000
|
309 |
+
2024-04-29 20:10:44,167 ----------------------------------------------------------------------------------------------------
|
310 |
+
2024-04-29 20:10:44,168 EPOCH 16 done: loss 0.0131 - lr 0.000000
|
311 |
+
2024-04-29 20:10:51,196 Evaluating as a multi-label problem: False
|
312 |
+
2024-04-29 20:10:51,203 DEV : loss 0.07494457066059113 - f1-score (micro avg) 0.7532
|
313 |
+
2024-04-29 20:10:51,212 saving best model
|
314 |
+
2024-04-29 20:10:53,114 ----------------------------------------------------------------------------------------------------
|
315 |
+
2024-04-29 20:11:14,140 epoch 17 - iter 132/1326 - loss 0.01086358 - time (sec): 21.03 - samples/sec: 984.72 - lr: 0.000000
|
316 |
+
2024-04-29 20:11:34,547 epoch 17 - iter 264/1326 - loss 0.00784167 - time (sec): 41.43 - samples/sec: 909.28 - lr: 0.000000
|
317 |
+
2024-04-29 20:11:55,214 epoch 17 - iter 396/1326 - loss 0.00810142 - time (sec): 62.10 - samples/sec: 903.48 - lr: 0.000000
|
318 |
+
2024-04-29 20:12:16,277 epoch 17 - iter 528/1326 - loss 0.01276904 - time (sec): 83.16 - samples/sec: 959.18 - lr: 0.000000
|
319 |
+
2024-04-29 20:12:37,331 epoch 17 - iter 660/1326 - loss 0.01460244 - time (sec): 104.22 - samples/sec: 968.65 - lr: 0.000000
|
320 |
+
2024-04-29 20:12:58,109 epoch 17 - iter 792/1326 - loss 0.01585436 - time (sec): 124.99 - samples/sec: 959.39 - lr: 0.000000
|
321 |
+
2024-04-29 20:13:18,704 epoch 17 - iter 924/1326 - loss 0.01492635 - time (sec): 145.59 - samples/sec: 951.40 - lr: 0.000000
|
322 |
+
2024-04-29 20:13:39,820 epoch 17 - iter 1056/1326 - loss 0.01390514 - time (sec): 166.71 - samples/sec: 959.19 - lr: 0.000000
|
323 |
+
2024-04-29 20:14:00,123 epoch 17 - iter 1188/1326 - loss 0.01331555 - time (sec): 187.01 - samples/sec: 942.26 - lr: 0.000000
|
324 |
+
2024-04-29 20:14:21,036 epoch 17 - iter 1320/1326 - loss 0.01345542 - time (sec): 207.92 - samples/sec: 947.22 - lr: 0.000000
|
325 |
+
2024-04-29 20:14:21,870 ----------------------------------------------------------------------------------------------------
|
326 |
+
2024-04-29 20:14:21,870 EPOCH 17 done: loss 0.0135 - lr 0.000000
|
327 |
+
2024-04-29 20:14:28,607 Evaluating as a multi-label problem: False
|
328 |
+
2024-04-29 20:14:28,614 DEV : loss 0.07131695002317429 - f1-score (micro avg) 0.7265
|
329 |
+
2024-04-29 20:14:28,624 ----------------------------------------------------------------------------------------------------
|
330 |
+
2024-04-29 20:14:49,351 epoch 18 - iter 132/1326 - loss 0.00639355 - time (sec): 20.73 - samples/sec: 843.00 - lr: 0.000000
|
331 |
+
2024-04-29 20:15:10,862 epoch 18 - iter 264/1326 - loss 0.00675824 - time (sec): 42.24 - samples/sec: 972.00 - lr: 0.000000
|
332 |
+
2024-04-29 20:15:31,579 epoch 18 - iter 396/1326 - loss 0.00743064 - time (sec): 62.95 - samples/sec: 969.54 - lr: 0.000000
|
333 |
+
2024-04-29 20:15:52,280 epoch 18 - iter 528/1326 - loss 0.00891165 - time (sec): 83.66 - samples/sec: 964.18 - lr: 0.000000
|
334 |
+
2024-04-29 20:16:13,313 epoch 18 - iter 660/1326 - loss 0.01136151 - time (sec): 104.69 - samples/sec: 974.25 - lr: 0.000000
|
335 |
+
2024-04-29 20:16:34,312 epoch 18 - iter 792/1326 - loss 0.01088078 - time (sec): 125.69 - samples/sec: 974.73 - lr: 0.000000
|
336 |
+
2024-04-29 20:16:54,598 epoch 18 - iter 924/1326 - loss 0.01056691 - time (sec): 145.97 - samples/sec: 955.73 - lr: 0.000000
|
337 |
+
2024-04-29 20:17:15,485 epoch 18 - iter 1056/1326 - loss 0.01338623 - time (sec): 166.86 - samples/sec: 954.02 - lr: 0.000000
|
338 |
+
2024-04-29 20:17:36,152 epoch 18 - iter 1188/1326 - loss 0.01294660 - time (sec): 187.53 - samples/sec: 946.42 - lr: 0.000000
|
339 |
+
2024-04-29 20:17:56,803 epoch 18 - iter 1320/1326 - loss 0.01185896 - time (sec): 208.18 - samples/sec: 943.11 - lr: 0.000000
|
340 |
+
2024-04-29 20:17:57,706 ----------------------------------------------------------------------------------------------------
|
341 |
+
2024-04-29 20:17:57,707 EPOCH 18 done: loss 0.0118 - lr 0.000000
|
342 |
+
2024-04-29 20:18:04,460 Evaluating as a multi-label problem: False
|
343 |
+
2024-04-29 20:18:04,467 DEV : loss 0.06895702332258224 - f1-score (micro avg) 0.7143
|
344 |
+
2024-04-29 20:18:04,477 ----------------------------------------------------------------------------------------------------
|
345 |
+
2024-04-29 20:18:25,241 epoch 19 - iter 132/1326 - loss 0.00779178 - time (sec): 20.76 - samples/sec: 859.34 - lr: 0.000000
|
346 |
+
2024-04-29 20:18:45,931 epoch 19 - iter 264/1326 - loss 0.00883730 - time (sec): 41.45 - samples/sec: 903.23 - lr: 0.000000
|
347 |
+
2024-04-29 20:19:06,584 epoch 19 - iter 396/1326 - loss 0.00865701 - time (sec): 62.11 - samples/sec: 929.04 - lr: 0.000000
|
348 |
+
2024-04-29 20:19:27,143 epoch 19 - iter 528/1326 - loss 0.00931012 - time (sec): 82.67 - samples/sec: 933.53 - lr: 0.000000
|
349 |
+
2024-04-29 20:19:47,967 epoch 19 - iter 660/1326 - loss 0.00893505 - time (sec): 103.49 - samples/sec: 941.05 - lr: 0.000000
|
350 |
+
2024-04-29 20:20:09,276 epoch 19 - iter 792/1326 - loss 0.00983372 - time (sec): 124.80 - samples/sec: 965.06 - lr: 0.000000
|
351 |
+
2024-04-29 20:20:29,681 epoch 19 - iter 924/1326 - loss 0.01071250 - time (sec): 145.20 - samples/sec: 946.81 - lr: 0.000000
|
352 |
+
2024-04-29 20:20:50,714 epoch 19 - iter 1056/1326 - loss 0.01008226 - time (sec): 166.24 - samples/sec: 945.81 - lr: 0.000000
|
353 |
+
2024-04-29 20:21:11,577 epoch 19 - iter 1188/1326 - loss 0.01218936 - time (sec): 187.10 - samples/sec: 948.77 - lr: 0.000000
|
354 |
+
2024-04-29 20:21:32,191 epoch 19 - iter 1320/1326 - loss 0.01151174 - time (sec): 207.71 - samples/sec: 946.17 - lr: 0.000000
|
355 |
+
2024-04-29 20:21:33,090 ----------------------------------------------------------------------------------------------------
|
356 |
+
2024-04-29 20:21:33,090 EPOCH 19 done: loss 0.0115 - lr 0.000000
|
357 |
+
2024-04-29 20:21:39,713 Evaluating as a multi-label problem: False
|
358 |
+
2024-04-29 20:21:39,722 DEV : loss 0.06978413462638855 - f1-score (micro avg) 0.7296
|
359 |
+
2024-04-29 20:21:39,734 ----------------------------------------------------------------------------------------------------
|
360 |
+
2024-04-29 20:22:00,944 epoch 20 - iter 132/1326 - loss 0.00412526 - time (sec): 21.21 - samples/sec: 947.92 - lr: 0.000000
|
361 |
+
2024-04-29 20:22:21,834 epoch 20 - iter 264/1326 - loss 0.00391955 - time (sec): 42.10 - samples/sec: 947.36 - lr: 0.000000
|
362 |
+
2024-04-29 20:22:42,606 epoch 20 - iter 396/1326 - loss 0.00617386 - time (sec): 62.87 - samples/sec: 941.76 - lr: 0.000000
|
363 |
+
2024-04-29 20:23:02,937 epoch 20 - iter 528/1326 - loss 0.00598707 - time (sec): 83.20 - samples/sec: 924.06 - lr: 0.000000
|
364 |
+
2024-04-29 20:23:23,893 epoch 20 - iter 660/1326 - loss 0.00815138 - time (sec): 104.16 - samples/sec: 928.58 - lr: 0.000000
|
365 |
+
2024-04-29 20:23:45,135 epoch 20 - iter 792/1326 - loss 0.00815129 - time (sec): 125.40 - samples/sec: 958.62 - lr: 0.000000
|
366 |
+
2024-04-29 20:24:05,808 epoch 20 - iter 924/1326 - loss 0.00848857 - time (sec): 146.07 - samples/sec: 951.78 - lr: 0.000000
|
367 |
+
2024-04-29 20:24:26,900 epoch 20 - iter 1056/1326 - loss 0.00786540 - time (sec): 167.17 - samples/sec: 960.67 - lr: 0.000000
|
368 |
+
2024-04-29 20:24:47,236 epoch 20 - iter 1188/1326 - loss 0.00819986 - time (sec): 187.50 - samples/sec: 949.58 - lr: 0.000000
|
369 |
+
2024-04-29 20:25:07,654 epoch 20 - iter 1320/1326 - loss 0.01085452 - time (sec): 207.92 - samples/sec: 944.93 - lr: 0.000000
|
370 |
+
2024-04-29 20:25:08,614 ----------------------------------------------------------------------------------------------------
|
371 |
+
2024-04-29 20:25:08,614 EPOCH 20 done: loss 0.0108 - lr 0.000000
|
372 |
+
2024-04-29 20:25:15,217 Evaluating as a multi-label problem: False
|
373 |
+
2024-04-29 20:25:15,224 DEV : loss 0.06992758810520172 - f1-score (micro avg) 0.7296
|
374 |
+
2024-04-29 20:25:16,992 ----------------------------------------------------------------------------------------------------
|
375 |
+
2024-04-29 20:25:47,102 SequenceTagger predicts: Dictionary with 25 tags: O, S-ORG, B-ORG, E-ORG, I-ORG, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-MISC, B-MISC, E-MISC, I-MISC, S-UTE, B-UTE, E-UTE, I-UTE, S-SINGLE_COMPANY, B-SINGLE_COMPANY, E-SINGLE_COMPANY, I-SINGLE_COMPANY
|
376 |
+
2024-04-29 20:25:54,356 Evaluating as a multi-label problem: False
|
377 |
+
2024-04-29 20:25:54,363 0.7039 0.7868 0.7431 0.5944
|
378 |
+
2024-04-29 20:25:54,363
|
379 |
+
Results:
|
380 |
+
- F-score (micro) 0.7431
|
381 |
+
- F-score (macro) 0.7429
|
382 |
+
- Accuracy 0.5944
|
383 |
+
|
384 |
+
By class:
|
385 |
+
precision recall f1-score support
|
386 |
+
|
387 |
+
UTE 0.7568 0.7887 0.7724 71
|
388 |
+
SINGLE_COMPANY 0.6538 0.7846 0.7133 65
|
389 |
+
|
390 |
+
micro avg 0.7039 0.7868 0.7431 136
|
391 |
+
macro avg 0.7053 0.7867 0.7429 136
|
392 |
+
weighted avg 0.7076 0.7868 0.7442 136
|
393 |
+
|
394 |
+
2024-04-29 20:25:54,363 ----------------------------------------------------------------------------------------------------
|
weights.txt
ADDED
File without changes
|