stefan-it commited on
Commit
c340f63
·
1 Parent(s): 2b11a01

Upload folder using huggingface_hub

Browse files
best-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10bc21c3de0a2f5b38e85eeb787ce3c3d3a3c9432d40392ded32d959bc135dff
3
+ size 870793839
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e2611c33efa8d4a223a6421bd6462bf56f83c70c635d2a8a4f4b7b644b34309
3
+ size 870793956
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 17:22:32 0.0001 1.0824 0.0979 0.0000 0.0000 0.0000 0.0000
3
+ 2 17:29:34 0.0001 0.1042 0.0605 0.7925 0.7089 0.7483 0.6154
4
+ 3 17:36:43 0.0001 0.0635 0.0590 0.7286 0.8270 0.7747 0.6533
5
+ 4 17:43:48 0.0001 0.0409 0.0613 0.7683 0.8397 0.8024 0.6838
6
+ 5 17:50:51 0.0001 0.0271 0.0721 0.7260 0.8608 0.7876 0.6602
7
+ 6 17:57:52 0.0001 0.0193 0.0800 0.7519 0.8312 0.7896 0.6701
8
+ 7 18:04:53 0.0001 0.0124 0.0946 0.7529 0.8354 0.7920 0.6712
9
+ 8 18:11:57 0.0000 0.0098 0.0935 0.7388 0.8354 0.7842 0.6622
10
+ 9 18:18:59 0.0000 0.0075 0.0998 0.7395 0.8143 0.7751 0.6498
11
+ 10 18:26:00 0.0000 0.0057 0.1049 0.7433 0.8186 0.7791 0.6554
runs/events.out.tfevents.1697217331.6d4c7681f95b.3224.8 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bbc3501e6deaafb54c3a346ac31977ade4e857620752b197077c7a82082dfe4c
3
+ size 434848
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,260 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-13 17:15:31,806 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-13 17:15:31,809 Model: "SequenceTagger(
3
+ (embeddings): ByT5Embeddings(
4
+ (model): T5EncoderModel(
5
+ (shared): Embedding(384, 1472)
6
+ (encoder): T5Stack(
7
+ (embed_tokens): Embedding(384, 1472)
8
+ (block): ModuleList(
9
+ (0): T5Block(
10
+ (layer): ModuleList(
11
+ (0): T5LayerSelfAttention(
12
+ (SelfAttention): T5Attention(
13
+ (q): Linear(in_features=1472, out_features=384, bias=False)
14
+ (k): Linear(in_features=1472, out_features=384, bias=False)
15
+ (v): Linear(in_features=1472, out_features=384, bias=False)
16
+ (o): Linear(in_features=384, out_features=1472, bias=False)
17
+ (relative_attention_bias): Embedding(32, 6)
18
+ )
19
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (1): T5LayerFF(
23
+ (DenseReluDense): T5DenseGatedActDense(
24
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
25
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
26
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
27
+ (dropout): Dropout(p=0.1, inplace=False)
28
+ (act): NewGELUActivation()
29
+ )
30
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
31
+ (dropout): Dropout(p=0.1, inplace=False)
32
+ )
33
+ )
34
+ )
35
+ (1-11): 11 x T5Block(
36
+ (layer): ModuleList(
37
+ (0): T5LayerSelfAttention(
38
+ (SelfAttention): T5Attention(
39
+ (q): Linear(in_features=1472, out_features=384, bias=False)
40
+ (k): Linear(in_features=1472, out_features=384, bias=False)
41
+ (v): Linear(in_features=1472, out_features=384, bias=False)
42
+ (o): Linear(in_features=384, out_features=1472, bias=False)
43
+ )
44
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
45
+ (dropout): Dropout(p=0.1, inplace=False)
46
+ )
47
+ (1): T5LayerFF(
48
+ (DenseReluDense): T5DenseGatedActDense(
49
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
50
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
51
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
52
+ (dropout): Dropout(p=0.1, inplace=False)
53
+ (act): NewGELUActivation()
54
+ )
55
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
56
+ (dropout): Dropout(p=0.1, inplace=False)
57
+ )
58
+ )
59
+ )
60
+ )
61
+ (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
62
+ (dropout): Dropout(p=0.1, inplace=False)
63
+ )
64
+ )
65
+ )
66
+ (locked_dropout): LockedDropout(p=0.5)
67
+ (linear): Linear(in_features=1472, out_features=13, bias=True)
68
+ (loss_function): CrossEntropyLoss()
69
+ )"
70
+ 2023-10-13 17:15:31,809 ----------------------------------------------------------------------------------------------------
71
+ 2023-10-13 17:15:31,810 MultiCorpus: 6183 train + 680 dev + 2113 test sentences
72
+ - NER_HIPE_2022 Corpus: 6183 train + 680 dev + 2113 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/topres19th/en/with_doc_seperator
73
+ 2023-10-13 17:15:31,810 ----------------------------------------------------------------------------------------------------
74
+ 2023-10-13 17:15:31,810 Train: 6183 sentences
75
+ 2023-10-13 17:15:31,810 (train_with_dev=False, train_with_test=False)
76
+ 2023-10-13 17:15:31,810 ----------------------------------------------------------------------------------------------------
77
+ 2023-10-13 17:15:31,810 Training Params:
78
+ 2023-10-13 17:15:31,810 - learning_rate: "0.00015"
79
+ 2023-10-13 17:15:31,810 - mini_batch_size: "8"
80
+ 2023-10-13 17:15:31,810 - max_epochs: "10"
81
+ 2023-10-13 17:15:31,810 - shuffle: "True"
82
+ 2023-10-13 17:15:31,811 ----------------------------------------------------------------------------------------------------
83
+ 2023-10-13 17:15:31,811 Plugins:
84
+ 2023-10-13 17:15:31,811 - TensorboardLogger
85
+ 2023-10-13 17:15:31,811 - LinearScheduler | warmup_fraction: '0.1'
86
+ 2023-10-13 17:15:31,811 ----------------------------------------------------------------------------------------------------
87
+ 2023-10-13 17:15:31,811 Final evaluation on model from best epoch (best-model.pt)
88
+ 2023-10-13 17:15:31,811 - metric: "('micro avg', 'f1-score')"
89
+ 2023-10-13 17:15:31,811 ----------------------------------------------------------------------------------------------------
90
+ 2023-10-13 17:15:31,811 Computation:
91
+ 2023-10-13 17:15:31,811 - compute on device: cuda:0
92
+ 2023-10-13 17:15:31,811 - embedding storage: none
93
+ 2023-10-13 17:15:31,811 ----------------------------------------------------------------------------------------------------
94
+ 2023-10-13 17:15:31,811 Model training base path: "hmbench-topres19th/en-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3"
95
+ 2023-10-13 17:15:31,812 ----------------------------------------------------------------------------------------------------
96
+ 2023-10-13 17:15:31,812 ----------------------------------------------------------------------------------------------------
97
+ 2023-10-13 17:15:31,812 Logging anything other than scalars to TensorBoard is currently not supported.
98
+ 2023-10-13 17:16:11,330 epoch 1 - iter 77/773 - loss 2.53777300 - time (sec): 39.52 - samples/sec: 292.16 - lr: 0.000015 - momentum: 0.000000
99
+ 2023-10-13 17:16:51,228 epoch 1 - iter 154/773 - loss 2.49369541 - time (sec): 79.41 - samples/sec: 303.18 - lr: 0.000030 - momentum: 0.000000
100
+ 2023-10-13 17:17:31,929 epoch 1 - iter 231/773 - loss 2.32948188 - time (sec): 120.11 - samples/sec: 304.80 - lr: 0.000045 - momentum: 0.000000
101
+ 2023-10-13 17:18:13,110 epoch 1 - iter 308/773 - loss 2.11470827 - time (sec): 161.30 - samples/sec: 304.89 - lr: 0.000060 - momentum: 0.000000
102
+ 2023-10-13 17:18:54,504 epoch 1 - iter 385/773 - loss 1.90288187 - time (sec): 202.69 - samples/sec: 301.50 - lr: 0.000075 - momentum: 0.000000
103
+ 2023-10-13 17:19:34,785 epoch 1 - iter 462/773 - loss 1.67712086 - time (sec): 242.97 - samples/sec: 301.73 - lr: 0.000089 - momentum: 0.000000
104
+ 2023-10-13 17:20:14,388 epoch 1 - iter 539/773 - loss 1.47746186 - time (sec): 282.57 - samples/sec: 303.06 - lr: 0.000104 - momentum: 0.000000
105
+ 2023-10-13 17:20:54,452 epoch 1 - iter 616/773 - loss 1.31731180 - time (sec): 322.64 - samples/sec: 304.67 - lr: 0.000119 - momentum: 0.000000
106
+ 2023-10-13 17:21:34,169 epoch 1 - iter 693/773 - loss 1.19495894 - time (sec): 362.35 - samples/sec: 305.47 - lr: 0.000134 - momentum: 0.000000
107
+ 2023-10-13 17:22:14,919 epoch 1 - iter 770/773 - loss 1.08518740 - time (sec): 403.10 - samples/sec: 307.41 - lr: 0.000149 - momentum: 0.000000
108
+ 2023-10-13 17:22:16,338 ----------------------------------------------------------------------------------------------------
109
+ 2023-10-13 17:22:16,338 EPOCH 1 done: loss 1.0824 - lr: 0.000149
110
+ 2023-10-13 17:22:32,706 DEV : loss 0.09791397303342819 - f1-score (micro avg) 0.0
111
+ 2023-10-13 17:22:32,733 ----------------------------------------------------------------------------------------------------
112
+ 2023-10-13 17:23:12,872 epoch 2 - iter 77/773 - loss 0.13435438 - time (sec): 40.14 - samples/sec: 279.67 - lr: 0.000148 - momentum: 0.000000
113
+ 2023-10-13 17:23:53,937 epoch 2 - iter 154/773 - loss 0.12660142 - time (sec): 81.20 - samples/sec: 285.64 - lr: 0.000147 - momentum: 0.000000
114
+ 2023-10-13 17:24:34,663 epoch 2 - iter 231/773 - loss 0.12315260 - time (sec): 121.93 - samples/sec: 297.53 - lr: 0.000145 - momentum: 0.000000
115
+ 2023-10-13 17:25:14,614 epoch 2 - iter 308/773 - loss 0.12161588 - time (sec): 161.88 - samples/sec: 302.05 - lr: 0.000143 - momentum: 0.000000
116
+ 2023-10-13 17:25:54,874 epoch 2 - iter 385/773 - loss 0.11903730 - time (sec): 202.14 - samples/sec: 304.42 - lr: 0.000142 - momentum: 0.000000
117
+ 2023-10-13 17:26:34,375 epoch 2 - iter 462/773 - loss 0.11583604 - time (sec): 241.64 - samples/sec: 302.69 - lr: 0.000140 - momentum: 0.000000
118
+ 2023-10-13 17:27:13,810 epoch 2 - iter 539/773 - loss 0.11302760 - time (sec): 281.07 - samples/sec: 301.81 - lr: 0.000138 - momentum: 0.000000
119
+ 2023-10-13 17:27:54,601 epoch 2 - iter 616/773 - loss 0.10939658 - time (sec): 321.87 - samples/sec: 306.16 - lr: 0.000137 - momentum: 0.000000
120
+ 2023-10-13 17:28:34,289 epoch 2 - iter 693/773 - loss 0.10642948 - time (sec): 361.55 - samples/sec: 305.63 - lr: 0.000135 - momentum: 0.000000
121
+ 2023-10-13 17:29:15,133 epoch 2 - iter 770/773 - loss 0.10414025 - time (sec): 402.40 - samples/sec: 308.04 - lr: 0.000133 - momentum: 0.000000
122
+ 2023-10-13 17:29:16,532 ----------------------------------------------------------------------------------------------------
123
+ 2023-10-13 17:29:16,533 EPOCH 2 done: loss 0.1042 - lr: 0.000133
124
+ 2023-10-13 17:29:34,169 DEV : loss 0.06052974984049797 - f1-score (micro avg) 0.7483
125
+ 2023-10-13 17:29:34,204 saving best model
126
+ 2023-10-13 17:29:35,171 ----------------------------------------------------------------------------------------------------
127
+ 2023-10-13 17:30:15,916 epoch 3 - iter 77/773 - loss 0.06637866 - time (sec): 40.74 - samples/sec: 315.46 - lr: 0.000132 - momentum: 0.000000
128
+ 2023-10-13 17:30:57,103 epoch 3 - iter 154/773 - loss 0.07367969 - time (sec): 81.93 - samples/sec: 307.59 - lr: 0.000130 - momentum: 0.000000
129
+ 2023-10-13 17:31:38,243 epoch 3 - iter 231/773 - loss 0.06770262 - time (sec): 123.07 - samples/sec: 303.97 - lr: 0.000128 - momentum: 0.000000
130
+ 2023-10-13 17:32:19,573 epoch 3 - iter 308/773 - loss 0.06804529 - time (sec): 164.40 - samples/sec: 301.03 - lr: 0.000127 - momentum: 0.000000
131
+ 2023-10-13 17:33:01,236 epoch 3 - iter 385/773 - loss 0.06731084 - time (sec): 206.06 - samples/sec: 297.59 - lr: 0.000125 - momentum: 0.000000
132
+ 2023-10-13 17:33:42,967 epoch 3 - iter 462/773 - loss 0.06679348 - time (sec): 247.79 - samples/sec: 298.22 - lr: 0.000123 - momentum: 0.000000
133
+ 2023-10-13 17:34:22,539 epoch 3 - iter 539/773 - loss 0.06486323 - time (sec): 287.37 - samples/sec: 300.80 - lr: 0.000122 - momentum: 0.000000
134
+ 2023-10-13 17:35:02,752 epoch 3 - iter 616/773 - loss 0.06256280 - time (sec): 327.58 - samples/sec: 301.75 - lr: 0.000120 - momentum: 0.000000
135
+ 2023-10-13 17:35:43,326 epoch 3 - iter 693/773 - loss 0.06248414 - time (sec): 368.15 - samples/sec: 302.07 - lr: 0.000118 - momentum: 0.000000
136
+ 2023-10-13 17:36:23,839 epoch 3 - iter 770/773 - loss 0.06337434 - time (sec): 408.67 - samples/sec: 302.55 - lr: 0.000117 - momentum: 0.000000
137
+ 2023-10-13 17:36:25,542 ----------------------------------------------------------------------------------------------------
138
+ 2023-10-13 17:36:25,543 EPOCH 3 done: loss 0.0635 - lr: 0.000117
139
+ 2023-10-13 17:36:43,553 DEV : loss 0.05895433574914932 - f1-score (micro avg) 0.7747
140
+ 2023-10-13 17:36:43,582 saving best model
141
+ 2023-10-13 17:36:46,243 ----------------------------------------------------------------------------------------------------
142
+ 2023-10-13 17:37:26,456 epoch 4 - iter 77/773 - loss 0.03920921 - time (sec): 40.21 - samples/sec: 297.47 - lr: 0.000115 - momentum: 0.000000
143
+ 2023-10-13 17:38:05,917 epoch 4 - iter 154/773 - loss 0.04471010 - time (sec): 79.67 - samples/sec: 302.53 - lr: 0.000113 - momentum: 0.000000
144
+ 2023-10-13 17:38:46,036 epoch 4 - iter 231/773 - loss 0.04357768 - time (sec): 119.79 - samples/sec: 306.88 - lr: 0.000112 - momentum: 0.000000
145
+ 2023-10-13 17:39:25,576 epoch 4 - iter 308/773 - loss 0.04110929 - time (sec): 159.33 - samples/sec: 303.72 - lr: 0.000110 - momentum: 0.000000
146
+ 2023-10-13 17:40:06,355 epoch 4 - iter 385/773 - loss 0.04203085 - time (sec): 200.11 - samples/sec: 303.61 - lr: 0.000108 - momentum: 0.000000
147
+ 2023-10-13 17:40:47,876 epoch 4 - iter 462/773 - loss 0.04164789 - time (sec): 241.63 - samples/sec: 305.88 - lr: 0.000107 - momentum: 0.000000
148
+ 2023-10-13 17:41:27,686 epoch 4 - iter 539/773 - loss 0.04178221 - time (sec): 281.44 - samples/sec: 305.45 - lr: 0.000105 - momentum: 0.000000
149
+ 2023-10-13 17:42:08,800 epoch 4 - iter 616/773 - loss 0.04237077 - time (sec): 322.55 - samples/sec: 306.22 - lr: 0.000103 - momentum: 0.000000
150
+ 2023-10-13 17:42:49,799 epoch 4 - iter 693/773 - loss 0.04162761 - time (sec): 363.55 - samples/sec: 307.00 - lr: 0.000102 - momentum: 0.000000
151
+ 2023-10-13 17:43:30,072 epoch 4 - iter 770/773 - loss 0.04094877 - time (sec): 403.82 - samples/sec: 306.52 - lr: 0.000100 - momentum: 0.000000
152
+ 2023-10-13 17:43:31,577 ----------------------------------------------------------------------------------------------------
153
+ 2023-10-13 17:43:31,577 EPOCH 4 done: loss 0.0409 - lr: 0.000100
154
+ 2023-10-13 17:43:48,933 DEV : loss 0.061349667608737946 - f1-score (micro avg) 0.8024
155
+ 2023-10-13 17:43:48,960 saving best model
156
+ 2023-10-13 17:43:51,557 ----------------------------------------------------------------------------------------------------
157
+ 2023-10-13 17:44:32,555 epoch 5 - iter 77/773 - loss 0.02670187 - time (sec): 40.99 - samples/sec: 319.39 - lr: 0.000098 - momentum: 0.000000
158
+ 2023-10-13 17:45:12,578 epoch 5 - iter 154/773 - loss 0.02458544 - time (sec): 81.02 - samples/sec: 301.83 - lr: 0.000097 - momentum: 0.000000
159
+ 2023-10-13 17:45:52,902 epoch 5 - iter 231/773 - loss 0.02523199 - time (sec): 121.34 - samples/sec: 307.11 - lr: 0.000095 - momentum: 0.000000
160
+ 2023-10-13 17:46:33,070 epoch 5 - iter 308/773 - loss 0.02445345 - time (sec): 161.51 - samples/sec: 308.37 - lr: 0.000093 - momentum: 0.000000
161
+ 2023-10-13 17:47:14,170 epoch 5 - iter 385/773 - loss 0.02715882 - time (sec): 202.61 - samples/sec: 309.92 - lr: 0.000092 - momentum: 0.000000
162
+ 2023-10-13 17:47:54,217 epoch 5 - iter 462/773 - loss 0.02748993 - time (sec): 242.66 - samples/sec: 311.16 - lr: 0.000090 - momentum: 0.000000
163
+ 2023-10-13 17:48:34,167 epoch 5 - iter 539/773 - loss 0.02757364 - time (sec): 282.61 - samples/sec: 310.41 - lr: 0.000088 - momentum: 0.000000
164
+ 2023-10-13 17:49:13,961 epoch 5 - iter 616/773 - loss 0.02745903 - time (sec): 322.40 - samples/sec: 311.14 - lr: 0.000087 - momentum: 0.000000
165
+ 2023-10-13 17:49:53,667 epoch 5 - iter 693/773 - loss 0.02735164 - time (sec): 362.11 - samples/sec: 310.47 - lr: 0.000085 - momentum: 0.000000
166
+ 2023-10-13 17:50:32,927 epoch 5 - iter 770/773 - loss 0.02703869 - time (sec): 401.37 - samples/sec: 308.69 - lr: 0.000083 - momentum: 0.000000
167
+ 2023-10-13 17:50:34,344 ----------------------------------------------------------------------------------------------------
168
+ 2023-10-13 17:50:34,345 EPOCH 5 done: loss 0.0271 - lr: 0.000083
169
+ 2023-10-13 17:50:51,329 DEV : loss 0.07207323610782623 - f1-score (micro avg) 0.7876
170
+ 2023-10-13 17:50:51,359 ----------------------------------------------------------------------------------------------------
171
+ 2023-10-13 17:51:32,093 epoch 6 - iter 77/773 - loss 0.02037186 - time (sec): 40.73 - samples/sec: 324.37 - lr: 0.000082 - momentum: 0.000000
172
+ 2023-10-13 17:52:11,821 epoch 6 - iter 154/773 - loss 0.02091576 - time (sec): 80.46 - samples/sec: 306.02 - lr: 0.000080 - momentum: 0.000000
173
+ 2023-10-13 17:52:53,494 epoch 6 - iter 231/773 - loss 0.02128225 - time (sec): 122.13 - samples/sec: 310.25 - lr: 0.000078 - momentum: 0.000000
174
+ 2023-10-13 17:53:34,630 epoch 6 - iter 308/773 - loss 0.02091718 - time (sec): 163.27 - samples/sec: 307.22 - lr: 0.000077 - momentum: 0.000000
175
+ 2023-10-13 17:54:14,102 epoch 6 - iter 385/773 - loss 0.01902102 - time (sec): 202.74 - samples/sec: 304.66 - lr: 0.000075 - momentum: 0.000000
176
+ 2023-10-13 17:54:54,541 epoch 6 - iter 462/773 - loss 0.01975430 - time (sec): 243.18 - samples/sec: 306.46 - lr: 0.000073 - momentum: 0.000000
177
+ 2023-10-13 17:55:33,990 epoch 6 - iter 539/773 - loss 0.01975584 - time (sec): 282.63 - samples/sec: 305.55 - lr: 0.000072 - momentum: 0.000000
178
+ 2023-10-13 17:56:13,256 epoch 6 - iter 616/773 - loss 0.01933327 - time (sec): 321.89 - samples/sec: 304.57 - lr: 0.000070 - momentum: 0.000000
179
+ 2023-10-13 17:56:53,888 epoch 6 - iter 693/773 - loss 0.01883937 - time (sec): 362.53 - samples/sec: 304.22 - lr: 0.000068 - momentum: 0.000000
180
+ 2023-10-13 17:57:34,581 epoch 6 - iter 770/773 - loss 0.01913272 - time (sec): 403.22 - samples/sec: 307.28 - lr: 0.000067 - momentum: 0.000000
181
+ 2023-10-13 17:57:36,011 ----------------------------------------------------------------------------------------------------
182
+ 2023-10-13 17:57:36,012 EPOCH 6 done: loss 0.0193 - lr: 0.000067
183
+ 2023-10-13 17:57:52,925 DEV : loss 0.08004289120435715 - f1-score (micro avg) 0.7896
184
+ 2023-10-13 17:57:52,953 ----------------------------------------------------------------------------------------------------
185
+ 2023-10-13 17:58:33,473 epoch 7 - iter 77/773 - loss 0.01052829 - time (sec): 40.52 - samples/sec: 315.07 - lr: 0.000065 - momentum: 0.000000
186
+ 2023-10-13 17:59:12,688 epoch 7 - iter 154/773 - loss 0.01217112 - time (sec): 79.73 - samples/sec: 311.68 - lr: 0.000063 - momentum: 0.000000
187
+ 2023-10-13 17:59:52,552 epoch 7 - iter 231/773 - loss 0.01258506 - time (sec): 119.60 - samples/sec: 311.55 - lr: 0.000062 - momentum: 0.000000
188
+ 2023-10-13 18:00:33,331 epoch 7 - iter 308/773 - loss 0.01270538 - time (sec): 160.38 - samples/sec: 311.57 - lr: 0.000060 - momentum: 0.000000
189
+ 2023-10-13 18:01:13,969 epoch 7 - iter 385/773 - loss 0.01276005 - time (sec): 201.01 - samples/sec: 309.45 - lr: 0.000058 - momentum: 0.000000
190
+ 2023-10-13 18:01:53,522 epoch 7 - iter 462/773 - loss 0.01229893 - time (sec): 240.57 - samples/sec: 308.18 - lr: 0.000057 - momentum: 0.000000
191
+ 2023-10-13 18:02:33,945 epoch 7 - iter 539/773 - loss 0.01220461 - time (sec): 280.99 - samples/sec: 307.69 - lr: 0.000055 - momentum: 0.000000
192
+ 2023-10-13 18:03:14,733 epoch 7 - iter 616/773 - loss 0.01161193 - time (sec): 321.78 - samples/sec: 308.13 - lr: 0.000054 - momentum: 0.000000
193
+ 2023-10-13 18:03:55,166 epoch 7 - iter 693/773 - loss 0.01260338 - time (sec): 362.21 - samples/sec: 307.12 - lr: 0.000052 - momentum: 0.000000
194
+ 2023-10-13 18:04:34,914 epoch 7 - iter 770/773 - loss 0.01235583 - time (sec): 401.96 - samples/sec: 307.96 - lr: 0.000050 - momentum: 0.000000
195
+ 2023-10-13 18:04:36,433 ----------------------------------------------------------------------------------------------------
196
+ 2023-10-13 18:04:36,433 EPOCH 7 done: loss 0.0124 - lr: 0.000050
197
+ 2023-10-13 18:04:53,194 DEV : loss 0.09459361433982849 - f1-score (micro avg) 0.792
198
+ 2023-10-13 18:04:53,223 ----------------------------------------------------------------------------------------------------
199
+ 2023-10-13 18:05:33,830 epoch 8 - iter 77/773 - loss 0.01144622 - time (sec): 40.60 - samples/sec: 328.26 - lr: 0.000048 - momentum: 0.000000
200
+ 2023-10-13 18:06:15,374 epoch 8 - iter 154/773 - loss 0.00970948 - time (sec): 82.15 - samples/sec: 316.58 - lr: 0.000047 - momentum: 0.000000
201
+ 2023-10-13 18:06:56,115 epoch 8 - iter 231/773 - loss 0.00944804 - time (sec): 122.89 - samples/sec: 308.35 - lr: 0.000045 - momentum: 0.000000
202
+ 2023-10-13 18:07:37,287 epoch 8 - iter 308/773 - loss 0.00923867 - time (sec): 164.06 - samples/sec: 306.77 - lr: 0.000043 - momentum: 0.000000
203
+ 2023-10-13 18:08:18,386 epoch 8 - iter 385/773 - loss 0.00978032 - time (sec): 205.16 - samples/sec: 310.98 - lr: 0.000042 - momentum: 0.000000
204
+ 2023-10-13 18:08:59,496 epoch 8 - iter 462/773 - loss 0.01173064 - time (sec): 246.27 - samples/sec: 308.52 - lr: 0.000040 - momentum: 0.000000
205
+ 2023-10-13 18:09:39,687 epoch 8 - iter 539/773 - loss 0.01090035 - time (sec): 286.46 - samples/sec: 307.75 - lr: 0.000039 - momentum: 0.000000
206
+ 2023-10-13 18:10:18,679 epoch 8 - iter 616/773 - loss 0.01064762 - time (sec): 325.45 - samples/sec: 304.49 - lr: 0.000037 - momentum: 0.000000
207
+ 2023-10-13 18:10:58,847 epoch 8 - iter 693/773 - loss 0.01013823 - time (sec): 365.62 - samples/sec: 303.87 - lr: 0.000035 - momentum: 0.000000
208
+ 2023-10-13 18:11:38,667 epoch 8 - iter 770/773 - loss 0.00975702 - time (sec): 405.44 - samples/sec: 305.54 - lr: 0.000034 - momentum: 0.000000
209
+ 2023-10-13 18:11:40,105 ----------------------------------------------------------------------------------------------------
210
+ 2023-10-13 18:11:40,106 EPOCH 8 done: loss 0.0098 - lr: 0.000034
211
+ 2023-10-13 18:11:57,039 DEV : loss 0.09352090209722519 - f1-score (micro avg) 0.7842
212
+ 2023-10-13 18:11:57,070 ----------------------------------------------------------------------------------------------------
213
+ 2023-10-13 18:12:37,400 epoch 9 - iter 77/773 - loss 0.00652820 - time (sec): 40.33 - samples/sec: 318.29 - lr: 0.000032 - momentum: 0.000000
214
+ 2023-10-13 18:13:18,257 epoch 9 - iter 154/773 - loss 0.00570655 - time (sec): 81.18 - samples/sec: 317.61 - lr: 0.000030 - momentum: 0.000000
215
+ 2023-10-13 18:13:57,276 epoch 9 - iter 231/773 - loss 0.00585034 - time (sec): 120.20 - samples/sec: 310.14 - lr: 0.000028 - momentum: 0.000000
216
+ 2023-10-13 18:14:37,023 epoch 9 - iter 308/773 - loss 0.00612878 - time (sec): 159.95 - samples/sec: 308.35 - lr: 0.000027 - momentum: 0.000000
217
+ 2023-10-13 18:15:16,631 epoch 9 - iter 385/773 - loss 0.00671168 - time (sec): 199.56 - samples/sec: 307.49 - lr: 0.000025 - momentum: 0.000000
218
+ 2023-10-13 18:15:56,285 epoch 9 - iter 462/773 - loss 0.00718371 - time (sec): 239.21 - samples/sec: 304.49 - lr: 0.000024 - momentum: 0.000000
219
+ 2023-10-13 18:16:37,807 epoch 9 - iter 539/773 - loss 0.00797348 - time (sec): 280.74 - samples/sec: 305.58 - lr: 0.000022 - momentum: 0.000000
220
+ 2023-10-13 18:17:18,932 epoch 9 - iter 616/773 - loss 0.00801701 - time (sec): 321.86 - samples/sec: 304.66 - lr: 0.000020 - momentum: 0.000000
221
+ 2023-10-13 18:17:59,375 epoch 9 - iter 693/773 - loss 0.00786144 - time (sec): 362.30 - samples/sec: 304.33 - lr: 0.000019 - momentum: 0.000000
222
+ 2023-10-13 18:18:40,458 epoch 9 - iter 770/773 - loss 0.00750063 - time (sec): 403.39 - samples/sec: 307.03 - lr: 0.000017 - momentum: 0.000000
223
+ 2023-10-13 18:18:41,956 ----------------------------------------------------------------------------------------------------
224
+ 2023-10-13 18:18:41,956 EPOCH 9 done: loss 0.0075 - lr: 0.000017
225
+ 2023-10-13 18:18:59,133 DEV : loss 0.09976237267255783 - f1-score (micro avg) 0.7751
226
+ 2023-10-13 18:18:59,164 ----------------------------------------------------------------------------------------------------
227
+ 2023-10-13 18:19:39,491 epoch 10 - iter 77/773 - loss 0.00581833 - time (sec): 40.33 - samples/sec: 301.35 - lr: 0.000015 - momentum: 0.000000
228
+ 2023-10-13 18:20:18,915 epoch 10 - iter 154/773 - loss 0.00435431 - time (sec): 79.75 - samples/sec: 296.06 - lr: 0.000014 - momentum: 0.000000
229
+ 2023-10-13 18:20:59,083 epoch 10 - iter 231/773 - loss 0.00547099 - time (sec): 119.92 - samples/sec: 294.88 - lr: 0.000012 - momentum: 0.000000
230
+ 2023-10-13 18:21:38,863 epoch 10 - iter 308/773 - loss 0.00540846 - time (sec): 159.70 - samples/sec: 301.38 - lr: 0.000010 - momentum: 0.000000
231
+ 2023-10-13 18:22:18,093 epoch 10 - iter 385/773 - loss 0.00499552 - time (sec): 198.93 - samples/sec: 304.11 - lr: 0.000009 - momentum: 0.000000
232
+ 2023-10-13 18:22:58,689 epoch 10 - iter 462/773 - loss 0.00533351 - time (sec): 239.52 - samples/sec: 307.37 - lr: 0.000007 - momentum: 0.000000
233
+ 2023-10-13 18:23:39,202 epoch 10 - iter 539/773 - loss 0.00590520 - time (sec): 280.04 - samples/sec: 307.79 - lr: 0.000005 - momentum: 0.000000
234
+ 2023-10-13 18:24:20,024 epoch 10 - iter 616/773 - loss 0.00588295 - time (sec): 320.86 - samples/sec: 309.60 - lr: 0.000004 - momentum: 0.000000
235
+ 2023-10-13 18:25:00,654 epoch 10 - iter 693/773 - loss 0.00607221 - time (sec): 361.49 - samples/sec: 308.83 - lr: 0.000002 - momentum: 0.000000
236
+ 2023-10-13 18:25:40,608 epoch 10 - iter 770/773 - loss 0.00573109 - time (sec): 401.44 - samples/sec: 308.29 - lr: 0.000000 - momentum: 0.000000
237
+ 2023-10-13 18:25:42,149 ----------------------------------------------------------------------------------------------------
238
+ 2023-10-13 18:25:42,149 EPOCH 10 done: loss 0.0057 - lr: 0.000000
239
+ 2023-10-13 18:25:59,998 DEV : loss 0.10486873239278793 - f1-score (micro avg) 0.7791
240
+ 2023-10-13 18:26:01,376 ----------------------------------------------------------------------------------------------------
241
+ 2023-10-13 18:26:01,377 Loading model from best epoch ...
242
+ 2023-10-13 18:26:05,333 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-BUILDING, B-BUILDING, E-BUILDING, I-BUILDING, S-STREET, B-STREET, E-STREET, I-STREET
243
+ 2023-10-13 18:27:01,370
244
+ Results:
245
+ - F-score (micro) 0.7957
246
+ - F-score (macro) 0.7121
247
+ - Accuracy 0.6857
248
+
249
+ By class:
250
+ precision recall f1-score support
251
+
252
+ LOC 0.8471 0.8436 0.8453 946
253
+ BUILDING 0.5362 0.6811 0.6000 185
254
+ STREET 0.7037 0.6786 0.6909 56
255
+
256
+ micro avg 0.7815 0.8104 0.7957 1187
257
+ macro avg 0.6957 0.7344 0.7121 1187
258
+ weighted avg 0.7919 0.8104 0.7998 1187
259
+
260
+ 2023-10-13 18:27:01,370 ----------------------------------------------------------------------------------------------------