stefan-it commited on
Commit
400e985
·
1 Parent(s): d367c0a

Upload folder using huggingface_hub

Browse files
best-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe4fd1db139a08efbec1ac2ee507b5c21d03fc4bc321b5fbf7ccd6555c779157
3
+ size 870793839
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4318a2b527bd1158cd8cc1abdd37b762c39ae39743ca2e8df44c50d4bed32d98
3
+ size 870793956
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 22:58:31 0.0001 0.7800 0.1442 0.4530 0.3089 0.3673 0.2298
3
+ 2 23:14:36 0.0001 0.0981 0.0997 0.5403 0.7288 0.6206 0.4570
4
+ 3 23:30:54 0.0001 0.0608 0.1474 0.5379 0.7471 0.6255 0.4654
5
+ 4 23:47:04 0.0001 0.0437 0.1757 0.5276 0.7426 0.6169 0.4561
6
+ 5 00:03:20 0.0001 0.0331 0.2358 0.5226 0.7529 0.6170 0.4563
7
+ 6 00:19:56 0.0001 0.0243 0.2818 0.5456 0.7735 0.6398 0.4801
8
+ 7 00:36:26 0.0001 0.0182 0.3023 0.5556 0.7838 0.6502 0.4917
9
+ 8 00:52:16 0.0000 0.0134 0.3305 0.5673 0.7574 0.6487 0.4893
10
+ 9 01:08:26 0.0000 0.0096 0.3649 0.5591 0.7689 0.6474 0.4877
11
+ 10 01:24:25 0.0000 0.0081 0.3651 0.5601 0.7414 0.6381 0.4782
runs/events.out.tfevents.1697150545.c8b2203b18a8.2923.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d6e14f7a1a32b2e198d0569c5b2320ee0b19787bc8204ac6571b4423effb1285
3
+ size 1018100
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,262 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-12 22:42:25,718 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-12 22:42:25,721 Model: "SequenceTagger(
3
+ (embeddings): ByT5Embeddings(
4
+ (model): T5EncoderModel(
5
+ (shared): Embedding(384, 1472)
6
+ (encoder): T5Stack(
7
+ (embed_tokens): Embedding(384, 1472)
8
+ (block): ModuleList(
9
+ (0): T5Block(
10
+ (layer): ModuleList(
11
+ (0): T5LayerSelfAttention(
12
+ (SelfAttention): T5Attention(
13
+ (q): Linear(in_features=1472, out_features=384, bias=False)
14
+ (k): Linear(in_features=1472, out_features=384, bias=False)
15
+ (v): Linear(in_features=1472, out_features=384, bias=False)
16
+ (o): Linear(in_features=384, out_features=1472, bias=False)
17
+ (relative_attention_bias): Embedding(32, 6)
18
+ )
19
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (1): T5LayerFF(
23
+ (DenseReluDense): T5DenseGatedActDense(
24
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
25
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
26
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
27
+ (dropout): Dropout(p=0.1, inplace=False)
28
+ (act): NewGELUActivation()
29
+ )
30
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
31
+ (dropout): Dropout(p=0.1, inplace=False)
32
+ )
33
+ )
34
+ )
35
+ (1-11): 11 x T5Block(
36
+ (layer): ModuleList(
37
+ (0): T5LayerSelfAttention(
38
+ (SelfAttention): T5Attention(
39
+ (q): Linear(in_features=1472, out_features=384, bias=False)
40
+ (k): Linear(in_features=1472, out_features=384, bias=False)
41
+ (v): Linear(in_features=1472, out_features=384, bias=False)
42
+ (o): Linear(in_features=384, out_features=1472, bias=False)
43
+ )
44
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
45
+ (dropout): Dropout(p=0.1, inplace=False)
46
+ )
47
+ (1): T5LayerFF(
48
+ (DenseReluDense): T5DenseGatedActDense(
49
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
50
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
51
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
52
+ (dropout): Dropout(p=0.1, inplace=False)
53
+ (act): NewGELUActivation()
54
+ )
55
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
56
+ (dropout): Dropout(p=0.1, inplace=False)
57
+ )
58
+ )
59
+ )
60
+ )
61
+ (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
62
+ (dropout): Dropout(p=0.1, inplace=False)
63
+ )
64
+ )
65
+ )
66
+ (locked_dropout): LockedDropout(p=0.5)
67
+ (linear): Linear(in_features=1472, out_features=13, bias=True)
68
+ (loss_function): CrossEntropyLoss()
69
+ )"
70
+ 2023-10-12 22:42:25,721 ----------------------------------------------------------------------------------------------------
71
+ 2023-10-12 22:42:25,721 MultiCorpus: 14465 train + 1392 dev + 2432 test sentences
72
+ - NER_HIPE_2022 Corpus: 14465 train + 1392 dev + 2432 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/letemps/fr/with_doc_seperator
73
+ 2023-10-12 22:42:25,721 ----------------------------------------------------------------------------------------------------
74
+ 2023-10-12 22:42:25,721 Train: 14465 sentences
75
+ 2023-10-12 22:42:25,721 (train_with_dev=False, train_with_test=False)
76
+ 2023-10-12 22:42:25,722 ----------------------------------------------------------------------------------------------------
77
+ 2023-10-12 22:42:25,722 Training Params:
78
+ 2023-10-12 22:42:25,722 - learning_rate: "0.00015"
79
+ 2023-10-12 22:42:25,722 - mini_batch_size: "8"
80
+ 2023-10-12 22:42:25,722 - max_epochs: "10"
81
+ 2023-10-12 22:42:25,722 - shuffle: "True"
82
+ 2023-10-12 22:42:25,722 ----------------------------------------------------------------------------------------------------
83
+ 2023-10-12 22:42:25,722 Plugins:
84
+ 2023-10-12 22:42:25,722 - TensorboardLogger
85
+ 2023-10-12 22:42:25,722 - LinearScheduler | warmup_fraction: '0.1'
86
+ 2023-10-12 22:42:25,722 ----------------------------------------------------------------------------------------------------
87
+ 2023-10-12 22:42:25,722 Final evaluation on model from best epoch (best-model.pt)
88
+ 2023-10-12 22:42:25,722 - metric: "('micro avg', 'f1-score')"
89
+ 2023-10-12 22:42:25,722 ----------------------------------------------------------------------------------------------------
90
+ 2023-10-12 22:42:25,722 Computation:
91
+ 2023-10-12 22:42:25,723 - compute on device: cuda:0
92
+ 2023-10-12 22:42:25,723 - embedding storage: none
93
+ 2023-10-12 22:42:25,723 ----------------------------------------------------------------------------------------------------
94
+ 2023-10-12 22:42:25,723 Model training base path: "hmbench-letemps/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-1"
95
+ 2023-10-12 22:42:25,723 ----------------------------------------------------------------------------------------------------
96
+ 2023-10-12 22:42:25,723 ----------------------------------------------------------------------------------------------------
97
+ 2023-10-12 22:42:25,723 Logging anything other than scalars to TensorBoard is currently not supported.
98
+ 2023-10-12 22:44:07,805 epoch 1 - iter 180/1809 - loss 2.57222062 - time (sec): 102.08 - samples/sec: 376.55 - lr: 0.000015 - momentum: 0.000000
99
+ 2023-10-12 22:45:47,049 epoch 1 - iter 360/1809 - loss 2.35786106 - time (sec): 201.32 - samples/sec: 377.04 - lr: 0.000030 - momentum: 0.000000
100
+ 2023-10-12 22:47:19,208 epoch 1 - iter 540/1809 - loss 2.01770182 - time (sec): 293.48 - samples/sec: 384.68 - lr: 0.000045 - momentum: 0.000000
101
+ 2023-10-12 22:48:49,641 epoch 1 - iter 720/1809 - loss 1.66056163 - time (sec): 383.92 - samples/sec: 395.07 - lr: 0.000060 - momentum: 0.000000
102
+ 2023-10-12 22:50:18,165 epoch 1 - iter 900/1809 - loss 1.38841808 - time (sec): 472.44 - samples/sec: 400.83 - lr: 0.000075 - momentum: 0.000000
103
+ 2023-10-12 22:51:47,112 epoch 1 - iter 1080/1809 - loss 1.19834151 - time (sec): 561.39 - samples/sec: 402.83 - lr: 0.000089 - momentum: 0.000000
104
+ 2023-10-12 22:53:15,665 epoch 1 - iter 1260/1809 - loss 1.05788394 - time (sec): 649.94 - samples/sec: 404.31 - lr: 0.000104 - momentum: 0.000000
105
+ 2023-10-12 22:54:46,068 epoch 1 - iter 1440/1809 - loss 0.94563350 - time (sec): 740.34 - samples/sec: 406.31 - lr: 0.000119 - momentum: 0.000000
106
+ 2023-10-12 22:56:17,923 epoch 1 - iter 1620/1809 - loss 0.85880262 - time (sec): 832.20 - samples/sec: 407.21 - lr: 0.000134 - momentum: 0.000000
107
+ 2023-10-12 22:57:51,086 epoch 1 - iter 1800/1809 - loss 0.78348215 - time (sec): 925.36 - samples/sec: 408.35 - lr: 0.000149 - momentum: 0.000000
108
+ 2023-10-12 22:57:55,498 ----------------------------------------------------------------------------------------------------
109
+ 2023-10-12 22:57:55,498 EPOCH 1 done: loss 0.7800 - lr: 0.000149
110
+ 2023-10-12 22:58:31,841 DEV : loss 0.1441633254289627 - f1-score (micro avg) 0.3673
111
+ 2023-10-12 22:58:31,902 saving best model
112
+ 2023-10-12 22:58:32,765 ----------------------------------------------------------------------------------------------------
113
+ 2023-10-12 23:00:09,462 epoch 2 - iter 180/1809 - loss 0.11170747 - time (sec): 96.69 - samples/sec: 400.61 - lr: 0.000148 - momentum: 0.000000
114
+ 2023-10-12 23:01:44,813 epoch 2 - iter 360/1809 - loss 0.11397447 - time (sec): 192.05 - samples/sec: 403.34 - lr: 0.000147 - momentum: 0.000000
115
+ 2023-10-12 23:03:16,132 epoch 2 - iter 540/1809 - loss 0.11058507 - time (sec): 283.36 - samples/sec: 406.96 - lr: 0.000145 - momentum: 0.000000
116
+ 2023-10-12 23:04:47,107 epoch 2 - iter 720/1809 - loss 0.10972006 - time (sec): 374.34 - samples/sec: 406.43 - lr: 0.000143 - momentum: 0.000000
117
+ 2023-10-12 23:06:17,438 epoch 2 - iter 900/1809 - loss 0.10675904 - time (sec): 464.67 - samples/sec: 405.71 - lr: 0.000142 - momentum: 0.000000
118
+ 2023-10-12 23:07:48,550 epoch 2 - iter 1080/1809 - loss 0.10643316 - time (sec): 555.78 - samples/sec: 408.15 - lr: 0.000140 - momentum: 0.000000
119
+ 2023-10-12 23:09:19,398 epoch 2 - iter 1260/1809 - loss 0.10422146 - time (sec): 646.63 - samples/sec: 409.03 - lr: 0.000138 - momentum: 0.000000
120
+ 2023-10-12 23:10:52,238 epoch 2 - iter 1440/1809 - loss 0.10187179 - time (sec): 739.47 - samples/sec: 409.41 - lr: 0.000137 - momentum: 0.000000
121
+ 2023-10-12 23:12:23,351 epoch 2 - iter 1620/1809 - loss 0.09902556 - time (sec): 830.58 - samples/sec: 410.91 - lr: 0.000135 - momentum: 0.000000
122
+ 2023-10-12 23:13:54,328 epoch 2 - iter 1800/1809 - loss 0.09837045 - time (sec): 921.56 - samples/sec: 410.24 - lr: 0.000133 - momentum: 0.000000
123
+ 2023-10-12 23:13:58,390 ----------------------------------------------------------------------------------------------------
124
+ 2023-10-12 23:13:58,390 EPOCH 2 done: loss 0.0981 - lr: 0.000133
125
+ 2023-10-12 23:14:36,178 DEV : loss 0.0997246503829956 - f1-score (micro avg) 0.6206
126
+ 2023-10-12 23:14:36,235 saving best model
127
+ 2023-10-12 23:14:38,819 ----------------------------------------------------------------------------------------------------
128
+ 2023-10-12 23:16:13,637 epoch 3 - iter 180/1809 - loss 0.06267834 - time (sec): 94.81 - samples/sec: 402.42 - lr: 0.000132 - momentum: 0.000000
129
+ 2023-10-12 23:17:46,059 epoch 3 - iter 360/1809 - loss 0.06023491 - time (sec): 187.24 - samples/sec: 408.98 - lr: 0.000130 - momentum: 0.000000
130
+ 2023-10-12 23:19:16,706 epoch 3 - iter 540/1809 - loss 0.06190782 - time (sec): 277.88 - samples/sec: 407.79 - lr: 0.000128 - momentum: 0.000000
131
+ 2023-10-12 23:20:52,077 epoch 3 - iter 720/1809 - loss 0.06109607 - time (sec): 373.25 - samples/sec: 403.10 - lr: 0.000127 - momentum: 0.000000
132
+ 2023-10-12 23:22:25,696 epoch 3 - iter 900/1809 - loss 0.06165193 - time (sec): 466.87 - samples/sec: 403.46 - lr: 0.000125 - momentum: 0.000000
133
+ 2023-10-12 23:23:59,309 epoch 3 - iter 1080/1809 - loss 0.06206747 - time (sec): 560.49 - samples/sec: 403.93 - lr: 0.000123 - momentum: 0.000000
134
+ 2023-10-12 23:25:33,753 epoch 3 - iter 1260/1809 - loss 0.06194083 - time (sec): 654.93 - samples/sec: 404.63 - lr: 0.000122 - momentum: 0.000000
135
+ 2023-10-12 23:27:05,673 epoch 3 - iter 1440/1809 - loss 0.06118851 - time (sec): 746.85 - samples/sec: 404.71 - lr: 0.000120 - momentum: 0.000000
136
+ 2023-10-12 23:28:38,308 epoch 3 - iter 1620/1809 - loss 0.06177140 - time (sec): 839.48 - samples/sec: 405.10 - lr: 0.000118 - momentum: 0.000000
137
+ 2023-10-12 23:30:11,083 epoch 3 - iter 1800/1809 - loss 0.06083773 - time (sec): 932.26 - samples/sec: 405.59 - lr: 0.000117 - momentum: 0.000000
138
+ 2023-10-12 23:30:15,308 ----------------------------------------------------------------------------------------------------
139
+ 2023-10-12 23:30:15,309 EPOCH 3 done: loss 0.0608 - lr: 0.000117
140
+ 2023-10-12 23:30:54,908 DEV : loss 0.14741627871990204 - f1-score (micro avg) 0.6255
141
+ 2023-10-12 23:30:54,971 saving best model
142
+ 2023-10-12 23:30:57,602 ----------------------------------------------------------------------------------------------------
143
+ 2023-10-12 23:32:29,140 epoch 4 - iter 180/1809 - loss 0.04432308 - time (sec): 91.53 - samples/sec: 402.88 - lr: 0.000115 - momentum: 0.000000
144
+ 2023-10-12 23:34:01,093 epoch 4 - iter 360/1809 - loss 0.04554957 - time (sec): 183.49 - samples/sec: 415.77 - lr: 0.000113 - momentum: 0.000000
145
+ 2023-10-12 23:35:32,639 epoch 4 - iter 540/1809 - loss 0.04344483 - time (sec): 275.03 - samples/sec: 412.63 - lr: 0.000112 - momentum: 0.000000
146
+ 2023-10-12 23:37:05,804 epoch 4 - iter 720/1809 - loss 0.04221052 - time (sec): 368.20 - samples/sec: 408.78 - lr: 0.000110 - momentum: 0.000000
147
+ 2023-10-12 23:38:39,875 epoch 4 - iter 900/1809 - loss 0.04316114 - time (sec): 462.27 - samples/sec: 405.17 - lr: 0.000108 - momentum: 0.000000
148
+ 2023-10-12 23:40:13,662 epoch 4 - iter 1080/1809 - loss 0.04415013 - time (sec): 556.05 - samples/sec: 405.42 - lr: 0.000107 - momentum: 0.000000
149
+ 2023-10-12 23:41:44,160 epoch 4 - iter 1260/1809 - loss 0.04402235 - time (sec): 646.55 - samples/sec: 406.88 - lr: 0.000105 - momentum: 0.000000
150
+ 2023-10-12 23:43:15,665 epoch 4 - iter 1440/1809 - loss 0.04423013 - time (sec): 738.06 - samples/sec: 407.84 - lr: 0.000103 - momentum: 0.000000
151
+ 2023-10-12 23:44:47,822 epoch 4 - iter 1620/1809 - loss 0.04340254 - time (sec): 830.22 - samples/sec: 409.92 - lr: 0.000102 - momentum: 0.000000
152
+ 2023-10-12 23:46:22,508 epoch 4 - iter 1800/1809 - loss 0.04342319 - time (sec): 924.90 - samples/sec: 408.87 - lr: 0.000100 - momentum: 0.000000
153
+ 2023-10-12 23:46:26,848 ----------------------------------------------------------------------------------------------------
154
+ 2023-10-12 23:46:26,848 EPOCH 4 done: loss 0.0437 - lr: 0.000100
155
+ 2023-10-12 23:47:04,639 DEV : loss 0.1756805181503296 - f1-score (micro avg) 0.6169
156
+ 2023-10-12 23:47:04,696 ----------------------------------------------------------------------------------------------------
157
+ 2023-10-12 23:48:38,776 epoch 5 - iter 180/1809 - loss 0.03157894 - time (sec): 94.08 - samples/sec: 407.31 - lr: 0.000098 - momentum: 0.000000
158
+ 2023-10-12 23:50:11,001 epoch 5 - iter 360/1809 - loss 0.02830269 - time (sec): 186.30 - samples/sec: 411.52 - lr: 0.000097 - momentum: 0.000000
159
+ 2023-10-12 23:51:41,162 epoch 5 - iter 540/1809 - loss 0.02843117 - time (sec): 276.46 - samples/sec: 409.58 - lr: 0.000095 - momentum: 0.000000
160
+ 2023-10-12 23:53:15,346 epoch 5 - iter 720/1809 - loss 0.03067151 - time (sec): 370.65 - samples/sec: 403.73 - lr: 0.000093 - momentum: 0.000000
161
+ 2023-10-12 23:54:49,546 epoch 5 - iter 900/1809 - loss 0.03160142 - time (sec): 464.85 - samples/sec: 402.68 - lr: 0.000092 - momentum: 0.000000
162
+ 2023-10-12 23:56:20,496 epoch 5 - iter 1080/1809 - loss 0.03108926 - time (sec): 555.80 - samples/sec: 405.37 - lr: 0.000090 - momentum: 0.000000
163
+ 2023-10-12 23:57:47,978 epoch 5 - iter 1260/1809 - loss 0.03185705 - time (sec): 643.28 - samples/sec: 408.31 - lr: 0.000088 - momentum: 0.000000
164
+ 2023-10-12 23:59:20,267 epoch 5 - iter 1440/1809 - loss 0.03333433 - time (sec): 735.57 - samples/sec: 407.23 - lr: 0.000087 - momentum: 0.000000
165
+ 2023-10-13 00:00:56,869 epoch 5 - iter 1620/1809 - loss 0.03273045 - time (sec): 832.17 - samples/sec: 408.49 - lr: 0.000085 - momentum: 0.000000
166
+ 2023-10-13 00:02:34,892 epoch 5 - iter 1800/1809 - loss 0.03316525 - time (sec): 930.19 - samples/sec: 406.50 - lr: 0.000083 - momentum: 0.000000
167
+ 2023-10-13 00:02:39,155 ----------------------------------------------------------------------------------------------------
168
+ 2023-10-13 00:02:39,155 EPOCH 5 done: loss 0.0331 - lr: 0.000083
169
+ 2023-10-13 00:03:20,597 DEV : loss 0.23575519025325775 - f1-score (micro avg) 0.617
170
+ 2023-10-13 00:03:20,661 ----------------------------------------------------------------------------------------------------
171
+ 2023-10-13 00:04:52,695 epoch 6 - iter 180/1809 - loss 0.02388711 - time (sec): 92.03 - samples/sec: 408.81 - lr: 0.000082 - momentum: 0.000000
172
+ 2023-10-13 00:06:24,791 epoch 6 - iter 360/1809 - loss 0.02401339 - time (sec): 184.13 - samples/sec: 411.47 - lr: 0.000080 - momentum: 0.000000
173
+ 2023-10-13 00:07:59,259 epoch 6 - iter 540/1809 - loss 0.02310670 - time (sec): 278.60 - samples/sec: 406.14 - lr: 0.000078 - momentum: 0.000000
174
+ 2023-10-13 00:09:31,467 epoch 6 - iter 720/1809 - loss 0.02334831 - time (sec): 370.80 - samples/sec: 408.26 - lr: 0.000077 - momentum: 0.000000
175
+ 2023-10-13 00:11:06,383 epoch 6 - iter 900/1809 - loss 0.02452192 - time (sec): 465.72 - samples/sec: 407.11 - lr: 0.000075 - momentum: 0.000000
176
+ 2023-10-13 00:12:41,135 epoch 6 - iter 1080/1809 - loss 0.02459648 - time (sec): 560.47 - samples/sec: 405.40 - lr: 0.000073 - momentum: 0.000000
177
+ 2023-10-13 00:14:17,989 epoch 6 - iter 1260/1809 - loss 0.02521649 - time (sec): 657.33 - samples/sec: 403.03 - lr: 0.000072 - momentum: 0.000000
178
+ 2023-10-13 00:15:54,582 epoch 6 - iter 1440/1809 - loss 0.02521885 - time (sec): 753.92 - samples/sec: 399.88 - lr: 0.000070 - momentum: 0.000000
179
+ 2023-10-13 00:17:34,466 epoch 6 - iter 1620/1809 - loss 0.02506005 - time (sec): 853.80 - samples/sec: 397.24 - lr: 0.000068 - momentum: 0.000000
180
+ 2023-10-13 00:19:10,740 epoch 6 - iter 1800/1809 - loss 0.02442039 - time (sec): 950.08 - samples/sec: 397.99 - lr: 0.000067 - momentum: 0.000000
181
+ 2023-10-13 00:19:15,046 ----------------------------------------------------------------------------------------------------
182
+ 2023-10-13 00:19:15,047 EPOCH 6 done: loss 0.0243 - lr: 0.000067
183
+ 2023-10-13 00:19:55,967 DEV : loss 0.2817913591861725 - f1-score (micro avg) 0.6398
184
+ 2023-10-13 00:19:56,045 saving best model
185
+ 2023-10-13 00:19:57,138 ----------------------------------------------------------------------------------------------------
186
+ 2023-10-13 00:21:32,179 epoch 7 - iter 180/1809 - loss 0.01493884 - time (sec): 95.04 - samples/sec: 388.91 - lr: 0.000065 - momentum: 0.000000
187
+ 2023-10-13 00:23:05,610 epoch 7 - iter 360/1809 - loss 0.01507536 - time (sec): 188.47 - samples/sec: 401.81 - lr: 0.000063 - momentum: 0.000000
188
+ 2023-10-13 00:24:41,229 epoch 7 - iter 540/1809 - loss 0.01649935 - time (sec): 284.09 - samples/sec: 399.44 - lr: 0.000062 - momentum: 0.000000
189
+ 2023-10-13 00:26:17,385 epoch 7 - iter 720/1809 - loss 0.01746454 - time (sec): 380.24 - samples/sec: 397.51 - lr: 0.000060 - momentum: 0.000000
190
+ 2023-10-13 00:27:48,257 epoch 7 - iter 900/1809 - loss 0.01772093 - time (sec): 471.12 - samples/sec: 400.88 - lr: 0.000058 - momentum: 0.000000
191
+ 2023-10-13 00:29:15,753 epoch 7 - iter 1080/1809 - loss 0.01691637 - time (sec): 558.61 - samples/sec: 404.51 - lr: 0.000057 - momentum: 0.000000
192
+ 2023-10-13 00:30:47,556 epoch 7 - iter 1260/1809 - loss 0.01716291 - time (sec): 650.42 - samples/sec: 405.67 - lr: 0.000055 - momentum: 0.000000
193
+ 2023-10-13 00:32:24,463 epoch 7 - iter 1440/1809 - loss 0.01829192 - time (sec): 747.32 - samples/sec: 403.28 - lr: 0.000053 - momentum: 0.000000
194
+ 2023-10-13 00:34:06,279 epoch 7 - iter 1620/1809 - loss 0.01866002 - time (sec): 849.14 - samples/sec: 400.09 - lr: 0.000052 - momentum: 0.000000
195
+ 2023-10-13 00:35:43,193 epoch 7 - iter 1800/1809 - loss 0.01826084 - time (sec): 946.05 - samples/sec: 399.82 - lr: 0.000050 - momentum: 0.000000
196
+ 2023-10-13 00:35:47,355 ----------------------------------------------------------------------------------------------------
197
+ 2023-10-13 00:35:47,356 EPOCH 7 done: loss 0.0182 - lr: 0.000050
198
+ 2023-10-13 00:36:26,299 DEV : loss 0.3023461401462555 - f1-score (micro avg) 0.6502
199
+ 2023-10-13 00:36:26,356 saving best model
200
+ 2023-10-13 00:36:28,945 ----------------------------------------------------------------------------------------------------
201
+ 2023-10-13 00:37:59,940 epoch 8 - iter 180/1809 - loss 0.01451939 - time (sec): 90.99 - samples/sec: 412.53 - lr: 0.000048 - momentum: 0.000000
202
+ 2023-10-13 00:39:33,475 epoch 8 - iter 360/1809 - loss 0.01255235 - time (sec): 184.52 - samples/sec: 411.41 - lr: 0.000047 - momentum: 0.000000
203
+ 2023-10-13 00:41:07,633 epoch 8 - iter 540/1809 - loss 0.01182338 - time (sec): 278.68 - samples/sec: 408.87 - lr: 0.000045 - momentum: 0.000000
204
+ 2023-10-13 00:42:39,556 epoch 8 - iter 720/1809 - loss 0.01274256 - time (sec): 370.61 - samples/sec: 412.85 - lr: 0.000043 - momentum: 0.000000
205
+ 2023-10-13 00:44:10,716 epoch 8 - iter 900/1809 - loss 0.01253653 - time (sec): 461.77 - samples/sec: 414.98 - lr: 0.000042 - momentum: 0.000000
206
+ 2023-10-13 00:45:38,696 epoch 8 - iter 1080/1809 - loss 0.01269966 - time (sec): 549.75 - samples/sec: 414.13 - lr: 0.000040 - momentum: 0.000000
207
+ 2023-10-13 00:47:06,823 epoch 8 - iter 1260/1809 - loss 0.01283451 - time (sec): 637.87 - samples/sec: 415.05 - lr: 0.000038 - momentum: 0.000000
208
+ 2023-10-13 00:48:38,750 epoch 8 - iter 1440/1809 - loss 0.01300464 - time (sec): 729.80 - samples/sec: 415.22 - lr: 0.000037 - momentum: 0.000000
209
+ 2023-10-13 00:50:07,404 epoch 8 - iter 1620/1809 - loss 0.01320616 - time (sec): 818.45 - samples/sec: 416.97 - lr: 0.000035 - momentum: 0.000000
210
+ 2023-10-13 00:51:35,459 epoch 8 - iter 1800/1809 - loss 0.01348805 - time (sec): 906.51 - samples/sec: 417.43 - lr: 0.000033 - momentum: 0.000000
211
+ 2023-10-13 00:51:39,256 ----------------------------------------------------------------------------------------------------
212
+ 2023-10-13 00:51:39,256 EPOCH 8 done: loss 0.0134 - lr: 0.000033
213
+ 2023-10-13 00:52:16,475 DEV : loss 0.3305288553237915 - f1-score (micro avg) 0.6487
214
+ 2023-10-13 00:52:16,533 ----------------------------------------------------------------------------------------------------
215
+ 2023-10-13 00:53:46,221 epoch 9 - iter 180/1809 - loss 0.00593172 - time (sec): 89.69 - samples/sec: 414.24 - lr: 0.000032 - momentum: 0.000000
216
+ 2023-10-13 00:55:18,160 epoch 9 - iter 360/1809 - loss 0.00911510 - time (sec): 181.63 - samples/sec: 411.53 - lr: 0.000030 - momentum: 0.000000
217
+ 2023-10-13 00:56:48,486 epoch 9 - iter 540/1809 - loss 0.00925733 - time (sec): 271.95 - samples/sec: 412.89 - lr: 0.000028 - momentum: 0.000000
218
+ 2023-10-13 00:58:22,172 epoch 9 - iter 720/1809 - loss 0.00972486 - time (sec): 365.64 - samples/sec: 416.61 - lr: 0.000027 - momentum: 0.000000
219
+ 2023-10-13 00:59:59,270 epoch 9 - iter 900/1809 - loss 0.00982915 - time (sec): 462.74 - samples/sec: 410.07 - lr: 0.000025 - momentum: 0.000000
220
+ 2023-10-13 01:01:31,167 epoch 9 - iter 1080/1809 - loss 0.00992239 - time (sec): 554.63 - samples/sec: 410.23 - lr: 0.000023 - momentum: 0.000000
221
+ 2023-10-13 01:03:03,196 epoch 9 - iter 1260/1809 - loss 0.00987652 - time (sec): 646.66 - samples/sec: 408.94 - lr: 0.000022 - momentum: 0.000000
222
+ 2023-10-13 01:04:36,184 epoch 9 - iter 1440/1809 - loss 0.00999639 - time (sec): 739.65 - samples/sec: 409.36 - lr: 0.000020 - momentum: 0.000000
223
+ 2023-10-13 01:06:10,125 epoch 9 - iter 1620/1809 - loss 0.00981387 - time (sec): 833.59 - samples/sec: 409.53 - lr: 0.000018 - momentum: 0.000000
224
+ 2023-10-13 01:07:42,547 epoch 9 - iter 1800/1809 - loss 0.00963508 - time (sec): 926.01 - samples/sec: 408.65 - lr: 0.000017 - momentum: 0.000000
225
+ 2023-10-13 01:07:46,624 ----------------------------------------------------------------------------------------------------
226
+ 2023-10-13 01:07:46,624 EPOCH 9 done: loss 0.0096 - lr: 0.000017
227
+ 2023-10-13 01:08:26,098 DEV : loss 0.36490538716316223 - f1-score (micro avg) 0.6474
228
+ 2023-10-13 01:08:26,158 ----------------------------------------------------------------------------------------------------
229
+ 2023-10-13 01:09:58,926 epoch 10 - iter 180/1809 - loss 0.01093935 - time (sec): 92.77 - samples/sec: 407.46 - lr: 0.000015 - momentum: 0.000000
230
+ 2023-10-13 01:11:31,538 epoch 10 - iter 360/1809 - loss 0.00895096 - time (sec): 185.38 - samples/sec: 404.83 - lr: 0.000013 - momentum: 0.000000
231
+ 2023-10-13 01:13:04,293 epoch 10 - iter 540/1809 - loss 0.00986715 - time (sec): 278.13 - samples/sec: 405.74 - lr: 0.000012 - momentum: 0.000000
232
+ 2023-10-13 01:14:35,898 epoch 10 - iter 720/1809 - loss 0.00934801 - time (sec): 369.74 - samples/sec: 405.71 - lr: 0.000010 - momentum: 0.000000
233
+ 2023-10-13 01:16:09,995 epoch 10 - iter 900/1809 - loss 0.00920377 - time (sec): 463.84 - samples/sec: 405.65 - lr: 0.000008 - momentum: 0.000000
234
+ 2023-10-13 01:17:41,994 epoch 10 - iter 1080/1809 - loss 0.00843499 - time (sec): 555.83 - samples/sec: 406.93 - lr: 0.000007 - momentum: 0.000000
235
+ 2023-10-13 01:19:11,885 epoch 10 - iter 1260/1809 - loss 0.00837965 - time (sec): 645.72 - samples/sec: 409.24 - lr: 0.000005 - momentum: 0.000000
236
+ 2023-10-13 01:20:42,654 epoch 10 - iter 1440/1809 - loss 0.00861846 - time (sec): 736.49 - samples/sec: 410.76 - lr: 0.000003 - momentum: 0.000000
237
+ 2023-10-13 01:22:12,500 epoch 10 - iter 1620/1809 - loss 0.00843137 - time (sec): 826.34 - samples/sec: 412.02 - lr: 0.000002 - momentum: 0.000000
238
+ 2023-10-13 01:23:41,817 epoch 10 - iter 1800/1809 - loss 0.00810910 - time (sec): 915.66 - samples/sec: 413.30 - lr: 0.000000 - momentum: 0.000000
239
+ 2023-10-13 01:23:45,653 ----------------------------------------------------------------------------------------------------
240
+ 2023-10-13 01:23:45,653 EPOCH 10 done: loss 0.0081 - lr: 0.000000
241
+ 2023-10-13 01:24:25,128 DEV : loss 0.3651101589202881 - f1-score (micro avg) 0.6381
242
+ 2023-10-13 01:24:26,124 ----------------------------------------------------------------------------------------------------
243
+ 2023-10-13 01:24:26,126 Loading model from best epoch ...
244
+ 2023-10-13 01:24:31,655 SequenceTagger predicts: Dictionary with 13 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org
245
+ 2023-10-13 01:25:29,437
246
+ Results:
247
+ - F-score (micro) 0.6478
248
+ - F-score (macro) 0.5104
249
+ - Accuracy 0.4908
250
+
251
+ By class:
252
+ precision recall f1-score support
253
+
254
+ loc 0.6357 0.7766 0.6992 591
255
+ pers 0.5565 0.7591 0.6422 357
256
+ org 0.2241 0.1646 0.1898 79
257
+
258
+ micro avg 0.5864 0.7235 0.6478 1027
259
+ macro avg 0.4721 0.5668 0.5104 1027
260
+ weighted avg 0.5765 0.7235 0.6402 1027
261
+
262
+ 2023-10-13 01:25:29,437 ----------------------------------------------------------------------------------------------------