utakumi commited on
Commit
7292330
·
verified ·
1 Parent(s): 0f7a211

End of training

Browse files
Files changed (5) hide show
  1. README.md +4 -2
  2. all_results.json +16 -0
  3. eval_results.json +10 -0
  4. train_results.json +9 -0
  5. trainer_state.json +897 -0
README.md CHANGED
@@ -3,6 +3,8 @@ library_name: transformers
3
  license: apache-2.0
4
  base_model: rinna/japanese-hubert-base
5
  tags:
 
 
6
  - generated_from_trainer
7
  metrics:
8
  - wer
@@ -16,9 +18,9 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # Hubert-kakeiken-W-closed
18
 
19
- This model is a fine-tuned version of [rinna/japanese-hubert-base](https://huggingface.co/rinna/japanese-hubert-base) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.0269
22
  - Wer: 0.9988
23
  - Cer: 1.0164
24
 
 
3
  license: apache-2.0
4
  base_model: rinna/japanese-hubert-base
5
  tags:
6
+ - automatic-speech-recognition
7
+ - original_kakeiken_W_closed
8
  - generated_from_trainer
9
  metrics:
10
  - wer
 
18
 
19
  # Hubert-kakeiken-W-closed
20
 
21
+ This model is a fine-tuned version of [rinna/japanese-hubert-base](https://huggingface.co/rinna/japanese-hubert-base) on the ORIGINAL_KAKEIKEN_W_CLOSED - JA dataset.
22
  It achieves the following results on the evaluation set:
23
+ - Loss: 0.0273
24
  - Wer: 0.9988
25
  - Cer: 1.0164
26
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 39.95203400121433,
3
+ "eval_cer": 1.0164177335229967,
4
+ "eval_loss": 0.027329571545124054,
5
+ "eval_runtime": 62.1595,
6
+ "eval_samples": 6840,
7
+ "eval_samples_per_second": 110.039,
8
+ "eval_steps_per_second": 13.755,
9
+ "eval_wer": 0.9988304093567252,
10
+ "total_flos": 1.7147211678918107e+19,
11
+ "train_loss": 1.2561371190290116,
12
+ "train_runtime": 30821.1471,
13
+ "train_samples": 52680,
14
+ "train_samples_per_second": 68.369,
15
+ "train_steps_per_second": 1.068
16
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 39.95203400121433,
3
+ "eval_cer": 1.0164177335229967,
4
+ "eval_loss": 0.027329571545124054,
5
+ "eval_runtime": 62.1595,
6
+ "eval_samples": 6840,
7
+ "eval_samples_per_second": 110.039,
8
+ "eval_steps_per_second": 13.755,
9
+ "eval_wer": 0.9988304093567252
10
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 39.95203400121433,
3
+ "total_flos": 1.7147211678918107e+19,
4
+ "train_loss": 1.2561371190290116,
5
+ "train_runtime": 30821.1471,
6
+ "train_samples": 52680,
7
+ "train_samples_per_second": 68.369,
8
+ "train_steps_per_second": 1.068
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,897 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 39.95203400121433,
5
+ "eval_steps": 100.0,
6
+ "global_step": 32920,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.607164541590771,
13
+ "grad_norm": 57.35422134399414,
14
+ "learning_rate": 1.188e-06,
15
+ "loss": 28.44,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 1.0,
20
+ "eval_cer": 1.1284080132764343,
21
+ "eval_loss": 10.8607177734375,
22
+ "eval_runtime": 145.137,
23
+ "eval_samples_per_second": 47.128,
24
+ "eval_steps_per_second": 5.891,
25
+ "eval_wer": 1.0,
26
+ "step": 824
27
+ },
28
+ {
29
+ "epoch": 1.2137219186399515,
30
+ "grad_norm": 44.843685150146484,
31
+ "learning_rate": 2.3880000000000003e-06,
32
+ "loss": 11.3,
33
+ "step": 1000
34
+ },
35
+ {
36
+ "epoch": 1.8208864602307226,
37
+ "grad_norm": 36.54290008544922,
38
+ "learning_rate": 3.588e-06,
39
+ "loss": 9.2184,
40
+ "step": 1500
41
+ },
42
+ {
43
+ "epoch": 2.0,
44
+ "eval_cer": 1.1284376481744902,
45
+ "eval_loss": 7.510649681091309,
46
+ "eval_runtime": 131.1044,
47
+ "eval_samples_per_second": 52.172,
48
+ "eval_steps_per_second": 6.522,
49
+ "eval_wer": 1.0,
50
+ "step": 1648
51
+ },
52
+ {
53
+ "epoch": 2.427443837279903,
54
+ "grad_norm": 22.59714126586914,
55
+ "learning_rate": 4.788e-06,
56
+ "loss": 7.0297,
57
+ "step": 2000
58
+ },
59
+ {
60
+ "epoch": 3.0,
61
+ "eval_cer": 1.1283783783783783,
62
+ "eval_loss": 4.190179347991943,
63
+ "eval_runtime": 131.4162,
64
+ "eval_samples_per_second": 52.048,
65
+ "eval_steps_per_second": 6.506,
66
+ "eval_wer": 1.0,
67
+ "step": 2472
68
+ },
69
+ {
70
+ "epoch": 3.0340012143290833,
71
+ "grad_norm": 10.585729598999023,
72
+ "learning_rate": 5.988e-06,
73
+ "loss": 4.8903,
74
+ "step": 2500
75
+ },
76
+ {
77
+ "epoch": 3.6411657559198543,
78
+ "grad_norm": 2.5189406871795654,
79
+ "learning_rate": 7.1880000000000005e-06,
80
+ "loss": 3.6874,
81
+ "step": 3000
82
+ },
83
+ {
84
+ "epoch": 4.0,
85
+ "eval_cer": 1.1283783783783783,
86
+ "eval_loss": 3.1195626258850098,
87
+ "eval_runtime": 129.8504,
88
+ "eval_samples_per_second": 52.676,
89
+ "eval_steps_per_second": 6.584,
90
+ "eval_wer": 1.0,
91
+ "step": 3296
92
+ },
93
+ {
94
+ "epoch": 4.247723132969035,
95
+ "grad_norm": 1.423240303993225,
96
+ "learning_rate": 8.388e-06,
97
+ "loss": 3.1803,
98
+ "step": 3500
99
+ },
100
+ {
101
+ "epoch": 4.854887674559806,
102
+ "grad_norm": 1.8786451816558838,
103
+ "learning_rate": 9.588e-06,
104
+ "loss": 2.7259,
105
+ "step": 4000
106
+ },
107
+ {
108
+ "epoch": 5.0,
109
+ "eval_cer": 1.1283783783783783,
110
+ "eval_loss": 2.2997305393218994,
111
+ "eval_runtime": 128.0868,
112
+ "eval_samples_per_second": 53.401,
113
+ "eval_steps_per_second": 6.675,
114
+ "eval_wer": 1.0,
115
+ "step": 4120
116
+ },
117
+ {
118
+ "epoch": 5.461445051608986,
119
+ "grad_norm": 3.4784176349639893,
120
+ "learning_rate": 1.0787999999999999e-05,
121
+ "loss": 2.1431,
122
+ "step": 4500
123
+ },
124
+ {
125
+ "epoch": 6.0,
126
+ "eval_cer": 1.0313537221431959,
127
+ "eval_loss": 1.0496456623077393,
128
+ "eval_runtime": 138.552,
129
+ "eval_samples_per_second": 49.368,
130
+ "eval_steps_per_second": 6.171,
131
+ "eval_wer": 0.9997076023391813,
132
+ "step": 4944
133
+ },
134
+ {
135
+ "epoch": 6.068002428658167,
136
+ "grad_norm": 2.7191321849823,
137
+ "learning_rate": 1.1988000000000001e-05,
138
+ "loss": 1.3807,
139
+ "step": 5000
140
+ },
141
+ {
142
+ "epoch": 6.675166970248937,
143
+ "grad_norm": 2.5248982906341553,
144
+ "learning_rate": 1.3188e-05,
145
+ "loss": 0.8891,
146
+ "step": 5500
147
+ },
148
+ {
149
+ "epoch": 7.0,
150
+ "eval_cer": 1.017425320056899,
151
+ "eval_loss": 0.6379755735397339,
152
+ "eval_runtime": 135.841,
153
+ "eval_samples_per_second": 50.353,
154
+ "eval_steps_per_second": 6.294,
155
+ "eval_wer": 0.9998538011695907,
156
+ "step": 5768
157
+ },
158
+ {
159
+ "epoch": 7.281724347298118,
160
+ "grad_norm": 7.416316509246826,
161
+ "learning_rate": 1.4388000000000002e-05,
162
+ "loss": 0.6223,
163
+ "step": 6000
164
+ },
165
+ {
166
+ "epoch": 7.888888888888889,
167
+ "grad_norm": 6.211233615875244,
168
+ "learning_rate": 1.5588e-05,
169
+ "loss": 0.4891,
170
+ "step": 6500
171
+ },
172
+ {
173
+ "epoch": 8.0,
174
+ "eval_cer": 1.0305239449976291,
175
+ "eval_loss": 0.25890296697616577,
176
+ "eval_runtime": 76.2932,
177
+ "eval_samples_per_second": 89.654,
178
+ "eval_steps_per_second": 11.207,
179
+ "eval_wer": 0.9991228070175439,
180
+ "step": 6592
181
+ },
182
+ {
183
+ "epoch": 8.49544626593807,
184
+ "grad_norm": 2.76125431060791,
185
+ "learning_rate": 1.6788e-05,
186
+ "loss": 0.3675,
187
+ "step": 7000
188
+ },
189
+ {
190
+ "epoch": 9.0,
191
+ "eval_cer": 1.0932313892840209,
192
+ "eval_loss": 0.6099491119384766,
193
+ "eval_runtime": 74.5927,
194
+ "eval_samples_per_second": 91.698,
195
+ "eval_steps_per_second": 11.462,
196
+ "eval_wer": 0.9994152046783625,
197
+ "step": 7416
198
+ },
199
+ {
200
+ "epoch": 9.102003642987249,
201
+ "grad_norm": 8.747713088989258,
202
+ "learning_rate": 1.7988e-05,
203
+ "loss": 0.3164,
204
+ "step": 7500
205
+ },
206
+ {
207
+ "epoch": 9.70916818457802,
208
+ "grad_norm": 7.222991943359375,
209
+ "learning_rate": 1.9188e-05,
210
+ "loss": 0.2744,
211
+ "step": 8000
212
+ },
213
+ {
214
+ "epoch": 10.0,
215
+ "eval_cer": 1.023944997629208,
216
+ "eval_loss": 0.1262398213148117,
217
+ "eval_runtime": 62.8915,
218
+ "eval_samples_per_second": 108.759,
219
+ "eval_steps_per_second": 13.595,
220
+ "eval_wer": 0.9989766081871345,
221
+ "step": 8240
222
+ },
223
+ {
224
+ "epoch": 10.3157255616272,
225
+ "grad_norm": 7.39332389831543,
226
+ "learning_rate": 2.0388e-05,
227
+ "loss": 0.2525,
228
+ "step": 8500
229
+ },
230
+ {
231
+ "epoch": 10.922890103217972,
232
+ "grad_norm": 3.6922662258148193,
233
+ "learning_rate": 2.1588e-05,
234
+ "loss": 0.2278,
235
+ "step": 9000
236
+ },
237
+ {
238
+ "epoch": 11.0,
239
+ "eval_cer": 1.023411569464201,
240
+ "eval_loss": 0.11068873107433319,
241
+ "eval_runtime": 67.9091,
242
+ "eval_samples_per_second": 100.723,
243
+ "eval_steps_per_second": 12.59,
244
+ "eval_wer": 0.9989766081871345,
245
+ "step": 9064
246
+ },
247
+ {
248
+ "epoch": 11.529447480267152,
249
+ "grad_norm": 5.623218536376953,
250
+ "learning_rate": 2.2788000000000003e-05,
251
+ "loss": 0.2148,
252
+ "step": 9500
253
+ },
254
+ {
255
+ "epoch": 12.0,
256
+ "eval_cer": 1.0260787102892366,
257
+ "eval_loss": 0.06716426461935043,
258
+ "eval_runtime": 89.5293,
259
+ "eval_samples_per_second": 76.4,
260
+ "eval_steps_per_second": 9.55,
261
+ "eval_wer": 0.9989766081871345,
262
+ "step": 9888
263
+ },
264
+ {
265
+ "epoch": 12.136004857316333,
266
+ "grad_norm": 4.379143714904785,
267
+ "learning_rate": 2.3988e-05,
268
+ "loss": 0.2055,
269
+ "step": 10000
270
+ },
271
+ {
272
+ "epoch": 12.743169398907105,
273
+ "grad_norm": 5.908865451812744,
274
+ "learning_rate": 2.5188e-05,
275
+ "loss": 0.1927,
276
+ "step": 10500
277
+ },
278
+ {
279
+ "epoch": 13.0,
280
+ "eval_cer": 1.0219298245614035,
281
+ "eval_loss": 0.059190813452005386,
282
+ "eval_runtime": 80.5762,
283
+ "eval_samples_per_second": 84.889,
284
+ "eval_steps_per_second": 10.611,
285
+ "eval_wer": 0.9989766081871345,
286
+ "step": 10712
287
+ },
288
+ {
289
+ "epoch": 13.349726775956285,
290
+ "grad_norm": 3.240206003189087,
291
+ "learning_rate": 2.63856e-05,
292
+ "loss": 0.1926,
293
+ "step": 11000
294
+ },
295
+ {
296
+ "epoch": 13.956891317547056,
297
+ "grad_norm": 5.876375675201416,
298
+ "learning_rate": 2.7585600000000002e-05,
299
+ "loss": 0.1723,
300
+ "step": 11500
301
+ },
302
+ {
303
+ "epoch": 14.0,
304
+ "eval_cer": 1.022848506401138,
305
+ "eval_loss": 0.08003176748752594,
306
+ "eval_runtime": 62.2149,
307
+ "eval_samples_per_second": 109.941,
308
+ "eval_steps_per_second": 13.743,
309
+ "eval_wer": 0.9988304093567252,
310
+ "step": 11536
311
+ },
312
+ {
313
+ "epoch": 14.563448694596236,
314
+ "grad_norm": 3.714435577392578,
315
+ "learning_rate": 2.87856e-05,
316
+ "loss": 0.1725,
317
+ "step": 12000
318
+ },
319
+ {
320
+ "epoch": 15.0,
321
+ "eval_cer": 1.021100047415837,
322
+ "eval_loss": 0.051837269216775894,
323
+ "eval_runtime": 62.5224,
324
+ "eval_samples_per_second": 109.401,
325
+ "eval_steps_per_second": 13.675,
326
+ "eval_wer": 0.9989766081871345,
327
+ "step": 12360
328
+ },
329
+ {
330
+ "epoch": 15.170006071645416,
331
+ "grad_norm": 9.124956130981445,
332
+ "learning_rate": 2.99856e-05,
333
+ "loss": 0.1695,
334
+ "step": 12500
335
+ },
336
+ {
337
+ "epoch": 15.777170613236187,
338
+ "grad_norm": 3.6504106521606445,
339
+ "learning_rate": 2.9956874399074467e-05,
340
+ "loss": 0.1628,
341
+ "step": 13000
342
+ },
343
+ {
344
+ "epoch": 16.0,
345
+ "eval_cer": 1.0142247510668563,
346
+ "eval_loss": 0.13594070076942444,
347
+ "eval_runtime": 75.4469,
348
+ "eval_samples_per_second": 90.66,
349
+ "eval_steps_per_second": 11.332,
350
+ "eval_wer": 0.9988304093567252,
351
+ "step": 13184
352
+ },
353
+ {
354
+ "epoch": 16.38372799028537,
355
+ "grad_norm": 3.7022886276245117,
356
+ "learning_rate": 2.9825295862461663e-05,
357
+ "loss": 0.1626,
358
+ "step": 13500
359
+ },
360
+ {
361
+ "epoch": 16.99089253187614,
362
+ "grad_norm": 1.3335272073745728,
363
+ "learning_rate": 2.9606033905859603e-05,
364
+ "loss": 0.1567,
365
+ "step": 14000
366
+ },
367
+ {
368
+ "epoch": 17.0,
369
+ "eval_cer": 1.0195886676149835,
370
+ "eval_loss": 0.04436279088258743,
371
+ "eval_runtime": 64.8228,
372
+ "eval_samples_per_second": 105.518,
373
+ "eval_steps_per_second": 13.19,
374
+ "eval_wer": 0.9989766081871345,
375
+ "step": 14008
376
+ },
377
+ {
378
+ "epoch": 17.59744990892532,
379
+ "grad_norm": 2.4595658779144287,
380
+ "learning_rate": 2.9300385342391396e-05,
381
+ "loss": 0.1436,
382
+ "step": 14500
383
+ },
384
+ {
385
+ "epoch": 18.0,
386
+ "eval_cer": 1.0192626837363679,
387
+ "eval_loss": 0.04214438423514366,
388
+ "eval_runtime": 63.292,
389
+ "eval_samples_per_second": 108.07,
390
+ "eval_steps_per_second": 13.509,
391
+ "eval_wer": 0.9988304093567252,
392
+ "step": 14832
393
+ },
394
+ {
395
+ "epoch": 18.204007285974498,
396
+ "grad_norm": 0.9890690445899963,
397
+ "learning_rate": 2.891015791414923e-05,
398
+ "loss": 0.1495,
399
+ "step": 15000
400
+ },
401
+ {
402
+ "epoch": 18.81117182756527,
403
+ "grad_norm": 8.133474349975586,
404
+ "learning_rate": 2.843765960040039e-05,
405
+ "loss": 0.1351,
406
+ "step": 15500
407
+ },
408
+ {
409
+ "epoch": 19.0,
410
+ "eval_cer": 1.0173067804646752,
411
+ "eval_loss": 0.03748102858662605,
412
+ "eval_runtime": 63.0874,
413
+ "eval_samples_per_second": 108.421,
414
+ "eval_steps_per_second": 13.553,
415
+ "eval_wer": 0.9988304093567252,
416
+ "step": 15656
417
+ },
418
+ {
419
+ "epoch": 19.41772920461445,
420
+ "grad_norm": 5.602595806121826,
421
+ "learning_rate": 2.7885684967167233e-05,
422
+ "loss": 0.1454,
423
+ "step": 16000
424
+ },
425
+ {
426
+ "epoch": 20.0,
427
+ "eval_cer": 1.018640350877193,
428
+ "eval_loss": 0.03039967454969883,
429
+ "eval_runtime": 77.538,
430
+ "eval_samples_per_second": 88.215,
431
+ "eval_steps_per_second": 11.027,
432
+ "eval_wer": 0.9988304093567252,
433
+ "step": 16480
434
+ },
435
+ {
436
+ "epoch": 20.02428658166363,
437
+ "grad_norm": 2.98942494392395,
438
+ "learning_rate": 2.7257498638915816e-05,
439
+ "loss": 0.1353,
440
+ "step": 16500
441
+ },
442
+ {
443
+ "epoch": 20.6314511232544,
444
+ "grad_norm": 6.270744800567627,
445
+ "learning_rate": 2.6558287021276313e-05,
446
+ "loss": 0.1252,
447
+ "step": 17000
448
+ },
449
+ {
450
+ "epoch": 21.0,
451
+ "eval_cer": 1.0237079184447606,
452
+ "eval_loss": 0.05677889287471771,
453
+ "eval_runtime": 62.9546,
454
+ "eval_samples_per_second": 108.65,
455
+ "eval_steps_per_second": 13.581,
456
+ "eval_wer": 0.9988304093567252,
457
+ "step": 17304
458
+ },
459
+ {
460
+ "epoch": 21.238008500303582,
461
+ "grad_norm": 3.5701606273651123,
462
+ "learning_rate": 2.578938449744228e-05,
463
+ "loss": 0.1249,
464
+ "step": 17500
465
+ },
466
+ {
467
+ "epoch": 21.845173041894352,
468
+ "grad_norm": 6.276644706726074,
469
+ "learning_rate": 2.4956668735674143e-05,
470
+ "loss": 0.1233,
471
+ "step": 18000
472
+ },
473
+ {
474
+ "epoch": 22.0,
475
+ "eval_cer": 1.0176031294452348,
476
+ "eval_loss": 0.029143376275897026,
477
+ "eval_runtime": 63.804,
478
+ "eval_samples_per_second": 107.203,
479
+ "eval_steps_per_second": 13.4,
480
+ "eval_wer": 0.9988304093567252,
481
+ "step": 18128
482
+ },
483
+ {
484
+ "epoch": 22.451730418943534,
485
+ "grad_norm": 3.3094053268432617,
486
+ "learning_rate": 2.40650647888375e-05,
487
+ "loss": 0.1179,
488
+ "step": 18500
489
+ },
490
+ {
491
+ "epoch": 23.0,
492
+ "eval_cer": 1.016743717401612,
493
+ "eval_loss": 0.02712642401456833,
494
+ "eval_runtime": 63.365,
495
+ "eval_samples_per_second": 107.946,
496
+ "eval_steps_per_second": 13.493,
497
+ "eval_wer": 0.9988304093567252,
498
+ "step": 18952
499
+ },
500
+ {
501
+ "epoch": 23.058287795992715,
502
+ "grad_norm": 4.888893127441406,
503
+ "learning_rate": 2.3123726366487132e-05,
504
+ "loss": 0.1141,
505
+ "step": 19000
506
+ },
507
+ {
508
+ "epoch": 23.665452337583485,
509
+ "grad_norm": 1.941120982170105,
510
+ "learning_rate": 2.2130663756909194e-05,
511
+ "loss": 0.1108,
512
+ "step": 19500
513
+ },
514
+ {
515
+ "epoch": 24.0,
516
+ "eval_cer": 1.0178402086296823,
517
+ "eval_loss": 0.027879294008016586,
518
+ "eval_runtime": 65.3857,
519
+ "eval_samples_per_second": 104.61,
520
+ "eval_steps_per_second": 13.076,
521
+ "eval_wer": 0.9988304093567252,
522
+ "step": 19776
523
+ },
524
+ {
525
+ "epoch": 24.272009714632667,
526
+ "grad_norm": 1.846130132675171,
527
+ "learning_rate": 2.1095427217664034e-05,
528
+ "loss": 0.1089,
529
+ "step": 20000
530
+ },
531
+ {
532
+ "epoch": 24.879174256223436,
533
+ "grad_norm": 3.097496271133423,
534
+ "learning_rate": 2.002413959993121e-05,
535
+ "loss": 0.1031,
536
+ "step": 20500
537
+ },
538
+ {
539
+ "epoch": 25.0,
540
+ "eval_cer": 1.0182847321005215,
541
+ "eval_loss": 0.03511003032326698,
542
+ "eval_runtime": 65.1106,
543
+ "eval_samples_per_second": 105.052,
544
+ "eval_steps_per_second": 13.132,
545
+ "eval_wer": 0.9989766081871345,
546
+ "step": 20600
547
+ },
548
+ {
549
+ "epoch": 25.485731633272618,
550
+ "grad_norm": 5.548646926879883,
551
+ "learning_rate": 1.8923136977067138e-05,
552
+ "loss": 0.1006,
553
+ "step": 21000
554
+ },
555
+ {
556
+ "epoch": 26.0,
557
+ "eval_cer": 1.0173364153627311,
558
+ "eval_loss": 0.04413418844342232,
559
+ "eval_runtime": 62.3139,
560
+ "eval_samples_per_second": 109.767,
561
+ "eval_steps_per_second": 13.721,
562
+ "eval_wer": 0.9989766081871345,
563
+ "step": 21424
564
+ },
565
+ {
566
+ "epoch": 26.092289010321796,
567
+ "grad_norm": 1.0187814235687256,
568
+ "learning_rate": 1.779893117023784e-05,
569
+ "loss": 0.1032,
570
+ "step": 21500
571
+ },
572
+ {
573
+ "epoch": 26.69945355191257,
574
+ "grad_norm": 1.9439737796783447,
575
+ "learning_rate": 1.665817123460074e-05,
576
+ "loss": 0.0946,
577
+ "step": 22000
578
+ },
579
+ {
580
+ "epoch": 27.0,
581
+ "eval_cer": 1.0169511616880038,
582
+ "eval_loss": 0.03058658167719841,
583
+ "eval_runtime": 64.6664,
584
+ "eval_samples_per_second": 105.774,
585
+ "eval_steps_per_second": 13.222,
586
+ "eval_wer": 0.9988304093567252,
587
+ "step": 22248
588
+ },
589
+ {
590
+ "epoch": 27.306010928961747,
591
+ "grad_norm": 1.2627675533294678,
592
+ "learning_rate": 1.55076041338233e-05,
593
+ "loss": 0.0936,
594
+ "step": 22500
595
+ },
596
+ {
597
+ "epoch": 27.91317547055252,
598
+ "grad_norm": 0.8128781318664551,
599
+ "learning_rate": 1.4354034835527018e-05,
600
+ "loss": 0.09,
601
+ "step": 23000
602
+ },
603
+ {
604
+ "epoch": 28.0,
605
+ "eval_cer": 1.0176920341394025,
606
+ "eval_loss": 0.030218515545129776,
607
+ "eval_runtime": 64.0242,
608
+ "eval_samples_per_second": 106.835,
609
+ "eval_steps_per_second": 13.354,
610
+ "eval_wer": 0.9988304093567252,
611
+ "step": 23072
612
+ },
613
+ {
614
+ "epoch": 28.5197328476017,
615
+ "grad_norm": 4.884032249450684,
616
+ "learning_rate": 1.3206577220714804e-05,
617
+ "loss": 0.0813,
618
+ "step": 23500
619
+ },
620
+ {
621
+ "epoch": 29.0,
622
+ "eval_cer": 1.0178402086296823,
623
+ "eval_loss": 0.03485483676195145,
624
+ "eval_runtime": 62.0604,
625
+ "eval_samples_per_second": 110.215,
626
+ "eval_steps_per_second": 13.777,
627
+ "eval_wer": 0.9988304093567252,
628
+ "step": 23896
629
+ },
630
+ {
631
+ "epoch": 29.12629022465088,
632
+ "grad_norm": 0.34174931049346924,
633
+ "learning_rate": 1.2067421110204709e-05,
634
+ "loss": 0.0844,
635
+ "step": 24000
636
+ },
637
+ {
638
+ "epoch": 29.73345476624165,
639
+ "grad_norm": 0.053012751042842865,
640
+ "learning_rate": 1.0945609580796467e-05,
641
+ "loss": 0.0806,
642
+ "step": 24500
643
+ },
644
+ {
645
+ "epoch": 30.0,
646
+ "eval_cer": 1.0178698435277382,
647
+ "eval_loss": 0.03685862198472023,
648
+ "eval_runtime": 66.7781,
649
+ "eval_samples_per_second": 102.429,
650
+ "eval_steps_per_second": 12.804,
651
+ "eval_wer": 0.9988304093567252,
652
+ "step": 24720
653
+ },
654
+ {
655
+ "epoch": 30.34001214329083,
656
+ "grad_norm": 3.6521289348602295,
657
+ "learning_rate": 9.847777526821669e-06,
658
+ "loss": 0.0758,
659
+ "step": 25000
660
+ },
661
+ {
662
+ "epoch": 30.947176684881605,
663
+ "grad_norm": 2.7911853790283203,
664
+ "learning_rate": 8.780418017286117e-06,
665
+ "loss": 0.0763,
666
+ "step": 25500
667
+ },
668
+ {
669
+ "epoch": 31.0,
670
+ "eval_cer": 1.0164770033191086,
671
+ "eval_loss": 0.04343733936548233,
672
+ "eval_runtime": 63.0385,
673
+ "eval_samples_per_second": 108.505,
674
+ "eval_steps_per_second": 13.563,
675
+ "eval_wer": 0.9989766081871345,
676
+ "step": 25544
677
+ },
678
+ {
679
+ "epoch": 31.553734061930783,
680
+ "grad_norm": 1.7392168045043945,
681
+ "learning_rate": 7.749843892960228e-06,
682
+ "loss": 0.075,
683
+ "step": 26000
684
+ },
685
+ {
686
+ "epoch": 32.0,
687
+ "eval_cer": 1.0160028449502134,
688
+ "eval_loss": 0.030797116458415985,
689
+ "eval_runtime": 63.3897,
690
+ "eval_samples_per_second": 107.904,
691
+ "eval_steps_per_second": 13.488,
692
+ "eval_wer": 0.9988304093567252,
693
+ "step": 26368
694
+ },
695
+ {
696
+ "epoch": 32.16029143897996,
697
+ "grad_norm": 3.0110056400299072,
698
+ "learning_rate": 6.764079092952775e-06,
699
+ "loss": 0.0703,
700
+ "step": 26500
701
+ },
702
+ {
703
+ "epoch": 32.76745598057074,
704
+ "grad_norm": 1.4404278993606567,
705
+ "learning_rate": 5.8250048617236015e-06,
706
+ "loss": 0.0708,
707
+ "step": 27000
708
+ },
709
+ {
710
+ "epoch": 33.0,
711
+ "eval_cer": 1.0166844476055001,
712
+ "eval_loss": 0.030816324055194855,
713
+ "eval_runtime": 62.6517,
714
+ "eval_samples_per_second": 109.175,
715
+ "eval_steps_per_second": 13.647,
716
+ "eval_wer": 0.9988304093567252,
717
+ "step": 27192
718
+ },
719
+ {
720
+ "epoch": 33.374013357619916,
721
+ "grad_norm": 0.1344936639070511,
722
+ "learning_rate": 4.940195648850366e-06,
723
+ "loss": 0.0684,
724
+ "step": 27500
725
+ },
726
+ {
727
+ "epoch": 33.981177899210685,
728
+ "grad_norm": 4.214531421661377,
729
+ "learning_rate": 4.114884611130932e-06,
730
+ "loss": 0.0668,
731
+ "step": 28000
732
+ },
733
+ {
734
+ "epoch": 34.0,
735
+ "eval_cer": 1.0165659080132765,
736
+ "eval_loss": 0.02984553575515747,
737
+ "eval_runtime": 63.6993,
738
+ "eval_samples_per_second": 107.38,
739
+ "eval_steps_per_second": 13.422,
740
+ "eval_wer": 0.9988304093567252,
741
+ "step": 28016
742
+ },
743
+ {
744
+ "epoch": 34.58773527625986,
745
+ "grad_norm": 4.261690139770508,
746
+ "learning_rate": 3.353953006586277e-06,
747
+ "loss": 0.0639,
748
+ "step": 28500
749
+ },
750
+ {
751
+ "epoch": 35.0,
752
+ "eval_cer": 1.0163880986249407,
753
+ "eval_loss": 0.02722448669373989,
754
+ "eval_runtime": 64.9453,
755
+ "eval_samples_per_second": 105.319,
756
+ "eval_steps_per_second": 13.165,
757
+ "eval_wer": 0.9988304093567252,
758
+ "step": 28840
759
+ },
760
+ {
761
+ "epoch": 35.19429265330905,
762
+ "grad_norm": 0.7786476016044617,
763
+ "learning_rate": 2.6619013245208524e-06,
764
+ "loss": 0.0622,
765
+ "step": 29000
766
+ },
767
+ {
768
+ "epoch": 35.80145719489982,
769
+ "grad_norm": 0.006692953407764435,
770
+ "learning_rate": 2.0439854900570527e-06,
771
+ "loss": 0.0628,
772
+ "step": 29500
773
+ },
774
+ {
775
+ "epoch": 36.0,
776
+ "eval_cer": 1.0160621147463254,
777
+ "eval_loss": 0.02650887332856655,
778
+ "eval_runtime": 64.4273,
779
+ "eval_samples_per_second": 106.166,
780
+ "eval_steps_per_second": 13.271,
781
+ "eval_wer": 0.9988304093567252,
782
+ "step": 29664
783
+ },
784
+ {
785
+ "epoch": 36.408014571948996,
786
+ "grad_norm": 2.9724912643432617,
787
+ "learning_rate": 1.501384740615621e-06,
788
+ "loss": 0.0628,
789
+ "step": 30000
790
+ },
791
+ {
792
+ "epoch": 37.0,
793
+ "eval_cer": 1.0163288288288288,
794
+ "eval_loss": 0.026661457493901253,
795
+ "eval_runtime": 63.7838,
796
+ "eval_samples_per_second": 107.237,
797
+ "eval_steps_per_second": 13.405,
798
+ "eval_wer": 0.9988304093567252,
799
+ "step": 30488
800
+ },
801
+ {
802
+ "epoch": 37.01457194899818,
803
+ "grad_norm": 5.568892478942871,
804
+ "learning_rate": 1.0386208296455812e-06,
805
+ "loss": 0.0618,
806
+ "step": 30500
807
+ },
808
+ {
809
+ "epoch": 37.62173649058895,
810
+ "grad_norm": 2.4032373428344727,
811
+ "learning_rate": 6.584307495643449e-07,
812
+ "loss": 0.0586,
813
+ "step": 31000
814
+ },
815
+ {
816
+ "epoch": 38.0,
817
+ "eval_cer": 1.016151019440493,
818
+ "eval_loss": 0.02632048726081848,
819
+ "eval_runtime": 63.4705,
820
+ "eval_samples_per_second": 107.767,
821
+ "eval_steps_per_second": 13.471,
822
+ "eval_wer": 0.9988304093567252,
823
+ "step": 31312
824
+ },
825
+ {
826
+ "epoch": 38.22829386763813,
827
+ "grad_norm": 0.8156293630599976,
828
+ "learning_rate": 3.6306311427998064e-07,
829
+ "loss": 0.0599,
830
+ "step": 31500
831
+ },
832
+ {
833
+ "epoch": 38.8354584092289,
834
+ "grad_norm": 6.638393878936768,
835
+ "learning_rate": 1.5426485988442763e-07,
836
+ "loss": 0.058,
837
+ "step": 32000
838
+ },
839
+ {
840
+ "epoch": 39.0,
841
+ "eval_cer": 1.0164473684210527,
842
+ "eval_loss": 0.028048371896147728,
843
+ "eval_runtime": 62.7293,
844
+ "eval_samples_per_second": 109.04,
845
+ "eval_steps_per_second": 13.63,
846
+ "eval_wer": 0.9988304093567252,
847
+ "step": 32136
848
+ },
849
+ {
850
+ "epoch": 39.442015786278084,
851
+ "grad_norm": 0.12612353265285492,
852
+ "learning_rate": 3.327091249336667e-08,
853
+ "loss": 0.0588,
854
+ "step": 32500
855
+ },
856
+ {
857
+ "epoch": 39.95203400121433,
858
+ "eval_cer": 1.0163584637268848,
859
+ "eval_loss": 0.02687981352210045,
860
+ "eval_runtime": 63.6993,
861
+ "eval_samples_per_second": 107.38,
862
+ "eval_steps_per_second": 13.422,
863
+ "eval_wer": 0.9988304093567252,
864
+ "step": 32920
865
+ },
866
+ {
867
+ "epoch": 39.95203400121433,
868
+ "step": 32920,
869
+ "total_flos": 1.7147211678918107e+19,
870
+ "train_loss": 1.2561371190290116,
871
+ "train_runtime": 30821.1471,
872
+ "train_samples_per_second": 68.369,
873
+ "train_steps_per_second": 1.068
874
+ }
875
+ ],
876
+ "logging_steps": 500,
877
+ "max_steps": 32920,
878
+ "num_input_tokens_seen": 0,
879
+ "num_train_epochs": 40,
880
+ "save_steps": 400,
881
+ "stateful_callbacks": {
882
+ "TrainerControl": {
883
+ "args": {
884
+ "should_epoch_stop": false,
885
+ "should_evaluate": false,
886
+ "should_log": false,
887
+ "should_save": true,
888
+ "should_training_stop": true
889
+ },
890
+ "attributes": {}
891
+ }
892
+ },
893
+ "total_flos": 1.7147211678918107e+19,
894
+ "train_batch_size": 32,
895
+ "trial_name": null,
896
+ "trial_params": null
897
+ }