File size: 139,597 Bytes
9530b1f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
2023-12-20 17:30:48,670 INFO [train.py:953] (2/4) Training started
2023-12-20 17:30:48,670 INFO [train.py:963] (2/4) Device: cuda:2
2023-12-20 17:30:48,670 INFO [train.py:965] (2/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '0.0.0+unknown.version', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'audio_tagging', 'icefall-git-sha1': 'bd01c212-clean', 'icefall-git-date': 'Tue Dec 19 17:20:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_audio_tagging', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/lhotse_development/lhotse_at/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-7-1218101249-5bcbfb5567-jsftr', 'IP address': '10.177.6.147'}, 'world_size': 4, 'master_port': 13455, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp_at_as_full'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'num_events': 527, 'audioset_subset': 'full', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures'}
2023-12-20 17:30:48,670 INFO [train.py:967] (2/4) About to create model
2023-12-20 17:30:54,289 INFO [train.py:971] (2/4) Number of model parameters: 64264454
2023-12-20 17:30:56,960 INFO [train.py:986] (2/4) Using DDP
2023-12-20 17:30:57,435 INFO [at_datamodule.py:398] (2/4) About to get the audioset cuts for KD.
2023-12-20 17:30:57,498 INFO [at_datamodule.py:223] (2/4) Enable MUSAN
2023-12-20 17:30:57,498 INFO [at_datamodule.py:224] (2/4) About to get Musan cuts
2023-12-20 17:30:59,983 INFO [at_datamodule.py:248] (2/4) Enable SpecAugment
2023-12-20 17:30:59,983 INFO [at_datamodule.py:249] (2/4) Time warp factor: 80
2023-12-20 17:30:59,984 INFO [at_datamodule.py:259] (2/4) Num frame mask: 10
2023-12-20 17:30:59,984 INFO [at_datamodule.py:272] (2/4) About to create train dataset
2023-12-20 17:30:59,984 INFO [at_datamodule.py:299] (2/4) Using DynamicBucketingSampler.
2023-12-20 17:31:02,097 INFO [at_datamodule.py:315] (2/4) About to create train dataloader
2023-12-20 17:31:02,098 INFO [at_datamodule.py:410] (2/4) About to get test-other cuts
2023-12-20 17:31:02,100 INFO [at_datamodule.py:346] (2/4) About to create dev dataset
2023-12-20 17:31:02,576 INFO [at_datamodule.py:363] (2/4) About to create dev dataloader
2023-12-20 17:31:25,020 INFO [train.py:886] (2/4) Epoch 1, batch 0, loss[loss=2.283, audio_tagging_loss=2.283, over 20581.00 frames. ], tot_loss[loss=2.283, audio_tagging_loss=2.283, over 20581.00 frames. ], batch size: 106, lr: 2.25e-02, grad_scale: 2.0
2023-12-20 17:31:25,021 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:31:46,185 INFO [train.py:917] (2/4) Epoch 1, validation: loss=1.716, audio_tagging_loss=1.716, over 3737520.00 frames. 
2023-12-20 17:31:46,186 INFO [train.py:918] (2/4) Maximum memory allocated so far is 13081MB
2023-12-20 17:31:48,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=0.0, ans=0.5
2023-12-20 17:31:50,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=0.0, ans=0.3
2023-12-20 17:31:53,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=0.0, ans=0.9
2023-12-20 17:31:54,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=0.0, ans=0.5
2023-12-20 17:31:56,787 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+02 8.568e+02 1.002e+03 1.369e+03 1.715e+03, threshold=4.006e+03, percent-clipped=0.0
2023-12-20 17:31:58,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=66.66666666666667, ans=0.8976666666666667
2023-12-20 17:31:59,471 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=148.42 vs. limit=7.525
2023-12-20 17:32:01,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=66.66666666666667, ans=0.496875
2023-12-20 17:32:07,424 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 3.256e+02 7.044e+02 1.161e+03 1.783e+03, threshold=2.818e+03, percent-clipped=0.0
2023-12-20 17:32:09,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=133.33333333333334, ans=0.7513333333333333
2023-12-20 17:32:13,323 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=60.35 vs. limit=4.053333333333334
2023-12-20 17:32:27,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=60.54 vs. limit=7.575
2023-12-20 17:32:30,516 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=222.77 vs. limit=5.133333333333334
2023-12-20 17:32:30,896 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.273e+01 1.290e+02 2.793e+02 8.337e+02 1.783e+03, threshold=1.117e+03, percent-clipped=0.0
2023-12-20 17:32:32,702 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=17.53 vs. limit=7.6
2023-12-20 17:32:33,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=266.6666666666667, ans=0.4875
2023-12-20 17:32:34,708 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=313.04 vs. limit=7.6
2023-12-20 17:32:37,228 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=21.24 vs. limit=4.1066666666666665
2023-12-20 17:32:38,469 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=110.90 vs. limit=7.7
2023-12-20 17:32:39,231 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=111.36 vs. limit=4.1066666666666665
2023-12-20 17:32:40,041 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2023-12-20 17:32:41,292 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=255.36 vs. limit=7.75
2023-12-20 17:32:41,374 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=366.89 vs. limit=7.625
2023-12-20 17:32:42,072 INFO [train.py:886] (2/4) Epoch 1, batch 50, loss[loss=0.06074, audio_tagging_loss=0.06074, over 25000.00 frames. ], tot_loss[loss=0.3051, audio_tagging_loss=0.3051, over 1114689.49 frames. ], batch size: 100, lr: 2.48e-02, grad_scale: 2.0
2023-12-20 17:33:00,438 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=54.11 vs. limit=7.63
2023-12-20 17:33:07,720 INFO [train.py:886] (2/4) Epoch 2, batch 0, loss[loss=0.06753, audio_tagging_loss=0.06753, over 21552.00 frames. ], tot_loss[loss=0.06753, audio_tagging_loss=0.06753, over 21552.00 frames. ], batch size: 106, lr: 2.44e-02, grad_scale: 4.0
2023-12-20 17:33:07,721 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:33:15,956 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0380, 5.2993, 4.8652, 5.2336], device='cuda:2')
2023-12-20 17:33:23,169 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.8107, 4.8136, 4.8127, 4.8180], device='cuda:2')
2023-12-20 17:33:28,174 INFO [train.py:917] (2/4) Epoch 2, validation: loss=0.0597, audio_tagging_loss=0.0597, over 3737520.00 frames. 
2023-12-20 17:33:28,175 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14643MB
2023-12-20 17:33:32,027 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=213.93 vs. limit=5.173333333333334
2023-12-20 17:33:36,055 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=496.21 vs. limit=7.63
2023-12-20 17:33:40,925 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=78.05 vs. limit=7.655
2023-12-20 17:33:41,696 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=332.07 vs. limit=7.81
2023-12-20 17:33:42,886 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=413.56 vs. limit=7.655
2023-12-20 17:33:47,594 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=40.32 vs. limit=7.655
2023-12-20 17:33:50,301 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=230.04 vs. limit=7.655
2023-12-20 17:33:51,571 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=263.30 vs. limit=7.68
2023-12-20 17:33:51,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=480.0, ans=7.68
2023-12-20 17:33:52,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=480.0, ans=0.4775
2023-12-20 17:34:03,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.20 vs. limit=3.082
2023-12-20 17:34:06,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=546.6666666666666, ans=0.17950000000000002
2023-12-20 17:34:06,711 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=200.49 vs. limit=5.273333333333333
2023-12-20 17:34:14,753 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=15.66 vs. limit=4.245333333333333
2023-12-20 17:34:19,210 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=15.54 vs. limit=4.245333333333333
2023-12-20 17:34:21,478 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=20.41 vs. limit=5.153333333333333
2023-12-20 17:34:25,459 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.968e+01 6.154e+01 2.791e+02 2.019e+03, threshold=1.231e+02, percent-clipped=1.0
2023-12-20 17:34:26,577 INFO [train.py:886] (2/4) Epoch 2, batch 50, loss[loss=0.05319, audio_tagging_loss=0.05319, over 25000.00 frames. ], tot_loss[loss=0.05741, audio_tagging_loss=0.05741, over 1123870.55 frames. ], batch size: 100, lr: 2.66e-02, grad_scale: 2.0
2023-12-20 17:34:44,472 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=301.26 vs. limit=7.76
2023-12-20 17:34:52,047 INFO [train.py:886] (2/4) Epoch 3, batch 0, loss[loss=0.06629, audio_tagging_loss=0.06629, over 20834.00 frames. ], tot_loss[loss=0.06629, audio_tagging_loss=0.06629, over 20834.00 frames. ], batch size: 106, lr: 2.54e-02, grad_scale: 4.0
2023-12-20 17:34:52,048 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:35:12,448 INFO [train.py:917] (2/4) Epoch 3, validation: loss=0.05878, audio_tagging_loss=0.05878, over 3737520.00 frames. 
2023-12-20 17:35:12,448 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14778MB
2023-12-20 17:35:13,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=693.3333333333334, ans=0.4675
2023-12-20 17:35:22,421 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=4.277333333333333
2023-12-20 17:35:29,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=760.0, ans=5.475
2023-12-20 17:35:31,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=146.93 vs. limit=8.07
2023-12-20 17:35:33,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=760.0, ans=0.1715
2023-12-20 17:35:41,630 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=182.75 vs. limit=7.81
2023-12-20 17:35:45,467 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=238.44 vs. limit=7.81
2023-12-20 17:35:48,261 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=22.94 vs. limit=7.835
2023-12-20 17:35:51,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=893.3333333333334, ans=0.458125
2023-12-20 17:35:55,313 WARNING [optim.py:500] (2/4) Scaling gradients by 0.09217905253171921, model_norm_threshold=123.07855224609375
2023-12-20 17:35:55,463 WARNING [optim.py:572] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.7.weight with proportion 0.48, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.614e+05, grad_sumsq=6.752e+08, orig_rms_sq=1.276e-03
2023-12-20 17:35:55,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=348.28 vs. limit=7.835
2023-12-20 17:35:56,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=893.3333333333334, ans=0.458125
2023-12-20 17:36:06,020 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=320.93 vs. limit=7.86
2023-12-20 17:36:09,438 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=159.67 vs. limit=5.48
2023-12-20 17:36:11,142 INFO [train.py:886] (2/4) Epoch 3, batch 50, loss[loss=0.0548, audio_tagging_loss=0.0548, over 25000.00 frames. ], tot_loss[loss=0.05632, audio_tagging_loss=0.05632, over 1116987.31 frames. ], batch size: 100, lr: 2.75e-02, grad_scale: 4.0
2023-12-20 17:36:11,464 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=200.40 vs. limit=5.513333333333334
2023-12-20 17:36:35,802 INFO [train.py:886] (2/4) Epoch 4, batch 0, loss[loss=0.05267, audio_tagging_loss=0.05267, over 25000.00 frames. ], tot_loss[loss=0.05267, audio_tagging_loss=0.05267, over 25000.00 frames. ], batch size: 100, lr: 2.58e-02, grad_scale: 8.0
2023-12-20 17:36:35,803 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:36:54,828 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.2928, 5.0731, 4.1999, 4.6702], device='cuda:2')
2023-12-20 17:36:55,849 INFO [train.py:917] (2/4) Epoch 4, validation: loss=0.05673, audio_tagging_loss=0.05673, over 3737520.00 frames. 
2023-12-20 17:36:55,850 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14778MB
2023-12-20 17:37:05,033 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.17 vs. limit=8.28
2023-12-20 17:37:11,618 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=134.90 vs. limit=5.553333333333334
2023-12-20 17:37:11,986 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=103.01 vs. limit=7.915
2023-12-20 17:37:15,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=4.442666666666667
2023-12-20 17:37:17,163 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.47 vs. limit=4.442666666666667
2023-12-20 17:37:18,120 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=152.91 vs. limit=5.553333333333334
2023-12-20 17:37:19,163 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=105.25 vs. limit=8.33
2023-12-20 17:37:22,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1173.3333333333333, ans=0.8589333333333333
2023-12-20 17:37:22,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1173.3333333333333, ans=0.156
2023-12-20 17:37:22,415 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=35.86 vs. limit=5.293333333333333
2023-12-20 17:37:24,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1173.3333333333333, ans=0.8589333333333333
2023-12-20 17:37:27,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1173.3333333333333, ans=0.445
2023-12-20 17:37:30,098 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=325.36 vs. limit=8.38
2023-12-20 17:37:31,186 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=238.31 vs. limit=8.43
2023-12-20 17:37:33,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1240.0, ans=0.28759999999999997
2023-12-20 17:37:33,470 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=114.42 vs. limit=7.965
2023-12-20 17:37:34,896 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=332.68 vs. limit=7.965
2023-12-20 17:37:38,260 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=162.86 vs. limit=7.965
2023-12-20 17:37:40,255 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=131.95 vs. limit=7.965
2023-12-20 17:37:42,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=64.16 vs. limit=5.653333333333333
2023-12-20 17:37:49,920 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+01 2.504e+01 2.720e+01 3.182e+01 1.335e+03, threshold=5.440e+01, percent-clipped=1.0
2023-12-20 17:37:54,275 INFO [train.py:886] (2/4) Epoch 4, batch 50, loss[loss=0.05111, audio_tagging_loss=0.05111, over 25000.00 frames. ], tot_loss[loss=0.05369, audio_tagging_loss=0.05369, over 1121668.45 frames. ], batch size: 100, lr: 2.77e-02, grad_scale: 4.0
2023-12-20 17:38:12,172 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=295.48 vs. limit=8.02
2023-12-20 17:38:19,535 INFO [train.py:886] (2/4) Epoch 5, batch 0, loss[loss=0.06715, audio_tagging_loss=0.06715, over 20425.00 frames. ], tot_loss[loss=0.06715, audio_tagging_loss=0.06715, over 20425.00 frames. ], batch size: 106, lr: 2.59e-02, grad_scale: 8.0
2023-12-20 17:38:19,536 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:38:39,889 INFO [train.py:917] (2/4) Epoch 5, validation: loss=0.05523, audio_tagging_loss=0.05523, over 3737520.00 frames. 
2023-12-20 17:38:39,890 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14778MB
2023-12-20 17:38:44,050 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.36 vs. limit=4.554666666666667
2023-12-20 17:38:47,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1386.6666666666667, ans=0.0688
2023-12-20 17:38:54,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1453.3333333333333, ans=0.431875
2023-12-20 17:39:00,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1453.3333333333333, ans=0.28546666666666665
2023-12-20 17:39:05,393 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=96.82 vs. limit=8.64
2023-12-20 17:39:10,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1520.0, ans=5.76
2023-12-20 17:39:14,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1520.0, ans=0.2228
2023-12-20 17:39:16,626 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=49.08 vs. limit=8.095
2023-12-20 17:39:21,446 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=458.13 vs. limit=8.095
2023-12-20 17:39:38,872 INFO [train.py:886] (2/4) Epoch 5, batch 50, loss[loss=0.05064, audio_tagging_loss=0.05064, over 25000.00 frames. ], tot_loss[loss=0.05248, audio_tagging_loss=0.05248, over 1117817.77 frames. ], batch size: 100, lr: 2.77e-02, grad_scale: 8.0
2023-12-20 17:39:39,258 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=78.15 vs. limit=5.86
2023-12-20 17:40:04,929 INFO [train.py:886] (2/4) Epoch 6, batch 0, loss[loss=0.04925, audio_tagging_loss=0.04925, over 25000.00 frames. ], tot_loss[loss=0.04925, audio_tagging_loss=0.04925, over 25000.00 frames. ], batch size: 100, lr: 2.59e-02, grad_scale: 16.0
2023-12-20 17:40:04,930 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:40:19,729 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.0755, 3.7526, 5.2326, 3.6878], device='cuda:2')
2023-12-20 17:40:25,816 INFO [train.py:917] (2/4) Epoch 6, validation: loss=0.05425, audio_tagging_loss=0.05425, over 3737520.00 frames. 
2023-12-20 17:40:25,817 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:40:26,280 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=81.13 vs. limit=8.8
2023-12-20 17:40:28,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1733.3333333333333, ans=0.08916666666666667
2023-12-20 17:40:29,609 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=22.31 vs. limit=5.433333333333334
2023-12-20 17:40:36,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=4.72
2023-12-20 17:40:37,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1800.0, ans=0.837
2023-12-20 17:40:52,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1866.6666666666667, ans=0.4125
2023-12-20 17:40:52,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1866.6666666666667, ans=0.8346666666666667
2023-12-20 17:40:53,040 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=319.38 vs. limit=8.2
2023-12-20 17:40:58,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1866.6666666666667, ans=0.2813333333333333
2023-12-20 17:41:03,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=122.10 vs. limit=8.95
2023-12-20 17:41:07,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1933.3333333333333, ans=0.409375
2023-12-20 17:41:07,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1933.3333333333333, ans=0.409375
2023-12-20 17:41:10,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1933.3333333333333, ans=0.05650000000000001
2023-12-20 17:41:10,726 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=17.73 vs. limit=5.483333333333333
2023-12-20 17:41:13,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=3.3
2023-12-20 17:41:14,959 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.556e+01 2.831e+01 3.472e+01 7.747e+01, threshold=5.662e+01, percent-clipped=6.0
2023-12-20 17:41:18,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2000.0, ans=0.8300000000000001
2023-12-20 17:41:20,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=64.62 vs. limit=9.0
2023-12-20 17:41:23,850 INFO [train.py:886] (2/4) Epoch 6, batch 50, loss[loss=0.04601, audio_tagging_loss=0.04601, over 25000.00 frames. ], tot_loss[loss=0.0512, audio_tagging_loss=0.0512, over 1124299.94 frames. ], batch size: 100, lr: 2.76e-02, grad_scale: 16.0
2023-12-20 17:41:49,202 INFO [train.py:886] (2/4) Epoch 7, batch 0, loss[loss=0.05184, audio_tagging_loss=0.05184, over 24101.00 frames. ], tot_loss[loss=0.05184, audio_tagging_loss=0.05184, over 24101.00 frames. ], batch size: 100, lr: 2.60e-02, grad_scale: 32.0
2023-12-20 17:41:49,202 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:42:09,822 INFO [train.py:917] (2/4) Epoch 7, validation: loss=0.05269, audio_tagging_loss=0.05269, over 3737520.00 frames. 
2023-12-20 17:42:09,823 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:42:11,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2080.0, ans=0.40249999999999997
2023-12-20 17:42:13,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=39.36 vs. limit=8.28
2023-12-20 17:42:13,877 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=286.41 vs. limit=8.28
2023-12-20 17:42:21,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2146.6666666666665, ans=0.399375
2023-12-20 17:42:26,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.94 vs. limit=8.305
2023-12-20 17:42:30,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=9.86 vs. limit=4.429333333333333
2023-12-20 17:42:35,984 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=25.24 vs. limit=9.16
2023-12-20 17:42:37,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2213.3333333333335, ans=0.39625
2023-12-20 17:43:07,588 INFO [train.py:886] (2/4) Epoch 7, batch 50, loss[loss=0.04344, audio_tagging_loss=0.04344, over 25000.00 frames. ], tot_loss[loss=0.05087, audio_tagging_loss=0.05087, over 1122403.67 frames. ], batch size: 100, lr: 2.76e-02, grad_scale: 1.0
2023-12-20 17:43:08,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=33.27 vs. limit=8.405
2023-12-20 17:43:32,848 INFO [train.py:886] (2/4) Epoch 8, batch 0, loss[loss=0.05077, audio_tagging_loss=0.05077, over 24170.00 frames. ], tot_loss[loss=0.05077, audio_tagging_loss=0.05077, over 24170.00 frames. ], batch size: 100, lr: 2.60e-02, grad_scale: 2.0
2023-12-20 17:43:32,848 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:43:47,796 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.8993, 4.3654, 3.4870, 3.3905], device='cuda:2')
2023-12-20 17:43:53,651 INFO [train.py:917] (2/4) Epoch 8, validation: loss=0.05155, audio_tagging_loss=0.05155, over 3737520.00 frames. 
2023-12-20 17:43:53,652 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:44:23,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=40.97 vs. limit=9.42
2023-12-20 17:44:23,966 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.71 vs. limit=5.024
2023-12-20 17:44:37,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2626.6666666666665, ans=0.5
2023-12-20 17:44:42,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2693.3333333333335, ans=0.09899999999999999
2023-12-20 17:44:43,338 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.635e+01 3.487e+01 4.265e+01 5.657e+01 4.687e+02, threshold=8.530e+01, percent-clipped=24.0
2023-12-20 17:44:43,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2693.3333333333335, ans=0.04158333333333333
2023-12-20 17:44:48,472 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=26.61 vs. limit=8.51
2023-12-20 17:44:50,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2760.0, ans=0.8034
2023-12-20 17:44:51,056 INFO [train.py:886] (2/4) Epoch 8, batch 50, loss[loss=0.04868, audio_tagging_loss=0.04868, over 25000.00 frames. ], tot_loss[loss=0.04903, audio_tagging_loss=0.04903, over 1126572.41 frames. ], batch size: 100, lr: 2.75e-02, grad_scale: 2.0
2023-12-20 17:45:09,995 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.07 vs. limit=6.386666666666667
2023-12-20 17:45:16,350 INFO [train.py:886] (2/4) Epoch 9, batch 0, loss[loss=0.0511, audio_tagging_loss=0.0511, over 24103.00 frames. ], tot_loss[loss=0.0511, audio_tagging_loss=0.0511, over 24103.00 frames. ], batch size: 100, lr: 2.61e-02, grad_scale: 4.0
2023-12-20 17:45:16,350 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:45:37,427 INFO [train.py:917] (2/4) Epoch 9, validation: loss=0.04977, audio_tagging_loss=0.04977, over 3737520.00 frames. 
2023-12-20 17:45:37,428 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:45:37,767 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=92.44 vs. limit=9.58
2023-12-20 17:45:38,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=35.85 vs. limit=8.54
2023-12-20 17:45:52,080 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=167.20 vs. limit=8.565
2023-12-20 17:45:55,238 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=25.80 vs. limit=8.565
2023-12-20 17:46:04,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2906.6666666666665, ans=0.36375
2023-12-20 17:46:06,137 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=135.02 vs. limit=8.59
2023-12-20 17:46:10,579 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=93.55 vs. limit=8.615
2023-12-20 17:46:14,905 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=18.12 vs. limit=8.615
2023-12-20 17:46:15,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2973.3333333333335, ans=0.360625
2023-12-20 17:46:19,650 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=31.90 vs. limit=9.73
2023-12-20 17:46:23,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3040.0, ans=5.76
2023-12-20 17:46:24,027 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=121.48 vs. limit=8.64
2023-12-20 17:46:30,447 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=25.05 vs. limit=6.52
2023-12-20 17:46:32,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3106.6666666666665, ans=0.7912666666666667
2023-12-20 17:46:33,270 INFO [train.py:886] (2/4) Epoch 9, batch 50, loss[loss=0.04414, audio_tagging_loss=0.04414, over 25000.00 frames. ], tot_loss[loss=0.04714, audio_tagging_loss=0.04714, over 1123255.65 frames. ], batch size: 100, lr: 2.75e-02, grad_scale: 4.0
2023-12-20 17:46:33,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=32.84 vs. limit=8.665
2023-12-20 17:46:52,658 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.87 vs. limit=5.78
2023-12-20 17:46:59,483 INFO [train.py:886] (2/4) Epoch 10, batch 0, loss[loss=0.04603, audio_tagging_loss=0.04603, over 24103.00 frames. ], tot_loss[loss=0.04603, audio_tagging_loss=0.04603, over 24103.00 frames. ], batch size: 100, lr: 2.62e-02, grad_scale: 8.0
2023-12-20 17:46:59,484 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:47:20,692 INFO [train.py:917] (2/4) Epoch 10, validation: loss=0.04858, audio_tagging_loss=0.04858, over 3737520.00 frames. 
2023-12-20 17:47:20,693 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:47:21,188 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=25.96 vs. limit=8.67
2023-12-20 17:47:21,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=5.248
2023-12-20 17:47:31,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=3186.6666666666665, ans=0.07966666666666668
2023-12-20 17:47:37,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3186.6666666666665, ans=0.35062499999999996
2023-12-20 17:47:45,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3253.3333333333335, ans=0.34750000000000003
2023-12-20 17:47:48,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3253.3333333333335, ans=0.07799999999999999
2023-12-20 17:47:49,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=31.91 vs. limit=8.72
2023-12-20 17:47:58,344 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.51 vs. limit=9.99
2023-12-20 17:48:01,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3320.0, ans=0.0755
2023-12-20 17:48:03,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.44 vs. limit=6.66
2023-12-20 17:48:04,134 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.619e+01 3.726e+01 4.484e+01 5.424e+01 1.858e+02, threshold=8.969e+01, percent-clipped=3.0
2023-12-20 17:48:15,997 INFO [train.py:886] (2/4) Epoch 10, batch 50, loss[loss=0.04358, audio_tagging_loss=0.04358, over 25000.00 frames. ], tot_loss[loss=0.0462, audio_tagging_loss=0.0462, over 1119906.38 frames. ], batch size: 100, lr: 2.71e-02, grad_scale: 8.0
2023-12-20 17:48:40,825 INFO [train.py:886] (2/4) Epoch 11, batch 0, loss[loss=0.04723, audio_tagging_loss=0.04723, over 24078.00 frames. ], tot_loss[loss=0.04723, audio_tagging_loss=0.04723, over 24078.00 frames. ], batch size: 100, lr: 2.58e-02, grad_scale: 16.0
2023-12-20 17:48:40,826 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:48:53,636 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.3948, 2.3390, 3.4762, 2.4205], device='cuda:2')
2023-12-20 17:49:01,994 INFO [train.py:917] (2/4) Epoch 11, validation: loss=0.04728, audio_tagging_loss=0.04728, over 3737520.00 frames. 
2023-12-20 17:49:01,995 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:49:05,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3466.6666666666665, ans=0.7846666666666666
2023-12-20 17:49:10,845 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.75 vs. limit=10.1
2023-12-20 17:49:13,799 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=5.413333333333333
2023-12-20 17:49:22,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3533.3333333333335, ans=0.334375
2023-12-20 17:49:24,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.46 vs. limit=6.766666666666667
2023-12-20 17:49:28,421 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.80 vs. limit=6.8
2023-12-20 17:49:31,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3600.0, ans=0.33125
2023-12-20 17:49:32,726 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.87 vs. limit=8.85
2023-12-20 17:49:33,807 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=43.37 vs. limit=8.85
2023-12-20 17:49:34,197 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.32 vs. limit=10.2
2023-12-20 17:49:34,814 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=33.69 vs. limit=10.2
2023-12-20 17:49:40,594 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.99 vs. limit=10.25
2023-12-20 17:49:42,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3666.6666666666665, ans=0.06249999999999997
2023-12-20 17:49:43,617 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=62.52 vs. limit=8.875
2023-12-20 17:49:46,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=5.493333333333333
2023-12-20 17:49:48,262 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=31.55 vs. limit=8.9
2023-12-20 17:49:49,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=8.9
2023-12-20 17:49:53,288 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=62.67 vs. limit=8.9
2023-12-20 17:49:56,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3733.3333333333335, ans=0.05999999999999997
2023-12-20 17:49:57,433 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.452e+00
2023-12-20 17:49:57,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3800.0, ans=0.014499999999999985
2023-12-20 17:49:57,531 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=42.27 vs. limit=8.925
2023-12-20 17:49:58,312 INFO [train.py:886] (2/4) Epoch 11, batch 50, loss[loss=0.04255, audio_tagging_loss=0.04255, over 25000.00 frames. ], tot_loss[loss=0.04557, audio_tagging_loss=0.04557, over 1117498.70 frames. ], batch size: 100, lr: 2.58e-02, grad_scale: 16.0
2023-12-20 17:49:58,780 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=8.925
2023-12-20 17:50:23,186 INFO [train.py:886] (2/4) Epoch 12, batch 0, loss[loss=0.0448, audio_tagging_loss=0.0448, over 24138.00 frames. ], tot_loss[loss=0.0448, audio_tagging_loss=0.0448, over 24138.00 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 32.0
2023-12-20 17:50:23,187 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:50:36,041 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.9587, 1.7990, 1.8133, 1.8953], device='cuda:2')
2023-12-20 17:50:44,477 INFO [train.py:917] (2/4) Epoch 12, validation: loss=0.04619, audio_tagging_loss=0.04619, over 3737520.00 frames. 
2023-12-20 17:50:44,478 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:50:51,736 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.73 vs. limit=5.953333333333333
2023-12-20 17:50:51,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=8.93
2023-12-20 17:50:57,718 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=71.88 vs. limit=8.955
2023-12-20 17:50:57,757 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.99 vs. limit=6.9399999999999995
2023-12-20 17:51:00,137 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=8.955
2023-12-20 17:51:06,694 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=8.955
2023-12-20 17:51:07,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3946.6666666666665, ans=0.2592
2023-12-20 17:51:10,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3946.6666666666665, ans=0.05199999999999999
2023-12-20 17:51:11,498 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.21 vs. limit=10.46
2023-12-20 17:51:24,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4013.3333333333335, ans=0.311875
2023-12-20 17:51:25,063 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+01 3.849e+01 4.841e+01 5.572e+01 8.770e+01, threshold=9.682e+01, percent-clipped=0.0
2023-12-20 17:51:26,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=27.57 vs. limit=9.004999999999999
2023-12-20 17:51:27,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4013.3333333333335, ans=0.311875
2023-12-20 17:51:33,074 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.21 vs. limit=7.04
2023-12-20 17:51:35,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=4080.0, ans=9.03
2023-12-20 17:51:40,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4146.666666666667, ans=0.04938888888888889
2023-12-20 17:51:40,951 INFO [train.py:886] (2/4) Epoch 12, batch 50, loss[loss=0.04178, audio_tagging_loss=0.04178, over 25000.00 frames. ], tot_loss[loss=0.04376, audio_tagging_loss=0.04376, over 1120004.19 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 32.0
2023-12-20 17:52:04,705 INFO [train.py:886] (2/4) Epoch 13, batch 0, loss[loss=0.03956, audio_tagging_loss=0.03956, over 25000.00 frames. ], tot_loss[loss=0.03956, audio_tagging_loss=0.03956, over 25000.00 frames. ], batch size: 100, lr: 2.38e-02, grad_scale: 32.0
2023-12-20 17:52:04,705 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:52:12,866 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.9063, 1.1653, 1.8288, 1.8131], device='cuda:2')
2023-12-20 17:52:25,607 INFO [train.py:917] (2/4) Epoch 13, validation: loss=0.04525, audio_tagging_loss=0.04525, over 3737520.00 frames. 
2023-12-20 17:52:25,608 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:52:25,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4160.0, ans=0.2584
2023-12-20 17:52:25,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4160.0, ans=0.07400000000000001
2023-12-20 17:52:28,476 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=17.39 vs. limit=9.06
2023-12-20 17:52:29,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4160.0, ans=0.0
2023-12-20 17:52:29,163 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.32 vs. limit=10.620000000000001
2023-12-20 17:52:36,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4226.666666666667, ans=0.009950724637681159
2023-12-20 17:52:39,203 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=29.03 vs. limit=10.67
2023-12-20 17:52:41,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.78 vs. limit=10.67
2023-12-20 17:52:42,810 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=33.46 vs. limit=9.085
2023-12-20 17:52:42,961 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.62 vs. limit=5.0
2023-12-20 17:52:49,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4293.333333333333, ans=0.07
2023-12-20 17:52:50,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4293.333333333333, ans=0.04949747468305833
2023-12-20 17:52:59,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=9.135
2023-12-20 17:53:04,180 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.15 vs. limit=9.135
2023-12-20 17:53:11,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=51.27 vs. limit=9.16
2023-12-20 17:53:15,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4426.666666666667, ans=0.009907246376811594
2023-12-20 17:53:19,091 INFO [train.py:886] (2/4) Epoch 13, batch 50, loss[loss=0.04299, audio_tagging_loss=0.04299, over 25000.00 frames. ], tot_loss[loss=0.04317, audio_tagging_loss=0.04317, over 1121045.90 frames. ], batch size: 100, lr: 2.38e-02, grad_scale: 32.0
2023-12-20 17:53:43,854 INFO [train.py:886] (2/4) Epoch 14, batch 0, loss[loss=0.04288, audio_tagging_loss=0.04288, over 25000.00 frames. ], tot_loss[loss=0.04288, audio_tagging_loss=0.04288, over 25000.00 frames. ], batch size: 100, lr: 2.29e-02, grad_scale: 32.0
2023-12-20 17:53:43,854 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:53:55,626 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.9270, 1.7806, 2.1339, 1.9149], device='cuda:2')
2023-12-20 17:54:05,165 INFO [train.py:917] (2/4) Epoch 14, validation: loss=0.04503, audio_tagging_loss=0.04503, over 3737520.00 frames. 
2023-12-20 17:54:05,166 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:54:11,065 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=10.879999999999999
2023-12-20 17:54:26,446 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=9.24
2023-12-20 17:54:29,279 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.038e+01
2023-12-20 17:54:31,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4640.0, ans=0.2825
2023-12-20 17:54:31,642 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=9.24
2023-12-20 17:54:32,612 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.14 vs. limit=7.32
2023-12-20 17:54:35,687 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=24.44 vs. limit=10.98
2023-12-20 17:54:37,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4706.666666666667, ans=0.27937500000000004
2023-12-20 17:54:38,362 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.429e+01 4.195e+01 5.214e+01 6.348e+01 1.962e+02, threshold=1.043e+02, percent-clipped=5.0
2023-12-20 17:54:40,051 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.22 vs. limit=9.265
2023-12-20 17:54:57,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4840.0, ans=0.273125
2023-12-20 17:54:58,020 INFO [train.py:886] (2/4) Epoch 14, batch 50, loss[loss=0.03772, audio_tagging_loss=0.03772, over 25000.00 frames. ], tot_loss[loss=0.04195, audio_tagging_loss=0.04195, over 1123574.72 frames. ], batch size: 100, lr: 2.29e-02, grad_scale: 32.0
2023-12-20 17:55:22,498 INFO [train.py:886] (2/4) Epoch 15, batch 0, loss[loss=0.04001, audio_tagging_loss=0.04001, over 25000.00 frames. ], tot_loss[loss=0.04001, audio_tagging_loss=0.04001, over 25000.00 frames. ], batch size: 100, lr: 2.21e-02, grad_scale: 32.0
2023-12-20 17:55:22,498 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:55:43,372 INFO [train.py:917] (2/4) Epoch 15, validation: loss=0.04452, audio_tagging_loss=0.04452, over 3737520.00 frames. 
2023-12-20 17:55:43,373 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:55:44,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=4853.333333333333, ans=0.20146666666666668
2023-12-20 17:55:44,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4853.333333333333, ans=9.32
2023-12-20 17:55:47,837 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=36.73 vs. limit=9.32
2023-12-20 17:55:49,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4853.333333333333, ans=8.033333333333333
2023-12-20 17:55:49,892 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=27.66 vs. limit=9.32
2023-12-20 17:55:58,425 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=11.19
2023-12-20 17:56:01,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.60 vs. limit=11.19
2023-12-20 17:56:16,480 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=27.61 vs. limit=9.395
2023-12-20 17:56:18,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=9.395
2023-12-20 17:56:23,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5053.333333333333, ans=0.045611111111111116
2023-12-20 17:56:35,386 INFO [train.py:886] (2/4) Epoch 15, batch 50, loss[loss=0.04041, audio_tagging_loss=0.04041, over 25000.00 frames. ], tot_loss[loss=0.04165, audio_tagging_loss=0.04165, over 1115568.63 frames. ], batch size: 100, lr: 2.21e-02, grad_scale: 32.0
2023-12-20 17:57:00,247 INFO [train.py:886] (2/4) Epoch 16, batch 0, loss[loss=0.0395, audio_tagging_loss=0.0395, over 25000.00 frames. ], tot_loss[loss=0.0395, audio_tagging_loss=0.0395, over 25000.00 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 32.0
2023-12-20 17:57:00,248 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:57:12,226 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.6515, 2.9810, 2.7109, 3.1295], device='cuda:2')
2023-12-20 17:57:21,257 INFO [train.py:917] (2/4) Epoch 16, validation: loss=0.04383, audio_tagging_loss=0.04383, over 3737520.00 frames. 
2023-12-20 17:57:21,257 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:57:22,530 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.040e+02
2023-12-20 17:57:26,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5200.0, ans=0.25625
2023-12-20 17:57:27,290 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=30.31 vs. limit=9.45
2023-12-20 17:57:27,437 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=6.08
2023-12-20 17:57:31,593 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=9.475
2023-12-20 17:57:38,836 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=9.475
2023-12-20 17:57:44,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=9.5
2023-12-20 17:57:49,876 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.797e+01 3.933e+01 4.813e+01 5.766e+01 2.623e+02, threshold=9.626e+01, percent-clipped=4.0
2023-12-20 17:57:56,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=5400.0, ans=0.7110000000000001
2023-12-20 17:58:12,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5466.666666666667, ans=0.24375000000000002
2023-12-20 17:58:13,998 INFO [train.py:886] (2/4) Epoch 16, batch 50, loss[loss=0.04058, audio_tagging_loss=0.04058, over 25000.00 frames. ], tot_loss[loss=0.04029, audio_tagging_loss=0.04029, over 1124080.10 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 32.0
2023-12-20 17:58:14,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5533.333333333333, ans=0.24062499999999998
2023-12-20 17:58:38,072 INFO [train.py:886] (2/4) Epoch 17, batch 0, loss[loss=0.04294, audio_tagging_loss=0.04294, over 24156.00 frames. ], tot_loss[loss=0.04294, audio_tagging_loss=0.04294, over 24156.00 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 32.0
2023-12-20 17:58:38,072 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:58:46,300 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.6665, 3.0104, 2.7217, 2.8777], device='cuda:2')
2023-12-20 17:58:59,163 INFO [train.py:917] (2/4) Epoch 17, validation: loss=0.04362, audio_tagging_loss=0.04362, over 3737520.00 frames. 
2023-12-20 17:58:59,164 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:59:00,765 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=22.74 vs. limit=9.58
2023-12-20 17:59:03,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=9.58
2023-12-20 17:59:12,953 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=9.605
2023-12-20 17:59:13,898 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=26.48 vs. limit=9.605
2023-12-20 17:59:16,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=5613.333333333333, ans=0.07
2023-12-20 17:59:17,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=5613.333333333333, ans=0.7035333333333333
2023-12-20 17:59:18,813 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.46 vs. limit=6.272
2023-12-20 17:59:32,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5746.666666666667, ans=0.042722222222222224
2023-12-20 17:59:37,551 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=26.72 vs. limit=11.809999999999999
2023-12-20 17:59:38,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=11.809999999999999
2023-12-20 17:59:43,436 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=18.49 vs. limit=9.68
2023-12-20 17:59:47,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=6.48 vs. limit=6.325333333333333
2023-12-20 17:59:49,926 INFO [train.py:886] (2/4) Epoch 17, batch 50, loss[loss=0.03614, audio_tagging_loss=0.03614, over 25000.00 frames. ], tot_loss[loss=0.03988, audio_tagging_loss=0.03988, over 1120588.04 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 32.0
2023-12-20 18:00:14,311 INFO [train.py:886] (2/4) Epoch 18, batch 0, loss[loss=0.03981, audio_tagging_loss=0.03981, over 24110.00 frames. ], tot_loss[loss=0.03981, audio_tagging_loss=0.03981, over 24110.00 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 32.0
2023-12-20 18:00:14,312 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:00:35,060 INFO [train.py:917] (2/4) Epoch 18, validation: loss=0.04342, audio_tagging_loss=0.04342, over 3737520.00 frames. 
2023-12-20 18:00:35,060 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:00:45,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5960.0, ans=0.2404
2023-12-20 18:00:47,260 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=9.735
2023-12-20 18:00:58,725 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.069e+01 3.667e+01 4.319e+01 5.687e+01 1.553e+02, threshold=8.639e+01, percent-clipped=3.0
2023-12-20 18:00:59,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=12.02
2023-12-20 18:01:01,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=6026.666666666667, ans=0.21750000000000003
2023-12-20 18:01:14,386 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.23 vs. limit=12.07
2023-12-20 18:01:18,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=6160.0, ans=0.21125
2023-12-20 18:01:25,357 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=9.835
2023-12-20 18:01:25,750 INFO [train.py:886] (2/4) Epoch 18, batch 50, loss[loss=0.03439, audio_tagging_loss=0.03439, over 25000.00 frames. ], tot_loss[loss=0.03833, audio_tagging_loss=0.03833, over 1127202.33 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 32.0
2023-12-20 18:01:50,821 INFO [train.py:886] (2/4) Epoch 19, batch 0, loss[loss=0.03398, audio_tagging_loss=0.03398, over 25000.00 frames. ], tot_loss[loss=0.03398, audio_tagging_loss=0.03398, over 25000.00 frames. ], batch size: 100, lr: 1.96e-02, grad_scale: 32.0
2023-12-20 18:01:50,821 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:02:11,829 INFO [train.py:917] (2/4) Epoch 19, validation: loss=0.04287, audio_tagging_loss=0.04287, over 3737520.00 frames. 
2023-12-20 18:02:11,830 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:02:12,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=6240.0, ans=0.20750000000000002
2023-12-20 18:02:22,842 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=12.23
2023-12-20 18:02:28,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=6306.666666666667, ans=0.20437499999999997
2023-12-20 18:02:43,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=9.915
2023-12-20 18:02:45,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=6440.0, ans=0.198125
2023-12-20 18:02:45,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=6440.0, ans=0.6746000000000001
2023-12-20 18:02:48,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=6440.0, ans=0.029875000000000002
2023-12-20 18:02:52,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.61 vs. limit=9.94
2023-12-20 18:02:55,708 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.61 vs. limit=9.94
2023-12-20 18:03:00,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=6506.666666666667, ans=0.195
2023-12-20 18:03:00,494 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=9.94
2023-12-20 18:03:02,036 INFO [train.py:886] (2/4) Epoch 19, batch 50, loss[loss=0.03557, audio_tagging_loss=0.03557, over 25000.00 frames. ], tot_loss[loss=0.0378, audio_tagging_loss=0.0378, over 1123929.37 frames. ], batch size: 100, lr: 1.96e-02, grad_scale: 32.0
2023-12-20 18:03:26,294 INFO [train.py:886] (2/4) Epoch 20, batch 0, loss[loss=0.03504, audio_tagging_loss=0.03504, over 25000.00 frames. ], tot_loss[loss=0.03504, audio_tagging_loss=0.03504, over 25000.00 frames. ], batch size: 100, lr: 1.91e-02, grad_scale: 32.0
2023-12-20 18:03:26,295 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:03:47,095 INFO [train.py:917] (2/4) Epoch 20, validation: loss=0.0429, audio_tagging_loss=0.0429, over 3737520.00 frames. 
2023-12-20 18:03:47,095 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:03:48,714 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=9.97
2023-12-20 18:03:58,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=6653.333333333333, ans=0.188125
2023-12-20 18:04:06,504 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.851e+01 3.799e+01 4.551e+01 5.624e+01 1.513e+02, threshold=9.102e+01, percent-clipped=5.0
2023-12-20 18:04:15,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.55 vs. limit=12.54
2023-12-20 18:04:28,964 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.81 vs. limit=6.713333333333333
2023-12-20 18:04:37,012 INFO [train.py:886] (2/4) Epoch 20, batch 50, loss[loss=0.03413, audio_tagging_loss=0.03413, over 25000.00 frames. ], tot_loss[loss=0.03747, audio_tagging_loss=0.03747, over 1118978.42 frames. ], batch size: 100, lr: 1.91e-02, grad_scale: 32.0
2023-12-20 18:04:59,859 INFO [train.py:886] (2/4) Epoch 21, batch 0, loss[loss=0.04612, audio_tagging_loss=0.04612, over 20094.00 frames. ], tot_loss[loss=0.04612, audio_tagging_loss=0.04612, over 20094.00 frames. ], batch size: 106, lr: 1.86e-02, grad_scale: 32.0
2023-12-20 18:04:59,860 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:05:20,818 INFO [train.py:917] (2/4) Epoch 21, validation: loss=0.0427, audio_tagging_loss=0.0427, over 3737520.00 frames. 
2023-12-20 18:05:20,819 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:05:32,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=7000.0, ans=0.655
2023-12-20 18:05:48,317 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=10.15
2023-12-20 18:05:48,428 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=12.8
2023-12-20 18:05:50,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=7066.666666666667, ans=0.17933333333333334
2023-12-20 18:05:54,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=7133.333333333333, ans=0.307
2023-12-20 18:05:54,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=7133.333333333333, ans=0.0
2023-12-20 18:06:03,488 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.507e+01
2023-12-20 18:06:10,789 INFO [train.py:886] (2/4) Epoch 21, batch 50, loss[loss=0.03124, audio_tagging_loss=0.03124, over 25000.00 frames. ], tot_loss[loss=0.03702, audio_tagging_loss=0.03702, over 1110930.85 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 32.0
2023-12-20 18:06:29,315 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=12.96
2023-12-20 18:06:34,956 INFO [train.py:886] (2/4) Epoch 22, batch 0, loss[loss=0.03299, audio_tagging_loss=0.03299, over 25000.00 frames. ], tot_loss[loss=0.03299, audio_tagging_loss=0.03299, over 25000.00 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 32.0
2023-12-20 18:06:34,957 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:06:49,008 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.5037, 1.4958, 1.3105, 1.2949], device='cuda:2')
2023-12-20 18:06:55,944 INFO [train.py:917] (2/4) Epoch 22, validation: loss=0.04259, audio_tagging_loss=0.04259, over 3737520.00 frames. 
2023-12-20 18:06:55,945 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:06:59,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=10.23
2023-12-20 18:07:04,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=7280.0, ans=10.23
2023-12-20 18:07:06,514 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=10.254999999999999
2023-12-20 18:07:10,812 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.833e+01 3.757e+01 4.513e+01 5.428e+01 2.125e+02, threshold=9.026e+01, percent-clipped=5.0
2023-12-20 18:07:12,287 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=10.254999999999999
2023-12-20 18:07:16,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=7413.333333333333, ans=0.15250000000000002
2023-12-20 18:07:21,835 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=10.28
2023-12-20 18:07:27,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=7480.0, ans=0.009243478260869565
2023-12-20 18:07:28,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=7480.0, ans=0.14937499999999998
2023-12-20 18:07:39,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=7546.666666666667, ans=0.035222222222222224
2023-12-20 18:07:44,531 INFO [train.py:886] (2/4) Epoch 22, batch 50, loss[loss=0.03313, audio_tagging_loss=0.03313, over 25000.00 frames. ], tot_loss[loss=0.03545, audio_tagging_loss=0.03545, over 1119124.88 frames. ], batch size: 100, lr: 1.81e-02, grad_scale: 32.0
2023-12-20 18:08:02,962 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.18 vs. limit=13.219999999999999
2023-12-20 18:08:08,666 INFO [train.py:886] (2/4) Epoch 23, batch 0, loss[loss=0.04406, audio_tagging_loss=0.04406, over 21057.00 frames. ], tot_loss[loss=0.04406, audio_tagging_loss=0.04406, over 21057.00 frames. ], batch size: 106, lr: 1.77e-02, grad_scale: 32.0
2023-12-20 18:08:08,667 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:08:30,057 INFO [train.py:917] (2/4) Epoch 23, validation: loss=0.04291, audio_tagging_loss=0.04291, over 3737520.00 frames. 
2023-12-20 18:08:30,058 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:08:30,520 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=10.36
2023-12-20 18:08:31,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=7626.666666666667, ans=0.14250000000000002
2023-12-20 18:08:31,493 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=10.36
2023-12-20 18:08:34,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=10.36
2023-12-20 18:08:45,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=7693.333333333333, ans=0.13937500000000003
2023-12-20 18:08:46,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=10.385
2023-12-20 18:08:56,472 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=10.41
2023-12-20 18:09:17,955 INFO [train.py:886] (2/4) Epoch 23, batch 50, loss[loss=0.03305, audio_tagging_loss=0.03305, over 25000.00 frames. ], tot_loss[loss=0.03516, audio_tagging_loss=0.03516, over 1116803.91 frames. ], batch size: 100, lr: 1.77e-02, grad_scale: 32.0
2023-12-20 18:09:40,310 INFO [train.py:886] (2/4) Epoch 24, batch 0, loss[loss=0.04133, audio_tagging_loss=0.04133, over 21728.00 frames. ], tot_loss[loss=0.04133, audio_tagging_loss=0.04133, over 21728.00 frames. ], batch size: 106, lr: 1.73e-02, grad_scale: 32.0
2023-12-20 18:09:40,310 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:10:01,276 INFO [train.py:917] (2/4) Epoch 24, validation: loss=0.04248, audio_tagging_loss=0.04248, over 3737520.00 frames. 
2023-12-20 18:10:01,277 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:10:06,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=7973.333333333333, ans=0.17026666666666668
2023-12-20 18:10:12,545 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.742e+01 3.651e+01 4.128e+01 4.777e+01 1.617e+02, threshold=8.255e+01, percent-clipped=1.0
2023-12-20 18:10:14,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=8040.0, ans=0.125
2023-12-20 18:10:16,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=8040.0, ans=10.515
2023-12-20 18:10:19,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=8040.0, ans=0.125
2023-12-20 18:10:38,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8173.333333333333, ans=0.21826666666666666
2023-12-20 18:10:43,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8240.0, ans=0.21760000000000002
2023-12-20 18:10:49,639 INFO [train.py:886] (2/4) Epoch 24, batch 50, loss[loss=0.03189, audio_tagging_loss=0.03189, over 25000.00 frames. ], tot_loss[loss=0.03405, audio_tagging_loss=0.03405, over 1119075.99 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 32.0
2023-12-20 18:11:13,612 INFO [train.py:886] (2/4) Epoch 25, batch 0, loss[loss=0.0338, audio_tagging_loss=0.0338, over 25000.00 frames. ], tot_loss[loss=0.0338, audio_tagging_loss=0.0338, over 25000.00 frames. ], batch size: 100, lr: 1.70e-02, grad_scale: 32.0
2023-12-20 18:11:13,613 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:11:34,705 INFO [train.py:917] (2/4) Epoch 25, validation: loss=0.04257, audio_tagging_loss=0.04257, over 3737520.00 frames. 
2023-12-20 18:11:34,705 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:11:37,640 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=10.620000000000001
2023-12-20 18:11:47,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=10.645
2023-12-20 18:11:55,445 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=10.67
2023-12-20 18:12:00,768 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=6.559e+00
2023-12-20 18:12:04,796 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.31 vs. limit=7.4079999999999995
2023-12-20 18:12:06,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=8520.0, ans=0.6018
2023-12-20 18:12:10,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8520.0, ans=0.2148
2023-12-20 18:12:10,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=10.695
2023-12-20 18:12:11,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=13.89
2023-12-20 18:12:22,266 INFO [train.py:886] (2/4) Epoch 25, batch 50, loss[loss=0.03228, audio_tagging_loss=0.03228, over 25000.00 frames. ], tot_loss[loss=0.03326, audio_tagging_loss=0.03326, over 1123209.09 frames. ], batch size: 100, lr: 1.70e-02, grad_scale: 32.0
2023-12-20 18:12:45,044 INFO [train.py:886] (2/4) Epoch 26, batch 0, loss[loss=0.04138, audio_tagging_loss=0.04138, over 20177.00 frames. ], tot_loss[loss=0.04138, audio_tagging_loss=0.04138, over 20177.00 frames. ], batch size: 106, lr: 1.66e-02, grad_scale: 32.0
2023-12-20 18:12:45,044 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:13:05,884 INFO [train.py:917] (2/4) Epoch 26, validation: loss=0.04241, audio_tagging_loss=0.04241, over 3737520.00 frames. 
2023-12-20 18:13:05,885 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:13:11,871 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=22.46 vs. limit=10.75
2023-12-20 18:13:12,404 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.047e+01 3.673e+01 4.044e+01 4.675e+01 8.607e+01, threshold=8.088e+01, percent-clipped=1.0
2023-12-20 18:13:12,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=8666.666666666666, ans=0.5966666666666667
2023-12-20 18:13:15,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8733.333333333334, ans=0.21266666666666667
2023-12-20 18:13:16,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8733.333333333334, ans=0.21266666666666667
2023-12-20 18:13:16,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=8733.333333333334, ans=10.775
2023-12-20 18:13:20,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=8733.333333333334, ans=0.5943333333333334
2023-12-20 18:13:23,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8800.0, ans=0.212
2023-12-20 18:13:31,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=10.8
2023-12-20 18:13:41,604 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=14.15
2023-12-20 18:13:44,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=8933.333333333334, ans=0.125
2023-12-20 18:13:47,740 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=27.82 vs. limit=10.85
2023-12-20 18:13:52,997 INFO [train.py:886] (2/4) Epoch 26, batch 50, loss[loss=0.02956, audio_tagging_loss=0.02956, over 25000.00 frames. ], tot_loss[loss=0.03224, audio_tagging_loss=0.03224, over 1119526.98 frames. ], batch size: 100, lr: 1.66e-02, grad_scale: 32.0
2023-12-20 18:14:18,299 INFO [train.py:886] (2/4) Epoch 27, batch 0, loss[loss=0.0303, audio_tagging_loss=0.0303, over 25000.00 frames. ], tot_loss[loss=0.0303, audio_tagging_loss=0.0303, over 25000.00 frames. ], batch size: 100, lr: 1.63e-02, grad_scale: 32.0
2023-12-20 18:14:18,300 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:14:31,139 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.3577, 1.7678, 2.1146, 2.2565], device='cuda:2')
2023-12-20 18:14:37,636 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.5915, 3.0487, 2.5976, 2.5353], device='cuda:2')
2023-12-20 18:14:39,325 INFO [train.py:917] (2/4) Epoch 27, validation: loss=0.04294, audio_tagging_loss=0.04294, over 3737520.00 frames. 
2023-12-20 18:14:39,326 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:14:44,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=27.24 vs. limit=10.879999999999999
2023-12-20 18:14:46,294 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=10.879999999999999
2023-12-20 18:14:49,979 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.70 vs. limit=14.309999999999999
2023-12-20 18:14:51,858 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=31.65 vs. limit=10.905
2023-12-20 18:15:09,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=9213.333333333334, ans=0.5775333333333333
2023-12-20 18:15:26,761 INFO [train.py:886] (2/4) Epoch 27, batch 50, loss[loss=0.02842, audio_tagging_loss=0.02842, over 25000.00 frames. ], tot_loss[loss=0.03164, audio_tagging_loss=0.03164, over 1117606.62 frames. ], batch size: 100, lr: 1.63e-02, grad_scale: 32.0
2023-12-20 18:15:48,260 INFO [train.py:886] (2/4) Epoch 28, batch 0, loss[loss=0.0301, audio_tagging_loss=0.0301, over 25000.00 frames. ], tot_loss[loss=0.0301, audio_tagging_loss=0.0301, over 25000.00 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0
2023-12-20 18:15:48,261 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:16:09,705 INFO [train.py:917] (2/4) Epoch 28, validation: loss=0.04282, audio_tagging_loss=0.04282, over 3737520.00 frames. 
2023-12-20 18:16:09,705 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:16:12,511 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.131e+01 3.970e+01 4.630e+01 5.343e+01 9.281e+01, threshold=9.260e+01, percent-clipped=1.0
2023-12-20 18:16:24,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9426.666666666666, ans=0.20573333333333332
2023-12-20 18:16:32,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=9493.333333333334, ans=0.027111111111111114
2023-12-20 18:16:32,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=9493.333333333334, ans=0.125
2023-12-20 18:16:38,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=9560.0, ans=0.008791304347826087
2023-12-20 18:16:54,790 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.45 vs. limit=7.406666666666666
2023-12-20 18:16:56,950 INFO [train.py:886] (2/4) Epoch 28, batch 50, loss[loss=0.02563, audio_tagging_loss=0.02563, over 25000.00 frames. ], tot_loss[loss=0.03101, audio_tagging_loss=0.03101, over 1120311.90 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0
2023-12-20 18:17:19,793 INFO [train.py:886] (2/4) Epoch 29, batch 0, loss[loss=0.03977, audio_tagging_loss=0.03977, over 20634.00 frames. ], tot_loss[loss=0.03977, audio_tagging_loss=0.03977, over 20634.00 frames. ], batch size: 106, lr: 1.57e-02, grad_scale: 32.0
2023-12-20 18:17:19,793 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:17:30,680 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.5305, 2.3478, 2.4033, 2.3839], device='cuda:2')
2023-12-20 18:17:31,974 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8318, 1.6548, 1.3662, 1.6184, 1.7442, 1.6820, 1.5322, 1.6898],
       device='cuda:2')
2023-12-20 18:17:40,753 INFO [train.py:917] (2/4) Epoch 29, validation: loss=0.04276, audio_tagging_loss=0.04276, over 3737520.00 frames. 
2023-12-20 18:17:40,754 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:17:42,826 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=11.14
2023-12-20 18:18:06,856 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=11.19
2023-12-20 18:18:07,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=9840.0, ans=0.5556000000000001
2023-12-20 18:18:11,856 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.93 vs. limit=14.93
2023-12-20 18:18:29,134 INFO [train.py:886] (2/4) Epoch 29, batch 50, loss[loss=0.02734, audio_tagging_loss=0.02734, over 25000.00 frames. ], tot_loss[loss=0.02999, audio_tagging_loss=0.02999, over 1118289.13 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0
2023-12-20 18:18:29,998 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.044e+01 4.177e+01 4.600e+01 5.564e+01 7.757e+01, threshold=9.200e+01, percent-clipped=0.0
2023-12-20 18:18:46,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=10053.333333333334, ans=0.09899494936611666
2023-12-20 18:18:51,720 INFO [train.py:886] (2/4) Epoch 30, batch 0, loss[loss=0.03279, audio_tagging_loss=0.03279, over 24101.00 frames. ], tot_loss[loss=0.03279, audio_tagging_loss=0.03279, over 24101.00 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 32.0
2023-12-20 18:18:51,721 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:18:59,713 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8432, 2.2270, 2.3182, 2.7913], device='cuda:2')
2023-12-20 18:19:01,818 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7919, 2.4071, 2.4041, 2.8809], device='cuda:2')
2023-12-20 18:19:12,593 INFO [train.py:917] (2/4) Epoch 30, validation: loss=0.04346, audio_tagging_loss=0.04346, over 3737520.00 frames. 
2023-12-20 18:19:12,593 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:19:28,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=10120.0, ans=0.125
2023-12-20 18:19:29,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=10120.0, ans=0.125
2023-12-20 18:19:31,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=10186.666666666666, ans=0.125
2023-12-20 18:19:40,991 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=11.345
2023-12-20 18:19:54,857 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.89 vs. limit=10.16
2023-12-20 18:19:57,519 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=15.24
2023-12-20 18:19:59,939 INFO [train.py:886] (2/4) Epoch 30, batch 50, loss[loss=0.02758, audio_tagging_loss=0.02758, over 25000.00 frames. ], tot_loss[loss=0.02922, audio_tagging_loss=0.02922, over 1121142.49 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 32.0
2023-12-20 18:20:22,405 INFO [train.py:886] (2/4) Epoch 31, batch 0, loss[loss=0.02425, audio_tagging_loss=0.02425, over 25000.00 frames. ], tot_loss[loss=0.02425, audio_tagging_loss=0.02425, over 25000.00 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 32.0
2023-12-20 18:20:22,406 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:20:32,842 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.1955, 1.7944, 1.6471, 1.8199, 1.8954, 1.8125, 1.6294, 1.8020],
       device='cuda:2')
2023-12-20 18:20:33,457 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.4324, 2.3833, 2.5484, 2.2773], device='cuda:2')
2023-12-20 18:20:43,502 INFO [train.py:917] (2/4) Epoch 31, validation: loss=0.04363, audio_tagging_loss=0.04363, over 3737520.00 frames. 
2023-12-20 18:20:43,503 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:20:50,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=10400.0, ans=0.125
2023-12-20 18:21:01,029 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=11.425
2023-12-20 18:21:14,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10600.0, ans=0.194
2023-12-20 18:21:17,286 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=11.475
2023-12-20 18:21:19,805 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.177e-01
2023-12-20 18:21:20,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=10600.0, ans=0.125
2023-12-20 18:21:29,122 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.385e+01 4.278e+01 4.904e+01 5.799e+01 1.168e+02, threshold=9.808e+01, percent-clipped=2.0
2023-12-20 18:21:30,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=10666.666666666666, ans=10.0
2023-12-20 18:21:31,839 INFO [train.py:886] (2/4) Epoch 31, batch 50, loss[loss=0.02422, audio_tagging_loss=0.02422, over 25000.00 frames. ], tot_loss[loss=0.02842, audio_tagging_loss=0.02842, over 1117091.33 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 32.0
2023-12-20 18:21:32,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=10733.333333333334, ans=0.5243333333333333
2023-12-20 18:21:54,500 INFO [train.py:886] (2/4) Epoch 32, batch 0, loss[loss=0.03717, audio_tagging_loss=0.03717, over 21316.00 frames. ], tot_loss[loss=0.03717, audio_tagging_loss=0.03717, over 21316.00 frames. ], batch size: 106, lr: 1.49e-02, grad_scale: 32.0
2023-12-20 18:21:54,500 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:22:10,447 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.5819, 2.4984, 2.6485, 2.4092], device='cuda:2')
2023-12-20 18:22:15,977 INFO [train.py:917] (2/4) Epoch 32, validation: loss=0.04494, audio_tagging_loss=0.04494, over 3737520.00 frames. 
2023-12-20 18:22:15,977 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:22:16,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=10746.666666666666, ans=0.125
2023-12-20 18:22:20,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=15.56
2023-12-20 18:22:20,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=10746.666666666666, ans=0.021888888888888892
2023-12-20 18:22:24,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=10813.333333333334, ans=0.125
2023-12-20 18:22:25,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=10813.333333333334, ans=0.125
2023-12-20 18:22:37,831 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=11.58
2023-12-20 18:22:38,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=10880.0, ans=0.0
2023-12-20 18:22:40,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10880.0, ans=0.19119999999999998
2023-12-20 18:22:46,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=10946.666666666666, ans=0.125
2023-12-20 18:22:56,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=11013.333333333334, ans=0.125
2023-12-20 18:22:58,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=11013.333333333334, ans=0.020777777777777773
2023-12-20 18:23:02,816 INFO [train.py:886] (2/4) Epoch 32, batch 50, loss[loss=0.02819, audio_tagging_loss=0.02819, over 25000.00 frames. ], tot_loss[loss=0.02718, audio_tagging_loss=0.02718, over 1121418.55 frames. ], batch size: 100, lr: 1.49e-02, grad_scale: 32.0
2023-12-20 18:23:23,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11093.333333333334, ans=0.18906666666666666
2023-12-20 18:23:25,183 INFO [train.py:886] (2/4) Epoch 33, batch 0, loss[loss=0.03206, audio_tagging_loss=0.03206, over 21614.00 frames. ], tot_loss[loss=0.03206, audio_tagging_loss=0.03206, over 21614.00 frames. ], batch size: 106, lr: 1.47e-02, grad_scale: 32.0
2023-12-20 18:23:25,184 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:23:46,120 INFO [train.py:917] (2/4) Epoch 33, validation: loss=0.0459, audio_tagging_loss=0.0459, over 3737520.00 frames. 
2023-12-20 18:23:46,121 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:23:49,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=11093.333333333334, ans=0.125
2023-12-20 18:23:59,793 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=11.684999999999999
2023-12-20 18:24:20,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=11293.333333333334, ans=0.125
2023-12-20 18:24:25,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11360.0, ans=0.18639999999999998
2023-12-20 18:24:25,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=11360.0, ans=0.019333333333333338
2023-12-20 18:24:26,653 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.339e+01 4.449e+01 5.027e+01 5.967e+01 1.050e+02, threshold=1.005e+02, percent-clipped=1.0
2023-12-20 18:24:32,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=11426.666666666666, ans=0.125
2023-12-20 18:24:33,022 INFO [train.py:886] (2/4) Epoch 33, batch 50, loss[loss=0.02471, audio_tagging_loss=0.02471, over 25000.00 frames. ], tot_loss[loss=0.02623, audio_tagging_loss=0.02623, over 1116458.53 frames. ], batch size: 100, lr: 1.47e-02, grad_scale: 32.0
2023-12-20 18:24:54,841 INFO [train.py:886] (2/4) Epoch 34, batch 0, loss[loss=0.02526, audio_tagging_loss=0.02526, over 25000.00 frames. ], tot_loss[loss=0.02526, audio_tagging_loss=0.02526, over 25000.00 frames. ], batch size: 100, lr: 1.44e-02, grad_scale: 32.0
2023-12-20 18:24:54,841 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:25:16,061 INFO [train.py:917] (2/4) Epoch 34, validation: loss=0.0463, audio_tagging_loss=0.0463, over 3737520.00 frames. 
2023-12-20 18:25:16,062 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:25:29,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=11506.666666666666, ans=0.018722222222222223
2023-12-20 18:25:31,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11506.666666666666, ans=0.18493333333333334
2023-12-20 18:25:51,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=11640.0, ans=0.00833913043478261
2023-12-20 18:25:56,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.25 vs. limit=11.89
2023-12-20 18:26:02,681 INFO [train.py:886] (2/4) Epoch 34, batch 50, loss[loss=0.02265, audio_tagging_loss=0.02265, over 25000.00 frames. ], tot_loss[loss=0.02531, audio_tagging_loss=0.02531, over 1120059.27 frames. ], batch size: 100, lr: 1.44e-02, grad_scale: 32.0
2023-12-20 18:26:24,394 INFO [train.py:886] (2/4) Epoch 35, batch 0, loss[loss=0.02218, audio_tagging_loss=0.02218, over 25000.00 frames. ], tot_loss[loss=0.02218, audio_tagging_loss=0.02218, over 25000.00 frames. ], batch size: 100, lr: 1.42e-02, grad_scale: 32.0
2023-12-20 18:26:24,395 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:26:43,536 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0184, 2.5182, 2.4109, 2.8270], device='cuda:2')
2023-12-20 18:26:45,180 INFO [train.py:917] (2/4) Epoch 35, validation: loss=0.04736, audio_tagging_loss=0.04736, over 3737520.00 frames. 
2023-12-20 18:26:45,181 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:27:13,873 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.961e-01
2023-12-20 18:27:20,383 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=11.995000000000001
2023-12-20 18:27:23,466 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.764e+01 4.533e+01 5.198e+01 5.955e+01 1.043e+02, threshold=1.040e+02, percent-clipped=1.0
2023-12-20 18:27:23,861 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=12.02
2023-12-20 18:27:32,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=12120.0, ans=0.125
2023-12-20 18:27:33,754 INFO [train.py:886] (2/4) Epoch 35, batch 50, loss[loss=0.02357, audio_tagging_loss=0.02357, over 25000.00 frames. ], tot_loss[loss=0.02389, audio_tagging_loss=0.02389, over 1124135.09 frames. ], batch size: 100, lr: 1.42e-02, grad_scale: 32.0
2023-12-20 18:27:55,038 INFO [train.py:886] (2/4) Epoch 36, batch 0, loss[loss=0.03018, audio_tagging_loss=0.03018, over 20513.00 frames. ], tot_loss[loss=0.03018, audio_tagging_loss=0.03018, over 20513.00 frames. ], batch size: 106, lr: 1.40e-02, grad_scale: 32.0
2023-12-20 18:27:55,038 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:28:11,987 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0909, 4.4174, 4.4397, 4.3308], device='cuda:2')
2023-12-20 18:28:16,073 INFO [train.py:917] (2/4) Epoch 36, validation: loss=0.04841, audio_tagging_loss=0.04841, over 3737520.00 frames. 
2023-12-20 18:28:16,073 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:28:16,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.58 vs. limit=8.033333333333333
2023-12-20 18:28:19,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=12133.333333333334, ans=0.008231884057971015
2023-12-20 18:28:41,777 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.00 vs. limit=8.066666666666666
2023-12-20 18:29:03,192 INFO [train.py:886] (2/4) Epoch 36, batch 50, loss[loss=0.02135, audio_tagging_loss=0.02135, over 25000.00 frames. ], tot_loss[loss=0.02365, audio_tagging_loss=0.02365, over 1119716.98 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0
2023-12-20 18:29:24,449 INFO [train.py:886] (2/4) Epoch 37, batch 0, loss[loss=0.02786, audio_tagging_loss=0.02786, over 20498.00 frames. ], tot_loss[loss=0.02786, audio_tagging_loss=0.02786, over 20498.00 frames. ], batch size: 106, lr: 1.38e-02, grad_scale: 32.0
2023-12-20 18:29:24,450 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:29:34,311 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0687, 4.7948, 4.6310, 4.4593], device='cuda:2')
2023-12-20 18:29:45,679 INFO [train.py:917] (2/4) Epoch 37, validation: loss=0.04928, audio_tagging_loss=0.04928, over 3737520.00 frames. 
2023-12-20 18:29:45,680 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:29:46,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=12480.0, ans=0.4632
2023-12-20 18:29:49,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=12480.0, ans=0.125
2023-12-20 18:29:58,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=12546.666666666666, ans=0.125
2023-12-20 18:30:00,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=12546.666666666666, ans=0.008142028985507246
2023-12-20 18:30:02,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=12546.666666666666, ans=0.17453333333333335
2023-12-20 18:30:12,054 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=22.46 vs. limit=12.23
2023-12-20 18:30:19,005 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.554e+01 4.732e+01 5.545e+01 6.466e+01 1.044e+02, threshold=1.109e+02, percent-clipped=1.0
2023-12-20 18:30:26,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=12746.666666666666, ans=0.125
2023-12-20 18:30:32,823 INFO [train.py:886] (2/4) Epoch 37, batch 50, loss[loss=0.0178, audio_tagging_loss=0.0178, over 25000.00 frames. ], tot_loss[loss=0.02257, audio_tagging_loss=0.02257, over 1116382.77 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0
2023-12-20 18:30:55,794 INFO [train.py:886] (2/4) Epoch 38, batch 0, loss[loss=0.02753, audio_tagging_loss=0.02753, over 21206.00 frames. ], tot_loss[loss=0.02753, audio_tagging_loss=0.02753, over 21206.00 frames. ], batch size: 106, lr: 1.36e-02, grad_scale: 32.0
2023-12-20 18:30:55,794 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:31:16,994 INFO [train.py:917] (2/4) Epoch 38, validation: loss=0.04916, audio_tagging_loss=0.04916, over 3737520.00 frames. 
2023-12-20 18:31:16,994 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:31:20,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=12826.666666666666, ans=0.125
2023-12-20 18:31:21,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=12826.666666666666, ans=0.125
2023-12-20 18:31:25,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=12826.666666666666, ans=0.125
2023-12-20 18:31:35,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=12960.0, ans=0.125
2023-12-20 18:31:39,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=12960.0, ans=0.09899494936611666
2023-12-20 18:31:41,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=12960.0, ans=0.04949747468305833
2023-12-20 18:32:04,832 INFO [train.py:886] (2/4) Epoch 38, batch 50, loss[loss=0.01835, audio_tagging_loss=0.01835, over 25000.00 frames. ], tot_loss[loss=0.02232, audio_tagging_loss=0.02232, over 1109728.35 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 32.0
2023-12-20 18:32:26,418 INFO [train.py:886] (2/4) Epoch 39, batch 0, loss[loss=0.02121, audio_tagging_loss=0.02121, over 24125.00 frames. ], tot_loss[loss=0.02121, audio_tagging_loss=0.02121, over 24125.00 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 32.0
2023-12-20 18:32:26,419 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:32:46,472 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.6173, 1.9408, 1.7376, 2.2254, 2.0347, 2.0442, 1.8605, 2.0572],
       device='cuda:2')
2023-12-20 18:32:47,549 INFO [train.py:917] (2/4) Epoch 39, validation: loss=0.05058, audio_tagging_loss=0.05058, over 3737520.00 frames. 
2023-12-20 18:32:47,550 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:33:00,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=13240.0, ans=8.31
2023-12-20 18:33:01,723 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.85 vs. limit=9.296
2023-12-20 18:33:17,347 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.971e+01 5.139e+01 5.911e+01 6.986e+01 1.449e+02, threshold=1.182e+02, percent-clipped=3.0
2023-12-20 18:33:17,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=13373.333333333334, ans=0.125
2023-12-20 18:33:20,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=13373.333333333334, ans=0.010944444444444444
2023-12-20 18:33:22,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=13373.333333333334, ans=0.125
2023-12-20 18:33:29,650 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.23 vs. limit=12.54
2023-12-20 18:33:35,368 INFO [train.py:886] (2/4) Epoch 39, batch 50, loss[loss=0.0191, audio_tagging_loss=0.0191, over 25000.00 frames. ], tot_loss[loss=0.02097, audio_tagging_loss=0.02097, over 1117713.56 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 32.0
2023-12-20 18:33:57,923 INFO [train.py:886] (2/4) Epoch 40, batch 0, loss[loss=0.02404, audio_tagging_loss=0.02404, over 24092.00 frames. ], tot_loss[loss=0.02404, audio_tagging_loss=0.02404, over 24092.00 frames. ], batch size: 100, lr: 1.32e-02, grad_scale: 32.0
2023-12-20 18:33:57,924 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:34:19,043 INFO [train.py:917] (2/4) Epoch 40, validation: loss=0.05208, audio_tagging_loss=0.05208, over 3737520.00 frames. 
2023-12-20 18:34:19,043 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:34:21,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=13520.0, ans=0.1648
2023-12-20 18:34:23,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.35 vs. limit=12.57
2023-12-20 18:34:41,804 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.59 vs. limit=12.620000000000001
2023-12-20 18:34:58,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=13786.666666666666, ans=0.4174666666666667
2023-12-20 18:35:06,527 INFO [train.py:886] (2/4) Epoch 40, batch 50, loss[loss=0.01745, audio_tagging_loss=0.01745, over 25000.00 frames. ], tot_loss[loss=0.01985, audio_tagging_loss=0.01985, over 1124183.14 frames. ], batch size: 100, lr: 1.32e-02, grad_scale: 32.0
2023-12-20 18:35:29,528 INFO [train.py:886] (2/4) Epoch 41, batch 0, loss[loss=0.02016, audio_tagging_loss=0.02016, over 25000.00 frames. ], tot_loss[loss=0.02016, audio_tagging_loss=0.02016, over 25000.00 frames. ], batch size: 100, lr: 1.30e-02, grad_scale: 32.0
2023-12-20 18:35:29,528 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:35:48,061 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.3372, 3.0105, 3.3948, 3.0879], device='cuda:2')
2023-12-20 18:35:50,405 INFO [train.py:917] (2/4) Epoch 41, validation: loss=0.05259, audio_tagging_loss=0.05259, over 3737520.00 frames. 
2023-12-20 18:35:50,406 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:36:02,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=13933.333333333334, ans=0.008611111111111104
2023-12-20 18:36:04,368 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=12.725
2023-12-20 18:36:16,659 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.775e+01 5.160e+01 5.694e+01 6.780e+01 1.124e+02, threshold=1.139e+02, percent-clipped=0.0
2023-12-20 18:36:23,512 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=12.775
2023-12-20 18:36:31,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=14133.333333333334, ans=0.15866666666666665
2023-12-20 18:36:33,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=14133.333333333334, ans=0.125
2023-12-20 18:36:37,896 INFO [train.py:886] (2/4) Epoch 41, batch 50, loss[loss=0.01783, audio_tagging_loss=0.01783, over 25000.00 frames. ], tot_loss[loss=0.01954, audio_tagging_loss=0.01954, over 1118676.14 frames. ], batch size: 100, lr: 1.30e-02, grad_scale: 32.0
2023-12-20 18:37:00,638 INFO [train.py:886] (2/4) Epoch 42, batch 0, loss[loss=0.02661, audio_tagging_loss=0.02661, over 19785.00 frames. ], tot_loss[loss=0.02661, audio_tagging_loss=0.02661, over 19785.00 frames. ], batch size: 106, lr: 1.29e-02, grad_scale: 32.0
2023-12-20 18:37:00,639 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:37:21,716 INFO [train.py:917] (2/4) Epoch 42, validation: loss=0.0541, audio_tagging_loss=0.0541, over 3737520.00 frames. 
2023-12-20 18:37:21,717 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:37:27,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=14213.333333333334, ans=0.007779710144927536
2023-12-20 18:37:34,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=14280.0, ans=0.125
2023-12-20 18:37:37,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=12.855
2023-12-20 18:38:07,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.59 vs. limit=5.172000000000001
2023-12-20 18:38:09,808 INFO [train.py:886] (2/4) Epoch 42, batch 50, loss[loss=0.0145, audio_tagging_loss=0.0145, over 25000.00 frames. ], tot_loss[loss=0.01847, audio_tagging_loss=0.01847, over 1113558.36 frames. ], batch size: 100, lr: 1.29e-02, grad_scale: 32.0
2023-12-20 18:38:32,308 INFO [train.py:886] (2/4) Epoch 43, batch 0, loss[loss=0.02318, audio_tagging_loss=0.02318, over 21325.00 frames. ], tot_loss[loss=0.02318, audio_tagging_loss=0.02318, over 21325.00 frames. ], batch size: 106, lr: 1.27e-02, grad_scale: 32.0
2023-12-20 18:38:32,309 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:38:40,419 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4399, 2.6202, 2.8722, 2.6384], device='cuda:2')
2023-12-20 18:38:53,027 INFO [train.py:917] (2/4) Epoch 43, validation: loss=0.05602, audio_tagging_loss=0.05602, over 3737520.00 frames. 
2023-12-20 18:38:53,027 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:39:03,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=14626.666666666666, ans=0.38806666666666667
2023-12-20 18:39:16,028 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 4.316e+01 5.471e+01 6.063e+01 6.688e+01 1.130e+02, threshold=1.213e+02, percent-clipped=0.0
2023-12-20 18:39:19,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=14693.333333333334, ans=0.125
2023-12-20 18:39:29,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=14760.0, ans=0.007660869565217391
2023-12-20 18:39:31,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=14826.666666666666, ans=0.125
2023-12-20 18:39:38,292 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=13.059999999999999
2023-12-20 18:39:41,479 INFO [train.py:886] (2/4) Epoch 43, batch 50, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.01771, audio_tagging_loss=0.01771, over 1120921.25 frames. ], batch size: 100, lr: 1.27e-02, grad_scale: 32.0
2023-12-20 18:40:04,353 INFO [train.py:886] (2/4) Epoch 44, batch 0, loss[loss=0.01558, audio_tagging_loss=0.01558, over 25000.00 frames. ], tot_loss[loss=0.01558, audio_tagging_loss=0.01558, over 25000.00 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0
2023-12-20 18:40:04,354 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:40:25,321 INFO [train.py:917] (2/4) Epoch 44, validation: loss=0.05682, audio_tagging_loss=0.05682, over 3737520.00 frames. 
2023-12-20 18:40:25,322 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:40:40,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=14973.333333333334, ans=0.15026666666666666
2023-12-20 18:40:52,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=15040.0, ans=0.125
2023-12-20 18:40:53,888 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=5.266
2023-12-20 18:40:57,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=15106.666666666666, ans=0.125
2023-12-20 18:41:12,867 INFO [train.py:886] (2/4) Epoch 44, batch 50, loss[loss=0.0152, audio_tagging_loss=0.0152, over 25000.00 frames. ], tot_loss[loss=0.01714, audio_tagging_loss=0.01714, over 1121136.66 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0
2023-12-20 18:41:35,895 INFO [train.py:886] (2/4) Epoch 45, batch 0, loss[loss=0.01509, audio_tagging_loss=0.01509, over 25000.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 25000.00 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0
2023-12-20 18:41:35,896 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:41:56,899 INFO [train.py:917] (2/4) Epoch 45, validation: loss=0.05811, audio_tagging_loss=0.05811, over 3737520.00 frames. 
2023-12-20 18:41:56,900 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:42:15,037 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=5.298
2023-12-20 18:42:15,214 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.876e+01 5.082e+01 5.625e+01 6.615e+01 1.122e+02, threshold=1.125e+02, percent-clipped=0.0
2023-12-20 18:42:25,650 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=13.295
2023-12-20 18:42:26,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=15453.333333333334, ans=0.125
2023-12-20 18:42:30,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=15453.333333333334, ans=0.14546666666666666
2023-12-20 18:42:35,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=15520.0, ans=0.007495652173913044
2023-12-20 18:42:38,420 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.23 vs. limit=13.32
2023-12-20 18:42:44,425 INFO [train.py:886] (2/4) Epoch 45, batch 50, loss[loss=0.0147, audio_tagging_loss=0.0147, over 25000.00 frames. ], tot_loss[loss=0.01575, audio_tagging_loss=0.01575, over 1124023.84 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 64.0
2023-12-20 18:43:02,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=15600.0, ans=0.0
2023-12-20 18:43:02,321 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=13.35
2023-12-20 18:43:06,816 INFO [train.py:886] (2/4) Epoch 46, batch 0, loss[loss=0.01788, audio_tagging_loss=0.01788, over 24133.00 frames. ], tot_loss[loss=0.01788, audio_tagging_loss=0.01788, over 24133.00 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0
2023-12-20 18:43:06,816 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:43:18,177 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5517, 2.9490, 2.9990, 2.8908], device='cuda:2')
2023-12-20 18:43:27,876 INFO [train.py:917] (2/4) Epoch 46, validation: loss=0.05956, audio_tagging_loss=0.05956, over 3737520.00 frames. 
2023-12-20 18:43:27,876 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:43:37,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=15666.666666666666, ans=10.0
2023-12-20 18:43:38,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=15666.666666666666, ans=0.125
2023-12-20 18:43:40,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=15666.666666666666, ans=0.3516666666666667
2023-12-20 18:43:49,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=15733.333333333334, ans=0.0
2023-12-20 18:43:53,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=15733.333333333334, ans=0.007449275362318841
2023-12-20 18:43:58,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=15800.0, ans=0.0
2023-12-20 18:44:15,173 INFO [train.py:886] (2/4) Epoch 46, batch 50, loss[loss=0.01339, audio_tagging_loss=0.01339, over 25000.00 frames. ], tot_loss[loss=0.01494, audio_tagging_loss=0.01494, over 1123908.16 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0
2023-12-20 18:44:38,147 INFO [train.py:886] (2/4) Epoch 47, batch 0, loss[loss=0.01721, audio_tagging_loss=0.01721, over 20990.00 frames. ], tot_loss[loss=0.01721, audio_tagging_loss=0.01721, over 20990.00 frames. ], batch size: 106, lr: 1.21e-02, grad_scale: 64.0
2023-12-20 18:44:38,148 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:44:48,564 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.4234, 3.0464, 3.5152, 3.1036], device='cuda:2')
2023-12-20 18:44:59,320 INFO [train.py:917] (2/4) Epoch 47, validation: loss=0.06125, audio_tagging_loss=0.06125, over 3737520.00 frames. 
2023-12-20 18:44:59,320 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:45:08,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=16013.333333333334, ans=0.33953333333333335
2023-12-20 18:45:12,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=16013.333333333334, ans=0.035
2023-12-20 18:45:14,000 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 4.428e+01 5.199e+01 5.973e+01 6.776e+01 1.435e+02, threshold=1.195e+02, percent-clipped=1.0
2023-12-20 18:45:26,524 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=13.530000000000001
2023-12-20 18:45:46,336 INFO [train.py:886] (2/4) Epoch 47, batch 50, loss[loss=0.01396, audio_tagging_loss=0.01396, over 25000.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 1117903.36 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0
2023-12-20 18:46:04,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=16293.333333333334, ans=0.125
2023-12-20 18:46:08,724 INFO [train.py:886] (2/4) Epoch 48, batch 0, loss[loss=0.01321, audio_tagging_loss=0.01321, over 24086.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 24086.00 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0
2023-12-20 18:46:08,725 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:46:29,402 INFO [train.py:917] (2/4) Epoch 48, validation: loss=0.06238, audio_tagging_loss=0.06238, over 3737520.00 frames. 
2023-12-20 18:46:29,403 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:46:39,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=16360.0, ans=0.00731304347826087
2023-12-20 18:46:46,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.87 vs. limit=13.635
2023-12-20 18:46:47,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=13.635
2023-12-20 18:46:50,473 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.24 vs. limit=10.570666666666668
2023-12-20 18:46:53,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=16426.666666666668, ans=0.125
2023-12-20 18:46:55,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=16426.666666666668, ans=0.125
2023-12-20 18:47:10,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=16560.0, ans=0.0
2023-12-20 18:47:14,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=16560.0, ans=0.13440000000000002
2023-12-20 18:47:16,770 INFO [train.py:886] (2/4) Epoch 48, batch 50, loss[loss=0.01058, audio_tagging_loss=0.01058, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 1127183.65 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0
2023-12-20 18:47:37,825 INFO [train.py:886] (2/4) Epoch 49, batch 0, loss[loss=0.01802, audio_tagging_loss=0.01802, over 20499.00 frames. ], tot_loss[loss=0.01802, audio_tagging_loss=0.01802, over 20499.00 frames. ], batch size: 106, lr: 1.18e-02, grad_scale: 64.0
2023-12-20 18:47:37,825 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:47:58,811 INFO [train.py:917] (2/4) Epoch 49, validation: loss=0.06394, audio_tagging_loss=0.06394, over 3737520.00 frames. 
2023-12-20 18:47:58,811 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:48:09,451 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 4.348e+01 5.324e+01 6.019e+01 6.956e+01 1.317e+02, threshold=1.204e+02, percent-clipped=1.0
2023-12-20 18:48:09,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=16706.666666666668, ans=0.125
2023-12-20 18:48:09,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=16706.666666666668, ans=0.125
2023-12-20 18:48:24,651 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=13.79
2023-12-20 18:48:28,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=16840.0, ans=0.125
2023-12-20 18:48:41,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=16906.666666666668, ans=0.0
2023-12-20 18:48:45,782 INFO [train.py:886] (2/4) Epoch 49, batch 50, loss[loss=0.01319, audio_tagging_loss=0.01319, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 1116344.42 frames. ], batch size: 100, lr: 1.18e-02, grad_scale: 64.0
2023-12-20 18:49:07,494 INFO [train.py:886] (2/4) Epoch 50, batch 0, loss[loss=0.01496, audio_tagging_loss=0.01496, over 24131.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 24131.00 frames. ], batch size: 100, lr: 1.17e-02, grad_scale: 64.0
2023-12-20 18:49:07,495 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:49:17,438 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1588, 2.8040, 2.7916, 2.7895], device='cuda:2')
2023-12-20 18:49:28,222 INFO [train.py:917] (2/4) Epoch 50, validation: loss=0.06678, audio_tagging_loss=0.06678, over 3737520.00 frames. 
2023-12-20 18:49:28,223 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:49:33,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=16986.666666666668, ans=0.0
2023-12-20 18:49:44,732 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.049e-02
2023-12-20 18:49:59,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=17186.666666666668, ans=0.125
2023-12-20 18:50:06,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=17253.333333333332, ans=0.29613333333333347
2023-12-20 18:50:07,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=17253.333333333332, ans=0.007118840579710146
2023-12-20 18:50:15,466 INFO [train.py:886] (2/4) Epoch 50, batch 50, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 1119154.56 frames. ], batch size: 100, lr: 1.17e-02, grad_scale: 32.0
2023-12-20 18:50:15,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=17320.0, ans=0.0
2023-12-20 18:50:18,099 INFO [train.py:1099] (2/4) Done!