2023-12-20 17:30:48,711 INFO [train.py:953] (0/4) Training started 2023-12-20 17:30:48,726 INFO [train.py:963] (0/4) Device: cuda:0 2023-12-20 17:30:48,726 INFO [train.py:965] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '0.0.0+unknown.version', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'audio_tagging', 'icefall-git-sha1': 'bd01c212-clean', 'icefall-git-date': 'Tue Dec 19 17:20:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_audio_tagging', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/lhotse_development/lhotse_at/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-7-1218101249-5bcbfb5567-jsftr', 'IP address': '10.177.6.147'}, 'world_size': 4, 'master_port': 13455, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp_at_as_full'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'num_events': 527, 'audioset_subset': 'full', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures'} 2023-12-20 17:30:48,727 INFO [train.py:967] (0/4) About to create model 2023-12-20 17:30:54,311 INFO [train.py:971] (0/4) Number of model parameters: 64264454 2023-12-20 17:30:57,177 INFO [train.py:986] (0/4) Using DDP 2023-12-20 17:30:57,436 INFO [at_datamodule.py:398] (0/4) About to get the audioset cuts for KD. 2023-12-20 17:30:57,497 INFO [at_datamodule.py:223] (0/4) Enable MUSAN 2023-12-20 17:30:57,497 INFO [at_datamodule.py:224] (0/4) About to get Musan cuts 2023-12-20 17:30:59,882 INFO [at_datamodule.py:248] (0/4) Enable SpecAugment 2023-12-20 17:30:59,882 INFO [at_datamodule.py:249] (0/4) Time warp factor: 80 2023-12-20 17:30:59,883 INFO [at_datamodule.py:259] (0/4) Num frame mask: 10 2023-12-20 17:30:59,883 INFO [at_datamodule.py:272] (0/4) About to create train dataset 2023-12-20 17:30:59,883 INFO [at_datamodule.py:299] (0/4) Using DynamicBucketingSampler. 2023-12-20 17:31:01,401 INFO [at_datamodule.py:315] (0/4) About to create train dataloader 2023-12-20 17:31:01,401 INFO [at_datamodule.py:410] (0/4) About to get test-other cuts 2023-12-20 17:31:01,434 INFO [at_datamodule.py:346] (0/4) About to create dev dataset 2023-12-20 17:31:01,877 INFO [at_datamodule.py:363] (0/4) About to create dev dataloader 2023-12-20 17:31:25,020 INFO [train.py:886] (0/4) Epoch 1, batch 0, loss[loss=1.846, audio_tagging_loss=1.846, over 24114.00 frames. ], tot_loss[loss=1.846, audio_tagging_loss=1.846, over 24114.00 frames. ], batch size: 100, lr: 2.25e-02, grad_scale: 2.0 2023-12-20 17:31:25,022 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:31:46,190 INFO [train.py:917] (0/4) Epoch 1, validation: loss=1.716, audio_tagging_loss=1.716, over 3737520.00 frames. 2023-12-20 17:31:46,191 INFO [train.py:918] (0/4) Maximum memory allocated so far is 13125MB 2023-12-20 17:31:48,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=0.0, ans=0.2 2023-12-20 17:31:52,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=25.50 vs. limit=7.5 2023-12-20 17:31:54,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=0.0, ans=0.5 2023-12-20 17:31:56,789 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+02 8.568e+02 1.002e+03 1.369e+03 1.715e+03, threshold=4.006e+03, percent-clipped=0.0 2023-12-20 17:32:00,463 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=182.17 vs. limit=7.525 2023-12-20 17:32:02,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=66.66666666666667, ans=0.1975 2023-12-20 17:32:02,494 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=375.29 vs. limit=7.55 2023-12-20 17:32:02,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.16 vs. limit=7.525 2023-12-20 17:32:07,425 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 3.256e+02 7.044e+02 1.161e+03 1.783e+03, threshold=2.818e+03, percent-clipped=0.0 2023-12-20 17:32:11,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=372.17 vs. limit=7.55 2023-12-20 17:32:19,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=141.81 vs. limit=5.1 2023-12-20 17:32:24,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=222.04 vs. limit=7.575 2023-12-20 17:32:30,897 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.273e+01 1.290e+02 2.793e+02 8.337e+02 1.783e+03, threshold=1.117e+03, percent-clipped=0.0 2023-12-20 17:32:33,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=76.80 vs. limit=4.1066666666666665 2023-12-20 17:32:34,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=266.6666666666667, ans=0.4875 2023-12-20 17:32:37,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=266.6666666666667, ans=0.4875 2023-12-20 17:32:40,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=104.51 vs. limit=4.1066666666666665 2023-12-20 17:32:42,073 INFO [train.py:886] (0/4) Epoch 1, batch 50, loss[loss=0.0563, audio_tagging_loss=0.0563, over 25000.00 frames. ], tot_loss[loss=0.3024, audio_tagging_loss=0.3024, over 1119216.04 frames. ], batch size: 100, lr: 2.48e-02, grad_scale: 2.0 2023-12-20 17:32:44,905 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-1.pt 2023-12-20 17:33:07,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=134.76 vs. limit=7.76 2023-12-20 17:33:07,710 INFO [train.py:886] (0/4) Epoch 2, batch 0, loss[loss=0.06494, audio_tagging_loss=0.06494, over 20658.00 frames. ], tot_loss[loss=0.06494, audio_tagging_loss=0.06494, over 20658.00 frames. ], batch size: 106, lr: 2.44e-02, grad_scale: 4.0 2023-12-20 17:33:07,711 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:33:15,867 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0459, 5.9177, 5.8962, 6.1470], device='cuda:0') 2023-12-20 17:33:23,156 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.8086, 4.8140, 4.8172, 4.8175], device='cuda:0') 2023-12-20 17:33:28,178 INFO [train.py:917] (0/4) Epoch 2, validation: loss=0.0597, audio_tagging_loss=0.0597, over 3737520.00 frames. 2023-12-20 17:33:28,178 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14622MB 2023-12-20 17:33:41,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=356.15 vs. limit=7.655 2023-12-20 17:33:43,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=64.51 vs. limit=7.81 2023-12-20 17:33:44,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=413.3333333333333, ans=0.1845 2023-12-20 17:33:48,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=56.89 vs. limit=7.655 2023-12-20 17:33:51,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=240.85 vs. limit=7.68 2023-12-20 17:33:53,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=480.0, ans=0.0892 2023-12-20 17:33:53,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=480.0, ans=0.4775 2023-12-20 17:33:54,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=480.0, ans=0.29519999999999996 2023-12-20 17:34:00,041 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=279.87 vs. limit=7.68 2023-12-20 17:34:02,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=122.93 vs. limit=5.24 2023-12-20 17:34:03,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=199.22 vs. limit=7.705 2023-12-20 17:34:07,842 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=228.83 vs. limit=7.705 2023-12-20 17:34:09,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=47.97 vs. limit=5.136666666666667 2023-12-20 17:34:23,680 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=21.44 vs. limit=5.153333333333333 2023-12-20 17:34:25,459 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.968e+01 6.154e+01 2.791e+02 2.019e+03, threshold=1.231e+02, percent-clipped=1.0 2023-12-20 17:34:26,577 INFO [train.py:886] (0/4) Epoch 2, batch 50, loss[loss=0.05245, audio_tagging_loss=0.05245, over 25000.00 frames. ], tot_loss[loss=0.05852, audio_tagging_loss=0.05852, over 1119858.14 frames. ], batch size: 100, lr: 2.66e-02, grad_scale: 2.0 2023-12-20 17:34:29,330 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-2.pt 2023-12-20 17:34:51,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=693.3333333333334, ans=0.8757333333333334 2023-12-20 17:34:51,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=297.64 vs. limit=7.76 2023-12-20 17:34:52,018 INFO [train.py:886] (0/4) Epoch 3, batch 0, loss[loss=0.06851, audio_tagging_loss=0.06851, over 21308.00 frames. ], tot_loss[loss=0.06851, audio_tagging_loss=0.06851, over 21308.00 frames. ], batch size: 106, lr: 2.54e-02, grad_scale: 4.0 2023-12-20 17:34:52,019 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:35:12,453 INFO [train.py:917] (0/4) Epoch 3, validation: loss=0.05878, audio_tagging_loss=0.05878, over 3737520.00 frames. 2023-12-20 17:35:12,454 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14622MB 2023-12-20 17:35:12,905 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=254.15 vs. limit=7.76 2023-12-20 17:35:13,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=693.3333333333334, ans=0.17400000000000002 2023-12-20 17:35:17,761 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-20 17:35:22,067 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-20 17:35:29,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=760.0, ans=0.464375 2023-12-20 17:35:29,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=760.0, ans=0.1715 2023-12-20 17:35:32,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=437.39 vs. limit=7.785 2023-12-20 17:35:32,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=82.81 vs. limit=7.785 2023-12-20 17:35:33,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=760.0, ans=0.1715 2023-12-20 17:35:35,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=358.74 vs. limit=7.785 2023-12-20 17:35:37,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=18.85 vs. limit=7.81 2023-12-20 17:35:39,319 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=175.27 vs. limit=5.413333333333333 2023-12-20 17:35:39,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=128.24 vs. limit=7.81 2023-12-20 17:35:42,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=152.24 vs. limit=8.12 2023-12-20 17:35:47,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=893.3333333333334, ans=0.8687333333333334 2023-12-20 17:35:55,313 WARNING [optim.py:500] (0/4) Scaling gradients by 0.09217905253171921, model_norm_threshold=123.07855224609375 2023-12-20 17:35:55,465 WARNING [optim.py:572] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.7.weight with proportion 0.48, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.614e+05, grad_sumsq=6.752e+08, orig_rms_sq=1.276e-03 2023-12-20 17:35:56,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=893.3333333333334, ans=0.3883333333333333 2023-12-20 17:36:00,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.97 vs. limit=5.24 2023-12-20 17:36:04,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=960.0, ans=0.455 2023-12-20 17:36:06,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=165.02 vs. limit=5.48 2023-12-20 17:36:11,143 INFO [train.py:886] (0/4) Epoch 3, batch 50, loss[loss=0.05835, audio_tagging_loss=0.05835, over 25000.00 frames. ], tot_loss[loss=0.05546, audio_tagging_loss=0.05546, over 1120015.71 frames. ], batch size: 100, lr: 2.75e-02, grad_scale: 4.0 2023-12-20 17:36:13,902 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-3.pt 2023-12-20 17:36:35,799 INFO [train.py:886] (0/4) Epoch 4, batch 0, loss[loss=0.06113, audio_tagging_loss=0.06113, over 20961.00 frames. ], tot_loss[loss=0.06113, audio_tagging_loss=0.06113, over 20961.00 frames. ], batch size: 106, lr: 2.58e-02, grad_scale: 8.0 2023-12-20 17:36:35,801 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:36:54,989 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3530, 4.9626, 4.0798, 4.6962], device='cuda:0') 2023-12-20 17:36:55,852 INFO [train.py:917] (0/4) Epoch 4, validation: loss=0.05673, audio_tagging_loss=0.05673, over 3737520.00 frames. 2023-12-20 17:36:55,853 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 17:37:08,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=187.10 vs. limit=7.915 2023-12-20 17:37:10,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=94.81 vs. limit=5.553333333333334 2023-12-20 17:37:11,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=157.36 vs. limit=7.915 2023-12-20 17:37:15,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.61 vs. limit=4.442666666666667 2023-12-20 17:37:17,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.40 vs. limit=4.442666666666667 2023-12-20 17:37:18,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=162.59 vs. limit=5.553333333333334 2023-12-20 17:37:19,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=135.48 vs. limit=8.33 2023-12-20 17:37:22,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1173.3333333333333, ans=0.156 2023-12-20 17:37:22,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=25.39 vs. limit=5.293333333333333 2023-12-20 17:37:23,464 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=316.81 vs. limit=7.94 2023-12-20 17:37:27,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1173.3333333333333, ans=0.0736 2023-12-20 17:37:34,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1240.0, ans=7.965 2023-12-20 17:37:41,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1240.0, ans=0.035 2023-12-20 17:37:45,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1306.6666666666667, ans=0.43875 2023-12-20 17:37:49,920 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+01 2.504e+01 2.720e+01 3.182e+01 1.335e+03, threshold=5.440e+01, percent-clipped=1.0 2023-12-20 17:37:54,274 INFO [train.py:886] (0/4) Epoch 4, batch 50, loss[loss=0.05203, audio_tagging_loss=0.05203, over 25000.00 frames. ], tot_loss[loss=0.05516, audio_tagging_loss=0.05516, over 1123376.65 frames. ], batch size: 100, lr: 2.77e-02, grad_scale: 4.0 2023-12-20 17:37:56,995 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-4.pt 2023-12-20 17:38:18,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=264.22 vs. limit=8.02 2023-12-20 17:38:19,538 INFO [train.py:886] (0/4) Epoch 5, batch 0, loss[loss=0.06368, audio_tagging_loss=0.06368, over 21362.00 frames. ], tot_loss[loss=0.06368, audio_tagging_loss=0.06368, over 21362.00 frames. ], batch size: 106, lr: 2.59e-02, grad_scale: 8.0 2023-12-20 17:38:19,540 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:38:39,896 INFO [train.py:917] (0/4) Epoch 5, validation: loss=0.05523, audio_tagging_loss=0.05523, over 3737520.00 frames. 2023-12-20 17:38:39,897 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 17:38:46,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=134.78 vs. limit=8.02 2023-12-20 17:38:54,009 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=369.58 vs. limit=8.045 2023-12-20 17:38:58,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=116.91 vs. limit=8.045 2023-12-20 17:38:59,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=243.47 vs. limit=8.59 2023-12-20 17:39:00,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1453.3333333333333, ans=0.431875 2023-12-20 17:39:00,986 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=104.32 vs. limit=8.045 2023-12-20 17:39:10,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=121.34 vs. limit=8.07 2023-12-20 17:39:10,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1520.0, ans=0.42875 2023-12-20 17:39:12,185 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=125.53 vs. limit=8.07 2023-12-20 17:39:20,155 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=343.50 vs. limit=8.095 2023-12-20 17:39:27,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=321.96 vs. limit=8.12 2023-12-20 17:39:32,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1653.3333333333333, ans=0.08966666666666667 2023-12-20 17:39:38,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=384.28 vs. limit=8.145 2023-12-20 17:39:38,872 INFO [train.py:886] (0/4) Epoch 5, batch 50, loss[loss=0.05015, audio_tagging_loss=0.05015, over 25000.00 frames. ], tot_loss[loss=0.0522, audio_tagging_loss=0.0522, over 1120165.06 frames. ], batch size: 100, lr: 2.77e-02, grad_scale: 8.0 2023-12-20 17:39:41,630 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-5.pt 2023-12-20 17:40:04,929 INFO [train.py:886] (0/4) Epoch 6, batch 0, loss[loss=0.04905, audio_tagging_loss=0.04905, over 25000.00 frames. ], tot_loss[loss=0.04905, audio_tagging_loss=0.04905, over 25000.00 frames. ], batch size: 100, lr: 2.59e-02, grad_scale: 16.0 2023-12-20 17:40:04,930 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:40:21,300 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.6505, 3.6097, 2.4391, 3.9169, 3.9509, 3.7664, 2.8636, 3.9525], device='cuda:0') 2023-12-20 17:40:25,820 INFO [train.py:917] (0/4) Epoch 6, validation: loss=0.05425, audio_tagging_loss=0.05425, over 3737520.00 frames. 2023-12-20 17:40:25,821 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 17:40:28,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1733.3333333333333, ans=0.061000000000000006 2023-12-20 17:40:29,921 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=206.57 vs. limit=8.15 2023-12-20 17:40:37,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1800.0, ans=0.415625 2023-12-20 17:40:38,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1800.0, ans=0.415625 2023-12-20 17:40:43,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1800.0, ans=0.415625 2023-12-20 17:40:44,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=80.01 vs. limit=8.85 2023-12-20 17:40:46,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=318.44 vs. limit=8.175 2023-12-20 17:40:52,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1866.6666666666667, ans=0.4125 2023-12-20 17:40:52,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1866.6666666666667, ans=0.8346666666666667 2023-12-20 17:40:58,699 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.25 vs. limit=5.933333333333334 2023-12-20 17:40:59,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=56.35 vs. limit=8.2 2023-12-20 17:41:05,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=18.71 vs. limit=4.773333333333333 2023-12-20 17:41:07,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1933.3333333333333, ans=0.1275 2023-12-20 17:41:08,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1933.3333333333333, ans=0.409375 2023-12-20 17:41:08,491 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=191.29 vs. limit=8.225 2023-12-20 17:41:09,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=166.39 vs. limit=8.95 2023-12-20 17:41:10,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1933.3333333333333, ans=0.409375 2023-12-20 17:41:12,390 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.97 vs. limit=9.0 2023-12-20 17:41:14,959 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.556e+01 2.831e+01 3.472e+01 7.747e+01, threshold=5.662e+01, percent-clipped=6.0 2023-12-20 17:41:17,624 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=77.93 vs. limit=8.25 2023-12-20 17:41:23,851 INFO [train.py:886] (0/4) Epoch 6, batch 50, loss[loss=0.04726, audio_tagging_loss=0.04726, over 25000.00 frames. ], tot_loss[loss=0.05161, audio_tagging_loss=0.05161, over 1114079.16 frames. ], batch size: 100, lr: 2.76e-02, grad_scale: 16.0 2023-12-20 17:41:26,653 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-6.pt 2023-12-20 17:41:49,201 INFO [train.py:886] (0/4) Epoch 7, batch 0, loss[loss=0.0465, audio_tagging_loss=0.0465, over 25000.00 frames. ], tot_loss[loss=0.0465, audio_tagging_loss=0.0465, over 25000.00 frames. ], batch size: 100, lr: 2.60e-02, grad_scale: 32.0 2023-12-20 17:41:49,202 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:42:09,826 INFO [train.py:917] (0/4) Epoch 7, validation: loss=0.05269, audio_tagging_loss=0.05269, over 3737520.00 frames. 2023-12-20 17:42:09,827 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 17:42:11,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2080.0, ans=0.40249999999999997 2023-12-20 17:42:14,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=69.94 vs. limit=8.28 2023-12-20 17:42:14,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=184.77 vs. limit=8.28 2023-12-20 17:42:14,747 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=30.31 vs. limit=9.06 2023-12-20 17:42:21,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2146.6666666666665, ans=0.399375 2023-12-20 17:42:32,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.39 vs. limit=5.536666666666667 2023-12-20 17:42:37,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2213.3333333333335, ans=0.27786666666666665 2023-12-20 17:42:44,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2280.0, ans=0.393125 2023-12-20 17:42:51,024 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=7.293e-01 2023-12-20 17:42:57,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2346.6666666666665, ans=0.39 2023-12-20 17:42:59,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=39.37 vs. limit=8.38 2023-12-20 17:43:02,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2346.6666666666665, ans=0.0472 2023-12-20 17:43:02,704 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=39.01 vs. limit=8.38 2023-12-20 17:43:06,928 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=28.29 vs. limit=8.405 2023-12-20 17:43:07,590 INFO [train.py:886] (0/4) Epoch 7, batch 50, loss[loss=0.04816, audio_tagging_loss=0.04816, over 25000.00 frames. ], tot_loss[loss=0.05076, audio_tagging_loss=0.05076, over 1120624.72 frames. ], batch size: 100, lr: 2.76e-02, grad_scale: 1.0 2023-12-20 17:43:07,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2413.3333333333335, ans=0.38687499999999997 2023-12-20 17:43:10,446 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-7.pt 2023-12-20 17:43:32,849 INFO [train.py:886] (0/4) Epoch 8, batch 0, loss[loss=0.05033, audio_tagging_loss=0.05033, over 24053.00 frames. ], tot_loss[loss=0.05033, audio_tagging_loss=0.05033, over 24053.00 frames. ], batch size: 100, lr: 2.60e-02, grad_scale: 2.0 2023-12-20 17:43:32,853 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:43:47,996 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.9145, 4.3862, 3.5900, 3.4260], device='cuda:0') 2023-12-20 17:43:53,654 INFO [train.py:917] (0/4) Epoch 8, validation: loss=0.05155, audio_tagging_loss=0.05155, over 3737520.00 frames. 2023-12-20 17:43:53,654 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 17:44:01,551 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=95.47 vs. limit=9.32 2023-12-20 17:44:05,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=176.46 vs. limit=8.435 2023-12-20 17:44:11,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=15.20 vs. limit=4.997333333333334 2023-12-20 17:44:12,707 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=54.13 vs. limit=8.435 2023-12-20 17:44:13,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=4.997333333333334 2023-12-20 17:44:13,838 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=224.90 vs. limit=8.435 2023-12-20 17:44:17,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=47.65 vs. limit=8.46 2023-12-20 17:44:29,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=91.72 vs. limit=8.485 2023-12-20 17:44:35,963 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=41.22 vs. limit=8.485 2023-12-20 17:44:42,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2693.3333333333335, ans=0.09899999999999999 2023-12-20 17:44:43,339 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.635e+01 3.487e+01 4.265e+01 5.657e+01 4.687e+02, threshold=8.530e+01, percent-clipped=24.0 2023-12-20 17:44:50,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2760.0, ans=0.04949747468305833 2023-12-20 17:44:50,552 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=37.21 vs. limit=9.57 2023-12-20 17:44:51,064 INFO [train.py:886] (0/4) Epoch 8, batch 50, loss[loss=0.04467, audio_tagging_loss=0.04467, over 25000.00 frames. ], tot_loss[loss=0.04926, audio_tagging_loss=0.04926, over 1119056.28 frames. ], batch size: 100, lr: 2.75e-02, grad_scale: 2.0 2023-12-20 17:44:51,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=83.87 vs. limit=8.535 2023-12-20 17:44:53,693 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-8.pt 2023-12-20 17:45:16,131 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.27 vs. limit=6.386666666666667 2023-12-20 17:45:16,354 INFO [train.py:886] (0/4) Epoch 9, batch 0, loss[loss=0.05675, audio_tagging_loss=0.05675, over 21325.00 frames. ], tot_loss[loss=0.05675, audio_tagging_loss=0.05675, over 21325.00 frames. ], batch size: 106, lr: 2.61e-02, grad_scale: 4.0 2023-12-20 17:45:16,356 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:45:37,429 INFO [train.py:917] (0/4) Epoch 9, validation: loss=0.04977, audio_tagging_loss=0.04977, over 3737520.00 frames. 2023-12-20 17:45:37,429 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 17:45:37,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=98.60 vs. limit=9.58 2023-12-20 17:45:39,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2773.3333333333335, ans=0.37 2023-12-20 17:45:50,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=62.94 vs. limit=9.629999999999999 2023-12-20 17:45:52,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2840.0, ans=0.366875 2023-12-20 17:46:10,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=23.04 vs. limit=8.615 2023-12-20 17:46:15,212 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=26.08 vs. limit=8.615 2023-12-20 17:46:17,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=20.40 vs. limit=8.615 2023-12-20 17:46:23,042 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=48.11 vs. limit=9.78 2023-12-20 17:46:23,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3040.0, ans=0.35750000000000004 2023-12-20 17:46:27,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.85 vs. limit=5.216 2023-12-20 17:46:27,278 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.07 vs. limit=5.76 2023-12-20 17:46:27,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=8.64 2023-12-20 17:46:33,271 INFO [train.py:886] (0/4) Epoch 9, batch 50, loss[loss=0.04694, audio_tagging_loss=0.04694, over 25000.00 frames. ], tot_loss[loss=0.04785, audio_tagging_loss=0.04785, over 1121763.65 frames. ], batch size: 100, lr: 2.75e-02, grad_scale: 4.0 2023-12-20 17:46:36,042 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-9.pt 2023-12-20 17:46:59,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.84 vs. limit=5.78 2023-12-20 17:46:59,482 INFO [train.py:886] (0/4) Epoch 10, batch 0, loss[loss=0.0477, audio_tagging_loss=0.0477, over 24083.00 frames. ], tot_loss[loss=0.0477, audio_tagging_loss=0.0477, over 24083.00 frames. ], batch size: 100, lr: 2.62e-02, grad_scale: 8.0 2023-12-20 17:46:59,483 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:47:20,695 INFO [train.py:917] (0/4) Epoch 10, validation: loss=0.04858, audio_tagging_loss=0.04858, over 3737520.00 frames. 2023-12-20 17:47:20,696 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 17:47:21,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.76 vs. limit=6.5600000000000005 2023-12-20 17:47:21,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=91.98 vs. limit=8.67 2023-12-20 17:47:31,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=3186.6666666666665, ans=0.016746666666666667 2023-12-20 17:47:36,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3186.6666666666665, ans=0.35062499999999996 2023-12-20 17:47:37,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3186.6666666666665, ans=0.26813333333333333 2023-12-20 17:47:44,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3253.3333333333335, ans=0.34750000000000003 2023-12-20 17:47:44,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.34 vs. limit=5.301333333333334 2023-12-20 17:47:45,979 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=31.19 vs. limit=8.72 2023-12-20 17:47:45,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=21.39 vs. limit=8.72 2023-12-20 17:47:51,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.01 vs. limit=9.94 2023-12-20 17:47:53,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=31.59 vs. limit=9.99 2023-12-20 17:47:55,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=16.55 vs. limit=8.745000000000001 2023-12-20 17:48:04,134 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.619e+01 3.726e+01 4.484e+01 5.424e+01 1.858e+02, threshold=8.969e+01, percent-clipped=3.0 2023-12-20 17:48:04,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=30.85 vs. limit=10.04 2023-12-20 17:48:15,993 INFO [train.py:886] (0/4) Epoch 10, batch 50, loss[loss=0.04248, audio_tagging_loss=0.04248, over 25000.00 frames. ], tot_loss[loss=0.04669, audio_tagging_loss=0.04669, over 1120130.79 frames. ], batch size: 100, lr: 2.71e-02, grad_scale: 8.0 2023-12-20 17:48:18,701 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-10.pt 2023-12-20 17:48:40,822 INFO [train.py:886] (0/4) Epoch 11, batch 0, loss[loss=0.04616, audio_tagging_loss=0.04616, over 25000.00 frames. ], tot_loss[loss=0.04616, audio_tagging_loss=0.04616, over 25000.00 frames. ], batch size: 100, lr: 2.58e-02, grad_scale: 16.0 2023-12-20 17:48:40,823 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:48:53,138 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.3516, 2.5020, 2.8772, 2.5648], device='cuda:0') 2023-12-20 17:49:01,999 INFO [train.py:917] (0/4) Epoch 11, validation: loss=0.04728, audio_tagging_loss=0.04728, over 3737520.00 frames. 2023-12-20 17:49:02,000 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 17:49:05,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3466.6666666666665, ans=0.7846666666666666 2023-12-20 17:49:07,500 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=58.61 vs. limit=8.8 2023-12-20 17:49:09,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=8.8 2023-12-20 17:49:23,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3533.3333333333335, ans=0.334375 2023-12-20 17:49:25,172 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.28 vs. limit=5.9 2023-12-20 17:49:26,150 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=5.4399999999999995 2023-12-20 17:49:30,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3600.0, ans=0.264 2023-12-20 17:49:30,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=10.2 2023-12-20 17:49:30,732 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=8.85 2023-12-20 17:49:30,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=29.32 vs. limit=10.2 2023-12-20 17:49:31,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3600.0, ans=0.264 2023-12-20 17:49:32,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=56.68 vs. limit=8.85 2023-12-20 17:49:36,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3666.6666666666665, ans=0.2633333333333333 2023-12-20 17:49:43,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=67.80 vs. limit=8.875 2023-12-20 17:49:44,422 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=4.323e+01 2023-12-20 17:49:49,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=27.34 vs. limit=10.3 2023-12-20 17:49:52,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=46.24 vs. limit=8.9 2023-12-20 17:49:52,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.93 vs. limit=10.3 2023-12-20 17:49:58,308 INFO [train.py:886] (0/4) Epoch 11, batch 50, loss[loss=0.04229, audio_tagging_loss=0.04229, over 25000.00 frames. ], tot_loss[loss=0.04417, audio_tagging_loss=0.04417, over 1124488.66 frames. ], batch size: 100, lr: 2.58e-02, grad_scale: 16.0 2023-12-20 17:49:58,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3800.0, ans=0.16485 2023-12-20 17:49:58,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3800.0, ans=0.321875 2023-12-20 17:50:00,995 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-11.pt 2023-12-20 17:50:23,180 INFO [train.py:886] (0/4) Epoch 12, batch 0, loss[loss=0.04487, audio_tagging_loss=0.04487, over 24112.00 frames. ], tot_loss[loss=0.04487, audio_tagging_loss=0.04487, over 24112.00 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 32.0 2023-12-20 17:50:23,182 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:50:36,077 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.1437, 1.5026, 1.3238, 1.3460], device='cuda:0') 2023-12-20 17:50:44,481 INFO [train.py:917] (0/4) Epoch 12, validation: loss=0.04619, audio_tagging_loss=0.04619, over 3737520.00 frames. 2023-12-20 17:50:44,482 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 17:50:51,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=29.25 vs. limit=10.36 2023-12-20 17:50:51,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=8.93 2023-12-20 17:51:00,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=8.955 2023-12-20 17:51:03,614 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=8.955 2023-12-20 17:51:05,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.90 vs. limit=10.41 2023-12-20 17:51:07,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=27.37 vs. limit=10.46 2023-12-20 17:51:10,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3946.6666666666665, ans=0.2592 2023-12-20 17:51:11,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=8.98 2023-12-20 17:51:11,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=21.44 vs. limit=8.98 2023-12-20 17:51:15,858 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.43 vs. limit=10.46 2023-12-20 17:51:18,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.38 vs. limit=10.46 2023-12-20 17:51:23,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4013.3333333333335, ans=0.311875 2023-12-20 17:51:25,063 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+01 3.849e+01 4.841e+01 5.572e+01 8.770e+01, threshold=9.682e+01, percent-clipped=0.0 2023-12-20 17:51:26,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=39.49 vs. limit=9.004999999999999 2023-12-20 17:51:33,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=9.03 2023-12-20 17:51:35,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4080.0, ans=0.2592 2023-12-20 17:51:39,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=23.75 vs. limit=9.03 2023-12-20 17:51:40,928 INFO [train.py:886] (0/4) Epoch 12, batch 50, loss[loss=0.04265, audio_tagging_loss=0.04265, over 25000.00 frames. ], tot_loss[loss=0.04446, audio_tagging_loss=0.04446, over 1116434.67 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 32.0 2023-12-20 17:51:41,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4146.666666666667, ans=0.04938888888888889 2023-12-20 17:51:43,965 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-12.pt 2023-12-20 17:52:04,718 INFO [train.py:886] (0/4) Epoch 13, batch 0, loss[loss=0.04885, audio_tagging_loss=0.04885, over 24122.00 frames. ], tot_loss[loss=0.04885, audio_tagging_loss=0.04885, over 24122.00 frames. ], batch size: 100, lr: 2.38e-02, grad_scale: 32.0 2023-12-20 17:52:04,720 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:52:12,949 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.8749, 1.3593, 1.7172, 1.7075], device='cuda:0') 2023-12-20 17:52:25,611 INFO [train.py:917] (0/4) Epoch 13, validation: loss=0.04525, audio_tagging_loss=0.04525, over 3737520.00 frames. 2023-12-20 17:52:25,611 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 17:52:25,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4160.0, ans=0.2584 2023-12-20 17:52:25,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4160.0, ans=0.07400000000000001 2023-12-20 17:52:30,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=9.06 2023-12-20 17:52:40,499 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=9.085 2023-12-20 17:52:50,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4293.333333333333, ans=0.29874999999999996 2023-12-20 17:52:53,066 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.388e+01 2023-12-20 17:52:54,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=21.28 vs. limit=9.11 2023-12-20 17:52:56,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.23 vs. limit=10.719999999999999 2023-12-20 17:52:57,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4360.0, ans=0.295625 2023-12-20 17:53:00,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4360.0, ans=0.295625 2023-12-20 17:53:00,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=37.06 vs. limit=9.135 2023-12-20 17:53:04,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4360.0, ans=0.7474000000000001 2023-12-20 17:53:07,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=9.135 2023-12-20 17:53:14,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4426.666666666667, ans=0.2557333333333333 2023-12-20 17:53:19,101 INFO [train.py:886] (0/4) Epoch 13, batch 50, loss[loss=0.03964, audio_tagging_loss=0.03964, over 25000.00 frames. ], tot_loss[loss=0.0433, audio_tagging_loss=0.0433, over 1117859.90 frames. ], batch size: 100, lr: 2.38e-02, grad_scale: 32.0 2023-12-20 17:53:21,826 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-13.pt 2023-12-20 17:53:43,852 INFO [train.py:886] (0/4) Epoch 14, batch 0, loss[loss=0.04065, audio_tagging_loss=0.04065, over 24116.00 frames. ], tot_loss[loss=0.04065, audio_tagging_loss=0.04065, over 24116.00 frames. ], batch size: 100, lr: 2.29e-02, grad_scale: 32.0 2023-12-20 17:53:43,854 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:53:54,807 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.6294, 2.4012, 2.7488, 2.5912], device='cuda:0') 2023-12-20 17:54:05,170 INFO [train.py:917] (0/4) Epoch 14, validation: loss=0.04503, audio_tagging_loss=0.04503, over 3737520.00 frames. 2023-12-20 17:54:05,170 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 17:54:10,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.97 vs. limit=9.19 2023-12-20 17:54:13,876 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.28 vs. limit=9.19 2023-12-20 17:54:16,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.83 vs. limit=7.286666666666667 2023-12-20 17:54:28,472 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.80 vs. limit=10.98 2023-12-20 17:54:31,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=9.24 2023-12-20 17:54:38,364 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.429e+01 4.195e+01 5.214e+01 6.348e+01 1.962e+02, threshold=1.043e+02, percent-clipped=5.0 2023-12-20 17:54:42,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=9.265 2023-12-20 17:54:43,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=9.265 2023-12-20 17:54:48,338 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=9.29 2023-12-20 17:54:56,411 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.54 vs. limit=9.29 2023-12-20 17:54:57,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.44 vs. limit=7.42 2023-12-20 17:54:58,021 INFO [train.py:886] (0/4) Epoch 14, batch 50, loss[loss=0.04071, audio_tagging_loss=0.04071, over 25000.00 frames. ], tot_loss[loss=0.04191, audio_tagging_loss=0.04191, over 1118283.33 frames. ], batch size: 100, lr: 2.29e-02, grad_scale: 32.0 2023-12-20 17:55:00,640 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-14.pt 2023-12-20 17:55:22,494 INFO [train.py:886] (0/4) Epoch 15, batch 0, loss[loss=0.05047, audio_tagging_loss=0.05047, over 21790.00 frames. ], tot_loss[loss=0.05047, audio_tagging_loss=0.05047, over 21790.00 frames. ], batch size: 106, lr: 2.21e-02, grad_scale: 32.0 2023-12-20 17:55:22,495 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:55:43,378 INFO [train.py:917] (0/4) Epoch 15, validation: loss=0.04452, audio_tagging_loss=0.04452, over 3737520.00 frames. 2023-12-20 17:55:43,379 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 17:55:44,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4853.333333333333, ans=0.27249999999999996 2023-12-20 17:55:44,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=19.10 vs. limit=9.32 2023-12-20 17:55:49,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4853.333333333333, ans=0.27249999999999996 2023-12-20 17:55:50,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=5.941333333333333 2023-12-20 17:55:50,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4853.333333333333, ans=0.27249999999999996 2023-12-20 17:55:56,124 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=53.77 vs. limit=9.345 2023-12-20 17:55:56,205 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.90 vs. limit=5.968 2023-12-20 17:56:02,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4920.0, ans=0.2508 2023-12-20 17:56:17,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=5053.333333333333, ans=0.24946666666666667 2023-12-20 17:56:22,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=5053.333333333333, ans=11.29 2023-12-20 17:56:30,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.32 vs. limit=11.34 2023-12-20 17:56:34,927 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.58 vs. limit=6.296666666666667 2023-12-20 17:56:35,388 INFO [train.py:886] (0/4) Epoch 15, batch 50, loss[loss=0.04205, audio_tagging_loss=0.04205, over 25000.00 frames. ], tot_loss[loss=0.04143, audio_tagging_loss=0.04143, over 1116288.45 frames. ], batch size: 100, lr: 2.21e-02, grad_scale: 32.0 2023-12-20 17:56:38,274 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-15.pt 2023-12-20 17:57:00,246 INFO [train.py:886] (0/4) Epoch 16, batch 0, loss[loss=0.04969, audio_tagging_loss=0.04969, over 20608.00 frames. ], tot_loss[loss=0.04969, audio_tagging_loss=0.04969, over 20608.00 frames. ], batch size: 106, lr: 2.14e-02, grad_scale: 32.0 2023-12-20 17:57:00,247 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:57:12,306 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.7230, 1.3218, 1.6328, 1.7881], device='cuda:0') 2023-12-20 17:57:21,261 INFO [train.py:917] (0/4) Epoch 16, validation: loss=0.04383, audio_tagging_loss=0.04383, over 3737520.00 frames. 2023-12-20 17:57:21,262 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 17:57:24,646 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.55 vs. limit=9.45 2023-12-20 17:57:28,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5200.0, ans=0.25625 2023-12-20 17:57:29,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=5200.0, ans=0.00973913043478261 2023-12-20 17:57:33,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=6.106666666666667 2023-12-20 17:57:47,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=11.5 2023-12-20 17:57:49,874 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.797e+01 3.933e+01 4.813e+01 5.766e+01 2.623e+02, threshold=9.626e+01, percent-clipped=4.0 2023-12-20 17:57:56,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5400.0, ans=0.246 2023-12-20 17:58:06,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5466.666666666667, ans=0.7086666666666667 2023-12-20 17:58:08,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.72 vs. limit=6.366666666666667 2023-12-20 17:58:14,015 INFO [train.py:886] (0/4) Epoch 16, batch 50, loss[loss=0.03492, audio_tagging_loss=0.03492, over 25000.00 frames. ], tot_loss[loss=0.04099, audio_tagging_loss=0.04099, over 1109953.14 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 32.0 2023-12-20 17:58:14,527 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=11.65 2023-12-20 17:58:16,687 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-16.pt 2023-12-20 17:58:38,071 INFO [train.py:886] (0/4) Epoch 17, batch 0, loss[loss=0.04403, audio_tagging_loss=0.04403, over 24122.00 frames. ], tot_loss[loss=0.04403, audio_tagging_loss=0.04403, over 24122.00 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 32.0 2023-12-20 17:58:38,073 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 17:58:46,350 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7383, 3.0430, 2.4516, 2.8299], device='cuda:0') 2023-12-20 17:58:59,168 INFO [train.py:917] (0/4) Epoch 17, validation: loss=0.04362, audio_tagging_loss=0.04362, over 3737520.00 frames. 2023-12-20 17:58:59,169 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 17:59:12,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=9.605 2023-12-20 17:59:17,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5613.333333333333, ans=0.0 2023-12-20 17:59:17,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5613.333333333333, ans=0.8061333333333334 2023-12-20 17:59:18,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=5680.0, ans=0.23375 2023-12-20 17:59:19,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.08 vs. limit=7.84 2023-12-20 17:59:32,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5746.666666666667, ans=0.042722222222222224 2023-12-20 17:59:35,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5746.666666666667, ans=0.0 2023-12-20 17:59:47,230 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=3.654e+01 2023-12-20 17:59:49,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5880.0, ans=0.224375 2023-12-20 17:59:49,923 INFO [train.py:886] (0/4) Epoch 17, batch 50, loss[loss=0.03751, audio_tagging_loss=0.03751, over 25000.00 frames. ], tot_loss[loss=0.03974, audio_tagging_loss=0.03974, over 1123241.45 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 32.0 2023-12-20 17:59:52,540 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-17.pt 2023-12-20 18:00:14,303 INFO [train.py:886] (0/4) Epoch 18, batch 0, loss[loss=0.03726, audio_tagging_loss=0.03726, over 25000.00 frames. ], tot_loss[loss=0.03726, audio_tagging_loss=0.03726, over 25000.00 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 32.0 2023-12-20 18:00:14,305 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:00:35,065 INFO [train.py:917] (0/4) Epoch 18, validation: loss=0.04342, audio_tagging_loss=0.04342, over 3737520.00 frames. 2023-12-20 18:00:35,065 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:00:40,438 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.00 vs. limit=9.71 2023-12-20 18:00:47,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=24.40 vs. limit=9.735 2023-12-20 18:00:48,207 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=11.969999999999999 2023-12-20 18:00:49,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=27.98 vs. limit=11.969999999999999 2023-12-20 18:00:54,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=6026.666666666667, ans=0.0 2023-12-20 18:00:55,150 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=12.02 2023-12-20 18:00:58,722 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.069e+01 3.667e+01 4.319e+01 5.687e+01 1.553e+02, threshold=8.639e+01, percent-clipped=3.0 2023-12-20 18:01:08,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=6093.333333333333, ans=0.04127777777777778 2023-12-20 18:01:17,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=6160.0, ans=0.21125 2023-12-20 18:01:18,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=6160.0, ans=0.21125 2023-12-20 18:01:21,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=6160.0, ans=0.009530434782608696 2023-12-20 18:01:25,785 INFO [train.py:886] (0/4) Epoch 18, batch 50, loss[loss=0.03991, audio_tagging_loss=0.03991, over 25000.00 frames. ], tot_loss[loss=0.03911, audio_tagging_loss=0.03911, over 1114378.37 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 32.0 2023-12-20 18:01:28,561 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-18.pt 2023-12-20 18:01:50,813 INFO [train.py:886] (0/4) Epoch 19, batch 0, loss[loss=0.04695, audio_tagging_loss=0.04695, over 20901.00 frames. ], tot_loss[loss=0.04695, audio_tagging_loss=0.04695, over 20901.00 frames. ], batch size: 106, lr: 1.96e-02, grad_scale: 32.0 2023-12-20 18:01:50,815 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:02:11,831 INFO [train.py:917] (0/4) Epoch 19, validation: loss=0.04287, audio_tagging_loss=0.04287, over 3737520.00 frames. 2023-12-20 18:02:11,832 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:02:12,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=6240.0, ans=0.20750000000000002 2023-12-20 18:02:15,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=6240.0, ans=0.20750000000000002 2023-12-20 18:02:28,723 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=19.62 vs. limit=9.865 2023-12-20 18:02:45,661 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=1.984e+01 2023-12-20 18:02:45,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=12.33 2023-12-20 18:02:47,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=6440.0, ans=0.198125 2023-12-20 18:02:59,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=6506.666666666667, ans=0.03955555555555555 2023-12-20 18:03:02,031 INFO [train.py:886] (0/4) Epoch 19, batch 50, loss[loss=0.03495, audio_tagging_loss=0.03495, over 25000.00 frames. ], tot_loss[loss=0.03775, audio_tagging_loss=0.03775, over 1120317.84 frames. ], batch size: 100, lr: 1.96e-02, grad_scale: 32.0 2023-12-20 18:03:04,603 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-19.pt 2023-12-20 18:03:26,287 INFO [train.py:886] (0/4) Epoch 20, batch 0, loss[loss=0.03682, audio_tagging_loss=0.03682, over 25000.00 frames. ], tot_loss[loss=0.03682, audio_tagging_loss=0.03682, over 25000.00 frames. ], batch size: 100, lr: 1.91e-02, grad_scale: 32.0 2023-12-20 18:03:26,289 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:03:47,099 INFO [train.py:917] (0/4) Epoch 20, validation: loss=0.0429, audio_tagging_loss=0.0429, over 3737520.00 frames. 2023-12-20 18:03:47,100 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:03:48,472 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=25.19 vs. limit=9.97 2023-12-20 18:03:52,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=6586.666666666667, ans=0.00943768115942029 2023-12-20 18:03:53,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=30.16 vs. limit=9.97 2023-12-20 18:03:53,075 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.56 vs. limit=12.440000000000001 2023-12-20 18:03:59,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=21.68 vs. limit=9.995000000000001 2023-12-20 18:03:59,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=6653.333333333333, ans=0.03894444444444445 2023-12-20 18:04:00,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.66 vs. limit=6.661333333333333 2023-12-20 18:04:06,500 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.851e+01 3.799e+01 4.551e+01 5.624e+01 1.513e+02, threshold=9.102e+01, percent-clipped=5.0 2023-12-20 18:04:19,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=6786.666666666667, ans=0.09899494936611666 2023-12-20 18:04:27,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=6853.333333333333, ans=0.17875000000000002 2023-12-20 18:04:29,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=6853.333333333333, ans=0.23146666666666665 2023-12-20 18:04:30,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=6853.333333333333, ans=0.17875000000000002 2023-12-20 18:04:32,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=6853.333333333333, ans=0.6601333333333333 2023-12-20 18:04:37,011 INFO [train.py:886] (0/4) Epoch 20, batch 50, loss[loss=0.03471, audio_tagging_loss=0.03471, over 25000.00 frames. ], tot_loss[loss=0.03688, audio_tagging_loss=0.03688, over 1124482.34 frames. ], batch size: 100, lr: 1.91e-02, grad_scale: 32.0 2023-12-20 18:04:37,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=4.038 2023-12-20 18:04:39,877 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-20.pt 2023-12-20 18:04:59,859 INFO [train.py:886] (0/4) Epoch 21, batch 0, loss[loss=0.03603, audio_tagging_loss=0.03603, over 25000.00 frames. ], tot_loss[loss=0.03603, audio_tagging_loss=0.03603, over 25000.00 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 32.0 2023-12-20 18:04:59,861 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:05:20,819 INFO [train.py:917] (0/4) Epoch 21, validation: loss=0.0427, audio_tagging_loss=0.0427, over 3737520.00 frames. 2023-12-20 18:05:20,820 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:05:25,490 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=12.7 2023-12-20 18:05:32,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=7000.0, ans=4.05 2023-12-20 18:05:36,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=10.125 2023-12-20 18:05:50,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=10.15 2023-12-20 18:05:52,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=7133.333333333333, ans=0.009318840579710145 2023-12-20 18:05:54,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=7133.333333333333, ans=0.07 2023-12-20 18:06:10,789 INFO [train.py:886] (0/4) Epoch 21, batch 50, loss[loss=0.03496, audio_tagging_loss=0.03496, over 25000.00 frames. ], tot_loss[loss=0.03611, audio_tagging_loss=0.03611, over 1127122.05 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 32.0 2023-12-20 18:06:11,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=12.95 2023-12-20 18:06:13,418 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-21.pt 2023-12-20 18:06:34,955 INFO [train.py:886] (0/4) Epoch 22, batch 0, loss[loss=0.05059, audio_tagging_loss=0.05059, over 19779.00 frames. ], tot_loss[loss=0.05059, audio_tagging_loss=0.05059, over 19779.00 frames. ], batch size: 106, lr: 1.82e-02, grad_scale: 32.0 2023-12-20 18:06:34,957 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:06:47,982 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.0006, 2.0298, 2.2017, 2.2315], device='cuda:0') 2023-12-20 18:06:55,947 INFO [train.py:917] (0/4) Epoch 22, validation: loss=0.04259, audio_tagging_loss=0.04259, over 3737520.00 frames. 2023-12-20 18:06:55,948 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:07:01,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=7280.0, ans=0.15875 2023-12-20 18:07:04,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=7280.0, ans=0.15875 2023-12-20 18:07:05,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=10.254999999999999 2023-12-20 18:07:10,812 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.833e+01 3.757e+01 4.513e+01 5.428e+01 2.125e+02, threshold=9.026e+01, percent-clipped=5.0 2023-12-20 18:07:14,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=7413.333333333333, ans=0.2 2023-12-20 18:07:30,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.69 vs. limit=6.992 2023-12-20 18:07:32,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.63 vs. limit=13.11 2023-12-20 18:07:44,528 INFO [train.py:886] (0/4) Epoch 22, batch 50, loss[loss=0.03275, audio_tagging_loss=0.03275, over 25000.00 frames. ], tot_loss[loss=0.03583, audio_tagging_loss=0.03583, over 1122352.45 frames. ], batch size: 100, lr: 1.81e-02, grad_scale: 32.0 2023-12-20 18:07:47,125 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-22.pt 2023-12-20 18:08:08,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=13.219999999999999 2023-12-20 18:08:08,666 INFO [train.py:886] (0/4) Epoch 23, batch 0, loss[loss=0.04293, audio_tagging_loss=0.04293, over 21112.00 frames. ], tot_loss[loss=0.04293, audio_tagging_loss=0.04293, over 21112.00 frames. ], batch size: 106, lr: 1.77e-02, grad_scale: 32.0 2023-12-20 18:08:08,667 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:08:30,062 INFO [train.py:917] (0/4) Epoch 23, validation: loss=0.04291, audio_tagging_loss=0.04291, over 3737520.00 frames. 2023-12-20 18:08:30,063 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:08:32,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=7626.666666666667, ans=0.14250000000000002 2023-12-20 18:08:45,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=7693.333333333333, ans=0.07 2023-12-20 18:08:57,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=13.32 2023-12-20 18:09:02,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=7826.666666666667, ans=0.09899494936611666 2023-12-20 18:09:15,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=29.25 vs. limit=10.46 2023-12-20 18:09:17,982 INFO [train.py:886] (0/4) Epoch 23, batch 50, loss[loss=0.03649, audio_tagging_loss=0.03649, over 25000.00 frames. ], tot_loss[loss=0.03537, audio_tagging_loss=0.03537, over 1115886.29 frames. ], batch size: 100, lr: 1.77e-02, grad_scale: 32.0 2023-12-20 18:09:18,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=10.485 2023-12-20 18:09:20,567 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-23.pt 2023-12-20 18:09:40,316 INFO [train.py:886] (0/4) Epoch 24, batch 0, loss[loss=0.03913, audio_tagging_loss=0.03913, over 24193.00 frames. ], tot_loss[loss=0.03913, audio_tagging_loss=0.03913, over 24193.00 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 32.0 2023-12-20 18:09:40,317 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:10:01,280 INFO [train.py:917] (0/4) Epoch 24, validation: loss=0.04248, audio_tagging_loss=0.04248, over 3737520.00 frames. 2023-12-20 18:10:01,281 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:10:11,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=10.515 2023-12-20 18:10:12,545 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.742e+01 3.651e+01 4.128e+01 4.777e+01 1.617e+02, threshold=8.255e+01, percent-clipped=1.0 2023-12-20 18:10:13,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.59 vs. limit=10.515 2023-12-20 18:10:15,748 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=13.530000000000001 2023-12-20 18:10:17,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=8040.0, ans=0.009121739130434783 2023-12-20 18:10:31,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=13.629999999999999 2023-12-20 18:10:31,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=10.565 2023-12-20 18:10:36,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=8173.333333333333, ans=0.009092753623188406 2023-12-20 18:10:41,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=19.58 vs. limit=10.59 2023-12-20 18:10:49,625 INFO [train.py:886] (0/4) Epoch 24, batch 50, loss[loss=0.03405, audio_tagging_loss=0.03405, over 25000.00 frames. ], tot_loss[loss=0.03398, audio_tagging_loss=0.03398, over 1122316.42 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 32.0 2023-12-20 18:10:52,225 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-24.pt 2023-12-20 18:11:13,605 INFO [train.py:886] (0/4) Epoch 25, batch 0, loss[loss=0.03215, audio_tagging_loss=0.03215, over 25000.00 frames. ], tot_loss[loss=0.03215, audio_tagging_loss=0.03215, over 25000.00 frames. ], batch size: 100, lr: 1.70e-02, grad_scale: 32.0 2023-12-20 18:11:13,606 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:11:34,709 INFO [train.py:917] (0/4) Epoch 25, validation: loss=0.04257, audio_tagging_loss=0.04257, over 3737520.00 frames. 2023-12-20 18:11:34,710 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:11:35,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=13.74 2023-12-20 18:11:56,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=13.84 2023-12-20 18:12:02,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=8520.0, ans=0.2148 2023-12-20 18:12:10,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=8520.0, ans=10.695 2023-12-20 18:12:14,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=8586.666666666666, ans=0.5994666666666667 2023-12-20 18:12:14,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.19 vs. limit=9.293333333333333 2023-12-20 18:12:21,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=4.288 2023-12-20 18:12:21,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.81 vs. limit=10.745000000000001 2023-12-20 18:12:22,264 INFO [train.py:886] (0/4) Epoch 25, batch 50, loss[loss=0.03301, audio_tagging_loss=0.03301, over 25000.00 frames. ], tot_loss[loss=0.03319, audio_tagging_loss=0.03319, over 1120648.67 frames. ], batch size: 100, lr: 1.70e-02, grad_scale: 32.0 2023-12-20 18:12:25,016 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-25.pt 2023-12-20 18:12:45,042 INFO [train.py:886] (0/4) Epoch 26, batch 0, loss[loss=0.03708, audio_tagging_loss=0.03708, over 24123.00 frames. ], tot_loss[loss=0.03708, audio_tagging_loss=0.03708, over 24123.00 frames. ], batch size: 100, lr: 1.66e-02, grad_scale: 32.0 2023-12-20 18:12:45,043 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:13:05,888 INFO [train.py:917] (0/4) Epoch 26, validation: loss=0.04241, audio_tagging_loss=0.04241, over 3737520.00 frames. 2023-12-20 18:13:05,888 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:13:08,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=8666.666666666666, ans=0.125 2023-12-20 18:13:12,404 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.047e+01 3.673e+01 4.044e+01 4.675e+01 8.607e+01, threshold=8.088e+01, percent-clipped=1.0 2023-12-20 18:13:16,290 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=6.547e+00 2023-12-20 18:13:17,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8733.333333333334, ans=0.21266666666666667 2023-12-20 18:13:17,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=8733.333333333334, ans=0.07 2023-12-20 18:13:25,825 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.76 vs. limit=10.8 2023-12-20 18:13:30,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=8800.0, ans=0.008956521739130436 2023-12-20 18:13:52,997 INFO [train.py:886] (0/4) Epoch 26, batch 50, loss[loss=0.02947, audio_tagging_loss=0.02947, over 25000.00 frames. ], tot_loss[loss=0.03251, audio_tagging_loss=0.03251, over 1123279.65 frames. ], batch size: 100, lr: 1.66e-02, grad_scale: 32.0 2023-12-20 18:13:55,861 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-26.pt 2023-12-20 18:14:18,297 INFO [train.py:886] (0/4) Epoch 27, batch 0, loss[loss=0.03139, audio_tagging_loss=0.03139, over 25000.00 frames. ], tot_loss[loss=0.03139, audio_tagging_loss=0.03139, over 25000.00 frames. ], batch size: 100, lr: 1.63e-02, grad_scale: 32.0 2023-12-20 18:14:18,298 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:14:31,211 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.3703, 2.2539, 2.2031, 2.4177], device='cuda:0') 2023-12-20 18:14:37,108 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.6120, 2.2912, 2.3872, 2.9274], device='cuda:0') 2023-12-20 18:14:39,333 INFO [train.py:917] (0/4) Epoch 27, validation: loss=0.04294, audio_tagging_loss=0.04294, over 3737520.00 frames. 2023-12-20 18:14:39,333 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:14:56,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.91 vs. limit=7.632 2023-12-20 18:15:04,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=9146.666666666666, ans=9.573333333333334 2023-12-20 18:15:06,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=9146.666666666666, ans=0.04949747468305833 2023-12-20 18:15:06,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=10.93 2023-12-20 18:15:09,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=9213.333333333334, ans=0.008866666666666667 2023-12-20 18:15:13,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.78 vs. limit=7.303333333333334 2023-12-20 18:15:19,504 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.444e+00 2023-12-20 18:15:26,770 INFO [train.py:886] (0/4) Epoch 27, batch 50, loss[loss=0.02979, audio_tagging_loss=0.02979, over 25000.00 frames. ], tot_loss[loss=0.03181, audio_tagging_loss=0.03181, over 1119626.36 frames. ], batch size: 100, lr: 1.63e-02, grad_scale: 32.0 2023-12-20 18:15:29,313 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-27.pt 2023-12-20 18:15:48,260 INFO [train.py:886] (0/4) Epoch 28, batch 0, loss[loss=0.02678, audio_tagging_loss=0.02678, over 25000.00 frames. ], tot_loss[loss=0.02678, audio_tagging_loss=0.02678, over 25000.00 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-20 18:15:48,261 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:16:09,708 INFO [train.py:917] (0/4) Epoch 28, validation: loss=0.04282, audio_tagging_loss=0.04282, over 3737520.00 frames. 2023-12-20 18:16:09,709 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:16:12,511 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.131e+01 3.970e+01 4.630e+01 5.343e+01 9.281e+01, threshold=9.260e+01, percent-clipped=1.0 2023-12-20 18:16:32,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=9493.333333333334, ans=0.5677333333333334 2023-12-20 18:16:33,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=9493.333333333334, ans=0.125 2023-12-20 18:16:33,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=11.06 2023-12-20 18:16:49,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=9626.666666666666, ans=0.0 2023-12-20 18:16:56,951 INFO [train.py:886] (0/4) Epoch 28, batch 50, loss[loss=0.03239, audio_tagging_loss=0.03239, over 25000.00 frames. ], tot_loss[loss=0.03105, audio_tagging_loss=0.03105, over 1120200.06 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-20 18:16:59,636 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-28.pt 2023-12-20 18:17:19,793 INFO [train.py:886] (0/4) Epoch 29, batch 0, loss[loss=0.04024, audio_tagging_loss=0.04024, over 21184.00 frames. ], tot_loss[loss=0.04024, audio_tagging_loss=0.04024, over 21184.00 frames. ], batch size: 106, lr: 1.57e-02, grad_scale: 32.0 2023-12-20 18:17:19,794 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:17:30,691 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.3502, 2.1428, 2.1497, 2.2633], device='cuda:0') 2023-12-20 18:17:32,010 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8934, 1.6188, 1.4361, 1.7435, 1.7700, 1.8323, 1.6686, 1.6692], device='cuda:0') 2023-12-20 18:17:40,756 INFO [train.py:917] (0/4) Epoch 29, validation: loss=0.04276, audio_tagging_loss=0.04276, over 3737520.00 frames. 2023-12-20 18:17:40,756 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:17:40,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=9706.666666666666, ans=0.125 2023-12-20 18:17:46,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=11.14 2023-12-20 18:17:51,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=9773.333333333334, ans=0.125 2023-12-20 18:18:06,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.31 vs. limit=9.92 2023-12-20 18:18:29,139 INFO [train.py:886] (0/4) Epoch 29, batch 50, loss[loss=0.02937, audio_tagging_loss=0.02937, over 25000.00 frames. ], tot_loss[loss=0.03019, audio_tagging_loss=0.03019, over 1113979.63 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0 2023-12-20 18:18:29,998 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.044e+01 4.177e+01 4.600e+01 5.564e+01 7.757e+01, threshold=9.200e+01, percent-clipped=0.0 2023-12-20 18:18:31,794 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-29.pt 2023-12-20 18:18:51,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=10053.333333333334, ans=11.27 2023-12-20 18:18:51,716 INFO [train.py:886] (0/4) Epoch 30, batch 0, loss[loss=0.02997, audio_tagging_loss=0.02997, over 25000.00 frames. ], tot_loss[loss=0.02997, audio_tagging_loss=0.02997, over 25000.00 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 32.0 2023-12-20 18:18:51,718 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:19:02,198 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.9173, 3.2308, 2.6870, 2.7931], device='cuda:0') 2023-12-20 18:19:04,238 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.8203, 2.9910, 2.5508, 2.7330], device='cuda:0') 2023-12-20 18:19:12,597 INFO [train.py:917] (0/4) Epoch 30, validation: loss=0.04346, audio_tagging_loss=0.04346, over 3737520.00 frames. 2023-12-20 18:19:12,597 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:19:17,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=10053.333333333334, ans=0.008684057971014493 2023-12-20 18:19:32,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=10186.666666666666, ans=0.125 2023-12-20 18:19:51,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.31 vs. limit=7.58 2023-12-20 18:19:59,926 INFO [train.py:886] (0/4) Epoch 30, batch 50, loss[loss=0.03089, audio_tagging_loss=0.03089, over 25000.00 frames. ], tot_loss[loss=0.02924, audio_tagging_loss=0.02924, over 1119814.92 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 32.0 2023-12-20 18:20:00,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=10386.666666666666, ans=0.00861159420289855 2023-12-20 18:20:02,523 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-30.pt 2023-12-20 18:20:22,368 INFO [train.py:886] (0/4) Epoch 31, batch 0, loss[loss=0.02519, audio_tagging_loss=0.02519, over 25000.00 frames. ], tot_loss[loss=0.02519, audio_tagging_loss=0.02519, over 25000.00 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 32.0 2023-12-20 18:20:22,369 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:20:32,853 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.0413, 1.7855, 1.6880, 1.7358, 1.6589, 1.6005, 1.7256, 1.6501], device='cuda:0') 2023-12-20 18:20:33,467 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.9786, 2.4411, 2.6853, 2.5094], device='cuda:0') 2023-12-20 18:20:43,506 INFO [train.py:917] (0/4) Epoch 31, validation: loss=0.04363, audio_tagging_loss=0.04363, over 3737520.00 frames. 2023-12-20 18:20:43,506 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:20:50,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=10400.0, ans=0.125 2023-12-20 18:21:12,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=10600.0, ans=0.5290000000000001 2023-12-20 18:21:16,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=10600.0, ans=0.125 2023-12-20 18:21:17,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=10600.0, ans=0.125 2023-12-20 18:21:21,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=10666.666666666666, ans=0.022222222222222227 2023-12-20 18:21:27,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=10666.666666666666, ans=0.125 2023-12-20 18:21:29,123 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.385e+01 4.278e+01 4.904e+01 5.799e+01 1.168e+02, threshold=9.808e+01, percent-clipped=2.0 2023-12-20 18:21:31,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=10733.333333333334, ans=0.125 2023-12-20 18:21:31,828 INFO [train.py:886] (0/4) Epoch 31, batch 50, loss[loss=0.02927, audio_tagging_loss=0.02927, over 25000.00 frames. ], tot_loss[loss=0.02757, audio_tagging_loss=0.02757, over 1120930.87 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 32.0 2023-12-20 18:21:34,400 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-31.pt 2023-12-20 18:21:54,495 INFO [train.py:886] (0/4) Epoch 32, batch 0, loss[loss=0.03373, audio_tagging_loss=0.03373, over 21045.00 frames. ], tot_loss[loss=0.03373, audio_tagging_loss=0.03373, over 21045.00 frames. ], batch size: 106, lr: 1.49e-02, grad_scale: 32.0 2023-12-20 18:21:54,496 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:22:11,646 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.1590, 1.9765, 1.8990, 2.0357, 1.9304, 1.9888, 1.7659, 1.8450], device='cuda:0') 2023-12-20 18:22:15,981 INFO [train.py:917] (0/4) Epoch 32, validation: loss=0.04494, audio_tagging_loss=0.04494, over 3737520.00 frames. 2023-12-20 18:22:15,981 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:22:40,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=10880.0, ans=0.125 2023-12-20 18:22:41,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=10880.0, ans=0.125 2023-12-20 18:22:50,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=10946.666666666666, ans=0.02105555555555556 2023-12-20 18:22:57,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=11013.333333333334, ans=0.020777777777777773 2023-12-20 18:22:59,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=11013.333333333334, ans=0.020777777777777773 2023-12-20 18:22:59,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=11013.333333333334, ans=0.125 2023-12-20 18:23:02,778 INFO [train.py:886] (0/4) Epoch 32, batch 50, loss[loss=0.02527, audio_tagging_loss=0.02527, over 25000.00 frames. ], tot_loss[loss=0.02677, audio_tagging_loss=0.02677, over 1118193.29 frames. ], batch size: 100, lr: 1.49e-02, grad_scale: 32.0 2023-12-20 18:23:05,536 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-32.pt 2023-12-20 18:23:24,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11093.333333333334, ans=0.18906666666666666 2023-12-20 18:23:25,183 INFO [train.py:886] (0/4) Epoch 33, batch 0, loss[loss=0.03955, audio_tagging_loss=0.03955, over 19902.00 frames. ], tot_loss[loss=0.03955, audio_tagging_loss=0.03955, over 19902.00 frames. ], batch size: 106, lr: 1.47e-02, grad_scale: 32.0 2023-12-20 18:23:25,184 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:23:46,127 INFO [train.py:917] (0/4) Epoch 33, validation: loss=0.0459, audio_tagging_loss=0.0459, over 3737520.00 frames. 2023-12-20 18:23:46,128 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:24:00,825 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=11.684999999999999 2023-12-20 18:24:01,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=11160.0, ans=0.008443478260869565 2023-12-20 18:24:02,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=11160.0, ans=0.5094000000000001 2023-12-20 18:24:14,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=8.517333333333333 2023-12-20 18:24:18,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=11293.333333333334, ans=0.019611111111111107 2023-12-20 18:24:23,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=11360.0, ans=0.09492000000000002 2023-12-20 18:24:24,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=11360.0, ans=0.125 2023-12-20 18:24:25,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.76 vs. limit=11.76 2023-12-20 18:24:26,653 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.339e+01 4.449e+01 5.027e+01 5.967e+01 1.050e+02, threshold=1.005e+02, percent-clipped=1.0 2023-12-20 18:24:30,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=11360.0, ans=0.09899494936611666 2023-12-20 18:24:33,022 INFO [train.py:886] (0/4) Epoch 33, batch 50, loss[loss=0.02483, audio_tagging_loss=0.02483, over 25000.00 frames. ], tot_loss[loss=0.02621, audio_tagging_loss=0.02621, over 1116264.57 frames. ], batch size: 100, lr: 1.47e-02, grad_scale: 32.0 2023-12-20 18:24:35,600 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-33.pt 2023-12-20 18:24:54,840 INFO [train.py:886] (0/4) Epoch 34, batch 0, loss[loss=0.02383, audio_tagging_loss=0.02383, over 25000.00 frames. ], tot_loss[loss=0.02383, audio_tagging_loss=0.02383, over 25000.00 frames. ], batch size: 100, lr: 1.44e-02, grad_scale: 32.0 2023-12-20 18:24:54,841 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:25:16,066 INFO [train.py:917] (0/4) Epoch 34, validation: loss=0.0463, audio_tagging_loss=0.0463, over 3737520.00 frames. 2023-12-20 18:25:16,067 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:25:18,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=11440.0, ans=0.49960000000000004 2023-12-20 18:25:20,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.67 vs. limit=7.859999999999999 2023-12-20 18:25:28,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=11.815000000000001 2023-12-20 18:25:30,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=11506.666666666666, ans=0.07 2023-12-20 18:25:32,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11506.666666666666, ans=0.18493333333333334 2023-12-20 18:25:55,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.51 vs. limit=11.89 2023-12-20 18:26:02,682 INFO [train.py:886] (0/4) Epoch 34, batch 50, loss[loss=0.02391, audio_tagging_loss=0.02391, over 25000.00 frames. ], tot_loss[loss=0.02526, audio_tagging_loss=0.02526, over 1126473.49 frames. ], batch size: 100, lr: 1.44e-02, grad_scale: 32.0 2023-12-20 18:26:05,334 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-34.pt 2023-12-20 18:26:24,402 INFO [train.py:886] (0/4) Epoch 35, batch 0, loss[loss=0.02487, audio_tagging_loss=0.02487, over 25000.00 frames. ], tot_loss[loss=0.02487, audio_tagging_loss=0.02487, over 25000.00 frames. ], batch size: 100, lr: 1.42e-02, grad_scale: 32.0 2023-12-20 18:26:24,403 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:26:43,807 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.4431, 1.8780, 1.7333, 2.1452, 2.0297, 1.9219, 1.8128, 1.9415], device='cuda:0') 2023-12-20 18:26:45,181 INFO [train.py:917] (0/4) Epoch 35, validation: loss=0.04736, audio_tagging_loss=0.04736, over 3737520.00 frames. 2023-12-20 18:26:45,182 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:26:52,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.26 vs. limit=16.34 2023-12-20 18:27:08,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.76 vs. limit=5.0 2023-12-20 18:27:14,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=11986.666666666666, ans=0.2 2023-12-20 18:27:17,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=11986.666666666666, ans=0.8698666666666667 2023-12-20 18:27:20,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.43 vs. limit=16.490000000000002 2023-12-20 18:27:23,466 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.764e+01 4.533e+01 5.198e+01 5.955e+01 1.043e+02, threshold=1.040e+02, percent-clipped=1.0 2023-12-20 18:27:23,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=12.02 2023-12-20 18:27:33,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=12120.0, ans=0.07 2023-12-20 18:27:33,755 INFO [train.py:886] (0/4) Epoch 35, batch 50, loss[loss=0.02525, audio_tagging_loss=0.02525, over 25000.00 frames. ], tot_loss[loss=0.02462, audio_tagging_loss=0.02462, over 1115336.65 frames. ], batch size: 100, lr: 1.42e-02, grad_scale: 32.0 2023-12-20 18:27:36,405 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-35.pt 2023-12-20 18:27:55,036 INFO [train.py:886] (0/4) Epoch 36, batch 0, loss[loss=0.02167, audio_tagging_loss=0.02167, over 25000.00 frames. ], tot_loss[loss=0.02167, audio_tagging_loss=0.02167, over 25000.00 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-20 18:27:55,037 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:28:11,326 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.6698, 2.6626, 2.7448, 2.4475], device='cuda:0') 2023-12-20 18:28:16,077 INFO [train.py:917] (0/4) Epoch 36, validation: loss=0.04841, audio_tagging_loss=0.04841, over 3737520.00 frames. 2023-12-20 18:28:16,077 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:28:16,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=12.05 2023-12-20 18:28:19,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=12133.333333333334, ans=0.125 2023-12-20 18:28:19,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=12133.333333333334, ans=0.47533333333333333 2023-12-20 18:28:25,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=12200.0, ans=0.025 2023-12-20 18:28:25,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=12200.0, ans=0.008217391304347826 2023-12-20 18:28:28,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=12200.0, ans=0.015833333333333338 2023-12-20 18:28:29,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=12200.0, ans=0.125 2023-12-20 18:29:03,194 INFO [train.py:886] (0/4) Epoch 36, batch 50, loss[loss=0.02136, audio_tagging_loss=0.02136, over 25000.00 frames. ], tot_loss[loss=0.02372, audio_tagging_loss=0.02372, over 1124068.56 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-20 18:29:05,833 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-36.pt 2023-12-20 18:29:24,448 INFO [train.py:886] (0/4) Epoch 37, batch 0, loss[loss=0.02597, audio_tagging_loss=0.02597, over 24113.00 frames. ], tot_loss[loss=0.02597, audio_tagging_loss=0.02597, over 24113.00 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-20 18:29:24,449 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:29:34,375 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.4701, 3.0234, 3.9278, 3.3843], device='cuda:0') 2023-12-20 18:29:45,682 INFO [train.py:917] (0/4) Epoch 37, validation: loss=0.04928, audio_tagging_loss=0.04928, over 3737520.00 frames. 2023-12-20 18:29:45,682 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:29:46,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=12480.0, ans=0.4632 2023-12-20 18:29:49,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=12480.0, ans=0.125 2023-12-20 18:29:58,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=12546.666666666666, ans=0.125 2023-12-20 18:30:00,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=12546.666666666666, ans=0.125 2023-12-20 18:30:08,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=12613.333333333334, ans=0.45853333333333335 2023-12-20 18:30:12,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=12613.333333333334, ans=0.125 2023-12-20 18:30:19,001 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.554e+01 4.732e+01 5.545e+01 6.466e+01 1.044e+02, threshold=1.109e+02, percent-clipped=1.0 2023-12-20 18:30:27,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=12746.666666666666, ans=0.4538666666666667 2023-12-20 18:30:29,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=12746.666666666666, ans=0.4538666666666667 2023-12-20 18:30:30,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12746.666666666666, ans=0.17253333333333334 2023-12-20 18:30:32,822 INFO [train.py:886] (0/4) Epoch 37, batch 50, loss[loss=0.02143, audio_tagging_loss=0.02143, over 25000.00 frames. ], tot_loss[loss=0.02291, audio_tagging_loss=0.02291, over 1119963.71 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-20 18:30:35,404 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-37.pt 2023-12-20 18:30:55,796 INFO [train.py:886] (0/4) Epoch 38, batch 0, loss[loss=0.01933, audio_tagging_loss=0.01933, over 25000.00 frames. ], tot_loss[loss=0.01933, audio_tagging_loss=0.01933, over 25000.00 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 32.0 2023-12-20 18:30:55,798 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:31:16,998 INFO [train.py:917] (0/4) Epoch 38, validation: loss=0.04916, audio_tagging_loss=0.04916, over 3737520.00 frames. 2023-12-20 18:31:16,998 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:31:20,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=12826.666666666666, ans=0.125 2023-12-20 18:31:21,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=12826.666666666666, ans=0.125 2023-12-20 18:31:22,927 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.56 vs. limit=17.119999999999997 2023-12-20 18:31:25,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=12826.666666666666, ans=0.17173333333333335 2023-12-20 18:31:29,480 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=9.157333333333334 2023-12-20 18:31:31,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=12893.333333333334, ans=0.125 2023-12-20 18:31:41,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=12960.0, ans=0.1704 2023-12-20 18:31:58,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.12 vs. limit=17.32 2023-12-20 18:31:59,779 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.36 vs. limit=12.41 2023-12-20 18:32:04,832 INFO [train.py:886] (0/4) Epoch 38, batch 50, loss[loss=0.02317, audio_tagging_loss=0.02317, over 25000.00 frames. ], tot_loss[loss=0.02151, audio_tagging_loss=0.02151, over 1121650.27 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 32.0 2023-12-20 18:32:05,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.86 vs. limit=5.0 2023-12-20 18:32:07,556 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-38.pt 2023-12-20 18:32:26,412 INFO [train.py:886] (0/4) Epoch 39, batch 0, loss[loss=0.01958, audio_tagging_loss=0.01958, over 25000.00 frames. ], tot_loss[loss=0.01958, audio_tagging_loss=0.01958, over 25000.00 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 32.0 2023-12-20 18:32:26,413 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:32:46,054 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.6049, 2.0022, 1.8330, 2.2014, 2.0728, 2.0848, 2.0333, 2.0218], device='cuda:0') 2023-12-20 18:32:47,553 INFO [train.py:917] (0/4) Epoch 39, validation: loss=0.05058, audio_tagging_loss=0.05058, over 3737520.00 frames. 2023-12-20 18:32:47,553 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:32:58,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=13240.0, ans=0.1676 2023-12-20 18:33:06,420 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=17.48 2023-12-20 18:33:13,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=13306.666666666666, ans=0.125 2023-12-20 18:33:17,347 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.971e+01 5.139e+01 5.911e+01 6.986e+01 1.449e+02, threshold=1.182e+02, percent-clipped=3.0 2023-12-20 18:33:22,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=13373.333333333334, ans=0.00796231884057971 2023-12-20 18:33:26,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=13440.0, ans=0.007947826086956522 2023-12-20 18:33:33,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=18.44 vs. limit=12.54 2023-12-20 18:33:34,115 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=5.016 2023-12-20 18:33:35,338 INFO [train.py:886] (0/4) Epoch 39, batch 50, loss[loss=0.02023, audio_tagging_loss=0.02023, over 25000.00 frames. ], tot_loss[loss=0.02128, audio_tagging_loss=0.02128, over 1123679.49 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 32.0 2023-12-20 18:33:38,012 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-39.pt 2023-12-20 18:33:57,923 INFO [train.py:886] (0/4) Epoch 40, batch 0, loss[loss=0.02848, audio_tagging_loss=0.02848, over 21121.00 frames. ], tot_loss[loss=0.02848, audio_tagging_loss=0.02848, over 21121.00 frames. ], batch size: 106, lr: 1.32e-02, grad_scale: 32.0 2023-12-20 18:33:57,925 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:34:19,046 INFO [train.py:917] (0/4) Epoch 40, validation: loss=0.05208, audio_tagging_loss=0.05208, over 3737520.00 frames. 2023-12-20 18:34:19,046 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:34:21,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=13520.0, ans=0.125 2023-12-20 18:34:27,739 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.77 vs. limit=9.408000000000001 2023-12-20 18:34:36,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.54 vs. limit=11.793333333333333 2023-12-20 18:34:48,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=13720.0, ans=0.125 2023-12-20 18:34:48,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=13720.0, ans=0.009500000000000001 2023-12-20 18:34:54,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=13720.0, ans=0.125 2023-12-20 18:35:06,526 INFO [train.py:886] (0/4) Epoch 40, batch 50, loss[loss=0.02043, audio_tagging_loss=0.02043, over 25000.00 frames. ], tot_loss[loss=0.02035, audio_tagging_loss=0.02035, over 1117882.24 frames. ], batch size: 100, lr: 1.32e-02, grad_scale: 32.0 2023-12-20 18:35:09,119 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-40.pt 2023-12-20 18:35:29,525 INFO [train.py:886] (0/4) Epoch 41, batch 0, loss[loss=0.0203, audio_tagging_loss=0.0203, over 24098.00 frames. ], tot_loss[loss=0.0203, audio_tagging_loss=0.0203, over 24098.00 frames. ], batch size: 100, lr: 1.30e-02, grad_scale: 32.0 2023-12-20 18:35:29,526 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:35:47,359 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8632, 2.3514, 2.2810, 2.4426], device='cuda:0') 2023-12-20 18:35:50,410 INFO [train.py:917] (0/4) Epoch 41, validation: loss=0.05259, audio_tagging_loss=0.05259, over 3737520.00 frames. 2023-12-20 18:35:50,411 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:36:04,486 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.82 vs. limit=17.95 2023-12-20 18:36:16,660 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.775e+01 5.160e+01 5.694e+01 6.780e+01 1.124e+02, threshold=1.139e+02, percent-clipped=0.0 2023-12-20 18:36:16,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=14000.0, ans=0.10999999999999999 2023-12-20 18:36:24,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.39 vs. limit=8.516666666666666 2023-12-20 18:36:28,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=14133.333333333334, ans=0.15866666666666665 2023-12-20 18:36:30,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=14133.333333333334, ans=0.007797101449275362 2023-12-20 18:36:37,878 INFO [train.py:886] (0/4) Epoch 41, batch 50, loss[loss=0.01933, audio_tagging_loss=0.01933, over 25000.00 frames. ], tot_loss[loss=0.01949, audio_tagging_loss=0.01949, over 1119914.65 frames. ], batch size: 100, lr: 1.30e-02, grad_scale: 32.0 2023-12-20 18:36:40,533 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-41.pt 2023-12-20 18:37:00,621 INFO [train.py:886] (0/4) Epoch 42, batch 0, loss[loss=0.02289, audio_tagging_loss=0.02289, over 24125.00 frames. ], tot_loss[loss=0.02289, audio_tagging_loss=0.02289, over 24125.00 frames. ], batch size: 100, lr: 1.29e-02, grad_scale: 32.0 2023-12-20 18:37:00,623 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:37:21,714 INFO [train.py:917] (0/4) Epoch 42, validation: loss=0.0541, audio_tagging_loss=0.0541, over 3737520.00 frames. 2023-12-20 18:37:21,715 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:37:22,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=14213.333333333334, ans=0.15786666666666666 2023-12-20 18:37:34,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=14280.0, ans=0.125 2023-12-20 18:37:41,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=14346.666666666666, ans=0.0 2023-12-20 18:38:08,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=14480.0, ans=0.125 2023-12-20 18:38:09,810 INFO [train.py:886] (0/4) Epoch 42, batch 50, loss[loss=0.017, audio_tagging_loss=0.017, over 25000.00 frames. ], tot_loss[loss=0.01845, audio_tagging_loss=0.01845, over 1123125.59 frames. ], batch size: 100, lr: 1.29e-02, grad_scale: 32.0 2023-12-20 18:38:12,328 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-42.pt 2023-12-20 18:38:32,314 INFO [train.py:886] (0/4) Epoch 43, batch 0, loss[loss=0.01685, audio_tagging_loss=0.01685, over 25000.00 frames. ], tot_loss[loss=0.01685, audio_tagging_loss=0.01685, over 25000.00 frames. ], batch size: 100, lr: 1.27e-02, grad_scale: 32.0 2023-12-20 18:38:32,316 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:38:40,408 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.2689, 2.6713, 2.5553, 2.9498], device='cuda:0') 2023-12-20 18:38:53,030 INFO [train.py:917] (0/4) Epoch 43, validation: loss=0.05602, audio_tagging_loss=0.05602, over 3737520.00 frames. 2023-12-20 18:38:53,031 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:38:53,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=14560.0, ans=0.006000000000000005 2023-12-20 18:39:02,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14626.666666666666, ans=0.15373333333333333 2023-12-20 18:39:03,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=14626.666666666666, ans=0.38806666666666667 2023-12-20 18:39:16,029 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 4.316e+01 5.471e+01 6.063e+01 6.688e+01 1.130e+02, threshold=1.213e+02, percent-clipped=0.0 2023-12-20 18:39:20,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=14693.333333333334, ans=0.15306666666666666 2023-12-20 18:39:32,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=14826.666666666666, ans=0.004888888888888887 2023-12-20 18:39:41,479 INFO [train.py:886] (0/4) Epoch 43, batch 50, loss[loss=0.01795, audio_tagging_loss=0.01795, over 25000.00 frames. ], tot_loss[loss=0.01781, audio_tagging_loss=0.01781, over 1122395.49 frames. ], batch size: 100, lr: 1.27e-02, grad_scale: 32.0 2023-12-20 18:39:44,103 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-43.pt 2023-12-20 18:40:04,352 INFO [train.py:886] (0/4) Epoch 44, batch 0, loss[loss=0.0209, audio_tagging_loss=0.0209, over 21062.00 frames. ], tot_loss[loss=0.0209, audio_tagging_loss=0.0209, over 21062.00 frames. ], batch size: 106, lr: 1.25e-02, grad_scale: 32.0 2023-12-20 18:40:04,353 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:40:25,327 INFO [train.py:917] (0/4) Epoch 44, validation: loss=0.05682, audio_tagging_loss=0.05682, over 3737520.00 frames. 2023-12-20 18:40:25,327 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:40:31,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=14906.666666666666, ans=0.125 2023-12-20 18:40:40,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=14973.333333333334, ans=0.007614492753623189 2023-12-20 18:40:57,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=15106.666666666666, ans=0.125 2023-12-20 18:41:10,309 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.622e-02 2023-12-20 18:41:12,868 INFO [train.py:886] (0/4) Epoch 44, batch 50, loss[loss=0.0179, audio_tagging_loss=0.0179, over 25000.00 frames. ], tot_loss[loss=0.01685, audio_tagging_loss=0.01685, over 1119757.27 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-20 18:41:15,503 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-44.pt 2023-12-20 18:41:35,895 INFO [train.py:886] (0/4) Epoch 45, batch 0, loss[loss=0.02128, audio_tagging_loss=0.02128, over 24092.00 frames. ], tot_loss[loss=0.02128, audio_tagging_loss=0.02128, over 24092.00 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-20 18:41:35,897 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:41:56,905 INFO [train.py:917] (0/4) Epoch 45, validation: loss=0.05811, audio_tagging_loss=0.05811, over 3737520.00 frames. 2023-12-20 18:41:56,905 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:42:04,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=15253.333333333334, ans=0.125 2023-12-20 18:42:04,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=13.219999999999999 2023-12-20 18:42:15,214 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.876e+01 5.082e+01 5.625e+01 6.615e+01 1.122e+02, threshold=1.125e+02, percent-clipped=0.0 2023-12-20 18:42:25,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=15453.333333333334, ans=0.3591333333333333 2023-12-20 18:42:28,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15453.333333333334, ans=0.14546666666666666 2023-12-20 18:42:29,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=15453.333333333334, ans=0.125 2023-12-20 18:42:34,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.62 vs. limit=12.726666666666667 2023-12-20 18:42:44,429 INFO [train.py:886] (0/4) Epoch 45, batch 50, loss[loss=0.01602, audio_tagging_loss=0.01602, over 25000.00 frames. ], tot_loss[loss=0.01649, audio_tagging_loss=0.01649, over 1123708.74 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 64.0 2023-12-20 18:42:47,040 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-45.pt 2023-12-20 18:43:06,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=15600.0, ans=0.0 2023-12-20 18:43:06,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=13.35 2023-12-20 18:43:06,809 INFO [train.py:886] (0/4) Epoch 46, batch 0, loss[loss=0.01656, audio_tagging_loss=0.01656, over 24094.00 frames. ], tot_loss[loss=0.01656, audio_tagging_loss=0.01656, over 24094.00 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0 2023-12-20 18:43:06,810 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:43:18,172 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4531, 2.8162, 2.9612, 2.9419], device='cuda:0') 2023-12-20 18:43:27,879 INFO [train.py:917] (0/4) Epoch 46, validation: loss=0.05956, audio_tagging_loss=0.05956, over 3737520.00 frames. 2023-12-20 18:43:27,880 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:43:36,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=15666.666666666666, ans=10.0 2023-12-20 18:43:38,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=15666.666666666666, ans=0.0 2023-12-20 18:43:39,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=15666.666666666666, ans=0.007463768115942029 2023-12-20 18:43:39,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=15666.666666666666, ans=0.04949747468305833 2023-12-20 18:43:40,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=15666.666666666666, ans=0.14333333333333334 2023-12-20 18:43:48,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=15733.333333333334, ans=0.125 2023-12-20 18:43:53,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=19.3 2023-12-20 18:43:53,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15733.333333333334, ans=0.14266666666666666 2023-12-20 18:43:53,724 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.738e-02 2023-12-20 18:43:57,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.64 vs. limit=12.9 2023-12-20 18:44:07,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.70 vs. limit=19.4 2023-12-20 18:44:15,194 INFO [train.py:886] (0/4) Epoch 46, batch 50, loss[loss=0.01494, audio_tagging_loss=0.01494, over 25000.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 1111902.59 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0 2023-12-20 18:44:17,721 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-46.pt 2023-12-20 18:44:38,146 INFO [train.py:886] (0/4) Epoch 47, batch 0, loss[loss=0.01749, audio_tagging_loss=0.01749, over 20786.00 frames. ], tot_loss[loss=0.01749, audio_tagging_loss=0.01749, over 20786.00 frames. ], batch size: 106, lr: 1.21e-02, grad_scale: 64.0 2023-12-20 18:44:38,148 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:44:48,562 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.4168, 2.9901, 3.4349, 3.0834], device='cuda:0') 2023-12-20 18:44:59,324 INFO [train.py:917] (0/4) Epoch 47, validation: loss=0.06125, audio_tagging_loss=0.06125, over 3737520.00 frames. 2023-12-20 18:44:59,330 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:45:14,000 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 4.428e+01 5.199e+01 5.973e+01 6.776e+01 1.435e+02, threshold=1.195e+02, percent-clipped=1.0 2023-12-20 18:45:32,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=16146.666666666666, ans=0.0 2023-12-20 18:45:37,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=16213.333333333334, ans=13.58 2023-12-20 18:45:46,336 INFO [train.py:886] (0/4) Epoch 47, batch 50, loss[loss=0.01721, audio_tagging_loss=0.01721, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 1120089.74 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-20 18:45:49,034 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-47.pt 2023-12-20 18:46:08,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=16293.333333333334, ans=0.125 2023-12-20 18:46:08,732 INFO [train.py:886] (0/4) Epoch 48, batch 0, loss[loss=0.01592, audio_tagging_loss=0.01592, over 24140.00 frames. ], tot_loss[loss=0.01592, audio_tagging_loss=0.01592, over 24140.00 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-20 18:46:08,737 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:46:29,406 INFO [train.py:917] (0/4) Epoch 48, validation: loss=0.06238, audio_tagging_loss=0.06238, over 3737520.00 frames. 2023-12-20 18:46:29,406 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:46:45,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=16360.0, ans=0.0 2023-12-20 18:47:08,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.35 vs. limit=13.71 2023-12-20 18:47:16,769 INFO [train.py:886] (0/4) Epoch 48, batch 50, loss[loss=0.01331, audio_tagging_loss=0.01331, over 25000.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 1111251.54 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-20 18:47:19,299 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-48.pt 2023-12-20 18:47:37,825 INFO [train.py:886] (0/4) Epoch 49, batch 0, loss[loss=0.01429, audio_tagging_loss=0.01429, over 25000.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 25000.00 frames. ], batch size: 100, lr: 1.18e-02, grad_scale: 64.0 2023-12-20 18:47:37,826 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:47:58,815 INFO [train.py:917] (0/4) Epoch 49, validation: loss=0.06394, audio_tagging_loss=0.06394, over 3737520.00 frames. 2023-12-20 18:47:58,816 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:48:06,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=16640.0, ans=0.007252173913043478 2023-12-20 18:48:09,447 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 4.348e+01 5.324e+01 6.019e+01 6.956e+01 1.317e+02, threshold=1.204e+02, percent-clipped=1.0 2023-12-20 18:48:09,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=16706.666666666668, ans=0.07 2023-12-20 18:48:09,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=16706.666666666668, ans=0.0 2023-12-20 18:48:27,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=16840.0, ans=0.3106 2023-12-20 18:48:41,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=16906.666666666668, ans=0.13093333333333332 2023-12-20 18:48:45,782 INFO [train.py:886] (0/4) Epoch 49, batch 50, loss[loss=0.01555, audio_tagging_loss=0.01555, over 25000.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 1120693.80 frames. ], batch size: 100, lr: 1.18e-02, grad_scale: 64.0 2023-12-20 18:48:48,277 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-49.pt 2023-12-20 18:49:07,492 INFO [train.py:886] (0/4) Epoch 50, batch 0, loss[loss=0.01624, audio_tagging_loss=0.01624, over 21452.00 frames. ], tot_loss[loss=0.01624, audio_tagging_loss=0.01624, over 21452.00 frames. ], batch size: 106, lr: 1.17e-02, grad_scale: 64.0 2023-12-20 18:49:07,494 INFO [train.py:909] (0/4) Computing validation loss 2023-12-20 18:49:17,457 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1026, 2.5460, 2.6543, 2.4564], device='cuda:0') 2023-12-20 18:49:28,226 INFO [train.py:917] (0/4) Epoch 50, validation: loss=0.06678, audio_tagging_loss=0.06678, over 3737520.00 frames. 2023-12-20 18:49:28,226 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14790MB 2023-12-20 18:49:33,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=16986.666666666668, ans=0.0 2023-12-20 18:49:58,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=17186.666666666668, ans=0.04949747468305833 2023-12-20 18:50:04,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=17186.666666666668, ans=0.0 2023-12-20 18:50:13,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=17253.333333333332, ans=0.0 2023-12-20 18:50:15,461 INFO [train.py:886] (0/4) Epoch 50, batch 50, loss[loss=0.01225, audio_tagging_loss=0.01225, over 25000.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 1120277.15 frames. ], batch size: 100, lr: 1.17e-02, grad_scale: 32.0 2023-12-20 18:50:18,670 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-50.pt 2023-12-20 18:50:22,488 INFO [train.py:1099] (0/4) Done!