2023-12-21 10:53:09,281 INFO [train.py:953] (0/4) Training started 2023-12-21 10:53:09,285 INFO [train.py:963] (0/4) Device: cuda:0 2023-12-21 10:53:09,286 INFO [train.py:965] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '0.0.0+unknown.version', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'audio_tagging', 'icefall-git-sha1': 'bd01c212-dirty', 'icefall-git-date': 'Tue Dec 19 17:20:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_audio_tagging', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/lhotse_development/lhotse_at/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-2-1207150844-f49d8c4f4-c49d5', 'IP address': '10.177.22.19'}, 'world_size': 4, 'master_port': 13455, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp_at_as_full'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'num_events': 527, 'audioset_subset': 'full', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures'} 2023-12-21 10:53:09,286 INFO [train.py:967] (0/4) About to create model 2023-12-21 10:53:14,394 INFO [train.py:971] (0/4) Number of model parameters: 64264454 2023-12-21 10:53:17,119 INFO [train.py:986] (0/4) Using DDP 2023-12-21 10:53:17,373 INFO [at_datamodule.py:398] (0/4) About to get the audioset cuts for KD. 2023-12-21 10:53:17,450 INFO [at_datamodule.py:223] (0/4) Enable MUSAN 2023-12-21 10:53:17,450 INFO [at_datamodule.py:224] (0/4) About to get Musan cuts 2023-12-21 10:53:19,865 INFO [at_datamodule.py:248] (0/4) Enable SpecAugment 2023-12-21 10:53:19,865 INFO [at_datamodule.py:249] (0/4) Time warp factor: 80 2023-12-21 10:53:19,865 INFO [at_datamodule.py:259] (0/4) Num frame mask: 10 2023-12-21 10:53:19,865 INFO [at_datamodule.py:272] (0/4) About to create train dataset 2023-12-21 10:53:19,866 INFO [at_datamodule.py:299] (0/4) Using DynamicBucketingSampler. 2023-12-21 10:53:21,881 INFO [at_datamodule.py:315] (0/4) About to create train dataloader 2023-12-21 10:53:21,882 INFO [at_datamodule.py:410] (0/4) About to get test-other cuts 2023-12-21 10:53:21,886 INFO [at_datamodule.py:346] (0/4) About to create dev dataset 2023-12-21 10:53:22,343 INFO [at_datamodule.py:363] (0/4) About to create dev dataloader 2023-12-21 10:53:49,464 INFO [train.py:886] (0/4) Epoch 1, batch 0, loss[loss=1.851, audio_tagging_loss=1.851, over 24026.00 frames. ], tot_loss[loss=1.851, audio_tagging_loss=1.851, over 24026.00 frames. ], batch size: 100, lr: 2.25e-02, grad_scale: 2.0 2023-12-21 10:53:49,465 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 10:54:14,725 INFO [train.py:917] (0/4) Epoch 1, validation: loss=1.716, audio_tagging_loss=1.716, over 3737520.00 frames. 2023-12-21 10:54:14,726 INFO [train.py:918] (0/4) Maximum memory allocated so far is 13114MB 2023-12-21 10:54:18,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=7.5 2023-12-21 10:54:23,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.82 vs. limit=7.5 2023-12-21 10:54:25,383 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+02 8.487e+02 9.999e+02 1.363e+03 1.706e+03, threshold=4.000e+03, percent-clipped=0.0 2023-12-21 10:54:25,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=66.66666666666667, ans=0.24933333333333332 2023-12-21 10:54:28,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=66.66666666666667, ans=5.041666666666667 2023-12-21 10:54:29,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.94 vs. limit=7.525 2023-12-21 10:54:29,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=66.66666666666667, ans=0.496875 2023-12-21 10:54:36,994 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.136e+01 2.542e+02 7.819e+02 1.187e+03 1.783e+03, threshold=3.128e+03, percent-clipped=0.0 2023-12-21 10:54:37,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=511.85 vs. limit=7.6 2023-12-21 10:54:39,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=133.33333333333334, ans=0.5 2023-12-21 10:54:41,006 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=511.72 vs. limit=7.6 2023-12-21 10:54:51,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=374.24 vs. limit=7.575 2023-12-21 10:54:52,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=383.89 vs. limit=7.65 2023-12-21 10:55:01,179 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.709e+01 9.013e+01 2.542e+02 8.019e+02 1.783e+03, threshold=1.017e+03, percent-clipped=0.0 2023-12-21 10:55:03,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=266.6666666666667, ans=0.29733333333333334 2023-12-21 10:55:04,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=207.77 vs. limit=7.6 2023-12-21 10:55:10,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=252.43 vs. limit=7.6 2023-12-21 10:55:13,130 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=41.64 vs. limit=7.625 2023-12-21 10:55:13,424 INFO [train.py:886] (0/4) Epoch 1, batch 50, loss[loss=0.0458, audio_tagging_loss=0.0458, over 25000.00 frames. ], tot_loss[loss=0.2952, audio_tagging_loss=0.2952, over 1118148.87 frames. ], batch size: 100, lr: 2.48e-02, grad_scale: 2.0 2023-12-21 10:55:22,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=276.05 vs. limit=7.625 2023-12-21 10:55:24,491 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=275.68 vs. limit=7.65 2023-12-21 10:55:25,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=400.0, ans=0.48125 2023-12-21 10:55:25,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=400.0, ans=0.296 2023-12-21 10:55:29,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=373.30 vs. limit=7.8 2023-12-21 10:55:33,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=275.43 vs. limit=7.65 2023-12-21 10:55:34,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=25.11 vs. limit=7.8 2023-12-21 10:55:36,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=4.16 2023-12-21 10:55:37,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=221.49 vs. limit=7.675 2023-12-21 10:55:40,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=466.6666666666667, ans=0.1825 2023-12-21 10:55:41,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=466.6666666666667, ans=0.29533333333333334 2023-12-21 10:55:43,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.75 vs. limit=7.85 2023-12-21 10:55:45,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=11.29 vs. limit=5.116666666666666 2023-12-21 10:55:47,694 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=338.83 vs. limit=5.233333333333333 2023-12-21 10:55:50,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=533.3333333333334, ans=0.8813333333333333 2023-12-21 10:55:51,079 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=181.63 vs. limit=7.7 2023-12-21 10:55:52,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=533.3333333333334, ans=0.43333333333333335 2023-12-21 10:55:53,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=354.23 vs. limit=7.9 2023-12-21 10:56:04,841 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=286.91 vs. limit=7.725 2023-12-21 10:56:05,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=600.0, ans=0.425 2023-12-21 10:56:09,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=43.29 vs. limit=5.15 2023-12-21 10:56:12,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=3.09 2023-12-21 10:56:13,354 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=136.55 vs. limit=7.725 2023-12-21 10:56:15,147 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.696e+01 2.804e+01 4.984e+01 1.709e+02 1.783e+03, threshold=9.968e+01, percent-clipped=0.0 2023-12-21 10:56:15,176 INFO [train.py:886] (0/4) Epoch 1, batch 100, loss[loss=0.03291, audio_tagging_loss=0.03291, over 25000.00 frames. ], tot_loss[loss=0.1534, audio_tagging_loss=0.1534, over 1963683.14 frames. ], batch size: 100, lr: 2.70e-02, grad_scale: 4.0 2023-12-21 10:56:20,491 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=82.47 vs. limit=5.166666666666667 2023-12-21 10:56:24,011 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=234.79 vs. limit=5.333333333333333 2023-12-21 10:56:29,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=733.3333333333334, ans=0.465625 2023-12-21 10:56:31,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=733.3333333333334, ans=0.5 2023-12-21 10:56:31,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=361.63 vs. limit=8.05 2023-12-21 10:56:37,680 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=183.11 vs. limit=7.775 2023-12-21 10:56:38,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=4.32 2023-12-21 10:56:41,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=284.28 vs. limit=7.8 2023-12-21 10:56:42,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=24.70 vs. limit=7.8 2023-12-21 10:56:43,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=302.56 vs. limit=7.8 2023-12-21 10:56:54,737 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=233.31 vs. limit=7.825 2023-12-21 10:57:03,551 WARNING [optim.py:500] (0/4) Scaling gradients by 0.07864928245544434, model_norm_threshold=99.68033599853516 2023-12-21 10:57:03,696 WARNING [optim.py:572] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.7.weight with proportion 0.45, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.166e+05, grad_sumsq=5.753e+08, orig_rms_sq=1.246e-03 2023-12-21 10:57:08,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=933.3333333333334, ans=0.45625 2023-12-21 10:57:10,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=200.49 vs. limit=8.2 2023-12-21 10:57:15,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=178.99 vs. limit=7.875 2023-12-21 10:57:16,111 INFO [train.py:886] (0/4) Epoch 1, batch 150, loss[loss=0.02982, audio_tagging_loss=0.02982, over 24750.00 frames. ], tot_loss[loss=0.1046, audio_tagging_loss=0.1046, over 2629225.21 frames. ], batch size: 99, lr: 2.93e-02, grad_scale: 2.0 2023-12-21 10:57:27,472 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=48.56 vs. limit=7.875 2023-12-21 10:57:28,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=422.72 vs. limit=7.9 2023-12-21 10:57:29,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1066.6666666666667, ans=0.45 2023-12-21 10:57:31,311 WARNING [optim.py:500] (0/4) Scaling gradients by 0.0951763167977333, model_norm_threshold=99.68033599853516 2023-12-21 10:57:31,456 WARNING [optim.py:572] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.7.weight with proportion 0.44, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.792e+05, grad_sumsq=3.739e+08, orig_rms_sq=1.282e-03 2023-12-21 10:57:34,338 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=16.72 vs. limit=4.426666666666667 2023-12-21 10:57:41,079 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=16.23 vs. limit=4.453333333333333 2023-12-21 10:57:41,459 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=4.453333333333333 2023-12-21 10:57:47,820 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=81.46 vs. limit=7.925 2023-12-21 10:57:56,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1200.0, ans=0.44375 2023-12-21 10:58:03,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=140.89 vs. limit=8.4 2023-12-21 10:58:03,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=174.10 vs. limit=8.4 2023-12-21 10:58:11,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=29.05 vs. limit=8.45 2023-12-21 10:58:13,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1266.6666666666667, ans=5.316666666666666 2023-12-21 10:58:19,088 INFO [train.py:886] (0/4) Epoch 1, batch 200, loss[loss=0.03024, audio_tagging_loss=0.03024, over 25000.00 frames. ], tot_loss[loss=0.07973, audio_tagging_loss=0.07973, over 3147317.77 frames. ], batch size: 100, lr: 3.15e-02, grad_scale: 4.0 2023-12-21 10:58:20,186 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.593e+01 2.983e+01 3.603e+01 1.267e+03, threshold=5.966e+01, percent-clipped=10.0 2023-12-21 10:58:20,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1333.3333333333333, ans=5.833333333333333 2023-12-21 10:58:30,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1400.0, ans=0.434375 2023-12-21 10:58:31,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1400.0, ans=0.325 2023-12-21 10:58:33,155 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=130.96 vs. limit=8.025 2023-12-21 10:58:43,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1466.6666666666667, ans=0.09083333333333334 2023-12-21 10:58:48,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=19.41 vs. limit=5.366666666666667 2023-12-21 10:58:49,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1466.6666666666667, ans=0.43125 2023-12-21 10:58:51,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=1466.6666666666667, ans=5.366666666666667 2023-12-21 10:58:54,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1533.3333333333333, ans=5.958333333333333 2023-12-21 10:59:00,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=46.06 vs. limit=8.65 2023-12-21 10:59:05,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=139.99 vs. limit=8.075 2023-12-21 10:59:05,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.22 vs. limit=8.075 2023-12-21 10:59:08,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=26.56 vs. limit=8.1 2023-12-21 10:59:22,004 INFO [train.py:886] (0/4) Epoch 1, batch 250, loss[loss=0.03272, audio_tagging_loss=0.03272, over 25000.00 frames. ], tot_loss[loss=0.06471, audio_tagging_loss=0.06471, over 3553213.18 frames. ], batch size: 100, lr: 3.38e-02, grad_scale: 2.0 2023-12-21 10:59:22,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=69.91 vs. limit=5.833333333333333 2023-12-21 10:59:29,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1666.6666666666667, ans=0.2833333333333333 2023-12-21 10:59:30,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1666.6666666666667, ans=0.1375 2023-12-21 10:59:31,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=207.15 vs. limit=8.125 2023-12-21 10:59:37,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=17.75 vs. limit=5.433333333333334 2023-12-21 10:59:37,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=47.21 vs. limit=8.15 2023-12-21 10:59:38,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=38.70 vs. limit=8.15 2023-12-21 10:59:39,697 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 10:59:44,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=144.93 vs. limit=5.866666666666666 2023-12-21 10:59:46,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1800.0, ans=8.85 2023-12-21 10:59:47,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1800.0, ans=0.837 2023-12-21 10:59:49,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=47.53 vs. limit=4.72 2023-12-21 11:00:01,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1866.6666666666667, ans=0.2813333333333333 2023-12-21 11:00:04,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1866.6666666666667, ans=0.4125 2023-12-21 11:00:18,283 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=162.04 vs. limit=8.95 2023-12-21 11:00:20,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=45.15 vs. limit=5.966666666666667 2023-12-21 11:00:23,449 INFO [train.py:886] (0/4) Epoch 1, batch 300, loss[loss=0.03361, audio_tagging_loss=0.03361, over 25000.00 frames. ], tot_loss[loss=0.05501, audio_tagging_loss=0.05501, over 3865190.36 frames. ], batch size: 100, lr: 3.60e-02, grad_scale: 4.0 2023-12-21 11:00:25,743 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.506e+01 2.969e+01 4.379e+01 2.139e+02, threshold=5.939e+01, percent-clipped=11.0 2023-12-21 11:00:34,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=173.00 vs. limit=8.25 2023-12-21 11:00:39,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2066.6666666666665, ans=0.403125 2023-12-21 11:00:40,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2066.6666666666665, ans=0.22933333333333333 2023-12-21 11:00:41,996 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=40.41 vs. limit=6.033333333333333 2023-12-21 11:00:44,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=170.05 vs. limit=8.275 2023-12-21 11:00:44,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=16.43 vs. limit=5.516666666666667 2023-12-21 11:00:47,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=27.17 vs. limit=8.275 2023-12-21 11:00:48,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2133.3333333333335, ans=0.4 2023-12-21 11:00:49,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=3.32 2023-12-21 11:00:56,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=4.8533333333333335 2023-12-21 11:00:57,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2133.3333333333335, ans=0.043333333333333335 2023-12-21 11:00:59,332 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=54.08 vs. limit=8.3 2023-12-21 11:01:04,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2200.0, ans=0.1175 2023-12-21 11:01:05,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2200.0, ans=0.22499999999999998 2023-12-21 11:01:16,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=5.566666666666666 2023-12-21 11:01:19,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=30.42 vs. limit=8.35 2023-12-21 11:01:23,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.38 vs. limit=4.453333333333333 2023-12-21 11:01:25,855 INFO [train.py:886] (0/4) Epoch 1, batch 350, loss[loss=0.02376, audio_tagging_loss=0.02376, over 24750.00 frames. ], tot_loss[loss=0.04835, audio_tagging_loss=0.04835, over 4106824.32 frames. ], batch size: 99, lr: 3.83e-02, grad_scale: 4.0 2023-12-21 11:01:27,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=255.85 vs. limit=8.375 2023-12-21 11:01:30,995 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=41.35 vs. limit=9.25 2023-12-21 11:01:33,752 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=98.48 vs. limit=8.375 2023-12-21 11:01:34,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2333.3333333333335, ans=0.0475 2023-12-21 11:01:34,973 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=4.933333333333334 2023-12-21 11:01:35,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.11 vs. limit=5.583333333333333 2023-12-21 11:01:37,309 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=28.40 vs. limit=8.4 2023-12-21 11:01:39,495 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=41.00 vs. limit=9.3 2023-12-21 11:01:39,613 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=51.20 vs. limit=9.3 2023-12-21 11:01:42,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2400.0, ans=0.3875 2023-12-21 11:01:44,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=28.36 vs. limit=9.3 2023-12-21 11:01:50,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2466.6666666666665, ans=0.384375 2023-12-21 11:01:56,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=31.33 vs. limit=8.425 2023-12-21 11:02:02,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2533.3333333333335, ans=0.105 2023-12-21 11:02:09,252 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.55 vs. limit=5.633333333333334 2023-12-21 11:02:12,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=41.18 vs. limit=6.266666666666667 2023-12-21 11:02:15,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=71.73 vs. limit=8.475 2023-12-21 11:02:16,905 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.95 vs. limit=9.45 2023-12-21 11:02:20,821 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=241.08 vs. limit=8.475 2023-12-21 11:02:20,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2600.0, ans=5.65 2023-12-21 11:02:20,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=23.76 vs. limit=8.475 2023-12-21 11:02:23,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=37.18 vs. limit=8.475 2023-12-21 11:02:24,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=30.41 vs. limit=6.3 2023-12-21 11:02:28,559 INFO [train.py:886] (0/4) Epoch 1, batch 400, loss[loss=0.02717, audio_tagging_loss=0.02717, over 25000.00 frames. ], tot_loss[loss=0.04302, audio_tagging_loss=0.04302, over 4289814.06 frames. ], batch size: 100, lr: 4.05e-02, grad_scale: 8.0 2023-12-21 11:02:29,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2666.6666666666665, ans=0.2733333333333333 2023-12-21 11:02:30,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=35.85 vs. limit=9.5 2023-12-21 11:02:30,850 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.972e+01 3.451e+01 4.422e+01 2.511e+02, threshold=6.902e+01, percent-clipped=7.0 2023-12-21 11:02:32,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2666.6666666666665, ans=0.24 2023-12-21 11:02:35,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=79.02 vs. limit=8.5 2023-12-21 11:02:47,473 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=22.79 vs. limit=8.525 2023-12-21 11:02:52,527 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=26.19 vs. limit=6.4 2023-12-21 11:03:06,168 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=201.70 vs. limit=8.575 2023-12-21 11:03:07,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=60.18 vs. limit=8.575 2023-12-21 11:03:09,820 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=125.20 vs. limit=8.575 2023-12-21 11:03:13,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2866.6666666666665, ans=0.365625 2023-12-21 11:03:13,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2866.6666666666665, ans=0.035 2023-12-21 11:03:23,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=73.44 vs. limit=8.6 2023-12-21 11:03:23,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=17.07 vs. limit=8.6 2023-12-21 11:03:29,117 INFO [train.py:886] (0/4) Epoch 1, batch 450, loss[loss=0.0273, audio_tagging_loss=0.0273, over 24750.00 frames. ], tot_loss[loss=0.03911, audio_tagging_loss=0.03911, over 4436864.67 frames. ], batch size: 99, lr: 4.28e-02, grad_scale: 8.0 2023-12-21 11:03:33,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3000.0, ans=0.0875 2023-12-21 11:03:43,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3066.6666666666665, ans=0.35625 2023-12-21 11:03:51,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3066.6666666666665, ans=0.35625 2023-12-21 11:04:13,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3200.0, ans=0.35 2023-12-21 11:04:14,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=87.45 vs. limit=8.7 2023-12-21 11:04:14,993 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=16.35 vs. limit=8.7 2023-12-21 11:04:20,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=17.88 vs. limit=8.725 2023-12-21 11:04:22,258 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=75.15 vs. limit=8.725 2023-12-21 11:04:27,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3266.6666666666665, ans=0.346875 2023-12-21 11:04:28,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=31.57 vs. limit=8.725 2023-12-21 11:04:29,729 INFO [train.py:886] (0/4) Epoch 1, batch 500, loss[loss=0.02571, audio_tagging_loss=0.02571, over 24750.00 frames. ], tot_loss[loss=0.03576, audio_tagging_loss=0.03576, over 4556460.99 frames. ], batch size: 99, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:04:30,252 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.09 vs. limit=5.333333333333334 2023-12-21 11:04:31,971 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.916e+01 3.389e+01 4.125e+01 8.969e+01, threshold=6.779e+01, percent-clipped=3.0 2023-12-21 11:04:32,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.03 vs. limit=10.0 2023-12-21 11:04:34,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=33.28 vs. limit=8.75 2023-12-21 11:04:36,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3333.3333333333335, ans=0.07 2023-12-21 11:04:39,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=5.333333333333334 2023-12-21 11:04:39,251 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=10.0 2023-12-21 11:04:43,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3400.0, ans=0.340625 2023-12-21 11:04:45,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=8.775 2023-12-21 11:04:50,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=22.62 vs. limit=6.7 2023-12-21 11:04:51,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=21.59 vs. limit=8.775 2023-12-21 11:04:57,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.63 vs. limit=10.1 2023-12-21 11:05:01,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3466.6666666666665, ans=0.07 2023-12-21 11:05:06,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=92.47 vs. limit=8.825 2023-12-21 11:05:09,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3533.3333333333335, ans=0.334375 2023-12-21 11:05:12,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=34.40 vs. limit=10.15 2023-12-21 11:05:17,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=10.2 2023-12-21 11:05:21,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=8.85 2023-12-21 11:05:31,200 INFO [train.py:886] (0/4) Epoch 1, batch 550, loss[loss=0.02208, audio_tagging_loss=0.02208, over 24750.00 frames. ], tot_loss[loss=0.03339, audio_tagging_loss=0.03339, over 4646160.31 frames. ], batch size: 99, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:05:35,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3666.6666666666665, ans=0.2633333333333333 2023-12-21 11:05:38,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.22 vs. limit=10.25 2023-12-21 11:05:41,905 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.34 vs. limit=10.3 2023-12-21 11:05:42,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=58.26 vs. limit=8.9 2023-12-21 11:05:49,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3733.3333333333335, ans=0.26266666666666666 2023-12-21 11:06:00,256 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=61.73 vs. limit=8.925 2023-12-21 11:06:00,327 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.63 vs. limit=8.925 2023-12-21 11:06:06,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.17 vs. limit=10.4 2023-12-21 11:06:12,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=61.08 vs. limit=8.95 2023-12-21 11:06:18,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.52 vs. limit=10.45 2023-12-21 11:06:23,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.48 vs. limit=5.983333333333333 2023-12-21 11:06:29,516 INFO [train.py:886] (0/4) Epoch 1, batch 600, loss[loss=0.02578, audio_tagging_loss=0.02578, over 24750.00 frames. ], tot_loss[loss=0.03173, audio_tagging_loss=0.03173, over 4716806.42 frames. ], batch size: 99, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:06:31,737 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+01 3.660e+01 5.036e+01 7.229e+01 1.228e+02, threshold=1.007e+02, percent-clipped=27.0 2023-12-21 11:06:33,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=19.59 vs. limit=7.0 2023-12-21 11:06:36,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=54.78 vs. limit=9.0 2023-12-21 11:06:38,897 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=55.80 vs. limit=9.0 2023-12-21 11:06:54,622 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=20.09 vs. limit=7.066666666666666 2023-12-21 11:06:55,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4133.333333333333, ans=0.04944444444444445 2023-12-21 11:06:59,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4133.333333333333, ans=0.009971014492753623 2023-12-21 11:07:03,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.76 vs. limit=10.65 2023-12-21 11:07:07,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.76 vs. limit=10.65 2023-12-21 11:07:11,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4200.0, ans=0.035 2023-12-21 11:07:16,469 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.18 vs. limit=9.1 2023-12-21 11:07:26,194 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=9.1 2023-12-21 11:07:27,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=5.733333333333333 2023-12-21 11:07:27,739 INFO [train.py:886] (0/4) Epoch 1, batch 650, loss[loss=0.02138, audio_tagging_loss=0.02138, over 24750.00 frames. ], tot_loss[loss=0.03039, audio_tagging_loss=0.03039, over 4765351.85 frames. ], batch size: 99, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:07:34,747 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=47.36 vs. limit=9.125 2023-12-21 11:07:35,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4333.333333333333, ans=0.25666666666666665 2023-12-21 11:07:44,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=9.15 2023-12-21 11:07:59,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4466.666666666667, ans=0.04805555555555556 2023-12-21 11:08:00,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4466.666666666667, ans=0.7436666666666667 2023-12-21 11:08:04,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4533.333333333333, ans=0.2875 2023-12-21 11:08:05,506 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=10.9 2023-12-21 11:08:08,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4533.333333333333, ans=0.009884057971014493 2023-12-21 11:08:10,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=36.72 vs. limit=9.2 2023-12-21 11:08:15,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4600.0, ans=0.284375 2023-12-21 11:08:18,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4600.0, ans=0.284375 2023-12-21 11:08:18,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=9.225 2023-12-21 11:08:23,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=9.225 2023-12-21 11:08:24,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.34 vs. limit=10.95 2023-12-21 11:08:26,814 INFO [train.py:886] (0/4) Epoch 1, batch 700, loss[loss=0.02704, audio_tagging_loss=0.02704, over 24750.00 frames. ], tot_loss[loss=0.02914, audio_tagging_loss=0.02914, over 4804044.82 frames. ], batch size: 99, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:08:28,959 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.878e+01 4.940e+01 6.035e+01 7.817e+01 1.849e+02, threshold=1.207e+02, percent-clipped=12.0 2023-12-21 11:08:31,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=38.89 vs. limit=11.0 2023-12-21 11:08:38,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4733.333333333333, ans=0.278125 2023-12-21 11:08:40,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=4733.333333333333, ans=9.275 2023-12-21 11:08:46,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=9.275 2023-12-21 11:08:48,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4800.0, ans=0.275 2023-12-21 11:08:50,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4800.0, ans=0.272 2023-12-21 11:08:54,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4800.0, ans=0.275 2023-12-21 11:08:54,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=9.3 2023-12-21 11:09:01,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=35.83 vs. limit=9.325 2023-12-21 11:09:04,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4866.666666666667, ans=0.00981159420289855 2023-12-21 11:09:13,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4933.333333333333, ans=0.26875 2023-12-21 11:09:21,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=38.41 vs. limit=11.25 2023-12-21 11:09:22,046 INFO [train.py:886] (0/4) Epoch 1, batch 750, loss[loss=0.02056, audio_tagging_loss=0.02056, over 24750.00 frames. ], tot_loss[loss=0.02786, audio_tagging_loss=0.02786, over 4836891.52 frames. ], batch size: 99, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:09:28,224 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.067e+00 2023-12-21 11:09:33,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5066.666666666667, ans=0.2625 2023-12-21 11:09:39,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5066.666666666667, ans=0.009768115942028985 2023-12-21 11:09:39,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=6.026666666666667 2023-12-21 11:09:52,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=5133.333333333333, ans=11.35 2023-12-21 11:09:52,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5133.333333333333, ans=0.259375 2023-12-21 11:10:02,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5200.0, ans=0.25625 2023-12-21 11:10:11,986 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.68 vs. limit=11.45 2023-12-21 11:10:12,010 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.79 vs. limit=6.316666666666666 2023-12-21 11:10:19,869 INFO [train.py:886] (0/4) Epoch 1, batch 800, loss[loss=0.02337, audio_tagging_loss=0.02337, over 25000.00 frames. ], tot_loss[loss=0.02704, audio_tagging_loss=0.02704, over 4864514.31 frames. ], batch size: 100, lr: 4.49e-02, grad_scale: 16.0 2023-12-21 11:10:21,424 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.27 vs. limit=11.5 2023-12-21 11:10:22,010 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.496e+01 3.313e+01 4.173e+01 5.276e+01 1.022e+02, threshold=8.346e+01, percent-clipped=0.0 2023-12-21 11:10:22,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.59 vs. limit=6.333333333333333 2023-12-21 11:10:27,802 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.00 vs. limit=6.333333333333333 2023-12-21 11:10:29,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5400.0, ans=0.246875 2023-12-21 11:10:31,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=9.525 2023-12-21 11:10:36,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=5400.0, ans=0.246875 2023-12-21 11:10:46,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5466.666666666667, ans=0.24375000000000002 2023-12-21 11:10:50,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.07 vs. limit=11.6 2023-12-21 11:11:00,276 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.79 vs. limit=7.766666666666667 2023-12-21 11:11:06,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5600.0, ans=0.043333333333333335 2023-12-21 11:11:08,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=30.42 vs. limit=9.6 2023-12-21 11:11:16,881 INFO [train.py:886] (0/4) Epoch 1, batch 850, loss[loss=0.02448, audio_tagging_loss=0.02448, over 25000.00 frames. ], tot_loss[loss=0.02642, audio_tagging_loss=0.02642, over 4885511.91 frames. ], batch size: 100, lr: 4.49e-02, grad_scale: 16.0 2023-12-21 11:11:19,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5666.666666666667, ans=0.7016666666666667 2023-12-21 11:11:19,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5666.666666666667, ans=0.234375 2023-12-21 11:11:26,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5666.666666666667, ans=0.234375 2023-12-21 11:11:28,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5733.333333333333, ans=0.23125 2023-12-21 11:12:00,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=9.7 2023-12-21 11:12:06,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.97 vs. limit=11.95 2023-12-21 11:12:09,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=5933.333333333333, ans=0.8093333333333333 2023-12-21 11:12:12,827 INFO [train.py:886] (0/4) Epoch 1, batch 900, loss[loss=0.02619, audio_tagging_loss=0.02619, over 24750.00 frames. ], tot_loss[loss=0.02601, audio_tagging_loss=0.02601, over 4900511.96 frames. ], batch size: 99, lr: 4.48e-02, grad_scale: 16.0 2023-12-21 11:12:14,799 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 3.238e+01 4.010e+01 4.970e+01 2.854e+02, threshold=8.021e+01, percent-clipped=5.0 2023-12-21 11:12:46,979 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=19.85 vs. limit=9.825 2023-12-21 11:12:48,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.67 vs. limit=9.825 2023-12-21 11:12:52,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=6.48 2023-12-21 11:13:09,784 INFO [train.py:886] (0/4) Epoch 1, batch 950, loss[loss=0.0242, audio_tagging_loss=0.0242, over 23955.00 frames. ], tot_loss[loss=0.02573, audio_tagging_loss=0.02573, over 4909892.63 frames. ], batch size: 100, lr: 4.48e-02, grad_scale: 16.0 2023-12-21 11:13:16,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=6333.333333333333, ans=0.07 2023-12-21 11:13:26,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=6400.0, ans=0.2 2023-12-21 11:13:29,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=6400.0, ans=0.2 2023-12-21 11:13:35,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.57 vs. limit=6.616666666666667 2023-12-21 11:13:41,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.88 vs. limit=9.925 2023-12-21 11:13:48,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=6533.333333333333, ans=0.04949747468305833 2023-12-21 11:13:52,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=12.45 2023-12-21 11:13:59,175 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=9.975 2023-12-21 11:13:59,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=9.975 2023-12-21 11:14:04,971 INFO [train.py:886] (0/4) Epoch 1, batch 1000, loss[loss=0.02001, audio_tagging_loss=0.02001, over 25000.00 frames. ], tot_loss[loss=0.02524, audio_tagging_loss=0.02524, over 4916130.52 frames. ], batch size: 100, lr: 4.48e-02, grad_scale: 16.0 2023-12-21 11:14:07,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=6666.666666666667, ans=0.1875 2023-12-21 11:14:07,685 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.858e+01 3.323e+01 3.988e+01 7.077e+01, threshold=6.647e+01, percent-clipped=0.0 2023-12-21 11:14:12,127 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=6.666666666666667 2023-12-21 11:14:17,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=6733.333333333333, ans=0.009405797101449275 2023-12-21 11:14:22,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=10.025 2023-12-21 11:14:24,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=10.025 2023-12-21 11:14:25,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.62 vs. limit=10.05 2023-12-21 11:14:30,041 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.87 vs. limit=12.6 2023-12-21 11:14:34,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=6800.0, ans=0.009391304347826087 2023-12-21 11:14:41,064 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.184e+01 2023-12-21 11:14:43,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=6866.666666666667, ans=0.0 2023-12-21 11:14:46,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=6866.666666666667, ans=0.0093768115942029 2023-12-21 11:14:47,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=6866.666666666667, ans=0.17812499999999998 2023-12-21 11:14:50,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=29.02 vs. limit=10.1 2023-12-21 11:14:57,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=6933.333333333333, ans=0.0 2023-12-21 11:14:59,652 INFO [train.py:886] (0/4) Epoch 1, batch 1050, loss[loss=0.02231, audio_tagging_loss=0.02231, over 23963.00 frames. ], tot_loss[loss=0.02474, audio_tagging_loss=0.02474, over 4923367.18 frames. ], batch size: 100, lr: 4.48e-02, grad_scale: 16.0 2023-12-21 11:15:00,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7000.0, ans=0.22999999999999998 2023-12-21 11:15:03,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=7000.0, ans=0.171875 2023-12-21 11:15:09,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=7066.666666666667, ans=0.037222222222222226 2023-12-21 11:15:16,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=7066.666666666667, ans=0.306 2023-12-21 11:15:23,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=7133.333333333333, ans=0.16562500000000002 2023-12-21 11:15:27,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=7133.333333333333, ans=0.22866666666666666 2023-12-21 11:15:44,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=10.225 2023-12-21 11:15:46,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=7266.666666666667, ans=0.13278333333333334 2023-12-21 11:15:46,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.68 vs. limit=5.0 2023-12-21 11:15:50,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=10.225 2023-12-21 11:15:54,677 INFO [train.py:886] (0/4) Epoch 1, batch 1100, loss[loss=0.02041, audio_tagging_loss=0.02041, over 25000.00 frames. ], tot_loss[loss=0.02422, audio_tagging_loss=0.02422, over 4929137.83 frames. ], batch size: 100, lr: 4.48e-02, grad_scale: 16.0 2023-12-21 11:15:56,648 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.590e+01 3.009e+01 3.352e+01 1.810e+02, threshold=6.019e+01, percent-clipped=1.0 2023-12-21 11:15:59,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=10.25 2023-12-21 11:16:04,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=7400.0, ans=0.641 2023-12-21 11:16:06,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.68 vs. limit=8.7 2023-12-21 11:16:10,017 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=10.275 2023-12-21 11:16:14,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.64 vs. limit=6.85 2023-12-21 11:16:17,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=25.78 vs. limit=10.3 2023-12-21 11:16:27,196 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=10.325 2023-12-21 11:16:31,866 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.24 vs. limit=10.325 2023-12-21 11:16:33,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=13.15 2023-12-21 11:16:34,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=7533.333333333333, ans=0.6363333333333334 2023-12-21 11:16:35,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=7533.333333333333, ans=0.14687499999999998 2023-12-21 11:16:37,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=10.35 2023-12-21 11:16:40,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=7600.0, ans=0.14375 2023-12-21 11:16:43,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=7600.0, ans=0.14375 2023-12-21 11:16:48,212 INFO [train.py:886] (0/4) Epoch 1, batch 1150, loss[loss=0.02195, audio_tagging_loss=0.02195, over 25000.00 frames. ], tot_loss[loss=0.02407, audio_tagging_loss=0.02407, over 4933844.15 frames. ], batch size: 100, lr: 4.47e-02, grad_scale: 16.0 2023-12-21 11:16:56,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=7666.666666666667, ans=0.140625 2023-12-21 11:16:57,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.39 vs. limit=10.375 2023-12-21 11:16:58,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=21.62 vs. limit=10.375 2023-12-21 11:17:16,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=7.12 2023-12-21 11:17:20,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=7866.666666666667, ans=0.033888888888888885 2023-12-21 11:17:28,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=7866.666666666667, ans=0.22133333333333333 2023-12-21 11:17:43,356 INFO [train.py:886] (0/4) Epoch 1, batch 1200, loss[loss=0.02587, audio_tagging_loss=0.02587, over 24951.00 frames. ], tot_loss[loss=0.02385, audio_tagging_loss=0.02385, over 4935818.63 frames. ], batch size: 100, lr: 4.47e-02, grad_scale: 32.0 2023-12-21 11:17:45,262 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.423e+01 2.660e+01 3.185e+01 5.087e+01, threshold=5.319e+01, percent-clipped=0.0 2023-12-21 11:17:49,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=8000.0, ans=0.125 2023-12-21 11:17:53,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=8066.666666666667, ans=10.525 2023-12-21 11:17:55,288 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.587e+00 2023-12-21 11:17:55,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.88 vs. limit=9.033333333333333 2023-12-21 11:18:23,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=46.93 vs. limit=10.575 2023-12-21 11:18:37,381 INFO [train.py:886] (0/4) Epoch 1, batch 1250, loss[loss=0.01925, audio_tagging_loss=0.01925, over 24750.00 frames. ], tot_loss[loss=0.02369, audio_tagging_loss=0.02369, over 4931819.26 frames. ], batch size: 99, lr: 4.47e-02, grad_scale: 32.0 2023-12-21 11:18:44,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.15 vs. limit=9.166666666666668 2023-12-21 11:18:55,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=46.00 vs. limit=10.65 2023-12-21 11:19:00,307 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=13.85 2023-12-21 11:19:03,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=8466.666666666666, ans=0.125 2023-12-21 11:19:05,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=10.675 2023-12-21 11:19:13,228 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=10.7 2023-12-21 11:19:30,114 INFO [train.py:886] (0/4) Epoch 1, batch 1300, loss[loss=0.02259, audio_tagging_loss=0.02259, over 24750.00 frames. ], tot_loss[loss=0.02348, audio_tagging_loss=0.02348, over 4931116.87 frames. ], batch size: 99, lr: 4.47e-02, grad_scale: 32.0 2023-12-21 11:19:30,498 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.73 vs. limit=10.75 2023-12-21 11:19:32,767 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.283e+01 2.585e+01 3.133e+01 4.200e+01, threshold=5.169e+01, percent-clipped=0.0 2023-12-21 11:19:37,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.53 vs. limit=7.166666666666666 2023-12-21 11:19:40,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=8666.666666666666, ans=0.03055555555555556 2023-12-21 11:19:44,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=8733.333333333334, ans=0.030277777777777775 2023-12-21 11:20:04,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=8866.666666666666, ans=0.125 2023-12-21 11:20:06,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=8866.666666666666, ans=0.125 2023-12-21 11:20:14,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=8933.333333333334, ans=0.008927536231884059 2023-12-21 11:20:21,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=8933.333333333334, ans=0.21066666666666667 2023-12-21 11:20:24,098 INFO [train.py:886] (0/4) Epoch 1, batch 1350, loss[loss=0.01966, audio_tagging_loss=0.01966, over 25000.00 frames. ], tot_loss[loss=0.02316, audio_tagging_loss=0.02316, over 4934530.23 frames. ], batch size: 100, lr: 4.46e-02, grad_scale: 32.0 2023-12-21 11:20:29,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.01 vs. limit=14.25 2023-12-21 11:20:33,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=10.875 2023-12-21 11:20:47,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=10.925 2023-12-21 11:20:50,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=10.925 2023-12-21 11:20:50,977 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.97 vs. limit=7.283333333333333 2023-12-21 11:20:58,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=9200.0, ans=0.028333333333333335 2023-12-21 11:21:00,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=9200.0, ans=0.125 2023-12-21 11:21:04,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=9200.0, ans=0.028333333333333335 2023-12-21 11:21:08,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=9266.666666666666, ans=0.02805555555555556 2023-12-21 11:21:09,168 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.58 vs. limit=14.45 2023-12-21 11:21:14,547 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.61 vs. limit=14.45 2023-12-21 11:21:16,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=9266.666666666666, ans=0.5756666666666668 2023-12-21 11:21:18,823 INFO [train.py:886] (0/4) Epoch 1, batch 1400, loss[loss=0.02008, audio_tagging_loss=0.02008, over 25000.00 frames. ], tot_loss[loss=0.02274, audio_tagging_loss=0.02274, over 4943477.87 frames. ], batch size: 100, lr: 4.46e-02, grad_scale: 32.0 2023-12-21 11:21:20,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=9333.333333333334, ans=0.0 2023-12-21 11:21:20,784 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.552e+01 2.157e+01 2.468e+01 2.846e+01 4.252e+01, threshold=4.936e+01, percent-clipped=0.0 2023-12-21 11:21:22,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=9333.333333333334, ans=9.666666666666668 2023-12-21 11:21:24,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=9333.333333333334, ans=0.125 2023-12-21 11:21:26,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=9333.333333333334, ans=0.125 2023-12-21 11:21:28,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=26.55 vs. limit=14.55 2023-12-21 11:21:31,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=9400.0, ans=0.027500000000000004 2023-12-21 11:21:32,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=11.025 2023-12-21 11:21:34,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.70 vs. limit=11.025 2023-12-21 11:21:34,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=11.025 2023-12-21 11:21:51,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=9533.333333333334, ans=0.125 2023-12-21 11:22:05,363 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=11.1 2023-12-21 11:22:05,939 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.600e+01 2023-12-21 11:22:11,312 INFO [train.py:886] (0/4) Epoch 1, batch 1450, loss[loss=0.0231, audio_tagging_loss=0.0231, over 24904.00 frames. ], tot_loss[loss=0.02238, audio_tagging_loss=0.02238, over 4947706.82 frames. ], batch size: 100, lr: 4.46e-02, grad_scale: 32.0 2023-12-21 11:22:12,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=9666.666666666666, ans=0.125 2023-12-21 11:22:25,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.06 vs. limit=14.8 2023-12-21 11:22:38,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=9800.0, ans=0.202 2023-12-21 11:22:41,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=18.91 vs. limit=11.175 2023-12-21 11:22:49,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.98 vs. limit=14.9 2023-12-21 11:22:57,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=9933.333333333334, ans=0.15066666666666667 2023-12-21 11:22:59,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=9933.333333333334, ans=0.025277777777777777 2023-12-21 11:23:00,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.85 vs. limit=14.95 2023-12-21 11:23:04,952 INFO [train.py:886] (0/4) Epoch 1, batch 1500, loss[loss=0.02463, audio_tagging_loss=0.02463, over 24750.00 frames. ], tot_loss[loss=0.02223, audio_tagging_loss=0.02223, over 4951009.60 frames. ], batch size: 99, lr: 4.46e-02, grad_scale: 32.0 2023-12-21 11:23:06,866 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.537e+01 2.171e+01 2.518e+01 2.989e+01 4.492e+01, threshold=5.036e+01, percent-clipped=0.0 2023-12-21 11:23:08,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=10000.0, ans=0.2 2023-12-21 11:23:11,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=10000.0, ans=0.125 2023-12-21 11:23:17,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=10066.666666666666, ans=0.5476666666666667 2023-12-21 11:23:20,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=10066.666666666666, ans=0.5476666666666667 2023-12-21 11:23:23,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=10066.666666666666, ans=0.125 2023-12-21 11:23:33,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=8.053333333333335 2023-12-21 11:23:39,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.80 vs. limit=15.15 2023-12-21 11:23:43,010 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=11.325 2023-12-21 11:23:43,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.53 vs. limit=15.15 2023-12-21 11:23:55,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10333.333333333334, ans=0.19666666666666666 2023-12-21 11:23:57,365 INFO [train.py:886] (0/4) Epoch 1, batch 1550, loss[loss=0.02448, audio_tagging_loss=0.02448, over 24750.00 frames. ], tot_loss[loss=0.0222, audio_tagging_loss=0.0222, over 4947427.28 frames. ], batch size: 99, lr: 4.45e-02, grad_scale: 32.0 2023-12-21 11:24:00,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=10333.333333333334, ans=0.14666666666666667 2023-12-21 11:24:01,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=10333.333333333334, ans=0.05 2023-12-21 11:24:03,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=10333.333333333334, ans=0.8533333333333333 2023-12-21 11:24:07,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=10400.0, ans=0.125 2023-12-21 11:24:07,506 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.51 vs. limit=15.3 2023-12-21 11:24:10,495 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=25.93 vs. limit=11.4 2023-12-21 11:24:18,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=10466.666666666666, ans=6.093333333333334 2023-12-21 11:24:19,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.55 vs. limit=15.35 2023-12-21 11:24:24,373 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.57 vs. limit=15.35 2023-12-21 11:24:45,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=10600.0, ans=0.008565217391304348 2023-12-21 11:24:47,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=11.475 2023-12-21 11:24:49,627 INFO [train.py:886] (0/4) Epoch 1, batch 1600, loss[loss=0.01806, audio_tagging_loss=0.01806, over 24750.00 frames. ], tot_loss[loss=0.02216, audio_tagging_loss=0.02216, over 4943897.82 frames. ], batch size: 99, lr: 4.45e-02, grad_scale: 32.0 2023-12-21 11:24:50,173 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.84 vs. limit=7.666666666666666 2023-12-21 11:24:51,541 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.281e+01 2.645e+01 2.930e+01 5.191e+01, threshold=5.289e+01, percent-clipped=1.0 2023-12-21 11:24:55,053 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.66 vs. limit=15.5 2023-12-21 11:24:55,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=10666.666666666666, ans=0.125 2023-12-21 11:24:58,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=10666.666666666666, ans=0.00855072463768116 2023-12-21 11:25:03,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=10733.333333333334, ans=0.035 2023-12-21 11:25:03,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=10733.333333333334, ans=0.02194444444444444 2023-12-21 11:25:09,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=10733.333333333334, ans=0.02194444444444444 2023-12-21 11:25:29,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=23.40 vs. limit=11.575 2023-12-21 11:25:42,707 INFO [train.py:886] (0/4) Epoch 1, batch 1650, loss[loss=0.021, audio_tagging_loss=0.021, over 25000.00 frames. ], tot_loss[loss=0.02191, audio_tagging_loss=0.02191, over 4947590.87 frames. ], batch size: 100, lr: 4.45e-02, grad_scale: 32.0 2023-12-21 11:25:53,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.42 vs. limit=15.8 2023-12-21 11:25:57,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=11.65 2023-12-21 11:26:01,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=11133.333333333334, ans=0.125 2023-12-21 11:26:04,590 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=1.736e+00 2023-12-21 11:26:12,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=11.675 2023-12-21 11:26:24,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.79 vs. limit=11.725 2023-12-21 11:26:25,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=11266.666666666666, ans=0.125 2023-12-21 11:26:30,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=11266.666666666666, ans=0.008420289855072463 2023-12-21 11:26:30,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11266.666666666666, ans=0.18733333333333335 2023-12-21 11:26:33,611 INFO [train.py:886] (0/4) Epoch 1, batch 1700, loss[loss=0.0194, audio_tagging_loss=0.0194, over 24750.00 frames. ], tot_loss[loss=0.0218, audio_tagging_loss=0.0218, over 4954003.66 frames. ], batch size: 99, lr: 4.44e-02, grad_scale: 32.0 2023-12-21 11:26:37,006 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.242e+01 2.541e+01 2.981e+01 4.448e+01, threshold=5.082e+01, percent-clipped=0.0 2023-12-21 11:26:58,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11466.666666666666, ans=0.18533333333333335 2023-12-21 11:27:12,494 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.86 vs. limit=11.825 2023-12-21 11:27:15,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.12 vs. limit=11.825 2023-12-21 11:27:26,578 INFO [train.py:886] (0/4) Epoch 1, batch 1750, loss[loss=0.01935, audio_tagging_loss=0.01935, over 25000.00 frames. ], tot_loss[loss=0.02154, audio_tagging_loss=0.02154, over 4955008.60 frames. ], batch size: 100, lr: 4.44e-02, grad_scale: 32.0 2023-12-21 11:27:27,155 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.13 vs. limit=7.916666666666666 2023-12-21 11:27:32,826 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=11.875 2023-12-21 11:27:33,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=8.666666666666666 2023-12-21 11:27:45,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=11733.333333333334, ans=0.0 2023-12-21 11:27:45,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=11733.333333333334, ans=0.125 2023-12-21 11:28:16,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=11933.333333333334, ans=0.0 2023-12-21 11:28:19,019 INFO [train.py:886] (0/4) Epoch 1, batch 1800, loss[loss=0.02275, audio_tagging_loss=0.02275, over 25000.00 frames. ], tot_loss[loss=0.02139, audio_tagging_loss=0.02139, over 4959443.71 frames. ], batch size: 100, lr: 4.44e-02, grad_scale: 32.0 2023-12-21 11:28:20,988 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.137e+01 2.486e+01 2.786e+01 3.987e+01, threshold=4.972e+01, percent-clipped=0.0 2023-12-21 11:28:21,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=12000.0, ans=0.07 2023-12-21 11:28:29,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=12066.666666666666, ans=0.47766666666666674 2023-12-21 11:28:36,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=12066.666666666666, ans=0.125 2023-12-21 11:28:40,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=12133.333333333334, ans=0.04949747468305833 2023-12-21 11:28:45,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.31 vs. limit=12.05 2023-12-21 11:28:48,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12133.333333333334, ans=0.17866666666666667 2023-12-21 11:28:55,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=8.879999999999999 2023-12-21 11:29:03,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.28 vs. limit=16.7 2023-12-21 11:29:10,293 INFO [train.py:886] (0/4) Epoch 1, batch 1850, loss[loss=0.02025, audio_tagging_loss=0.02025, over 25000.00 frames. ], tot_loss[loss=0.02143, audio_tagging_loss=0.02143, over 4955809.60 frames. ], batch size: 100, lr: 4.43e-02, grad_scale: 32.0 2023-12-21 11:29:10,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=12333.333333333334, ans=0.00818840579710145 2023-12-21 11:29:18,585 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.32 vs. limit=12.125 2023-12-21 11:29:32,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=12.175 2023-12-21 11:29:35,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=12466.666666666666, ans=0.008159420289855073 2023-12-21 11:29:38,658 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.57 vs. limit=16.85 2023-12-21 11:30:03,671 INFO [train.py:886] (0/4) Epoch 1, batch 1900, loss[loss=0.01764, audio_tagging_loss=0.01764, over 24750.00 frames. ], tot_loss[loss=0.02155, audio_tagging_loss=0.02155, over 4941990.74 frames. ], batch size: 99, lr: 4.43e-02, grad_scale: 32.0 2023-12-21 11:30:05,588 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.668e+01 2.185e+01 2.601e+01 3.008e+01 6.428e+01, threshold=5.202e+01, percent-clipped=3.0 2023-12-21 11:30:25,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=12800.0, ans=0.00808695652173913 2023-12-21 11:30:30,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=12800.0, ans=0.125 2023-12-21 11:30:34,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=12800.0, ans=0.1 2023-12-21 11:30:39,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=12.325 2023-12-21 11:30:39,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=12866.666666666666, ans=0.17133333333333334 2023-12-21 11:30:41,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=12866.666666666666, ans=0.125 2023-12-21 11:30:56,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=13000.0, ans=0.012500000000000004 2023-12-21 11:30:57,338 INFO [train.py:886] (0/4) Epoch 1, batch 1950, loss[loss=0.02014, audio_tagging_loss=0.02014, over 25000.00 frames. ], tot_loss[loss=0.02131, audio_tagging_loss=0.02131, over 4942887.76 frames. ], batch size: 100, lr: 4.43e-02, grad_scale: 32.0 2023-12-21 11:30:57,548 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.106e+01 2023-12-21 11:31:18,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.82 vs. limit=17.35 2023-12-21 11:31:41,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.88 vs. limit=17.45 2023-12-21 11:31:41,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=13266.666666666666, ans=0.4356666666666667 2023-12-21 11:31:42,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.22 vs. limit=12.475 2023-12-21 11:31:46,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=13266.666666666666, ans=0.007985507246376812 2023-12-21 11:31:49,734 INFO [train.py:886] (0/4) Epoch 1, batch 2000, loss[loss=0.02267, audio_tagging_loss=0.02267, over 24750.00 frames. ], tot_loss[loss=0.02112, audio_tagging_loss=0.02112, over 4943999.45 frames. ], batch size: 99, lr: 4.42e-02, grad_scale: 32.0 2023-12-21 11:31:51,642 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.518e+01 2.224e+01 2.549e+01 2.855e+01 5.920e+01, threshold=5.098e+01, percent-clipped=1.0 2023-12-21 11:32:04,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=13400.0, ans=0.16599999999999998 2023-12-21 11:32:05,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=13400.0, ans=0.007956521739130435 2023-12-21 11:32:09,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=13400.0, ans=0.125 2023-12-21 11:32:16,153 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=26.07 vs. limit=17.6 2023-12-21 11:32:19,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=13466.666666666666, ans=0.125 2023-12-21 11:32:31,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=12.575 2023-12-21 11:32:44,299 INFO [train.py:886] (0/4) Epoch 1, batch 2050, loss[loss=0.01948, audio_tagging_loss=0.01948, over 25000.00 frames. ], tot_loss[loss=0.02096, audio_tagging_loss=0.02096, over 4944270.86 frames. ], batch size: 100, lr: 4.42e-02, grad_scale: 32.0 2023-12-21 11:32:44,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.72 vs. limit=12.625 2023-12-21 11:32:52,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.01 vs. limit=11.833333333333332 2023-12-21 11:32:54,383 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=8.673e+00 2023-12-21 11:32:56,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=17.8 2023-12-21 11:32:58,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.89 vs. limit=12.65 2023-12-21 11:33:09,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13800.0, ans=0.162 2023-12-21 11:33:10,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=12.675 2023-12-21 11:33:12,254 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=12.675 2023-12-21 11:33:14,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=13800.0, ans=0.41700000000000004 2023-12-21 11:33:17,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=12.7 2023-12-21 11:33:23,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=13866.666666666666, ans=0.125 2023-12-21 11:33:37,328 INFO [train.py:886] (0/4) Epoch 1, batch 2100, loss[loss=0.02428, audio_tagging_loss=0.02428, over 25000.00 frames. ], tot_loss[loss=0.02084, audio_tagging_loss=0.02084, over 4949789.28 frames. ], batch size: 100, lr: 4.42e-02, grad_scale: 32.0 2023-12-21 11:33:39,943 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.625e+01 2.173e+01 2.502e+01 2.826e+01 4.812e+01, threshold=5.003e+01, percent-clipped=0.0 2023-12-21 11:33:42,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=12.75 2023-12-21 11:33:48,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=14066.666666666666, ans=0.125 2023-12-21 11:33:49,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=14066.666666666666, ans=0.008055555555555559 2023-12-21 11:33:59,453 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.89 vs. limit=5.12 2023-12-21 11:34:02,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=14133.333333333334, ans=0.125 2023-12-21 11:34:30,103 INFO [train.py:886] (0/4) Epoch 1, batch 2150, loss[loss=0.02962, audio_tagging_loss=0.02962, over 24942.00 frames. ], tot_loss[loss=0.02091, audio_tagging_loss=0.02091, over 4947983.43 frames. ], batch size: 100, lr: 4.41e-02, grad_scale: 32.0 2023-12-21 11:34:36,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=14333.333333333334, ans=0.15666666666666668 2023-12-21 11:34:36,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=14333.333333333334, ans=10.0 2023-12-21 11:34:54,112 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=8.804e+00 2023-12-21 11:34:58,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14466.666666666666, ans=0.15533333333333335 2023-12-21 11:34:58,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.27 vs. limit=18.35 2023-12-21 11:35:00,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=14466.666666666666, ans=0.025 2023-12-21 11:35:01,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=14533.333333333334, ans=0.125 2023-12-21 11:35:07,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=9.813333333333333 2023-12-21 11:35:09,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14533.333333333334, ans=0.15466666666666667 2023-12-21 11:35:23,985 INFO [train.py:886] (0/4) Epoch 1, batch 2200, loss[loss=0.01937, audio_tagging_loss=0.01937, over 24750.00 frames. ], tot_loss[loss=0.02098, audio_tagging_loss=0.02098, over 4942074.46 frames. ], batch size: 99, lr: 4.41e-02, grad_scale: 32.0 2023-12-21 11:35:25,957 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.344e+01 2.656e+01 2.983e+01 4.042e+01, threshold=5.311e+01, percent-clipped=0.0 2023-12-21 11:35:27,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.15 vs. limit=5.2 2023-12-21 11:35:34,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=13.025 2023-12-21 11:35:36,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=14733.333333333334, ans=0.15266666666666667 2023-12-21 11:35:49,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=13.05 2023-12-21 11:35:56,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.25 vs. limit=18.65 2023-12-21 11:35:59,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=14866.666666666666, ans=0.15133333333333335 2023-12-21 11:36:05,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=14933.333333333334, ans=0.15066666666666667 2023-12-21 11:36:16,360 INFO [train.py:886] (0/4) Epoch 1, batch 2250, loss[loss=0.02366, audio_tagging_loss=0.02366, over 24750.00 frames. ], tot_loss[loss=0.0209, audio_tagging_loss=0.0209, over 4936132.47 frames. ], batch size: 99, lr: 4.40e-02, grad_scale: 64.0 2023-12-21 11:36:41,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=15133.333333333334, ans=0.007579710144927536 2023-12-21 11:36:43,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=15133.333333333334, ans=0.125 2023-12-21 11:36:43,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=15133.333333333334, ans=0.125 2023-12-21 11:36:49,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=15200.0, ans=0.04949747468305833 2023-12-21 11:36:50,943 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.19 vs. limit=18.9 2023-12-21 11:37:04,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.80 vs. limit=5.29 2023-12-21 11:37:08,448 INFO [train.py:886] (0/4) Epoch 1, batch 2300, loss[loss=0.0217, audio_tagging_loss=0.0217, over 25000.00 frames. ], tot_loss[loss=0.02066, audio_tagging_loss=0.02066, over 4937070.80 frames. ], batch size: 100, lr: 4.40e-02, grad_scale: 64.0 2023-12-21 11:37:10,372 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.341e+01 2.591e+01 2.957e+01 4.107e+01, threshold=5.182e+01, percent-clipped=0.0 2023-12-21 11:37:17,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=13.25 2023-12-21 11:37:29,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=19.1 2023-12-21 11:37:37,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=15466.666666666666, ans=0.3586666666666667 2023-12-21 11:38:01,169 INFO [train.py:886] (0/4) Epoch 1, batch 2350, loss[loss=0.01936, audio_tagging_loss=0.01936, over 25000.00 frames. ], tot_loss[loss=0.02043, audio_tagging_loss=0.02043, over 4947256.86 frames. ], batch size: 100, lr: 4.40e-02, grad_scale: 64.0 2023-12-21 11:38:25,331 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=10.32 2023-12-21 11:38:26,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=15800.0, ans=0.347 2023-12-21 11:38:30,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=15800.0, ans=0.437 2023-12-21 11:38:53,114 INFO [train.py:886] (0/4) Epoch 1, batch 2400, loss[loss=0.02179, audio_tagging_loss=0.02179, over 25000.00 frames. ], tot_loss[loss=0.02038, audio_tagging_loss=0.02038, over 4952172.65 frames. ], batch size: 100, lr: 4.39e-02, grad_scale: 64.0 2023-12-21 11:38:54,990 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.350e+01 2.627e+01 2.967e+01 3.953e+01, threshold=5.253e+01, percent-clipped=0.0 2023-12-21 11:39:16,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=16133.333333333334, ans=0.13866666666666666 2023-12-21 11:39:40,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=16266.666666666666, ans=0.04949747468305833 2023-12-21 11:39:40,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=16266.666666666666, ans=0.125 2023-12-21 11:39:44,698 INFO [train.py:886] (0/4) Epoch 1, batch 2450, loss[loss=0.01849, audio_tagging_loss=0.01849, over 25000.00 frames. ], tot_loss[loss=0.02031, audio_tagging_loss=0.02031, over 4960765.09 frames. ], batch size: 100, lr: 4.39e-02, grad_scale: 64.0 2023-12-21 11:39:49,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=16333.333333333334, ans=10.0 2023-12-21 11:39:54,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=16400.0, ans=0.136 2023-12-21 11:40:05,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=16466.666666666668, ans=0.007289855072463768 2023-12-21 11:40:11,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=16466.666666666668, ans=0.125 2023-12-21 11:40:12,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=13.675 2023-12-21 11:40:26,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=16600.0, ans=0.31900000000000006 2023-12-21 11:40:36,789 INFO [train.py:886] (0/4) Epoch 1, batch 2500, loss[loss=0.02225, audio_tagging_loss=0.02225, over 24750.00 frames. ], tot_loss[loss=0.02042, audio_tagging_loss=0.02042, over 4960648.86 frames. ], batch size: 99, lr: 4.38e-02, grad_scale: 64.0 2023-12-21 11:40:38,713 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.472e+01 2.667e+01 3.044e+01 4.269e+01, threshold=5.334e+01, percent-clipped=0.0 2023-12-21 11:40:43,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=16666.666666666668, ans=0.125 2023-12-21 11:40:45,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=16733.333333333332, ans=0.0 2023-12-21 11:40:49,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=16733.333333333332, ans=0.125 2023-12-21 11:40:55,283 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 11:41:02,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=16800.0, ans=0.0072173913043478265 2023-12-21 11:41:07,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16866.666666666668, ans=0.1313333333333333 2023-12-21 11:41:11,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=13.825000000000001 2023-12-21 11:41:21,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=16933.333333333332, ans=0.125 2023-12-21 11:41:27,283 INFO [train.py:886] (0/4) Epoch 1, batch 2550, loss[loss=0.02693, audio_tagging_loss=0.02693, over 21905.00 frames. ], tot_loss[loss=0.02044, audio_tagging_loss=0.02044, over 4953466.74 frames. ], batch size: 107, lr: 4.38e-02, grad_scale: 64.0 2023-12-21 11:41:30,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.59 vs. limit=9.25 2023-12-21 11:41:42,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=23.88 vs. limit=13.9 2023-12-21 11:41:46,814 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.80 vs. limit=20.3 2023-12-21 11:41:54,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=17133.333333333332, ans=0.125 2023-12-21 11:41:56,577 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=5.57 2023-12-21 11:41:58,342 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=4.086e+00 2023-12-21 11:42:02,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=17200.0, ans=0.035 2023-12-21 11:42:05,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=17200.0, ans=0.007130434782608696 2023-12-21 11:42:10,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=10.906666666666666 2023-12-21 11:42:13,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=17266.666666666668, ans=0.125 2023-12-21 11:42:21,182 INFO [train.py:886] (0/4) Epoch 1, batch 2600, loss[loss=0.01787, audio_tagging_loss=0.01787, over 23997.00 frames. ], tot_loss[loss=0.0203, audio_tagging_loss=0.0203, over 4948053.09 frames. ], batch size: 100, lr: 4.37e-02, grad_scale: 64.0 2023-12-21 11:42:23,106 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.528e+01 2.807e+01 3.292e+01 4.352e+01, threshold=5.614e+01, percent-clipped=0.0 2023-12-21 11:42:48,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=14.05 2023-12-21 11:43:02,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=17600.0, ans=0.125 2023-12-21 11:43:06,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=17600.0, ans=0.05 2023-12-21 11:43:12,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=17666.666666666668, ans=0.007028985507246377 2023-12-21 11:43:13,591 INFO [train.py:886] (0/4) Epoch 1, batch 2650, loss[loss=0.01833, audio_tagging_loss=0.01833, over 25000.00 frames. ], tot_loss[loss=0.02012, audio_tagging_loss=0.02012, over 4951174.57 frames. ], batch size: 100, lr: 4.37e-02, grad_scale: 64.0 2023-12-21 11:43:13,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=17666.666666666668, ans=0.125 2023-12-21 11:43:17,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.82 vs. limit=20.75 2023-12-21 11:43:18,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=17666.666666666668, ans=0.125 2023-12-21 11:43:19,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=17666.666666666668, ans=0.125 2023-12-21 11:43:37,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=17800.0, ans=0.125 2023-12-21 11:43:49,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.28 vs. limit=14.2 2023-12-21 11:43:50,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=17866.666666666668, ans=0.0 2023-12-21 11:43:51,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=17866.666666666668, ans=0.0 2023-12-21 11:43:53,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=17866.666666666668, ans=0.2746666666666667 2023-12-21 11:43:54,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.56 vs. limit=20.9 2023-12-21 11:43:54,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.81 vs. limit=20.95 2023-12-21 11:43:55,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=17933.333333333332, ans=0.0 2023-12-21 11:44:05,197 INFO [train.py:886] (0/4) Epoch 1, batch 2700, loss[loss=0.02174, audio_tagging_loss=0.02174, over 22075.00 frames. ], tot_loss[loss=0.01999, audio_tagging_loss=0.01999, over 4955209.85 frames. ], batch size: 107, lr: 4.36e-02, grad_scale: 64.0 2023-12-21 11:44:05,740 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.99 vs. limit=14.25 2023-12-21 11:44:07,123 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.548e+01 2.795e+01 3.093e+01 4.851e+01, threshold=5.589e+01, percent-clipped=0.0 2023-12-21 11:44:12,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.67 vs. limit=14.25 2023-12-21 11:44:27,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=18133.333333333332, ans=0.0 2023-12-21 11:44:36,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.50 vs. limit=9.55 2023-12-21 11:44:42,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=18200.0, ans=0.0 2023-12-21 11:44:42,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=18200.0, ans=0.125 2023-12-21 11:44:58,041 INFO [train.py:886] (0/4) Epoch 1, batch 2750, loss[loss=0.02005, audio_tagging_loss=0.02005, over 24909.00 frames. ], tot_loss[loss=0.01982, audio_tagging_loss=0.01982, over 4958198.87 frames. ], batch size: 100, lr: 4.36e-02, grad_scale: 64.0 2023-12-21 11:44:58,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=18333.333333333332, ans=0.0 2023-12-21 11:45:39,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=18600.0, ans=0.125 2023-12-21 11:45:47,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=18600.0, ans=0.125 2023-12-21 11:45:49,226 INFO [train.py:886] (0/4) Epoch 1, batch 2800, loss[loss=0.02744, audio_tagging_loss=0.02744, over 24950.00 frames. ], tot_loss[loss=0.02002, audio_tagging_loss=0.02002, over 4956355.18 frames. ], batch size: 100, lr: 4.36e-02, grad_scale: 64.0 2023-12-21 11:45:51,130 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.719e+01 3.067e+01 3.329e+01 4.208e+01, threshold=6.133e+01, percent-clipped=0.0 2023-12-21 11:46:13,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=18800.0, ans=0.0 2023-12-21 11:46:15,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18800.0, ans=0.11200000000000002 2023-12-21 11:46:18,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=18800.0, ans=0.11200000000000002 2023-12-21 11:46:18,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=18800.0, ans=0.242 2023-12-21 11:46:23,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=18866.666666666668, ans=0.2396666666666667 2023-12-21 11:46:35,663 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.40 vs. limit=21.7 2023-12-21 11:46:41,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=19000.0, ans=0.125 2023-12-21 11:46:41,864 INFO [train.py:886] (0/4) Epoch 1, batch 2850, loss[loss=0.02085, audio_tagging_loss=0.02085, over 24750.00 frames. ], tot_loss[loss=0.02, audio_tagging_loss=0.02, over 4948938.53 frames. ], batch size: 99, lr: 4.35e-02, grad_scale: 64.0 2023-12-21 11:46:48,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19000.0, ans=0.11000000000000001 2023-12-21 11:46:51,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=21.8 2023-12-21 11:47:07,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19133.333333333332, ans=0.10866666666666669 2023-12-21 11:47:23,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=19266.666666666668, ans=0.125 2023-12-21 11:47:24,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=19266.666666666668, ans=0.22566666666666668 2023-12-21 11:47:25,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19266.666666666668, ans=0.10733333333333334 2023-12-21 11:47:26,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=19266.666666666668, ans=0.22566666666666668 2023-12-21 11:47:30,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=19266.666666666668, ans=0.125 2023-12-21 11:47:35,406 INFO [train.py:886] (0/4) Epoch 1, batch 2900, loss[loss=0.01884, audio_tagging_loss=0.01884, over 24750.00 frames. ], tot_loss[loss=0.01988, audio_tagging_loss=0.01988, over 4946506.64 frames. ], batch size: 99, lr: 4.35e-02, grad_scale: 64.0 2023-12-21 11:47:37,294 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.573e+01 2.906e+01 3.283e+01 4.730e+01, threshold=5.812e+01, percent-clipped=0.0 2023-12-21 11:47:38,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=14.75 2023-12-21 11:47:39,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=19333.333333333332, ans=0.125 2023-12-21 11:47:41,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=19333.333333333332, ans=0.125 2023-12-21 11:47:42,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=19333.333333333332, ans=0.0 2023-12-21 11:47:56,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=19466.666666666668, ans=0.1 2023-12-21 11:48:01,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=19466.666666666668, ans=0.125 2023-12-21 11:48:02,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=19466.666666666668, ans=0.07 2023-12-21 11:48:02,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=14.8 2023-12-21 11:48:20,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=19600.0, ans=0.0 2023-12-21 11:48:22,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=19600.0, ans=0.21399999999999997 2023-12-21 11:48:23,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=14.85 2023-12-21 11:48:26,191 INFO [train.py:886] (0/4) Epoch 1, batch 2950, loss[loss=0.0226, audio_tagging_loss=0.0226, over 25000.00 frames. ], tot_loss[loss=0.01975, audio_tagging_loss=0.01975, over 4940746.06 frames. ], batch size: 100, lr: 4.34e-02, grad_scale: 64.0 2023-12-21 11:48:44,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=11.893333333333333 2023-12-21 11:49:00,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=19866.666666666668, ans=0.125 2023-12-21 11:49:20,066 INFO [train.py:886] (0/4) Epoch 1, batch 3000, loss[loss=0.01814, audio_tagging_loss=0.01814, over 24750.00 frames. ], tot_loss[loss=0.01965, audio_tagging_loss=0.01965, over 4947322.60 frames. ], batch size: 99, lr: 4.34e-02, grad_scale: 64.0 2023-12-21 11:49:20,068 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 11:49:45,414 INFO [train.py:917] (0/4) Epoch 1, validation: loss=0.04441, audio_tagging_loss=0.04441, over 3737520.00 frames. 2023-12-21 11:49:45,415 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 11:49:47,302 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 2.623e+01 2.967e+01 3.286e+01 5.413e+01, threshold=5.933e+01, percent-clipped=0.0 2023-12-21 11:49:54,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=20066.666666666668, ans=0.0 2023-12-21 11:49:58,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=20066.666666666668, ans=0.125 2023-12-21 11:49:59,356 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.91 vs. limit=10.0 2023-12-21 11:50:18,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=20200.0, ans=0.1 2023-12-21 11:50:23,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=20200.0, ans=0.125 2023-12-21 11:50:32,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=20266.666666666668, ans=0.2 2023-12-21 11:50:35,809 INFO [train.py:886] (0/4) Epoch 1, batch 3050, loss[loss=0.01627, audio_tagging_loss=0.01627, over 22074.00 frames. ], tot_loss[loss=0.01958, audio_tagging_loss=0.01958, over 4945109.51 frames. ], batch size: 107, lr: 4.33e-02, grad_scale: 64.0 2023-12-21 11:51:08,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=20533.333333333332, ans=0.125 2023-12-21 11:51:09,699 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2023-12-21 11:51:11,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=20533.333333333332, ans=0.0 2023-12-21 11:51:29,132 INFO [train.py:886] (0/4) Epoch 1, batch 3100, loss[loss=0.01997, audio_tagging_loss=0.01997, over 25000.00 frames. ], tot_loss[loss=0.01953, audio_tagging_loss=0.01953, over 4945212.09 frames. ], batch size: 100, lr: 4.33e-02, grad_scale: 64.0 2023-12-21 11:51:31,048 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.608e+01 2.817e+01 3.164e+01 4.242e+01, threshold=5.634e+01, percent-clipped=0.0 2023-12-21 11:51:34,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=20666.666666666668, ans=0.035 2023-12-21 11:51:37,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=20733.333333333332, ans=0.0 2023-12-21 11:51:38,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.06 vs. limit=22.5 2023-12-21 11:51:57,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=20800.0, ans=0.2 2023-12-21 11:51:59,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=20866.666666666668, ans=0.125 2023-12-21 11:52:20,861 INFO [train.py:886] (0/4) Epoch 1, batch 3150, loss[loss=0.01931, audio_tagging_loss=0.01931, over 24750.00 frames. ], tot_loss[loss=0.0197, audio_tagging_loss=0.0197, over 4940788.84 frames. ], batch size: 99, lr: 4.32e-02, grad_scale: 64.0 2023-12-21 11:52:23,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=21000.0, ans=0.05 2023-12-21 11:52:25,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=21000.0, ans=0.1 2023-12-21 11:52:32,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.95 vs. limit=10.0 2023-12-21 11:52:36,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=21066.666666666668, ans=0.125 2023-12-21 11:52:37,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=21066.666666666668, ans=0.1 2023-12-21 11:52:38,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=21066.666666666668, ans=0.1 2023-12-21 11:52:42,853 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.89 vs. limit=10.0 2023-12-21 11:52:46,598 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2023-12-21 11:52:53,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=21200.0, ans=0.125 2023-12-21 11:53:05,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=21266.666666666668, ans=0.0 2023-12-21 11:53:12,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=21333.333333333332, ans=0.125 2023-12-21 11:53:13,087 INFO [train.py:886] (0/4) Epoch 1, batch 3200, loss[loss=0.01976, audio_tagging_loss=0.01976, over 24750.00 frames. ], tot_loss[loss=0.01959, audio_tagging_loss=0.01959, over 4932107.71 frames. ], batch size: 99, lr: 4.32e-02, grad_scale: 64.0 2023-12-21 11:53:14,992 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.755e+01 2.973e+01 3.408e+01 4.303e+01, threshold=5.945e+01, percent-clipped=0.0 2023-12-21 11:53:22,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=21400.0, ans=0.125 2023-12-21 11:53:24,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=21400.0, ans=0.006217391304347826 2023-12-21 11:53:33,070 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.73 vs. limit=15.0 2023-12-21 11:53:40,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=21466.666666666668, ans=0.1 2023-12-21 11:53:45,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=21533.333333333332, ans=0.07 2023-12-21 11:53:47,080 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.45 vs. limit=15.0 2023-12-21 11:54:01,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=21600.0, ans=0.125 2023-12-21 11:54:01,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=31.93 vs. limit=22.5 2023-12-21 11:54:05,832 INFO [train.py:886] (0/4) Epoch 1, batch 3250, loss[loss=0.02286, audio_tagging_loss=0.02286, over 24750.00 frames. ], tot_loss[loss=0.01953, audio_tagging_loss=0.01953, over 4933343.43 frames. ], batch size: 99, lr: 4.31e-02, grad_scale: 64.0 2023-12-21 11:54:21,307 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.16 vs. limit=22.5 2023-12-21 11:54:27,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=21800.0, ans=0.125 2023-12-21 11:54:44,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=21866.666666666668, ans=0.5 2023-12-21 11:54:46,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=21933.333333333332, ans=0.1 2023-12-21 11:54:56,780 INFO [train.py:886] (0/4) Epoch 1, batch 3300, loss[loss=0.02039, audio_tagging_loss=0.02039, over 25000.00 frames. ], tot_loss[loss=0.01954, audio_tagging_loss=0.01954, over 4932667.34 frames. ], batch size: 100, lr: 4.31e-02, grad_scale: 64.0 2023-12-21 11:54:56,980 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.056e+00 2023-12-21 11:54:59,349 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.622e+01 2.937e+01 3.224e+01 4.411e+01, threshold=5.874e+01, percent-clipped=0.0 2023-12-21 11:55:07,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=22066.666666666668, ans=0.0 2023-12-21 11:55:13,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=22066.666666666668, ans=0.0 2023-12-21 11:55:26,740 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=23.07 vs. limit=15.0 2023-12-21 11:55:29,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2023-12-21 11:55:50,059 INFO [train.py:886] (0/4) Epoch 1, batch 3350, loss[loss=0.0175, audio_tagging_loss=0.0175, over 24750.00 frames. ], tot_loss[loss=0.01943, audio_tagging_loss=0.01943, over 4938331.70 frames. ], batch size: 99, lr: 4.30e-02, grad_scale: 64.0 2023-12-21 11:55:50,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.56 vs. limit=22.5 2023-12-21 11:55:52,185 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=6.036e+00 2023-12-21 11:56:00,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=22400.0, ans=0.125 2023-12-21 11:56:09,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=22400.0, ans=0.125 2023-12-21 11:56:19,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=22466.666666666668, ans=0.0 2023-12-21 11:56:25,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.64 vs. limit=22.5 2023-12-21 11:56:33,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=22600.0, ans=0.005956521739130435 2023-12-21 11:56:43,131 INFO [train.py:886] (0/4) Epoch 1, batch 3400, loss[loss=0.02198, audio_tagging_loss=0.02198, over 25000.00 frames. ], tot_loss[loss=0.01955, audio_tagging_loss=0.01955, over 4946685.31 frames. ], batch size: 100, lr: 4.29e-02, grad_scale: 64.0 2023-12-21 11:56:45,035 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.691e+01 2.947e+01 3.309e+01 4.555e+01, threshold=5.894e+01, percent-clipped=0.0 2023-12-21 11:56:57,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.29 vs. limit=22.5 2023-12-21 11:57:17,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=22866.666666666668, ans=15.0 2023-12-21 11:57:18,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=22866.666666666668, ans=0.125 2023-12-21 11:57:21,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2023-12-21 11:57:29,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=22933.333333333332, ans=0.125 2023-12-21 11:57:33,810 INFO [train.py:886] (0/4) Epoch 1, batch 3450, loss[loss=0.02, audio_tagging_loss=0.02, over 24750.00 frames. ], tot_loss[loss=0.0196, audio_tagging_loss=0.0196, over 4944807.96 frames. ], batch size: 99, lr: 4.29e-02, grad_scale: 64.0 2023-12-21 11:57:42,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=23000.0, ans=0.07 2023-12-21 11:57:48,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=15.0 2023-12-21 11:58:02,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=23133.333333333332, ans=0.0 2023-12-21 11:58:05,966 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.470e-01 2023-12-21 11:58:06,191 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.15 vs. limit=22.5 2023-12-21 11:58:06,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=23200.0, ans=0.125 2023-12-21 11:58:07,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=23200.0, ans=0.125 2023-12-21 11:58:13,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=23200.0, ans=0.2 2023-12-21 11:58:15,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.58 vs. limit=15.0 2023-12-21 11:58:23,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=23333.333333333332, ans=0.005797101449275363 2023-12-21 11:58:23,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.25 vs. limit=22.5 2023-12-21 11:58:24,317 INFO [train.py:886] (0/4) Epoch 1, batch 3500, loss[loss=0.02043, audio_tagging_loss=0.02043, over 24750.00 frames. ], tot_loss[loss=0.01961, audio_tagging_loss=0.01961, over 4939879.37 frames. ], batch size: 99, lr: 4.28e-02, grad_scale: 64.0 2023-12-21 11:58:25,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.88 vs. limit=15.0 2023-12-21 11:58:26,216 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.180e+01 2.654e+01 2.914e+01 3.165e+01 4.933e+01, threshold=5.829e+01, percent-clipped=0.0 2023-12-21 11:58:28,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=23333.333333333332, ans=0.125 2023-12-21 11:58:33,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=23400.0, ans=0.005782608695652174 2023-12-21 11:58:39,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=23400.0, ans=0.1 2023-12-21 11:58:47,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=23466.666666666668, ans=0.035 2023-12-21 11:59:15,523 INFO [train.py:886] (0/4) Epoch 1, batch 3550, loss[loss=0.01644, audio_tagging_loss=0.01644, over 25000.00 frames. ], tot_loss[loss=0.01935, audio_tagging_loss=0.01935, over 4939385.31 frames. ], batch size: 100, lr: 4.28e-02, grad_scale: 64.0 2023-12-21 11:59:16,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=23666.666666666668, ans=0.1 2023-12-21 11:59:17,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=23666.666666666668, ans=0.1 2023-12-21 11:59:32,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=23733.333333333332, ans=0.125 2023-12-21 11:59:39,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=23800.0, ans=0.125 2023-12-21 11:59:45,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=23866.666666666668, ans=0.0 2023-12-21 11:59:53,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2023-12-21 12:00:01,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=23933.333333333332, ans=0.1 2023-12-21 12:00:05,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=24000.0, ans=0.125 2023-12-21 12:00:05,894 INFO [train.py:886] (0/4) Epoch 1, batch 3600, loss[loss=0.01891, audio_tagging_loss=0.01891, over 24750.00 frames. ], tot_loss[loss=0.01913, audio_tagging_loss=0.01913, over 4943351.87 frames. ], batch size: 99, lr: 4.27e-02, grad_scale: 64.0 2023-12-21 12:00:07,819 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.526e+01 2.849e+01 3.295e+01 5.645e+01, threshold=5.698e+01, percent-clipped=0.0 2023-12-21 12:00:10,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=15.0 2023-12-21 12:00:24,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=24066.666666666668, ans=0.125 2023-12-21 12:00:32,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=24133.333333333332, ans=0.125 2023-12-21 12:00:39,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=24200.0, ans=0.125 2023-12-21 12:00:46,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=24266.666666666668, ans=0.125 2023-12-21 12:00:57,626 INFO [train.py:886] (0/4) Epoch 1, batch 3650, loss[loss=0.0189, audio_tagging_loss=0.0189, over 22322.00 frames. ], tot_loss[loss=0.01905, audio_tagging_loss=0.01905, over 4949222.40 frames. ], batch size: 107, lr: 4.27e-02, grad_scale: 64.0 2023-12-21 12:00:59,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=24333.333333333332, ans=0.125 2023-12-21 12:01:01,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=24333.333333333332, ans=0.125 2023-12-21 12:01:14,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=24400.0, ans=0.005565217391304348 2023-12-21 12:01:17,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=24466.666666666668, ans=0.125 2023-12-21 12:01:31,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=24533.333333333332, ans=0.0 2023-12-21 12:01:47,742 INFO [train.py:886] (0/4) Epoch 1, batch 3700, loss[loss=0.01931, audio_tagging_loss=0.01931, over 25000.00 frames. ], tot_loss[loss=0.01914, audio_tagging_loss=0.01914, over 4951644.35 frames. ], batch size: 100, lr: 4.26e-02, grad_scale: 64.0 2023-12-21 12:01:49,605 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.529e+01 2.856e+01 3.179e+01 4.127e+01, threshold=5.712e+01, percent-clipped=0.0 2023-12-21 12:01:50,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=24666.666666666668, ans=0.0 2023-12-21 12:01:59,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.41 vs. limit=15.0 2023-12-21 12:02:07,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=25.45 vs. limit=15.0 2023-12-21 12:02:18,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=24866.666666666668, ans=0.1 2023-12-21 12:02:20,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=24866.666666666668, ans=0.125 2023-12-21 12:02:25,198 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.911e+00 2023-12-21 12:02:29,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=12.0 2023-12-21 12:02:32,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=24933.333333333332, ans=0.0 2023-12-21 12:02:33,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=24933.333333333332, ans=0.125 2023-12-21 12:02:33,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=24933.333333333332, ans=0.0054492753623188415 2023-12-21 12:02:37,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=20.04 vs. limit=15.0 2023-12-21 12:02:38,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.40 vs. limit=10.0 2023-12-21 12:02:38,534 INFO [train.py:886] (0/4) Epoch 1, batch 3750, loss[loss=0.01766, audio_tagging_loss=0.01766, over 24750.00 frames. ], tot_loss[loss=0.01926, audio_tagging_loss=0.01926, over 4950921.31 frames. ], batch size: 99, lr: 4.26e-02, grad_scale: 64.0 2023-12-21 12:02:56,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=25066.666666666668, ans=0.125 2023-12-21 12:03:07,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=25133.333333333332, ans=0.0 2023-12-21 12:03:09,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=25200.0, ans=0.125 2023-12-21 12:03:17,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=25200.0, ans=0.125 2023-12-21 12:03:30,816 INFO [train.py:886] (0/4) Epoch 1, batch 3800, loss[loss=0.01864, audio_tagging_loss=0.01864, over 24750.00 frames. ], tot_loss[loss=0.01933, audio_tagging_loss=0.01933, over 4951369.54 frames. ], batch size: 99, lr: 4.25e-02, grad_scale: 64.0 2023-12-21 12:03:32,663 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.585e+01 2.873e+01 3.239e+01 4.281e+01, threshold=5.745e+01, percent-clipped=0.0 2023-12-21 12:03:37,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=25333.333333333332, ans=0.2 2023-12-21 12:03:42,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=16.77 vs. limit=15.0 2023-12-21 12:04:12,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=25600.0, ans=0.0 2023-12-21 12:04:16,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.64 vs. limit=15.0 2023-12-21 12:04:20,855 INFO [train.py:886] (0/4) Epoch 1, batch 3850, loss[loss=0.018, audio_tagging_loss=0.018, over 25000.00 frames. ], tot_loss[loss=0.01927, audio_tagging_loss=0.01927, over 4950857.18 frames. ], batch size: 100, lr: 4.24e-02, grad_scale: 64.0 2023-12-21 12:04:42,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.56 vs. limit=8.0 2023-12-21 12:04:51,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25866.666666666668, ans=0.1 2023-12-21 12:04:54,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=25866.666666666668, ans=0.005246376811594203 2023-12-21 12:05:11,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=25933.333333333332, ans=0.0 2023-12-21 12:05:12,879 INFO [train.py:886] (0/4) Epoch 1, batch 3900, loss[loss=0.02052, audio_tagging_loss=0.02052, over 25000.00 frames. ], tot_loss[loss=0.01914, audio_tagging_loss=0.01914, over 4952600.20 frames. ], batch size: 100, lr: 4.24e-02, grad_scale: 64.0 2023-12-21 12:05:14,770 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.615e+01 2.835e+01 3.211e+01 6.050e+01, threshold=5.671e+01, percent-clipped=1.0 2023-12-21 12:05:14,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=26000.0, ans=0.125 2023-12-21 12:05:17,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=26000.0, ans=0.125 2023-12-21 12:05:31,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=12.0 2023-12-21 12:05:38,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=26133.333333333332, ans=0.1 2023-12-21 12:05:50,704 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.62 vs. limit=15.0 2023-12-21 12:05:52,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=26200.0, ans=0.2 2023-12-21 12:05:52,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=26200.0, ans=0.07 2023-12-21 12:06:02,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=26266.666666666668, ans=0.1 2023-12-21 12:06:05,265 INFO [train.py:886] (0/4) Epoch 1, batch 3950, loss[loss=0.02109, audio_tagging_loss=0.02109, over 21920.00 frames. ], tot_loss[loss=0.01905, audio_tagging_loss=0.01905, over 4953445.92 frames. ], batch size: 107, lr: 4.23e-02, grad_scale: 64.0 2023-12-21 12:06:07,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=26333.333333333332, ans=0.0 2023-12-21 12:06:08,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=26333.333333333332, ans=0.125 2023-12-21 12:06:22,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=26400.0, ans=0.125 2023-12-21 12:06:35,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.29 vs. limit=22.5 2023-12-21 12:06:37,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.16 vs. limit=10.0 2023-12-21 12:06:50,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=26600.0, ans=0.0 2023-12-21 12:06:50,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=26600.0, ans=0.125 2023-12-21 12:06:54,937 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-4000.pt 2023-12-21 12:06:56,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=26666.666666666668, ans=0.125 2023-12-21 12:06:57,681 INFO [train.py:886] (0/4) Epoch 1, batch 4000, loss[loss=0.02279, audio_tagging_loss=0.02279, over 25000.00 frames. ], tot_loss[loss=0.01903, audio_tagging_loss=0.01903, over 4955169.64 frames. ], batch size: 100, lr: 4.23e-02, grad_scale: 64.0 2023-12-21 12:06:59,521 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.146e+01 2.600e+01 2.855e+01 3.213e+01 4.653e+01, threshold=5.711e+01, percent-clipped=0.0 2023-12-21 12:07:04,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=26666.666666666668, ans=0.04949747468305833 2023-12-21 12:07:06,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=26733.333333333332, ans=0.1 2023-12-21 12:07:10,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.82 vs. limit=6.0 2023-12-21 12:07:19,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=26800.0, ans=0.125 2023-12-21 12:07:24,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=26800.0, ans=0.005043478260869565 2023-12-21 12:07:25,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=26800.0, ans=0.005043478260869565 2023-12-21 12:07:25,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.43 vs. limit=15.0 2023-12-21 12:07:36,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=26866.666666666668, ans=0.1 2023-12-21 12:07:46,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=26933.333333333332, ans=0.125 2023-12-21 12:07:48,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=26933.333333333332, ans=0.0 2023-12-21 12:07:50,894 INFO [train.py:886] (0/4) Epoch 1, batch 4050, loss[loss=0.02043, audio_tagging_loss=0.02043, over 25000.00 frames. ], tot_loss[loss=0.01913, audio_tagging_loss=0.01913, over 4959199.47 frames. ], batch size: 100, lr: 4.22e-02, grad_scale: 64.0 2023-12-21 12:07:54,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27000.0, ans=0.1 2023-12-21 12:08:00,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=27066.666666666668, ans=0.125 2023-12-21 12:08:06,383 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.089e+00 2023-12-21 12:08:08,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=27066.666666666668, ans=0.05 2023-12-21 12:08:10,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=27133.333333333332, ans=0.125 2023-12-21 12:08:15,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=27133.333333333332, ans=0.07 2023-12-21 12:08:25,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=27200.0, ans=0.09899494936611666 2023-12-21 12:08:26,386 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2023-12-21 12:08:29,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=27200.0, ans=0.125 2023-12-21 12:08:30,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=27200.0, ans=0.004956521739130435 2023-12-21 12:08:30,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=27266.666666666668, ans=0.125 2023-12-21 12:08:41,509 INFO [train.py:886] (0/4) Epoch 1, batch 4100, loss[loss=0.0212, audio_tagging_loss=0.0212, over 24750.00 frames. ], tot_loss[loss=0.01919, audio_tagging_loss=0.01919, over 4948006.03 frames. ], batch size: 99, lr: 4.22e-02, grad_scale: 64.0 2023-12-21 12:08:44,142 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.558e+01 2.802e+01 3.131e+01 4.356e+01, threshold=5.603e+01, percent-clipped=0.0 2023-12-21 12:09:01,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=27400.0, ans=0.1 2023-12-21 12:09:23,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.32 vs. limit=15.0 2023-12-21 12:09:24,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=27600.0, ans=0.0 2023-12-21 12:09:30,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.18 vs. limit=22.5 2023-12-21 12:09:34,832 INFO [train.py:886] (0/4) Epoch 1, batch 4150, loss[loss=0.02048, audio_tagging_loss=0.02048, over 24750.00 frames. ], tot_loss[loss=0.01917, audio_tagging_loss=0.01917, over 4940290.37 frames. ], batch size: 99, lr: 4.21e-02, grad_scale: 64.0 2023-12-21 12:09:40,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.20 vs. limit=22.5 2023-12-21 12:09:42,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.08 vs. limit=22.5 2023-12-21 12:09:42,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=27666.666666666668, ans=0.125 2023-12-21 12:09:44,886 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.36 vs. limit=22.5 2023-12-21 12:09:47,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=27733.333333333332, ans=0.125 2023-12-21 12:09:50,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=27733.333333333332, ans=0.0 2023-12-21 12:09:55,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.76 vs. limit=22.5 2023-12-21 12:10:03,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.87 vs. limit=22.5 2023-12-21 12:10:09,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=27866.666666666668, ans=0.0 2023-12-21 12:10:11,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.80 vs. limit=15.0 2023-12-21 12:10:14,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=27866.666666666668, ans=0.125 2023-12-21 12:10:14,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=27933.333333333332, ans=0.125 2023-12-21 12:10:15,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=12.0 2023-12-21 12:10:24,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=27933.333333333332, ans=0.125 2023-12-21 12:10:24,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.97 vs. limit=10.0 2023-12-21 12:10:27,462 INFO [train.py:886] (0/4) Epoch 1, batch 4200, loss[loss=0.01332, audio_tagging_loss=0.01332, over 25000.00 frames. ], tot_loss[loss=0.01889, audio_tagging_loss=0.01889, over 4945546.75 frames. ], batch size: 100, lr: 4.20e-02, grad_scale: 64.0 2023-12-21 12:10:29,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=28000.0, ans=0.125 2023-12-21 12:10:30,044 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.568e+01 2.812e+01 3.182e+01 3.944e+01, threshold=5.625e+01, percent-clipped=0.0 2023-12-21 12:10:43,995 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-12-21 12:11:02,327 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=26.14 vs. limit=22.5 2023-12-21 12:11:02,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=28200.0, ans=0.125 2023-12-21 12:11:12,356 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.22 vs. limit=22.5 2023-12-21 12:11:13,081 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.228e+01 2023-12-21 12:11:14,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=28266.666666666668, ans=0.2 2023-12-21 12:11:16,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=12.0 2023-12-21 12:11:18,581 INFO [train.py:886] (0/4) Epoch 1, batch 4250, loss[loss=0.01877, audio_tagging_loss=0.01877, over 22113.00 frames. ], tot_loss[loss=0.0188, audio_tagging_loss=0.0188, over 4947838.13 frames. ], batch size: 107, lr: 4.20e-02, grad_scale: 128.0 2023-12-21 12:11:31,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=28400.0, ans=0.125 2023-12-21 12:11:33,205 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.060e+01 2023-12-21 12:11:41,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=28466.666666666668, ans=0.125 2023-12-21 12:11:45,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=28466.666666666668, ans=0.0 2023-12-21 12:11:50,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.38 vs. limit=15.0 2023-12-21 12:11:51,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=28533.333333333332, ans=0.1 2023-12-21 12:12:02,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.42 vs. limit=22.5 2023-12-21 12:12:07,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=28600.0, ans=0.125 2023-12-21 12:12:11,740 INFO [train.py:886] (0/4) Epoch 1, batch 4300, loss[loss=0.01524, audio_tagging_loss=0.01524, over 25000.00 frames. ], tot_loss[loss=0.01886, audio_tagging_loss=0.01886, over 4950148.29 frames. ], batch size: 100, lr: 4.19e-02, grad_scale: 128.0 2023-12-21 12:12:13,642 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.582e+01 2.869e+01 3.269e+01 4.965e+01, threshold=5.738e+01, percent-clipped=0.0 2023-12-21 12:12:17,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=28666.666666666668, ans=0.125 2023-12-21 12:12:17,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=12.0 2023-12-21 12:12:24,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=28733.333333333332, ans=0.2 2023-12-21 12:12:30,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=28800.0, ans=0.125 2023-12-21 12:12:35,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=28800.0, ans=0.125 2023-12-21 12:12:35,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=28800.0, ans=0.004608695652173913 2023-12-21 12:12:38,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=28800.0, ans=0.0 2023-12-21 12:12:56,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.41 vs. limit=15.0 2023-12-21 12:13:03,101 INFO [train.py:886] (0/4) Epoch 1, batch 4350, loss[loss=0.02081, audio_tagging_loss=0.02081, over 25000.00 frames. ], tot_loss[loss=0.01903, audio_tagging_loss=0.01903, over 4954648.42 frames. ], batch size: 100, lr: 4.19e-02, grad_scale: 128.0 2023-12-21 12:13:08,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.57 vs. limit=22.5 2023-12-21 12:13:11,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=29000.0, ans=0.0 2023-12-21 12:13:13,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=29066.666666666668, ans=0.125 2023-12-21 12:13:14,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=29066.666666666668, ans=0.0 2023-12-21 12:13:21,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.18 vs. limit=15.0 2023-12-21 12:13:26,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=29133.333333333332, ans=0.02 2023-12-21 12:13:34,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.15 vs. limit=22.5 2023-12-21 12:13:41,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=29200.0, ans=0.125 2023-12-21 12:13:55,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.42 vs. limit=15.0 2023-12-21 12:13:56,071 INFO [train.py:886] (0/4) Epoch 1, batch 4400, loss[loss=0.01679, audio_tagging_loss=0.01679, over 24750.00 frames. ], tot_loss[loss=0.01915, audio_tagging_loss=0.01915, over 4952823.72 frames. ], batch size: 99, lr: 4.18e-02, grad_scale: 128.0 2023-12-21 12:13:57,946 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.596e+01 2.831e+01 3.124e+01 4.949e+01, threshold=5.662e+01, percent-clipped=0.0 2023-12-21 12:14:02,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=29333.333333333332, ans=0.125 2023-12-21 12:14:03,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.23 vs. limit=22.5 2023-12-21 12:14:06,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=29400.0, ans=0.125 2023-12-21 12:14:08,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=29400.0, ans=0.125 2023-12-21 12:14:12,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=29400.0, ans=0.1 2023-12-21 12:14:18,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=29466.666666666668, ans=15.0 2023-12-21 12:14:22,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=29466.666666666668, ans=0.125 2023-12-21 12:14:30,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=29533.333333333332, ans=0.0044492753623188415 2023-12-21 12:14:30,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.94 vs. limit=15.0 2023-12-21 12:14:48,778 INFO [train.py:886] (0/4) Epoch 1, batch 4450, loss[loss=0.02024, audio_tagging_loss=0.02024, over 24750.00 frames. ], tot_loss[loss=0.01923, audio_tagging_loss=0.01923, over 4948791.38 frames. ], batch size: 99, lr: 4.17e-02, grad_scale: 128.0 2023-12-21 12:14:52,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=29666.666666666668, ans=0.1 2023-12-21 12:15:05,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.17 vs. limit=15.0 2023-12-21 12:15:28,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=29866.666666666668, ans=0.125 2023-12-21 12:15:32,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=17.78 vs. limit=15.0 2023-12-21 12:15:40,549 INFO [train.py:886] (0/4) Epoch 1, batch 4500, loss[loss=0.02195, audio_tagging_loss=0.02195, over 25000.00 frames. ], tot_loss[loss=0.01908, audio_tagging_loss=0.01908, over 4946528.84 frames. ], batch size: 100, lr: 4.17e-02, grad_scale: 128.0 2023-12-21 12:15:43,806 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.606e+01 2.897e+01 3.074e+01 4.883e+01, threshold=5.793e+01, percent-clipped=0.0 2023-12-21 12:15:51,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=30066.666666666668, ans=0.125 2023-12-21 12:15:58,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=30066.666666666668, ans=0.125 2023-12-21 12:15:59,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=30066.666666666668, ans=0.125 2023-12-21 12:16:10,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=30133.333333333332, ans=0.0 2023-12-21 12:16:10,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=30133.333333333332, ans=0.2 2023-12-21 12:16:13,412 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.303e+01 2023-12-21 12:16:31,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.33 vs. limit=15.0 2023-12-21 12:16:34,303 INFO [train.py:886] (0/4) Epoch 1, batch 4550, loss[loss=0.01923, audio_tagging_loss=0.01923, over 25000.00 frames. ], tot_loss[loss=0.01901, audio_tagging_loss=0.01901, over 4951961.49 frames. ], batch size: 100, lr: 4.16e-02, grad_scale: 128.0 2023-12-21 12:16:36,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=30333.333333333332, ans=0.05 2023-12-21 12:16:38,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=30333.333333333332, ans=0.0 2023-12-21 12:16:46,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.34 vs. limit=22.5 2023-12-21 12:16:46,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=30400.0, ans=0.5 2023-12-21 12:17:02,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=30466.666666666668, ans=12.0 2023-12-21 12:17:11,057 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.01 vs. limit=22.5 2023-12-21 12:17:12,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=30533.333333333332, ans=0.5 2023-12-21 12:17:15,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=30600.0, ans=0.125 2023-12-21 12:17:15,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.64 vs. limit=22.5 2023-12-21 12:17:17,826 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.34 vs. limit=12.0 2023-12-21 12:17:27,237 INFO [train.py:886] (0/4) Epoch 1, batch 4600, loss[loss=0.01935, audio_tagging_loss=0.01935, over 24750.00 frames. ], tot_loss[loss=0.01897, audio_tagging_loss=0.01897, over 4954890.63 frames. ], batch size: 99, lr: 4.15e-02, grad_scale: 128.0 2023-12-21 12:17:29,159 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.032e+01 2.518e+01 2.768e+01 3.063e+01 4.476e+01, threshold=5.536e+01, percent-clipped=0.0 2023-12-21 12:17:33,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=30666.666666666668, ans=0.125 2023-12-21 12:17:41,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=30733.333333333332, ans=0.125 2023-12-21 12:17:42,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=30733.333333333332, ans=0.125 2023-12-21 12:17:53,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=12.0 2023-12-21 12:17:54,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=30800.0, ans=0.0 2023-12-21 12:18:04,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=30866.666666666668, ans=0.05 2023-12-21 12:18:13,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=30933.333333333332, ans=0.125 2023-12-21 12:18:14,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.89 vs. limit=6.0 2023-12-21 12:18:18,730 INFO [train.py:886] (0/4) Epoch 1, batch 4650, loss[loss=0.01805, audio_tagging_loss=0.01805, over 25000.00 frames. ], tot_loss[loss=0.01899, audio_tagging_loss=0.01899, over 4955234.12 frames. ], batch size: 100, lr: 4.15e-02, grad_scale: 128.0 2023-12-21 12:18:43,950 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=9.562e-02 2023-12-21 12:18:48,823 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.38 vs. limit=22.5 2023-12-21 12:18:49,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=31200.0, ans=0.125 2023-12-21 12:19:01,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=31.05 vs. limit=22.5 2023-12-21 12:19:07,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=31266.666666666668, ans=0.07 2023-12-21 12:19:10,781 INFO [train.py:886] (0/4) Epoch 1, batch 4700, loss[loss=0.02071, audio_tagging_loss=0.02071, over 25000.00 frames. ], tot_loss[loss=0.01911, audio_tagging_loss=0.01911, over 4955692.24 frames. ], batch size: 100, lr: 4.14e-02, grad_scale: 128.0 2023-12-21 12:19:12,551 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.160e+01 2.530e+01 2.695e+01 2.950e+01 3.950e+01, threshold=5.391e+01, percent-clipped=0.0 2023-12-21 12:19:12,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=31333.333333333332, ans=0.004057971014492754 2023-12-21 12:19:17,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=31333.333333333332, ans=0.125 2023-12-21 12:19:31,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=31466.666666666668, ans=0.0 2023-12-21 12:19:35,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=31466.666666666668, ans=0.125 2023-12-21 12:19:38,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=31533.333333333332, ans=0.0 2023-12-21 12:19:56,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=31666.666666666668, ans=0.1 2023-12-21 12:19:57,269 INFO [train.py:886] (0/4) Epoch 1, batch 4750, loss[loss=0.01977, audio_tagging_loss=0.01977, over 24750.00 frames. ], tot_loss[loss=0.01926, audio_tagging_loss=0.01926, over 4949218.20 frames. ], batch size: 99, lr: 4.14e-02, grad_scale: 128.0 2023-12-21 12:19:59,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.56 vs. limit=15.0 2023-12-21 12:20:13,656 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-1.pt 2023-12-21 12:20:36,406 INFO [train.py:886] (0/4) Epoch 2, batch 0, loss[loss=0.05702, audio_tagging_loss=0.05702, over 20540.00 frames. ], tot_loss[loss=0.05702, audio_tagging_loss=0.05702, over 20540.00 frames. ], batch size: 107, lr: 4.05e-02, grad_scale: 128.0 2023-12-21 12:20:36,408 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 12:20:59,083 INFO [train.py:917] (0/4) Epoch 2, validation: loss=0.0423, audio_tagging_loss=0.0423, over 3737520.00 frames. 2023-12-21 12:20:59,084 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 12:21:10,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2023-12-21 12:21:19,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=31906.666666666668, ans=0.125 2023-12-21 12:21:19,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=31906.666666666668, ans=0.2 2023-12-21 12:21:33,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=31973.333333333332, ans=0.125 2023-12-21 12:21:35,980 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.667e+01 2.944e+01 3.472e+01 1.120e+02, threshold=5.887e+01, percent-clipped=2.0 2023-12-21 12:21:49,442 INFO [train.py:886] (0/4) Epoch 2, batch 50, loss[loss=0.02552, audio_tagging_loss=0.02552, over 25000.00 frames. ], tot_loss[loss=0.03054, audio_tagging_loss=0.03054, over 1108122.52 frames. ], batch size: 100, lr: 4.05e-02, grad_scale: 128.0 2023-12-21 12:21:52,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=32106.666666666668, ans=0.125 2023-12-21 12:21:53,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=32106.666666666668, ans=0.125 2023-12-21 12:21:54,390 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.535e+01 2023-12-21 12:22:09,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=32173.333333333332, ans=0.003875362318840579 2023-12-21 12:22:15,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=32240.0, ans=0.125 2023-12-21 12:22:41,804 INFO [train.py:886] (0/4) Epoch 2, batch 100, loss[loss=0.0231, audio_tagging_loss=0.0231, over 25000.00 frames. ], tot_loss[loss=0.02646, audio_tagging_loss=0.02646, over 1960568.11 frames. ], batch size: 100, lr: 4.04e-02, grad_scale: 128.0 2023-12-21 12:22:41,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=32440.0, ans=0.125 2023-12-21 12:22:51,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=32506.666666666668, ans=0.125 2023-12-21 12:22:57,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=32506.666666666668, ans=0.0 2023-12-21 12:23:13,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=32640.0, ans=0.1 2023-12-21 12:23:14,105 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2023-12-21 12:23:18,386 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.835e+01 3.088e+01 3.489e+01 4.316e+01, threshold=6.177e+01, percent-clipped=0.0 2023-12-21 12:23:19,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=32640.0, ans=0.125 2023-12-21 12:23:31,825 INFO [train.py:886] (0/4) Epoch 2, batch 150, loss[loss=0.01885, audio_tagging_loss=0.01885, over 21821.00 frames. ], tot_loss[loss=0.02404, audio_tagging_loss=0.02404, over 2625422.01 frames. ], batch size: 107, lr: 4.04e-02, grad_scale: 128.0 2023-12-21 12:23:34,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=32773.333333333336, ans=15.0 2023-12-21 12:23:37,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=32773.333333333336, ans=0.1 2023-12-21 12:23:41,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=32840.0, ans=0.0 2023-12-21 12:23:48,510 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.812e+01 2023-12-21 12:24:05,034 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.170e+01 2023-12-21 12:24:07,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=32973.333333333336, ans=0.125 2023-12-21 12:24:08,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=32973.333333333336, ans=0.125 2023-12-21 12:24:23,406 INFO [train.py:886] (0/4) Epoch 2, batch 200, loss[loss=0.01867, audio_tagging_loss=0.01867, over 25000.00 frames. ], tot_loss[loss=0.02234, audio_tagging_loss=0.02234, over 3140565.05 frames. ], batch size: 100, lr: 4.03e-02, grad_scale: 128.0 2023-12-21 12:24:23,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=33106.666666666664, ans=0.5 2023-12-21 12:24:55,529 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=15.0 2023-12-21 12:24:59,198 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.541e+01 2.777e+01 3.081e+01 4.614e+01, threshold=5.554e+01, percent-clipped=0.0 2023-12-21 12:25:12,690 INFO [train.py:886] (0/4) Epoch 2, batch 250, loss[loss=0.01892, audio_tagging_loss=0.01892, over 25000.00 frames. ], tot_loss[loss=0.02138, audio_tagging_loss=0.02138, over 3542323.44 frames. ], batch size: 100, lr: 4.02e-02, grad_scale: 128.0 2023-12-21 12:25:19,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=33440.0, ans=0.125 2023-12-21 12:25:21,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.20 vs. limit=22.5 2023-12-21 12:25:21,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.39 vs. limit=6.0 2023-12-21 12:25:26,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.48 vs. limit=15.0 2023-12-21 12:25:41,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=33573.333333333336, ans=0.015 2023-12-21 12:25:43,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=33640.0, ans=0.2 2023-12-21 12:25:46,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=33640.0, ans=0.125 2023-12-21 12:25:53,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=12.0 2023-12-21 12:25:58,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=33706.666666666664, ans=0.003542028985507247 2023-12-21 12:26:00,579 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.238e+01 2023-12-21 12:26:05,331 INFO [train.py:886] (0/4) Epoch 2, batch 300, loss[loss=0.01863, audio_tagging_loss=0.01863, over 25000.00 frames. ], tot_loss[loss=0.02076, audio_tagging_loss=0.02076, over 3856202.53 frames. ], batch size: 100, lr: 4.02e-02, grad_scale: 128.0 2023-12-21 12:26:08,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=33773.333333333336, ans=0.125 2023-12-21 12:26:19,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=33840.0, ans=0.125 2023-12-21 12:26:28,051 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.29 vs. limit=15.0 2023-12-21 12:26:41,963 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.636e+01 2.849e+01 3.270e+01 4.493e+01, threshold=5.697e+01, percent-clipped=0.0 2023-12-21 12:26:44,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.08 vs. limit=15.0 2023-12-21 12:26:57,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.51 vs. limit=22.5 2023-12-21 12:26:58,347 INFO [train.py:886] (0/4) Epoch 2, batch 350, loss[loss=0.01795, audio_tagging_loss=0.01795, over 24750.00 frames. ], tot_loss[loss=0.0203, audio_tagging_loss=0.0203, over 4096388.40 frames. ], batch size: 99, lr: 4.01e-02, grad_scale: 128.0 2023-12-21 12:27:01,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=34106.666666666664, ans=0.1 2023-12-21 12:27:03,613 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.95 vs. limit=6.0 2023-12-21 12:27:03,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.32 vs. limit=22.5 2023-12-21 12:27:07,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=34173.333333333336, ans=0.02 2023-12-21 12:27:15,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=34173.333333333336, ans=0.1 2023-12-21 12:27:21,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2023-12-21 12:27:26,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=34240.0, ans=0.125 2023-12-21 12:27:32,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=34306.666666666664, ans=0.003411594202898551 2023-12-21 12:27:47,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=34373.333333333336, ans=0.125 2023-12-21 12:27:48,700 INFO [train.py:886] (0/4) Epoch 2, batch 400, loss[loss=0.01695, audio_tagging_loss=0.01695, over 24750.00 frames. ], tot_loss[loss=0.01978, audio_tagging_loss=0.01978, over 4279477.35 frames. ], batch size: 99, lr: 4.00e-02, grad_scale: 128.0 2023-12-21 12:28:01,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=34506.666666666664, ans=0.07 2023-12-21 12:28:05,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=34506.666666666664, ans=0.2 2023-12-21 12:28:17,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=34573.333333333336, ans=0.1 2023-12-21 12:28:21,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=34640.0, ans=0.2 2023-12-21 12:28:27,069 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.614e+01 2.832e+01 3.282e+01 4.627e+01, threshold=5.664e+01, percent-clipped=0.0 2023-12-21 12:28:32,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=34706.666666666664, ans=0.0 2023-12-21 12:28:40,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=34706.666666666664, ans=0.025 2023-12-21 12:28:42,750 INFO [train.py:886] (0/4) Epoch 2, batch 450, loss[loss=0.01676, audio_tagging_loss=0.01676, over 25000.00 frames. ], tot_loss[loss=0.01951, audio_tagging_loss=0.01951, over 4434427.90 frames. ], batch size: 100, lr: 4.00e-02, grad_scale: 128.0 2023-12-21 12:28:52,813 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.55 vs. limit=15.0 2023-12-21 12:29:02,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=34906.666666666664, ans=0.0 2023-12-21 12:29:11,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=34906.666666666664, ans=0.2 2023-12-21 12:29:19,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=34973.333333333336, ans=0.07 2023-12-21 12:29:32,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=35040.0, ans=0.125 2023-12-21 12:29:35,147 INFO [train.py:886] (0/4) Epoch 2, batch 500, loss[loss=0.01416, audio_tagging_loss=0.01416, over 25000.00 frames. ], tot_loss[loss=0.01919, audio_tagging_loss=0.01919, over 4548736.21 frames. ], batch size: 100, lr: 3.99e-02, grad_scale: 128.0 2023-12-21 12:29:37,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=35106.666666666664, ans=0.125 2023-12-21 12:29:50,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=35173.333333333336, ans=0.0032231884057971017 2023-12-21 12:30:05,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=35306.666666666664, ans=0.2 2023-12-21 12:30:12,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=35306.666666666664, ans=0.0 2023-12-21 12:30:13,254 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.483e+01 2.716e+01 2.937e+01 3.953e+01, threshold=5.433e+01, percent-clipped=0.0 2023-12-21 12:30:14,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=25.45 vs. limit=22.5 2023-12-21 12:30:22,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.73 vs. limit=22.5 2023-12-21 12:30:27,307 INFO [train.py:886] (0/4) Epoch 2, batch 550, loss[loss=0.02057, audio_tagging_loss=0.02057, over 25000.00 frames. ], tot_loss[loss=0.01901, audio_tagging_loss=0.01901, over 4635870.79 frames. ], batch size: 100, lr: 3.99e-02, grad_scale: 128.0 2023-12-21 12:30:27,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=35440.0, ans=0.125 2023-12-21 12:30:32,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=35440.0, ans=0.003165217391304347 2023-12-21 12:30:48,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=35573.333333333336, ans=0.0 2023-12-21 12:30:49,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=35573.333333333336, ans=0.125 2023-12-21 12:30:59,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=12.0 2023-12-21 12:31:05,309 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=20.75 vs. limit=15.0 2023-12-21 12:31:20,661 INFO [train.py:886] (0/4) Epoch 2, batch 600, loss[loss=0.01794, audio_tagging_loss=0.01794, over 22243.00 frames. ], tot_loss[loss=0.0191, audio_tagging_loss=0.0191, over 4702310.50 frames. ], batch size: 107, lr: 3.98e-02, grad_scale: 128.0 2023-12-21 12:31:26,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=35773.333333333336, ans=0.0 2023-12-21 12:31:27,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=35773.333333333336, ans=0.1 2023-12-21 12:31:34,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.87 vs. limit=6.0 2023-12-21 12:31:37,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=35840.0, ans=0.125 2023-12-21 12:31:52,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=35973.333333333336, ans=0.125 2023-12-21 12:31:58,311 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.564e+01 2.794e+01 3.187e+01 4.110e+01, threshold=5.587e+01, percent-clipped=0.0 2023-12-21 12:32:01,452 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2023-12-21 12:32:02,438 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=19.07 vs. limit=15.0 2023-12-21 12:32:02,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=36040.0, ans=0.125 2023-12-21 12:32:12,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2023-12-21 12:32:12,557 INFO [train.py:886] (0/4) Epoch 2, batch 650, loss[loss=0.01871, audio_tagging_loss=0.01871, over 24750.00 frames. ], tot_loss[loss=0.01905, audio_tagging_loss=0.01905, over 4757613.27 frames. ], batch size: 99, lr: 3.97e-02, grad_scale: 128.0 2023-12-21 12:32:17,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=36106.666666666664, ans=0.2 2023-12-21 12:32:33,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=15.0 2023-12-21 12:32:34,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=26.61 vs. limit=22.5 2023-12-21 12:32:38,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=36240.0, ans=22.5 2023-12-21 12:32:57,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.01 vs. limit=15.0 2023-12-21 12:32:57,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=36373.333333333336, ans=0.125 2023-12-21 12:33:06,071 INFO [train.py:886] (0/4) Epoch 2, batch 700, loss[loss=0.01656, audio_tagging_loss=0.01656, over 24750.00 frames. ], tot_loss[loss=0.01893, audio_tagging_loss=0.01893, over 4797498.45 frames. ], batch size: 99, lr: 3.97e-02, grad_scale: 128.0 2023-12-21 12:33:08,601 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2023-12-21 12:33:12,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=36440.0, ans=0.0 2023-12-21 12:33:14,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=36506.666666666664, ans=0.125 2023-12-21 12:33:26,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=36573.333333333336, ans=0.125 2023-12-21 12:33:27,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=36573.333333333336, ans=0.125 2023-12-21 12:33:31,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=36573.333333333336, ans=0.125 2023-12-21 12:33:42,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.53 vs. limit=15.0 2023-12-21 12:33:43,496 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.539e+01 2.879e+01 3.158e+01 4.912e+01, threshold=5.759e+01, percent-clipped=0.0 2023-12-21 12:33:44,973 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.66 vs. limit=15.0 2023-12-21 12:33:59,164 INFO [train.py:886] (0/4) Epoch 2, batch 750, loss[loss=0.01686, audio_tagging_loss=0.01686, over 25000.00 frames. ], tot_loss[loss=0.01876, audio_tagging_loss=0.01876, over 4835343.84 frames. ], batch size: 100, lr: 3.96e-02, grad_scale: 128.0 2023-12-21 12:34:15,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=36840.0, ans=0.1 2023-12-21 12:34:24,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=36906.666666666664, ans=0.0 2023-12-21 12:34:48,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=37040.0, ans=0.125 2023-12-21 12:34:51,117 INFO [train.py:886] (0/4) Epoch 2, batch 800, loss[loss=0.02097, audio_tagging_loss=0.02097, over 25000.00 frames. ], tot_loss[loss=0.01872, audio_tagging_loss=0.01872, over 4865017.56 frames. ], batch size: 100, lr: 3.95e-02, grad_scale: 128.0 2023-12-21 12:34:55,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=12.0 2023-12-21 12:35:04,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.18 vs. limit=15.0 2023-12-21 12:35:22,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=37306.666666666664, ans=0.125 2023-12-21 12:35:24,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.18 vs. limit=15.0 2023-12-21 12:35:25,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=37306.666666666664, ans=0.02 2023-12-21 12:35:29,692 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.588e+01 2.884e+01 3.147e+01 4.791e+01, threshold=5.768e+01, percent-clipped=0.0 2023-12-21 12:35:31,154 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.90 vs. limit=10.0 2023-12-21 12:35:44,792 INFO [train.py:886] (0/4) Epoch 2, batch 850, loss[loss=0.02115, audio_tagging_loss=0.02115, over 25000.00 frames. ], tot_loss[loss=0.01876, audio_tagging_loss=0.01876, over 4888685.75 frames. ], batch size: 100, lr: 3.95e-02, grad_scale: 128.0 2023-12-21 12:35:54,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.32 vs. limit=5.0 2023-12-21 12:36:12,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=37573.333333333336, ans=0.07 2023-12-21 12:36:13,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=37573.333333333336, ans=0.1 2023-12-21 12:36:16,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=37640.0, ans=0.0 2023-12-21 12:36:28,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.04 vs. limit=22.5 2023-12-21 12:36:37,459 INFO [train.py:886] (0/4) Epoch 2, batch 900, loss[loss=0.02223, audio_tagging_loss=0.02223, over 24750.00 frames. ], tot_loss[loss=0.01879, audio_tagging_loss=0.01879, over 4910099.69 frames. ], batch size: 99, lr: 3.94e-02, grad_scale: 128.0 2023-12-21 12:36:49,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=37840.0, ans=0.002643478260869565 2023-12-21 12:37:14,917 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.584e+01 2.867e+01 3.127e+01 3.908e+01, threshold=5.734e+01, percent-clipped=0.0 2023-12-21 12:37:15,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=37973.333333333336, ans=0.0026144927536231885 2023-12-21 12:37:21,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.17 vs. limit=15.0 2023-12-21 12:37:28,445 INFO [train.py:886] (0/4) Epoch 2, batch 950, loss[loss=0.01746, audio_tagging_loss=0.01746, over 24750.00 frames. ], tot_loss[loss=0.01878, audio_tagging_loss=0.01878, over 4919574.99 frames. ], batch size: 99, lr: 3.94e-02, grad_scale: 128.0 2023-12-21 12:37:42,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=38173.333333333336, ans=0.2 2023-12-21 12:37:48,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=38173.333333333336, ans=0.125 2023-12-21 12:38:18,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=38373.333333333336, ans=0.125 2023-12-21 12:38:22,430 INFO [train.py:886] (0/4) Epoch 2, batch 1000, loss[loss=0.02434, audio_tagging_loss=0.02434, over 25000.00 frames. ], tot_loss[loss=0.01876, audio_tagging_loss=0.01876, over 4923006.35 frames. ], batch size: 100, lr: 3.93e-02, grad_scale: 128.0 2023-12-21 12:38:29,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.99 vs. limit=10.0 2023-12-21 12:38:41,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.70 vs. limit=12.0 2023-12-21 12:39:00,021 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.513e+01 2.801e+01 3.177e+01 4.242e+01, threshold=5.602e+01, percent-clipped=0.0 2023-12-21 12:39:00,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=38640.0, ans=0.125 2023-12-21 12:39:11,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=38706.666666666664, ans=0.125 2023-12-21 12:39:14,302 INFO [train.py:886] (0/4) Epoch 2, batch 1050, loss[loss=0.01864, audio_tagging_loss=0.01864, over 25000.00 frames. ], tot_loss[loss=0.01864, audio_tagging_loss=0.01864, over 4922891.13 frames. ], batch size: 100, lr: 3.92e-02, grad_scale: 128.0 2023-12-21 12:39:23,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=38773.333333333336, ans=0.09899494936611666 2023-12-21 12:39:27,552 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.001e+00 2023-12-21 12:39:29,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=38840.0, ans=0.0024260869565217386 2023-12-21 12:39:34,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=38840.0, ans=0.125 2023-12-21 12:39:39,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.68 vs. limit=15.0 2023-12-21 12:40:05,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=39040.0, ans=0.125 2023-12-21 12:40:06,873 INFO [train.py:886] (0/4) Epoch 2, batch 1100, loss[loss=0.01781, audio_tagging_loss=0.01781, over 25000.00 frames. ], tot_loss[loss=0.01861, audio_tagging_loss=0.01861, over 4928166.51 frames. ], batch size: 100, lr: 3.92e-02, grad_scale: 128.0 2023-12-21 12:40:12,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.47 vs. limit=15.0 2023-12-21 12:40:13,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=39106.666666666664, ans=0.125 2023-12-21 12:40:17,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=39173.333333333336, ans=0.125 2023-12-21 12:40:22,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=39173.333333333336, ans=0.125 2023-12-21 12:40:23,278 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.85 vs. limit=15.0 2023-12-21 12:40:28,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=39240.0, ans=0.125 2023-12-21 12:40:29,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=39240.0, ans=0.05 2023-12-21 12:40:30,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=39240.0, ans=0.125 2023-12-21 12:40:40,938 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.64 vs. limit=15.0 2023-12-21 12:40:43,540 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.542e+01 2.826e+01 3.168e+01 4.060e+01, threshold=5.651e+01, percent-clipped=0.0 2023-12-21 12:40:46,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.52 vs. limit=10.0 2023-12-21 12:40:51,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=39373.333333333336, ans=0.5 2023-12-21 12:40:59,808 INFO [train.py:886] (0/4) Epoch 2, batch 1150, loss[loss=0.02216, audio_tagging_loss=0.02216, over 25000.00 frames. ], tot_loss[loss=0.01855, audio_tagging_loss=0.01855, over 4933803.04 frames. ], batch size: 100, lr: 3.91e-02, grad_scale: 128.0 2023-12-21 12:41:01,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=39440.0, ans=0.125 2023-12-21 12:41:07,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2023-12-21 12:41:17,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=39506.666666666664, ans=0.0 2023-12-21 12:41:43,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.47 vs. limit=22.5 2023-12-21 12:41:48,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=39706.666666666664, ans=0.125 2023-12-21 12:41:50,037 INFO [train.py:886] (0/4) Epoch 2, batch 1200, loss[loss=0.02111, audio_tagging_loss=0.02111, over 24950.00 frames. ], tot_loss[loss=0.01853, audio_tagging_loss=0.01853, over 4941122.70 frames. ], batch size: 100, lr: 3.90e-02, grad_scale: 128.0 2023-12-21 12:42:00,261 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.699e+01 2023-12-21 12:42:19,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=39906.666666666664, ans=0.05 2023-12-21 12:42:26,678 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.581e+01 2.851e+01 3.035e+01 4.083e+01, threshold=5.702e+01, percent-clipped=0.0 2023-12-21 12:42:37,342 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.036e+00 2023-12-21 12:42:42,772 INFO [train.py:886] (0/4) Epoch 2, batch 1250, loss[loss=0.01533, audio_tagging_loss=0.01533, over 24750.00 frames. ], tot_loss[loss=0.01872, audio_tagging_loss=0.01872, over 4944545.31 frames. ], batch size: 99, lr: 3.90e-02, grad_scale: 128.0 2023-12-21 12:42:48,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=40106.666666666664, ans=0.125 2023-12-21 12:42:50,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=40106.666666666664, ans=0.125 2023-12-21 12:43:00,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40173.333333333336, ans=0.1 2023-12-21 12:43:16,079 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.99 vs. limit=15.0 2023-12-21 12:43:20,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=40306.666666666664, ans=0.0 2023-12-21 12:43:20,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=40306.666666666664, ans=0.1 2023-12-21 12:43:29,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.16 vs. limit=15.0 2023-12-21 12:43:34,314 INFO [train.py:886] (0/4) Epoch 2, batch 1300, loss[loss=0.01983, audio_tagging_loss=0.01983, over 24750.00 frames. ], tot_loss[loss=0.01875, audio_tagging_loss=0.01875, over 4945670.57 frames. ], batch size: 99, lr: 3.89e-02, grad_scale: 128.0 2023-12-21 12:43:46,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=40506.666666666664, ans=0.0 2023-12-21 12:43:51,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40506.666666666664, ans=0.1 2023-12-21 12:43:53,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=40573.333333333336, ans=0.0 2023-12-21 12:43:58,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.91 vs. limit=15.0 2023-12-21 12:44:00,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=40573.333333333336, ans=0.125 2023-12-21 12:44:07,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=40640.0, ans=0.0 2023-12-21 12:44:08,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=40640.0, ans=0.0 2023-12-21 12:44:10,395 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=17.75 vs. limit=15.0 2023-12-21 12:44:11,692 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.195e+01 2.485e+01 2.836e+01 3.251e+01 4.235e+01, threshold=5.672e+01, percent-clipped=0.0 2023-12-21 12:44:15,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=40706.666666666664, ans=0.125 2023-12-21 12:44:20,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=40706.666666666664, ans=10.0 2023-12-21 12:44:25,323 INFO [train.py:886] (0/4) Epoch 2, batch 1350, loss[loss=0.01896, audio_tagging_loss=0.01896, over 25000.00 frames. ], tot_loss[loss=0.01856, audio_tagging_loss=0.01856, over 4948314.75 frames. ], batch size: 100, lr: 3.88e-02, grad_scale: 128.0 2023-12-21 12:44:46,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=40906.666666666664, ans=0.015 2023-12-21 12:44:53,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2023-12-21 12:44:55,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.39 vs. limit=15.0 2023-12-21 12:44:58,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.93 vs. limit=15.0 2023-12-21 12:45:17,963 INFO [train.py:886] (0/4) Epoch 2, batch 1400, loss[loss=0.01592, audio_tagging_loss=0.01592, over 25000.00 frames. ], tot_loss[loss=0.01843, audio_tagging_loss=0.01843, over 4955468.00 frames. ], batch size: 100, lr: 3.88e-02, grad_scale: 128.0 2023-12-21 12:45:37,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=41240.0, ans=0.125 2023-12-21 12:45:43,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.35 vs. limit=10.0 2023-12-21 12:45:54,840 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.022e+01 2.439e+01 2.675e+01 2.970e+01 3.748e+01, threshold=5.350e+01, percent-clipped=0.0 2023-12-21 12:46:00,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=41373.333333333336, ans=0.125 2023-12-21 12:46:08,309 INFO [train.py:886] (0/4) Epoch 2, batch 1450, loss[loss=0.01878, audio_tagging_loss=0.01878, over 25000.00 frames. ], tot_loss[loss=0.01841, audio_tagging_loss=0.01841, over 4954996.89 frames. ], batch size: 100, lr: 3.87e-02, grad_scale: 128.0 2023-12-21 12:46:26,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=41506.666666666664, ans=0.0018463768115942036 2023-12-21 12:46:41,797 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.26 vs. limit=10.0 2023-12-21 12:46:42,897 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.83 vs. limit=22.5 2023-12-21 12:46:43,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=41640.0, ans=0.125 2023-12-21 12:46:47,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=41640.0, ans=0.0 2023-12-21 12:46:56,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=15.0 2023-12-21 12:47:01,208 INFO [train.py:886] (0/4) Epoch 2, batch 1500, loss[loss=0.01865, audio_tagging_loss=0.01865, over 25000.00 frames. ], tot_loss[loss=0.01842, audio_tagging_loss=0.01842, over 4953847.61 frames. ], batch size: 100, lr: 3.87e-02, grad_scale: 256.0 2023-12-21 12:47:10,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=41840.0, ans=0.95 2023-12-21 12:47:15,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2023-12-21 12:47:23,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=41906.666666666664, ans=0.125 2023-12-21 12:47:32,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=23.40 vs. limit=15.0 2023-12-21 12:47:37,916 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 2.550e+01 2.764e+01 3.124e+01 4.346e+01, threshold=5.529e+01, percent-clipped=0.0 2023-12-21 12:47:38,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=41973.333333333336, ans=0.125 2023-12-21 12:47:40,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=41973.333333333336, ans=0.125 2023-12-21 12:47:52,767 INFO [train.py:886] (0/4) Epoch 2, batch 1550, loss[loss=0.0167, audio_tagging_loss=0.0167, over 24750.00 frames. ], tot_loss[loss=0.01849, audio_tagging_loss=0.01849, over 4952217.16 frames. ], batch size: 99, lr: 3.86e-02, grad_scale: 256.0 2023-12-21 12:47:53,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=15.0 2023-12-21 12:48:02,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=42173.333333333336, ans=0.2 2023-12-21 12:48:02,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=42173.333333333336, ans=0.125 2023-12-21 12:48:04,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42173.333333333336, ans=0.1 2023-12-21 12:48:15,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=42240.0, ans=0.125 2023-12-21 12:48:16,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=42240.0, ans=0.0016869565217391292 2023-12-21 12:48:26,480 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.622e+00 2023-12-21 12:48:43,465 INFO [train.py:886] (0/4) Epoch 2, batch 1600, loss[loss=0.0203, audio_tagging_loss=0.0203, over 24750.00 frames. ], tot_loss[loss=0.01861, audio_tagging_loss=0.01861, over 4944544.99 frames. ], batch size: 99, lr: 3.85e-02, grad_scale: 256.0 2023-12-21 12:48:46,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=15.0 2023-12-21 12:48:50,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=42440.0, ans=0.1 2023-12-21 12:48:53,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.76 vs. limit=15.0 2023-12-21 12:48:56,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=42506.666666666664, ans=0.125 2023-12-21 12:49:21,553 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+01 2.625e+01 2.827e+01 3.147e+01 4.034e+01, threshold=5.654e+01, percent-clipped=0.0 2023-12-21 12:49:28,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=42706.666666666664, ans=0.0015855072463768112 2023-12-21 12:49:32,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=42706.666666666664, ans=0.125 2023-12-21 12:49:33,638 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.12 vs. limit=15.0 2023-12-21 12:49:36,913 INFO [train.py:886] (0/4) Epoch 2, batch 1650, loss[loss=0.01615, audio_tagging_loss=0.01615, over 24750.00 frames. ], tot_loss[loss=0.01848, audio_tagging_loss=0.01848, over 4942124.54 frames. ], batch size: 99, lr: 3.85e-02, grad_scale: 256.0 2023-12-21 12:49:41,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=42773.333333333336, ans=0.125 2023-12-21 12:49:43,386 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=20.44 vs. limit=15.0 2023-12-21 12:49:59,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-12-21 12:50:00,813 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.58 vs. limit=10.0 2023-12-21 12:50:02,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.95 vs. limit=10.0 2023-12-21 12:50:11,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=42973.333333333336, ans=0.0 2023-12-21 12:50:29,835 INFO [train.py:886] (0/4) Epoch 2, batch 1700, loss[loss=0.02188, audio_tagging_loss=0.02188, over 24750.00 frames. ], tot_loss[loss=0.01834, audio_tagging_loss=0.01834, over 4945358.36 frames. ], batch size: 99, lr: 3.84e-02, grad_scale: 256.0 2023-12-21 12:50:30,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2023-12-21 12:50:34,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=43106.666666666664, ans=0.125 2023-12-21 12:50:46,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=43173.333333333336, ans=0.05 2023-12-21 12:50:47,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.0 2023-12-21 12:50:50,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=43240.0, ans=0.09899494936611666 2023-12-21 12:50:51,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.80 vs. limit=10.0 2023-12-21 12:51:07,476 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.509e+01 2.799e+01 3.084e+01 4.189e+01, threshold=5.598e+01, percent-clipped=0.0 2023-12-21 12:51:13,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=43373.333333333336, ans=0.05 2023-12-21 12:51:19,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=43373.333333333336, ans=0.125 2023-12-21 12:51:21,603 INFO [train.py:886] (0/4) Epoch 2, batch 1750, loss[loss=0.01624, audio_tagging_loss=0.01624, over 24750.00 frames. ], tot_loss[loss=0.01825, audio_tagging_loss=0.01825, over 4944714.07 frames. ], batch size: 99, lr: 3.83e-02, grad_scale: 256.0 2023-12-21 12:51:25,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=43440.0, ans=0.0 2023-12-21 12:51:39,077 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2023-12-21 12:51:42,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=43573.333333333336, ans=0.0 2023-12-21 12:51:44,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.68 vs. limit=22.5 2023-12-21 12:51:47,525 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.14 vs. limit=15.0 2023-12-21 12:51:55,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-12-21 12:51:56,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=43640.0, ans=0.0 2023-12-21 12:52:14,321 INFO [train.py:886] (0/4) Epoch 2, batch 1800, loss[loss=0.01887, audio_tagging_loss=0.01887, over 21149.00 frames. ], tot_loss[loss=0.01828, audio_tagging_loss=0.01828, over 4945213.15 frames. ], batch size: 107, lr: 3.83e-02, grad_scale: 256.0 2023-12-21 12:52:22,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=43773.333333333336, ans=0.125 2023-12-21 12:52:30,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=43840.0, ans=0.125 2023-12-21 12:52:41,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=43906.666666666664, ans=0.125 2023-12-21 12:52:49,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=43973.333333333336, ans=0.0 2023-12-21 12:52:51,538 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.458e+01 2.715e+01 2.989e+01 4.266e+01, threshold=5.430e+01, percent-clipped=0.0 2023-12-21 12:53:05,766 INFO [train.py:886] (0/4) Epoch 2, batch 1850, loss[loss=0.02158, audio_tagging_loss=0.02158, over 24948.00 frames. ], tot_loss[loss=0.01846, audio_tagging_loss=0.01846, over 4952603.32 frames. ], batch size: 100, lr: 3.82e-02, grad_scale: 256.0 2023-12-21 12:53:19,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=44173.333333333336, ans=0.1 2023-12-21 12:53:20,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=44173.333333333336, ans=0.125 2023-12-21 12:53:21,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=44173.333333333336, ans=0.1 2023-12-21 12:53:35,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=44240.0, ans=0.0012521739130434782 2023-12-21 12:53:56,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=44373.333333333336, ans=0.1 2023-12-21 12:53:59,538 INFO [train.py:886] (0/4) Epoch 2, batch 1900, loss[loss=0.01698, audio_tagging_loss=0.01698, over 24750.00 frames. ], tot_loss[loss=0.0187, audio_tagging_loss=0.0187, over 4949308.14 frames. ], batch size: 99, lr: 3.81e-02, grad_scale: 256.0 2023-12-21 12:54:02,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=44440.0, ans=0.1 2023-12-21 12:54:36,231 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.600e+01 2.818e+01 3.089e+01 5.483e+01, threshold=5.636e+01, percent-clipped=1.0 2023-12-21 12:54:52,153 INFO [train.py:886] (0/4) Epoch 2, batch 1950, loss[loss=0.01592, audio_tagging_loss=0.01592, over 25000.00 frames. ], tot_loss[loss=0.01855, audio_tagging_loss=0.01855, over 4950262.93 frames. ], batch size: 100, lr: 3.81e-02, grad_scale: 256.0 2023-12-21 12:54:53,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=44773.333333333336, ans=0.0011362318840579706 2023-12-21 12:55:13,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=44906.666666666664, ans=0.04949747468305833 2023-12-21 12:55:26,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=44973.333333333336, ans=0.1 2023-12-21 12:55:32,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.77 vs. limit=12.0 2023-12-21 12:55:42,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.85 vs. limit=22.5 2023-12-21 12:55:44,286 INFO [train.py:886] (0/4) Epoch 2, batch 2000, loss[loss=0.01492, audio_tagging_loss=0.01492, over 24750.00 frames. ], tot_loss[loss=0.01848, audio_tagging_loss=0.01848, over 4949483.79 frames. ], batch size: 99, lr: 3.80e-02, grad_scale: 256.0 2023-12-21 12:55:59,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.91 vs. limit=22.5 2023-12-21 12:56:23,062 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.490e+01 2.748e+01 3.106e+01 5.965e+01, threshold=5.495e+01, percent-clipped=1.0 2023-12-21 12:56:28,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2023-12-21 12:56:30,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=45373.333333333336, ans=0.1 2023-12-21 12:56:38,160 INFO [train.py:886] (0/4) Epoch 2, batch 2050, loss[loss=0.01724, audio_tagging_loss=0.01724, over 25000.00 frames. ], tot_loss[loss=0.01848, audio_tagging_loss=0.01848, over 4949227.78 frames. ], batch size: 100, lr: 3.80e-02, grad_scale: 256.0 2023-12-21 12:56:50,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=45506.666666666664, ans=0.035 2023-12-21 12:56:52,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=45506.666666666664, ans=0.125 2023-12-21 12:56:56,469 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.638e+01 2023-12-21 12:57:05,559 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 12:57:10,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=45640.0, ans=10.0 2023-12-21 12:57:16,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=45640.0, ans=0.125 2023-12-21 12:57:25,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=45706.666666666664, ans=0.125 2023-12-21 12:57:26,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.37 vs. limit=22.5 2023-12-21 12:57:27,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.32 vs. limit=15.0 2023-12-21 12:57:27,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.79 vs. limit=10.0 2023-12-21 12:57:31,671 INFO [train.py:886] (0/4) Epoch 2, batch 2100, loss[loss=0.01795, audio_tagging_loss=0.01795, over 25000.00 frames. ], tot_loss[loss=0.01843, audio_tagging_loss=0.01843, over 4951674.53 frames. ], batch size: 100, lr: 3.79e-02, grad_scale: 256.0 2023-12-21 12:58:10,351 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.523e+01 2.813e+01 3.062e+01 4.027e+01, threshold=5.625e+01, percent-clipped=0.0 2023-12-21 12:58:23,689 INFO [train.py:886] (0/4) Epoch 2, batch 2150, loss[loss=0.01705, audio_tagging_loss=0.01705, over 25000.00 frames. ], tot_loss[loss=0.01845, audio_tagging_loss=0.01845, over 4951132.33 frames. ], batch size: 100, lr: 3.78e-02, grad_scale: 256.0 2023-12-21 12:58:43,754 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=15.0 2023-12-21 12:58:48,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=46240.0, ans=0.125 2023-12-21 12:58:50,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.88 vs. limit=22.5 2023-12-21 12:58:55,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=46306.666666666664, ans=0.125 2023-12-21 12:59:04,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=56.47 vs. limit=22.5 2023-12-21 12:59:09,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=46373.333333333336, ans=0.125 2023-12-21 12:59:16,521 INFO [train.py:886] (0/4) Epoch 2, batch 2200, loss[loss=0.01841, audio_tagging_loss=0.01841, over 24750.00 frames. ], tot_loss[loss=0.01868, audio_tagging_loss=0.01868, over 4942386.81 frames. ], batch size: 99, lr: 3.78e-02, grad_scale: 256.0 2023-12-21 12:59:31,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=46506.666666666664, ans=0.125 2023-12-21 12:59:39,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=46573.333333333336, ans=0.0007449275362318847 2023-12-21 12:59:39,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=46573.333333333336, ans=0.2 2023-12-21 12:59:46,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=46573.333333333336, ans=0.125 2023-12-21 12:59:52,198 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=12.0 2023-12-21 12:59:54,650 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.553e+01 2.739e+01 3.029e+01 4.205e+01, threshold=5.478e+01, percent-clipped=0.0 2023-12-21 12:59:57,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=46706.666666666664, ans=0.2 2023-12-21 13:00:05,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=46706.666666666664, ans=0.025 2023-12-21 13:00:09,440 INFO [train.py:886] (0/4) Epoch 2, batch 2250, loss[loss=0.01759, audio_tagging_loss=0.01759, over 24750.00 frames. ], tot_loss[loss=0.01864, audio_tagging_loss=0.01864, over 4943423.90 frames. ], batch size: 99, lr: 3.77e-02, grad_scale: 256.0 2023-12-21 13:00:14,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=46773.333333333336, ans=15.0 2023-12-21 13:00:20,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=46840.0, ans=0.125 2023-12-21 13:00:32,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.93 vs. limit=22.5 2023-12-21 13:00:38,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=46906.666666666664, ans=0.0006724637681159423 2023-12-21 13:00:50,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=46973.333333333336, ans=0.125 2023-12-21 13:00:50,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=46973.333333333336, ans=0.125 2023-12-21 13:01:01,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=47106.666666666664, ans=0.125 2023-12-21 13:01:01,794 INFO [train.py:886] (0/4) Epoch 2, batch 2300, loss[loss=0.01712, audio_tagging_loss=0.01712, over 25000.00 frames. ], tot_loss[loss=0.01848, audio_tagging_loss=0.01848, over 4944914.46 frames. ], batch size: 100, lr: 3.76e-02, grad_scale: 256.0 2023-12-21 13:01:08,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.50 vs. limit=15.0 2023-12-21 13:01:23,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=47240.0, ans=0.2 2023-12-21 13:01:25,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=47240.0, ans=0.125 2023-12-21 13:01:38,827 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.499e+01 2.770e+01 3.074e+01 4.050e+01, threshold=5.539e+01, percent-clipped=0.0 2023-12-21 13:01:52,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=47440.0, ans=0.0 2023-12-21 13:01:54,318 INFO [train.py:886] (0/4) Epoch 2, batch 2350, loss[loss=0.01979, audio_tagging_loss=0.01979, over 25000.00 frames. ], tot_loss[loss=0.01837, audio_tagging_loss=0.01837, over 4948017.94 frames. ], batch size: 100, lr: 3.76e-02, grad_scale: 256.0 2023-12-21 13:02:00,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=47440.0, ans=0.0 2023-12-21 13:02:05,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=47506.666666666664, ans=0.125 2023-12-21 13:02:07,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.75 vs. limit=22.5 2023-12-21 13:02:09,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=47506.666666666664, ans=0.125 2023-12-21 13:02:10,863 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.608e+00 2023-12-21 13:02:22,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47573.333333333336, ans=0.1 2023-12-21 13:02:45,213 INFO [train.py:886] (0/4) Epoch 2, batch 2400, loss[loss=0.02027, audio_tagging_loss=0.02027, over 22096.00 frames. ], tot_loss[loss=0.01822, audio_tagging_loss=0.01822, over 4949908.29 frames. ], batch size: 107, lr: 3.75e-02, grad_scale: 256.0 2023-12-21 13:03:08,127 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.30 vs. limit=10.0 2023-12-21 13:03:12,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=47906.666666666664, ans=0.05 2023-12-21 13:03:13,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=47906.666666666664, ans=0.125 2023-12-21 13:03:21,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=12.0 2023-12-21 13:03:22,328 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.458e+01 2.728e+01 3.033e+01 4.100e+01, threshold=5.456e+01, percent-clipped=0.0 2023-12-21 13:03:25,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=47973.333333333336, ans=0.07 2023-12-21 13:03:28,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=12.0 2023-12-21 13:03:37,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=48106.666666666664, ans=0.125 2023-12-21 13:03:37,944 INFO [train.py:886] (0/4) Epoch 2, batch 2450, loss[loss=0.01946, audio_tagging_loss=0.01946, over 25000.00 frames. ], tot_loss[loss=0.01844, audio_tagging_loss=0.01844, over 4951083.74 frames. ], batch size: 100, lr: 3.75e-02, grad_scale: 256.0 2023-12-21 13:03:40,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=48106.666666666664, ans=0.07 2023-12-21 13:03:52,077 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.28 vs. limit=15.0 2023-12-21 13:03:53,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.98 vs. limit=15.0 2023-12-21 13:03:59,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=48240.0, ans=0.0 2023-12-21 13:04:02,734 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 13:04:21,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=48373.333333333336, ans=0.1 2023-12-21 13:04:30,303 INFO [train.py:886] (0/4) Epoch 2, batch 2500, loss[loss=0.0185, audio_tagging_loss=0.0185, over 24750.00 frames. ], tot_loss[loss=0.01859, audio_tagging_loss=0.01859, over 4949782.70 frames. ], batch size: 99, lr: 3.74e-02, grad_scale: 256.0 2023-12-21 13:05:08,424 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.182e+01 2.552e+01 2.788e+01 3.039e+01 3.953e+01, threshold=5.575e+01, percent-clipped=0.0 2023-12-21 13:05:21,978 INFO [train.py:886] (0/4) Epoch 2, batch 2550, loss[loss=0.01959, audio_tagging_loss=0.01959, over 23995.00 frames. ], tot_loss[loss=0.01873, audio_tagging_loss=0.01873, over 4939442.33 frames. ], batch size: 100, lr: 3.73e-02, grad_scale: 256.0 2023-12-21 13:05:22,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.73 vs. limit=22.5 2023-12-21 13:05:24,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-12-21 13:05:36,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=48840.0, ans=0.125 2023-12-21 13:05:37,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=48840.0, ans=0.00025217391304347726 2023-12-21 13:05:39,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=20.55 vs. limit=15.0 2023-12-21 13:05:41,017 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.81 vs. limit=22.5 2023-12-21 13:05:42,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.22 vs. limit=15.0 2023-12-21 13:05:43,835 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.13 vs. limit=22.5 2023-12-21 13:05:49,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=48906.666666666664, ans=0.00023768115942028947 2023-12-21 13:05:49,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=48906.666666666664, ans=0.125 2023-12-21 13:05:50,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.60 vs. limit=10.0 2023-12-21 13:05:51,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=48906.666666666664, ans=0.1 2023-12-21 13:06:16,099 INFO [train.py:886] (0/4) Epoch 2, batch 2600, loss[loss=0.01662, audio_tagging_loss=0.01662, over 25000.00 frames. ], tot_loss[loss=0.01854, audio_tagging_loss=0.01854, over 4937759.58 frames. ], batch size: 100, lr: 3.73e-02, grad_scale: 256.0 2023-12-21 13:06:35,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=49240.0, ans=0.0 2023-12-21 13:06:40,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.66 vs. limit=6.0 2023-12-21 13:06:41,246 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2023-12-21 13:06:45,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=49240.0, ans=0.0 2023-12-21 13:06:47,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=49306.666666666664, ans=0.0 2023-12-21 13:06:53,173 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.505e+01 2.770e+01 3.074e+01 4.443e+01, threshold=5.539e+01, percent-clipped=0.0 2023-12-21 13:06:54,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=49306.666666666664, ans=0.2 2023-12-21 13:06:58,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=49373.333333333336, ans=0.1 2023-12-21 13:07:07,307 INFO [train.py:886] (0/4) Epoch 2, batch 2650, loss[loss=0.01441, audio_tagging_loss=0.01441, over 25000.00 frames. ], tot_loss[loss=0.01845, audio_tagging_loss=0.01845, over 4939731.88 frames. ], batch size: 100, lr: 3.72e-02, grad_scale: 256.0 2023-12-21 13:07:13,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=49440.0, ans=0.125 2023-12-21 13:07:20,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=49506.666666666664, ans=0.0 2023-12-21 13:07:20,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=49506.666666666664, ans=0.125 2023-12-21 13:07:30,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=49573.333333333336, ans=0.0 2023-12-21 13:07:39,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=12.0 2023-12-21 13:07:46,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=49640.0, ans=0.125 2023-12-21 13:07:49,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49706.666666666664, ans=0.1 2023-12-21 13:07:52,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=49706.666666666664, ans=0.125 2023-12-21 13:08:00,849 INFO [train.py:886] (0/4) Epoch 2, batch 2700, loss[loss=0.01542, audio_tagging_loss=0.01542, over 25000.00 frames. ], tot_loss[loss=0.01838, audio_tagging_loss=0.01838, over 4944260.07 frames. ], batch size: 100, lr: 3.71e-02, grad_scale: 256.0 2023-12-21 13:08:03,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=49773.333333333336, ans=0.5 2023-12-21 13:08:07,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.62 vs. limit=12.0 2023-12-21 13:08:10,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=49840.0, ans=3.478260869565174e-05 2023-12-21 13:08:25,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=49906.666666666664, ans=2.0289855072463947e-05 2023-12-21 13:08:38,470 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.555e+01 2.825e+01 3.141e+01 4.056e+01, threshold=5.650e+01, percent-clipped=0.0 2023-12-21 13:08:53,326 INFO [train.py:886] (0/4) Epoch 2, batch 2750, loss[loss=0.01991, audio_tagging_loss=0.01991, over 25000.00 frames. ], tot_loss[loss=0.01834, audio_tagging_loss=0.01834, over 4951168.20 frames. ], batch size: 100, lr: 3.71e-02, grad_scale: 256.0 2023-12-21 13:09:09,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=50173.333333333336, ans=0.125 2023-12-21 13:09:40,925 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.24 vs. limit=22.5 2023-12-21 13:09:42,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=50373.333333333336, ans=0.0 2023-12-21 13:09:45,136 INFO [train.py:886] (0/4) Epoch 2, batch 2800, loss[loss=0.02039, audio_tagging_loss=0.02039, over 24750.00 frames. ], tot_loss[loss=0.01847, audio_tagging_loss=0.01847, over 4950027.09 frames. ], batch size: 99, lr: 3.70e-02, grad_scale: 256.0 2023-12-21 13:09:54,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=50440.0, ans=0.09899494936611666 2023-12-21 13:09:58,128 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.961e+00 2023-12-21 13:10:08,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=50573.333333333336, ans=0.04949747468305833 2023-12-21 13:10:08,343 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2023-12-21 13:10:16,338 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.12 vs. limit=15.0 2023-12-21 13:10:23,042 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.547e+01 2.749e+01 3.005e+01 4.633e+01, threshold=5.497e+01, percent-clipped=0.0 2023-12-21 13:10:35,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=50706.666666666664, ans=0.125 2023-12-21 13:10:36,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=50706.666666666664, ans=0.0 2023-12-21 13:10:38,450 INFO [train.py:886] (0/4) Epoch 2, batch 2850, loss[loss=0.01853, audio_tagging_loss=0.01853, over 24750.00 frames. ], tot_loss[loss=0.01855, audio_tagging_loss=0.01855, over 4941214.68 frames. ], batch size: 99, lr: 3.70e-02, grad_scale: 256.0 2023-12-21 13:10:39,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=50773.333333333336, ans=0.125 2023-12-21 13:10:43,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=50773.333333333336, ans=0.125 2023-12-21 13:10:50,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=50840.0, ans=0.0 2023-12-21 13:10:51,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=50840.0, ans=0.0 2023-12-21 13:10:59,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=50906.666666666664, ans=0.125 2023-12-21 13:11:01,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=50906.666666666664, ans=0.2 2023-12-21 13:11:08,840 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.56 vs. limit=15.0 2023-12-21 13:11:22,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=51040.0, ans=0.0 2023-12-21 13:11:30,668 INFO [train.py:886] (0/4) Epoch 2, batch 2900, loss[loss=0.02206, audio_tagging_loss=0.02206, over 25000.00 frames. ], tot_loss[loss=0.01841, audio_tagging_loss=0.01841, over 4940935.98 frames. ], batch size: 100, lr: 3.69e-02, grad_scale: 256.0 2023-12-21 13:11:36,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=51106.666666666664, ans=0.09899494936611666 2023-12-21 13:11:38,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=51106.666666666664, ans=0.125 2023-12-21 13:12:08,844 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.551e+01 2.842e+01 3.147e+01 4.281e+01, threshold=5.684e+01, percent-clipped=0.0 2023-12-21 13:12:22,987 INFO [train.py:886] (0/4) Epoch 2, batch 2950, loss[loss=0.01852, audio_tagging_loss=0.01852, over 25000.00 frames. ], tot_loss[loss=0.01823, audio_tagging_loss=0.01823, over 4940390.98 frames. ], batch size: 100, lr: 3.68e-02, grad_scale: 256.0 2023-12-21 13:12:38,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=51506.666666666664, ans=0.125 2023-12-21 13:12:43,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.83 vs. limit=22.5 2023-12-21 13:12:52,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=51573.333333333336, ans=0.0 2023-12-21 13:13:10,808 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.96 vs. limit=22.5 2023-12-21 13:13:15,801 INFO [train.py:886] (0/4) Epoch 2, batch 3000, loss[loss=0.01939, audio_tagging_loss=0.01939, over 25000.00 frames. ], tot_loss[loss=0.01823, audio_tagging_loss=0.01823, over 4945317.10 frames. ], batch size: 100, lr: 3.68e-02, grad_scale: 256.0 2023-12-21 13:13:15,802 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 13:13:26,258 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.4514, 2.1646, 2.2419, 1.8589], device='cuda:0') 2023-12-21 13:13:38,880 INFO [train.py:917] (0/4) Epoch 2, validation: loss=0.04373, audio_tagging_loss=0.04373, over 3737520.00 frames. 2023-12-21 13:13:38,880 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 13:13:49,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.11 vs. limit=15.0 2023-12-21 13:14:11,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=51973.333333333336, ans=0.2 2023-12-21 13:14:16,990 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.105e+01 2.493e+01 2.750e+01 3.115e+01 4.237e+01, threshold=5.501e+01, percent-clipped=0.0 2023-12-21 13:14:18,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=51973.333333333336, ans=0.125 2023-12-21 13:14:25,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=52040.0, ans=0.125 2023-12-21 13:14:31,123 INFO [train.py:886] (0/4) Epoch 2, batch 3050, loss[loss=0.01836, audio_tagging_loss=0.01836, over 24750.00 frames. ], tot_loss[loss=0.01826, audio_tagging_loss=0.01826, over 4946660.21 frames. ], batch size: 99, lr: 3.67e-02, grad_scale: 256.0 2023-12-21 13:14:35,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=52106.666666666664, ans=0.125 2023-12-21 13:14:46,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=52173.333333333336, ans=0.125 2023-12-21 13:14:55,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=52240.0, ans=0.0 2023-12-21 13:15:16,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=52373.333333333336, ans=0.0 2023-12-21 13:15:18,873 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.52 vs. limit=10.0 2023-12-21 13:15:24,008 INFO [train.py:886] (0/4) Epoch 2, batch 3100, loss[loss=0.01759, audio_tagging_loss=0.01759, over 25000.00 frames. ], tot_loss[loss=0.01833, audio_tagging_loss=0.01833, over 4948603.78 frames. ], batch size: 100, lr: 3.67e-02, grad_scale: 256.0 2023-12-21 13:15:24,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.27 vs. limit=22.5 2023-12-21 13:15:35,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=52506.666666666664, ans=0.1 2023-12-21 13:15:36,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=23.39 vs. limit=22.5 2023-12-21 13:15:56,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=52640.0, ans=0.1 2023-12-21 13:15:59,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.04 vs. limit=10.0 2023-12-21 13:16:01,675 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.616e+01 2.830e+01 3.122e+01 4.076e+01, threshold=5.659e+01, percent-clipped=0.0 2023-12-21 13:16:15,753 INFO [train.py:886] (0/4) Epoch 2, batch 3150, loss[loss=0.01934, audio_tagging_loss=0.01934, over 24750.00 frames. ], tot_loss[loss=0.01852, audio_tagging_loss=0.01852, over 4950497.46 frames. ], batch size: 99, lr: 3.66e-02, grad_scale: 256.0 2023-12-21 13:16:17,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=52773.333333333336, ans=0.0 2023-12-21 13:16:18,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.83 vs. limit=22.5 2023-12-21 13:17:04,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=53040.0, ans=0.0 2023-12-21 13:17:08,692 INFO [train.py:886] (0/4) Epoch 2, batch 3200, loss[loss=0.01766, audio_tagging_loss=0.01766, over 22860.00 frames. ], tot_loss[loss=0.01842, audio_tagging_loss=0.01842, over 4949306.32 frames. ], batch size: 107, lr: 3.65e-02, grad_scale: 256.0 2023-12-21 13:17:29,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=53173.333333333336, ans=0.125 2023-12-21 13:17:43,462 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-8000.pt 2023-12-21 13:17:48,108 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.552e+01 2.741e+01 3.134e+01 4.308e+01, threshold=5.481e+01, percent-clipped=0.0 2023-12-21 13:17:48,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=53306.666666666664, ans=0.125 2023-12-21 13:18:01,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=53373.333333333336, ans=0.125 2023-12-21 13:18:04,656 INFO [train.py:886] (0/4) Epoch 2, batch 3250, loss[loss=0.01599, audio_tagging_loss=0.01599, over 25000.00 frames. ], tot_loss[loss=0.01832, audio_tagging_loss=0.01832, over 4950518.49 frames. ], batch size: 100, lr: 3.65e-02, grad_scale: 256.0 2023-12-21 13:18:23,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=53573.333333333336, ans=0.125 2023-12-21 13:18:44,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=53640.0, ans=0.125 2023-12-21 13:18:55,715 INFO [train.py:886] (0/4) Epoch 2, batch 3300, loss[loss=0.0188, audio_tagging_loss=0.0188, over 24750.00 frames. ], tot_loss[loss=0.01829, audio_tagging_loss=0.01829, over 4956133.92 frames. ], batch size: 99, lr: 3.64e-02, grad_scale: 256.0 2023-12-21 13:18:58,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.83 vs. limit=22.5 2023-12-21 13:19:03,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=53773.333333333336, ans=0.125 2023-12-21 13:19:04,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.80 vs. limit=15.0 2023-12-21 13:19:06,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=53840.0, ans=0.125 2023-12-21 13:19:16,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=53840.0, ans=0.125 2023-12-21 13:19:31,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=53973.333333333336, ans=15.0 2023-12-21 13:19:35,085 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.487e+01 2.709e+01 2.952e+01 3.963e+01, threshold=5.419e+01, percent-clipped=0.0 2023-12-21 13:19:36,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=53973.333333333336, ans=0.125 2023-12-21 13:19:50,316 INFO [train.py:886] (0/4) Epoch 2, batch 3350, loss[loss=0.02049, audio_tagging_loss=0.02049, over 25000.00 frames. ], tot_loss[loss=0.01828, audio_tagging_loss=0.01828, over 4963424.33 frames. ], batch size: 100, lr: 3.64e-02, grad_scale: 256.0 2023-12-21 13:19:54,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.08 vs. limit=15.0 2023-12-21 13:20:02,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2023-12-21 13:20:20,791 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.13 vs. limit=15.0 2023-12-21 13:20:31,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=54373.333333333336, ans=0.1 2023-12-21 13:20:37,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=54373.333333333336, ans=0.125 2023-12-21 13:20:41,755 INFO [train.py:886] (0/4) Epoch 2, batch 3400, loss[loss=0.01493, audio_tagging_loss=0.01493, over 25000.00 frames. ], tot_loss[loss=0.0184, audio_tagging_loss=0.0184, over 4966224.58 frames. ], batch size: 100, lr: 3.63e-02, grad_scale: 256.0 2023-12-21 13:21:20,897 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.559e+01 2.794e+01 3.054e+01 3.708e+01, threshold=5.589e+01, percent-clipped=0.0 2023-12-21 13:21:26,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.19 vs. limit=22.5 2023-12-21 13:21:34,414 INFO [train.py:886] (0/4) Epoch 2, batch 3450, loss[loss=0.01844, audio_tagging_loss=0.01844, over 24750.00 frames. ], tot_loss[loss=0.01845, audio_tagging_loss=0.01845, over 4955568.71 frames. ], batch size: 99, lr: 3.62e-02, grad_scale: 256.0 2023-12-21 13:21:35,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=54773.333333333336, ans=0.125 2023-12-21 13:21:51,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.64 vs. limit=15.0 2023-12-21 13:21:56,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=54906.666666666664, ans=0.125 2023-12-21 13:22:28,217 INFO [train.py:886] (0/4) Epoch 2, batch 3500, loss[loss=0.01538, audio_tagging_loss=0.01538, over 24750.00 frames. ], tot_loss[loss=0.01837, audio_tagging_loss=0.01837, over 4949380.08 frames. ], batch size: 99, lr: 3.62e-02, grad_scale: 512.0 2023-12-21 13:22:30,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.42 vs. limit=10.0 2023-12-21 13:22:31,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=55106.666666666664, ans=0.1 2023-12-21 13:23:05,440 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.571e+01 2.817e+01 3.195e+01 5.368e+01, threshold=5.635e+01, percent-clipped=0.0 2023-12-21 13:23:10,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=55373.333333333336, ans=0.2 2023-12-21 13:23:19,473 INFO [train.py:886] (0/4) Epoch 2, batch 3550, loss[loss=0.01578, audio_tagging_loss=0.01578, over 25000.00 frames. ], tot_loss[loss=0.01823, audio_tagging_loss=0.01823, over 4946931.19 frames. ], batch size: 100, lr: 3.61e-02, grad_scale: 512.0 2023-12-21 13:23:29,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=55440.0, ans=0.1 2023-12-21 13:23:39,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=55573.333333333336, ans=0.125 2023-12-21 13:23:45,655 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 13:23:45,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=55573.333333333336, ans=0.07 2023-12-21 13:24:11,731 INFO [train.py:886] (0/4) Epoch 2, batch 3600, loss[loss=0.01601, audio_tagging_loss=0.01601, over 25000.00 frames. ], tot_loss[loss=0.01821, audio_tagging_loss=0.01821, over 4946610.50 frames. ], batch size: 100, lr: 3.61e-02, grad_scale: 512.0 2023-12-21 13:24:13,404 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.65 vs. limit=15.0 2023-12-21 13:24:14,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=55773.333333333336, ans=0.125 2023-12-21 13:24:16,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.09 vs. limit=10.0 2023-12-21 13:24:18,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=55773.333333333336, ans=0.0 2023-12-21 13:24:42,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=55973.333333333336, ans=0.0 2023-12-21 13:24:47,325 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.22 vs. limit=22.5 2023-12-21 13:24:50,574 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.558e+01 2.810e+01 3.070e+01 4.011e+01, threshold=5.620e+01, percent-clipped=0.0 2023-12-21 13:25:04,353 INFO [train.py:886] (0/4) Epoch 2, batch 3650, loss[loss=0.01822, audio_tagging_loss=0.01822, over 25000.00 frames. ], tot_loss[loss=0.0181, audio_tagging_loss=0.0181, over 4953170.08 frames. ], batch size: 100, lr: 3.60e-02, grad_scale: 256.0 2023-12-21 13:25:05,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=56106.666666666664, ans=0.2 2023-12-21 13:25:29,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=56240.0, ans=0.035 2023-12-21 13:25:34,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=56306.666666666664, ans=0.125 2023-12-21 13:25:47,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=56373.333333333336, ans=0.1 2023-12-21 13:25:49,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=56373.333333333336, ans=0.125 2023-12-21 13:25:50,342 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.593e-01 2023-12-21 13:25:56,762 INFO [train.py:886] (0/4) Epoch 2, batch 3700, loss[loss=0.01918, audio_tagging_loss=0.01918, over 25000.00 frames. ], tot_loss[loss=0.01822, audio_tagging_loss=0.01822, over 4956227.46 frames. ], batch size: 100, lr: 3.59e-02, grad_scale: 256.0 2023-12-21 13:25:57,360 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.54 vs. limit=22.5 2023-12-21 13:25:59,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.85 vs. limit=22.5 2023-12-21 13:26:05,680 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.92 vs. limit=22.5 2023-12-21 13:26:10,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=56506.666666666664, ans=0.125 2023-12-21 13:26:12,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=12.0 2023-12-21 13:26:27,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=56640.0, ans=0.125 2023-12-21 13:26:28,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=56640.0, ans=0.0 2023-12-21 13:26:35,075 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.535e+01 2.837e+01 3.085e+01 3.878e+01, threshold=5.675e+01, percent-clipped=0.0 2023-12-21 13:26:37,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=56706.666666666664, ans=0.125 2023-12-21 13:26:41,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=56706.666666666664, ans=0.125 2023-12-21 13:26:50,304 INFO [train.py:886] (0/4) Epoch 2, batch 3750, loss[loss=0.0157, audio_tagging_loss=0.0157, over 24750.00 frames. ], tot_loss[loss=0.01826, audio_tagging_loss=0.01826, over 4952132.96 frames. ], batch size: 99, lr: 3.59e-02, grad_scale: 256.0 2023-12-21 13:27:22,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=56973.333333333336, ans=0.125 2023-12-21 13:27:36,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=57040.0, ans=0.07 2023-12-21 13:27:41,334 INFO [train.py:886] (0/4) Epoch 2, batch 3800, loss[loss=0.01799, audio_tagging_loss=0.01799, over 24750.00 frames. ], tot_loss[loss=0.01831, audio_tagging_loss=0.01831, over 4943890.47 frames. ], batch size: 99, lr: 3.58e-02, grad_scale: 256.0 2023-12-21 13:27:49,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=57106.666666666664, ans=0.025 2023-12-21 13:27:50,807 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.93 vs. limit=15.0 2023-12-21 13:27:58,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.16 vs. limit=15.0 2023-12-21 13:27:59,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.48 vs. limit=15.0 2023-12-21 13:28:06,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=57240.0, ans=22.5 2023-12-21 13:28:10,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=57240.0, ans=0.0 2023-12-21 13:28:20,921 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.558e+01 2.812e+01 3.070e+01 5.505e+01, threshold=5.625e+01, percent-clipped=0.0 2023-12-21 13:28:27,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=57373.333333333336, ans=0.0 2023-12-21 13:28:34,249 INFO [train.py:886] (0/4) Epoch 2, batch 3850, loss[loss=0.0194, audio_tagging_loss=0.0194, over 24750.00 frames. ], tot_loss[loss=0.01826, audio_tagging_loss=0.01826, over 4945982.06 frames. ], batch size: 99, lr: 3.58e-02, grad_scale: 256.0 2023-12-21 13:29:27,429 INFO [train.py:886] (0/4) Epoch 2, batch 3900, loss[loss=0.01756, audio_tagging_loss=0.01756, over 25000.00 frames. ], tot_loss[loss=0.01814, audio_tagging_loss=0.01814, over 4950395.11 frames. ], batch size: 100, lr: 3.57e-02, grad_scale: 256.0 2023-12-21 13:29:28,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=57773.333333333336, ans=0.125 2023-12-21 13:29:28,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=15.0 2023-12-21 13:29:34,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=57773.333333333336, ans=0.125 2023-12-21 13:29:54,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=57906.666666666664, ans=0.125 2023-12-21 13:30:02,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=57973.333333333336, ans=0.2 2023-12-21 13:30:05,871 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.478e+01 2.658e+01 2.976e+01 3.993e+01, threshold=5.317e+01, percent-clipped=0.0 2023-12-21 13:30:14,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2023-12-21 13:30:19,059 INFO [train.py:886] (0/4) Epoch 2, batch 3950, loss[loss=0.01725, audio_tagging_loss=0.01725, over 25000.00 frames. ], tot_loss[loss=0.01805, audio_tagging_loss=0.01805, over 4953084.28 frames. ], batch size: 100, lr: 3.56e-02, grad_scale: 256.0 2023-12-21 13:30:25,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=58106.666666666664, ans=0.0 2023-12-21 13:30:36,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=58173.333333333336, ans=0.2 2023-12-21 13:30:37,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.21 vs. limit=10.0 2023-12-21 13:30:54,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=58306.666666666664, ans=0.1 2023-12-21 13:31:00,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=58373.333333333336, ans=0.125 2023-12-21 13:31:12,218 INFO [train.py:886] (0/4) Epoch 2, batch 4000, loss[loss=0.01757, audio_tagging_loss=0.01757, over 25000.00 frames. ], tot_loss[loss=0.01816, audio_tagging_loss=0.01816, over 4954608.17 frames. ], batch size: 100, lr: 3.56e-02, grad_scale: 256.0 2023-12-21 13:31:23,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2023-12-21 13:31:49,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=58640.0, ans=0.04949747468305833 2023-12-21 13:31:50,750 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.173e+01 2.643e+01 2.871e+01 3.262e+01 4.395e+01, threshold=5.743e+01, percent-clipped=0.0 2023-12-21 13:32:04,039 INFO [train.py:886] (0/4) Epoch 2, batch 4050, loss[loss=0.01838, audio_tagging_loss=0.01838, over 24750.00 frames. ], tot_loss[loss=0.01819, audio_tagging_loss=0.01819, over 4955919.56 frames. ], batch size: 99, lr: 3.55e-02, grad_scale: 256.0 2023-12-21 13:32:05,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=58773.333333333336, ans=0.125 2023-12-21 13:32:19,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=58840.0, ans=0.0 2023-12-21 13:32:19,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=58840.0, ans=0.0 2023-12-21 13:32:32,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=58906.666666666664, ans=0.125 2023-12-21 13:32:32,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2023-12-21 13:32:34,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=58973.333333333336, ans=0.125 2023-12-21 13:32:44,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=15.0 2023-12-21 13:32:51,570 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.53 vs. limit=22.5 2023-12-21 13:32:55,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=59106.666666666664, ans=0.125 2023-12-21 13:32:56,517 INFO [train.py:886] (0/4) Epoch 2, batch 4100, loss[loss=0.01874, audio_tagging_loss=0.01874, over 24750.00 frames. ], tot_loss[loss=0.01836, audio_tagging_loss=0.01836, over 4953832.52 frames. ], batch size: 99, lr: 3.55e-02, grad_scale: 256.0 2023-12-21 13:33:05,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=59106.666666666664, ans=0.125 2023-12-21 13:33:16,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=59240.0, ans=0.125 2023-12-21 13:33:16,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2023-12-21 13:33:21,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=59240.0, ans=0.125 2023-12-21 13:33:34,103 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.565e+01 2.821e+01 3.074e+01 4.312e+01, threshold=5.642e+01, percent-clipped=0.0 2023-12-21 13:33:34,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=59306.666666666664, ans=0.125 2023-12-21 13:33:35,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=59306.666666666664, ans=0.0 2023-12-21 13:33:37,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=59373.333333333336, ans=0.0 2023-12-21 13:33:39,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.36 vs. limit=15.0 2023-12-21 13:33:48,678 INFO [train.py:886] (0/4) Epoch 2, batch 4150, loss[loss=0.01571, audio_tagging_loss=0.01571, over 24750.00 frames. ], tot_loss[loss=0.0183, audio_tagging_loss=0.0183, over 4947388.10 frames. ], batch size: 99, lr: 3.54e-02, grad_scale: 256.0 2023-12-21 13:34:02,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=59506.666666666664, ans=0.125 2023-12-21 13:34:03,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.22 vs. limit=10.0 2023-12-21 13:34:09,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=59573.333333333336, ans=0.125 2023-12-21 13:34:26,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=59640.0, ans=0.125 2023-12-21 13:34:34,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=59706.666666666664, ans=0.2 2023-12-21 13:34:38,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=59773.333333333336, ans=0.0 2023-12-21 13:34:40,075 INFO [train.py:886] (0/4) Epoch 2, batch 4200, loss[loss=0.01922, audio_tagging_loss=0.01922, over 24750.00 frames. ], tot_loss[loss=0.01819, audio_tagging_loss=0.01819, over 4948689.54 frames. ], batch size: 99, lr: 3.53e-02, grad_scale: 256.0 2023-12-21 13:34:56,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=59840.0, ans=0.1 2023-12-21 13:35:15,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=59973.333333333336, ans=0.05 2023-12-21 13:35:18,623 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.091e+01 2.515e+01 2.712e+01 3.027e+01 3.804e+01, threshold=5.424e+01, percent-clipped=0.0 2023-12-21 13:35:31,820 INFO [train.py:886] (0/4) Epoch 2, batch 4250, loss[loss=0.01585, audio_tagging_loss=0.01585, over 24750.00 frames. ], tot_loss[loss=0.01815, audio_tagging_loss=0.01815, over 4948777.81 frames. ], batch size: 99, lr: 3.53e-02, grad_scale: 256.0 2023-12-21 13:35:51,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=60173.333333333336, ans=0.125 2023-12-21 13:35:54,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=60240.0, ans=0.125 2023-12-21 13:35:54,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=60240.0, ans=0.125 2023-12-21 13:35:57,153 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=25.01 vs. limit=22.5 2023-12-21 13:36:04,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=60306.666666666664, ans=0.0 2023-12-21 13:36:24,761 INFO [train.py:886] (0/4) Epoch 2, batch 4300, loss[loss=0.01382, audio_tagging_loss=0.01382, over 24085.00 frames. ], tot_loss[loss=0.01811, audio_tagging_loss=0.01811, over 4952502.11 frames. ], batch size: 100, lr: 3.52e-02, grad_scale: 256.0 2023-12-21 13:36:25,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.19 vs. limit=22.5 2023-12-21 13:36:28,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=60440.0, ans=0.5 2023-12-21 13:36:29,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-12-21 13:36:36,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=60506.666666666664, ans=0.125 2023-12-21 13:36:47,076 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2023-12-21 13:37:03,036 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+01 2.538e+01 2.771e+01 3.049e+01 3.843e+01, threshold=5.542e+01, percent-clipped=0.0 2023-12-21 13:37:08,271 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.32 vs. limit=10.0 2023-12-21 13:37:15,379 INFO [train.py:886] (0/4) Epoch 2, batch 4350, loss[loss=0.02109, audio_tagging_loss=0.02109, over 25000.00 frames. ], tot_loss[loss=0.01816, audio_tagging_loss=0.01816, over 4951782.70 frames. ], batch size: 100, lr: 3.52e-02, grad_scale: 256.0 2023-12-21 13:37:16,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=15.0 2023-12-21 13:37:27,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.35 vs. limit=15.0 2023-12-21 13:37:30,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=12.0 2023-12-21 13:37:36,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=60906.666666666664, ans=0.1 2023-12-21 13:37:44,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=60906.666666666664, ans=0.04949747468305833 2023-12-21 13:37:49,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=60973.333333333336, ans=0.0 2023-12-21 13:38:08,515 INFO [train.py:886] (0/4) Epoch 2, batch 4400, loss[loss=0.01827, audio_tagging_loss=0.01827, over 24750.00 frames. ], tot_loss[loss=0.0183, audio_tagging_loss=0.0183, over 4951425.58 frames. ], batch size: 99, lr: 3.51e-02, grad_scale: 256.0 2023-12-21 13:38:17,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=61173.333333333336, ans=0.125 2023-12-21 13:38:27,521 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=12.0 2023-12-21 13:38:35,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=15.0 2023-12-21 13:38:46,108 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.594e+01 2.828e+01 3.102e+01 3.980e+01, threshold=5.657e+01, percent-clipped=0.0 2023-12-21 13:38:46,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=61306.666666666664, ans=0.2 2023-12-21 13:38:47,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=61306.666666666664, ans=0.1 2023-12-21 13:38:49,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=61373.333333333336, ans=0.07 2023-12-21 13:38:54,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=61373.333333333336, ans=0.04949747468305833 2023-12-21 13:38:59,965 INFO [train.py:886] (0/4) Epoch 2, batch 4450, loss[loss=0.01667, audio_tagging_loss=0.01667, over 25000.00 frames. ], tot_loss[loss=0.01823, audio_tagging_loss=0.01823, over 4950446.21 frames. ], batch size: 100, lr: 3.51e-02, grad_scale: 256.0 2023-12-21 13:39:05,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=61440.0, ans=0.1 2023-12-21 13:39:17,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=61506.666666666664, ans=0.125 2023-12-21 13:39:18,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=61506.666666666664, ans=0.125 2023-12-21 13:39:22,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=61573.333333333336, ans=0.95 2023-12-21 13:39:22,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=61573.333333333336, ans=0.125 2023-12-21 13:39:23,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=61573.333333333336, ans=0.125 2023-12-21 13:39:44,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=61706.666666666664, ans=0.0 2023-12-21 13:39:47,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=15.0 2023-12-21 13:39:51,921 INFO [train.py:886] (0/4) Epoch 2, batch 4500, loss[loss=0.02085, audio_tagging_loss=0.02085, over 24008.00 frames. ], tot_loss[loss=0.01819, audio_tagging_loss=0.01819, over 4948379.57 frames. ], batch size: 100, lr: 3.50e-02, grad_scale: 256.0 2023-12-21 13:39:54,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=61773.333333333336, ans=0.125 2023-12-21 13:40:06,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=61840.0, ans=0.0 2023-12-21 13:40:10,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=61840.0, ans=0.2 2023-12-21 13:40:10,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=61840.0, ans=0.0 2023-12-21 13:40:16,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2023-12-21 13:40:16,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=61906.666666666664, ans=15.0 2023-12-21 13:40:19,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=61906.666666666664, ans=0.0 2023-12-21 13:40:22,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=61973.333333333336, ans=0.125 2023-12-21 13:40:22,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.41 vs. limit=15.0 2023-12-21 13:40:29,936 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.527e+01 2.862e+01 3.154e+01 4.163e+01, threshold=5.724e+01, percent-clipped=0.0 2023-12-21 13:40:30,459 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2023-12-21 13:40:44,706 INFO [train.py:886] (0/4) Epoch 2, batch 4550, loss[loss=0.01564, audio_tagging_loss=0.01564, over 25000.00 frames. ], tot_loss[loss=0.01811, audio_tagging_loss=0.01811, over 4952326.30 frames. ], batch size: 100, lr: 3.49e-02, grad_scale: 256.0 2023-12-21 13:41:09,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=62240.0, ans=0.09899494936611666 2023-12-21 13:41:26,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.95 vs. limit=15.0 2023-12-21 13:41:29,242 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.98 vs. limit=22.5 2023-12-21 13:41:30,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=62373.333333333336, ans=0.125 2023-12-21 13:41:32,914 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=8.512e+01 2023-12-21 13:41:35,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.00 vs. limit=22.5 2023-12-21 13:41:35,493 INFO [train.py:886] (0/4) Epoch 2, batch 4600, loss[loss=0.01746, audio_tagging_loss=0.01746, over 25000.00 frames. ], tot_loss[loss=0.01799, audio_tagging_loss=0.01799, over 4955027.22 frames. ], batch size: 100, lr: 3.49e-02, grad_scale: 256.0 2023-12-21 13:41:45,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=62440.0, ans=0.0 2023-12-21 13:41:46,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=62506.666666666664, ans=0.125 2023-12-21 13:41:54,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=12.0 2023-12-21 13:42:15,551 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 2.477e+01 2.661e+01 2.958e+01 4.591e+01, threshold=5.321e+01, percent-clipped=0.0 2023-12-21 13:42:17,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=62640.0, ans=0.0 2023-12-21 13:42:28,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=62773.333333333336, ans=0.125 2023-12-21 13:42:29,640 INFO [train.py:886] (0/4) Epoch 2, batch 4650, loss[loss=0.01737, audio_tagging_loss=0.01737, over 25000.00 frames. ], tot_loss[loss=0.01811, audio_tagging_loss=0.01811, over 4956852.54 frames. ], batch size: 100, lr: 3.48e-02, grad_scale: 256.0 2023-12-21 13:42:29,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=62773.333333333336, ans=0.0 2023-12-21 13:42:34,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=62773.333333333336, ans=0.1 2023-12-21 13:42:47,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=62840.0, ans=0.05 2023-12-21 13:42:53,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=62906.666666666664, ans=0.125 2023-12-21 13:43:05,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=62973.333333333336, ans=0.0 2023-12-21 13:43:13,707 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.58 vs. limit=22.5 2023-12-21 13:43:19,222 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.43 vs. limit=22.5 2023-12-21 13:43:19,726 INFO [train.py:886] (0/4) Epoch 2, batch 4700, loss[loss=0.01542, audio_tagging_loss=0.01542, over 24750.00 frames. ], tot_loss[loss=0.01825, audio_tagging_loss=0.01825, over 4952665.81 frames. ], batch size: 99, lr: 3.48e-02, grad_scale: 256.0 2023-12-21 13:43:20,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=63106.666666666664, ans=0.0 2023-12-21 13:43:44,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=63240.0, ans=0.125 2023-12-21 13:43:52,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=63306.666666666664, ans=0.035 2023-12-21 13:43:55,041 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.72 vs. limit=15.0 2023-12-21 13:43:55,353 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.602e+01 2.941e+01 3.244e+01 4.815e+01, threshold=5.882e+01, percent-clipped=0.0 2023-12-21 13:43:58,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.60 vs. limit=15.0 2023-12-21 13:44:05,986 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.57 vs. limit=15.0 2023-12-21 13:44:07,435 INFO [train.py:886] (0/4) Epoch 2, batch 4750, loss[loss=0.01674, audio_tagging_loss=0.01674, over 24750.00 frames. ], tot_loss[loss=0.01841, audio_tagging_loss=0.01841, over 4948505.39 frames. ], batch size: 99, lr: 3.47e-02, grad_scale: 256.0 2023-12-21 13:44:22,406 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-2.pt 2023-12-21 13:44:45,606 INFO [train.py:886] (0/4) Epoch 3, batch 0, loss[loss=0.05402, audio_tagging_loss=0.05402, over 21159.00 frames. ], tot_loss[loss=0.05402, audio_tagging_loss=0.05402, over 21159.00 frames. ], batch size: 107, lr: 3.30e-02, grad_scale: 256.0 2023-12-21 13:44:45,608 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 13:45:06,900 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.8064, 1.7191, 3.0258, 2.3958], device='cuda:0') 2023-12-21 13:45:08,234 INFO [train.py:917] (0/4) Epoch 3, validation: loss=0.04026, audio_tagging_loss=0.04026, over 3737520.00 frames. 2023-12-21 13:45:08,234 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 13:45:09,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=63546.666666666664, ans=0.2 2023-12-21 13:45:11,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=16.21 vs. limit=15.0 2023-12-21 13:45:24,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=63613.333333333336, ans=0.2 2023-12-21 13:45:24,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=63613.333333333336, ans=0.0 2023-12-21 13:45:28,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=63613.333333333336, ans=0.0 2023-12-21 13:45:31,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=63680.0, ans=0.125 2023-12-21 13:45:34,351 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=12.0 2023-12-21 13:45:35,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2023-12-21 13:45:36,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=63680.0, ans=0.1 2023-12-21 13:45:42,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.25 vs. limit=10.0 2023-12-21 13:45:43,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.44 vs. limit=10.0 2023-12-21 13:45:56,666 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-12-21 13:46:01,490 INFO [train.py:886] (0/4) Epoch 3, batch 50, loss[loss=0.0247, audio_tagging_loss=0.0247, over 22113.00 frames. ], tot_loss[loss=0.0291, audio_tagging_loss=0.0291, over 1117553.68 frames. ], batch size: 107, lr: 3.29e-02, grad_scale: 64.0 2023-12-21 13:46:14,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.94 vs. limit=22.5 2023-12-21 13:46:20,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.99 vs. limit=6.0 2023-12-21 13:46:23,536 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 2.959e+01 3.288e+01 3.830e+01 1.189e+02, threshold=6.575e+01, percent-clipped=4.0 2023-12-21 13:46:49,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.53 vs. limit=22.5 2023-12-21 13:46:51,658 INFO [train.py:886] (0/4) Epoch 3, batch 100, loss[loss=0.02255, audio_tagging_loss=0.02255, over 21772.00 frames. ], tot_loss[loss=0.02516, audio_tagging_loss=0.02516, over 1967382.32 frames. ], batch size: 107, lr: 3.29e-02, grad_scale: 64.0 2023-12-21 13:47:02,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=64280.0, ans=0.125 2023-12-21 13:47:03,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=64280.0, ans=0.125 2023-12-21 13:47:35,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=64480.0, ans=0.0 2023-12-21 13:47:44,255 INFO [train.py:886] (0/4) Epoch 3, batch 150, loss[loss=0.02004, audio_tagging_loss=0.02004, over 25000.00 frames. ], tot_loss[loss=0.02276, audio_tagging_loss=0.02276, over 2633965.03 frames. ], batch size: 100, lr: 3.28e-02, grad_scale: 64.0 2023-12-21 13:47:58,124 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.77 vs. limit=22.5 2023-12-21 13:48:00,216 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.98 vs. limit=22.5 2023-12-21 13:48:07,592 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.621e+01 2.918e+01 3.112e+01 3.943e+01, threshold=5.836e+01, percent-clipped=0.0 2023-12-21 13:48:17,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=15.0 2023-12-21 13:48:18,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=64746.666666666664, ans=0.0 2023-12-21 13:48:26,457 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.664e+01 2023-12-21 13:48:31,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=64813.333333333336, ans=0.1 2023-12-21 13:48:32,701 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.79 vs. limit=15.0 2023-12-21 13:48:35,601 INFO [train.py:886] (0/4) Epoch 3, batch 200, loss[loss=0.01909, audio_tagging_loss=0.01909, over 25000.00 frames. ], tot_loss[loss=0.02117, audio_tagging_loss=0.02117, over 3147515.86 frames. ], batch size: 100, lr: 3.28e-02, grad_scale: 64.0 2023-12-21 13:48:40,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=64880.0, ans=0.2 2023-12-21 13:48:44,035 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.61 vs. limit=15.0 2023-12-21 13:48:51,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=64946.666666666664, ans=0.1 2023-12-21 13:49:07,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65080.0, ans=0.1 2023-12-21 13:49:17,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=65146.666666666664, ans=0.0 2023-12-21 13:49:23,771 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2023-12-21 13:49:28,031 INFO [train.py:886] (0/4) Epoch 3, batch 250, loss[loss=0.0172, audio_tagging_loss=0.0172, over 25000.00 frames. ], tot_loss[loss=0.02024, audio_tagging_loss=0.02024, over 3547188.61 frames. ], batch size: 100, lr: 3.27e-02, grad_scale: 64.0 2023-12-21 13:49:32,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=65213.333333333336, ans=0.2 2023-12-21 13:49:39,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=65280.0, ans=0.04949747468305833 2023-12-21 13:49:51,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=65346.666666666664, ans=0.125 2023-12-21 13:49:51,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=65346.666666666664, ans=0.2 2023-12-21 13:49:51,899 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.523e+01 2.809e+01 3.152e+01 4.163e+01, threshold=5.618e+01, percent-clipped=0.0 2023-12-21 13:49:57,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=65346.666666666664, ans=0.1 2023-12-21 13:50:10,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=65480.0, ans=0.0 2023-12-21 13:50:16,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=65480.0, ans=0.125 2023-12-21 13:50:17,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=65480.0, ans=0.0 2023-12-21 13:50:20,237 INFO [train.py:886] (0/4) Epoch 3, batch 300, loss[loss=0.01756, audio_tagging_loss=0.01756, over 24750.00 frames. ], tot_loss[loss=0.01978, audio_tagging_loss=0.01978, over 3862129.01 frames. ], batch size: 99, lr: 3.27e-02, grad_scale: 64.0 2023-12-21 13:50:23,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=65546.66666666667, ans=0.125 2023-12-21 13:50:24,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.76 vs. limit=15.0 2023-12-21 13:50:31,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=65613.33333333333, ans=0.125 2023-12-21 13:50:33,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=65613.33333333333, ans=0.125 2023-12-21 13:50:35,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.25 vs. limit=22.5 2023-12-21 13:50:40,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=65680.0, ans=0.0 2023-12-21 13:50:48,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=65680.0, ans=0.0 2023-12-21 13:50:57,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=65746.66666666667, ans=0.0 2023-12-21 13:51:01,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=65813.33333333333, ans=0.1 2023-12-21 13:51:01,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=65813.33333333333, ans=0.125 2023-12-21 13:51:06,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.04 vs. limit=22.5 2023-12-21 13:51:10,612 INFO [train.py:886] (0/4) Epoch 3, batch 350, loss[loss=0.02091, audio_tagging_loss=0.02091, over 25000.00 frames. ], tot_loss[loss=0.01931, audio_tagging_loss=0.01931, over 4097862.02 frames. ], batch size: 100, lr: 3.26e-02, grad_scale: 64.0 2023-12-21 13:51:18,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=20.23 vs. limit=15.0 2023-12-21 13:51:19,222 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.82 vs. limit=22.5 2023-12-21 13:51:21,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=15.0 2023-12-21 13:51:23,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=65946.66666666667, ans=0.125 2023-12-21 13:51:24,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=65946.66666666667, ans=0.125 2023-12-21 13:51:35,290 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.203e+01 2.550e+01 2.783e+01 3.112e+01 3.866e+01, threshold=5.566e+01, percent-clipped=0.0 2023-12-21 13:51:48,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=66080.0, ans=0.1 2023-12-21 13:51:54,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=66146.66666666667, ans=10.0 2023-12-21 13:52:00,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=66146.66666666667, ans=0.125 2023-12-21 13:52:03,162 INFO [train.py:886] (0/4) Epoch 3, batch 400, loss[loss=0.02206, audio_tagging_loss=0.02206, over 25000.00 frames. ], tot_loss[loss=0.01884, audio_tagging_loss=0.01884, over 4286373.40 frames. ], batch size: 100, lr: 3.25e-02, grad_scale: 64.0 2023-12-21 13:52:06,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=66213.33333333333, ans=0.125 2023-12-21 13:52:06,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2023-12-21 13:52:12,414 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2023-12-21 13:52:32,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=19.67 vs. limit=15.0 2023-12-21 13:52:54,316 INFO [train.py:886] (0/4) Epoch 3, batch 450, loss[loss=0.01817, audio_tagging_loss=0.01817, over 25000.00 frames. ], tot_loss[loss=0.01848, audio_tagging_loss=0.01848, over 4429422.60 frames. ], batch size: 100, lr: 3.25e-02, grad_scale: 64.0 2023-12-21 13:52:55,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=66546.66666666667, ans=0.0 2023-12-21 13:53:10,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=66613.33333333333, ans=0.2 2023-12-21 13:53:13,260 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.386e+00 2023-12-21 13:53:17,868 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.418e+01 2.694e+01 2.965e+01 4.467e+01, threshold=5.389e+01, percent-clipped=0.0 2023-12-21 13:53:35,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=66813.33333333333, ans=0.125 2023-12-21 13:53:37,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=66813.33333333333, ans=0.0 2023-12-21 13:53:43,338 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.21 vs. limit=15.0 2023-12-21 13:53:46,577 INFO [train.py:886] (0/4) Epoch 3, batch 500, loss[loss=0.01972, audio_tagging_loss=0.01972, over 25000.00 frames. ], tot_loss[loss=0.01817, audio_tagging_loss=0.01817, over 4546763.32 frames. ], batch size: 100, lr: 3.24e-02, grad_scale: 64.0 2023-12-21 13:53:47,051 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.08 vs. limit=6.0 2023-12-21 13:54:20,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=67080.0, ans=0.0 2023-12-21 13:54:36,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-12-21 13:54:38,662 INFO [train.py:886] (0/4) Epoch 3, batch 550, loss[loss=0.02014, audio_tagging_loss=0.02014, over 25000.00 frames. ], tot_loss[loss=0.01807, audio_tagging_loss=0.01807, over 4638683.91 frames. ], batch size: 100, lr: 3.24e-02, grad_scale: 64.0 2023-12-21 13:54:53,161 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.39 vs. limit=15.0 2023-12-21 13:54:57,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=67346.66666666667, ans=0.125 2023-12-21 13:55:01,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67346.66666666667, ans=0.1 2023-12-21 13:55:02,552 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.576e+01 2.807e+01 3.063e+01 3.994e+01, threshold=5.614e+01, percent-clipped=0.0 2023-12-21 13:55:29,815 INFO [train.py:886] (0/4) Epoch 3, batch 600, loss[loss=0.02265, audio_tagging_loss=0.02265, over 24948.00 frames. ], tot_loss[loss=0.01814, audio_tagging_loss=0.01814, over 4705784.81 frames. ], batch size: 100, lr: 3.23e-02, grad_scale: 64.0 2023-12-21 13:55:38,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=67546.66666666667, ans=0.125 2023-12-21 13:55:40,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=67613.33333333333, ans=0.09899494936611666 2023-12-21 13:55:48,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=67613.33333333333, ans=0.0 2023-12-21 13:55:55,861 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.37 vs. limit=22.5 2023-12-21 13:56:22,132 INFO [train.py:886] (0/4) Epoch 3, batch 650, loss[loss=0.01676, audio_tagging_loss=0.01676, over 24750.00 frames. ], tot_loss[loss=0.01807, audio_tagging_loss=0.01807, over 4759225.40 frames. ], batch size: 99, lr: 3.23e-02, grad_scale: 64.0 2023-12-21 13:56:26,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=67880.0, ans=0.125 2023-12-21 13:56:27,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=67880.0, ans=0.125 2023-12-21 13:56:31,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67946.66666666667, ans=0.1 2023-12-21 13:56:41,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.82 vs. limit=15.0 2023-12-21 13:56:43,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=68013.33333333333, ans=0.1 2023-12-21 13:56:46,692 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.530e+01 2.784e+01 3.010e+01 3.967e+01, threshold=5.567e+01, percent-clipped=0.0 2023-12-21 13:56:53,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=68080.0, ans=0.125 2023-12-21 13:57:04,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=68146.66666666667, ans=0.0 2023-12-21 13:57:08,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=68146.66666666667, ans=0.125 2023-12-21 13:57:15,056 INFO [train.py:886] (0/4) Epoch 3, batch 700, loss[loss=0.0199, audio_tagging_loss=0.0199, over 24750.00 frames. ], tot_loss[loss=0.01808, audio_tagging_loss=0.01808, over 4804423.12 frames. ], batch size: 99, lr: 3.22e-02, grad_scale: 64.0 2023-12-21 13:57:39,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=68346.66666666667, ans=0.05 2023-12-21 13:57:44,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68346.66666666667, ans=0.1 2023-12-21 13:57:50,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=12.0 2023-12-21 13:57:54,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=68413.33333333333, ans=0.125 2023-12-21 13:58:00,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=68480.0, ans=0.0 2023-12-21 13:58:06,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.03 vs. limit=6.0 2023-12-21 13:58:06,474 INFO [train.py:886] (0/4) Epoch 3, batch 750, loss[loss=0.01609, audio_tagging_loss=0.01609, over 24750.00 frames. ], tot_loss[loss=0.01807, audio_tagging_loss=0.01807, over 4836954.77 frames. ], batch size: 99, lr: 3.22e-02, grad_scale: 64.0 2023-12-21 13:58:09,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=68546.66666666667, ans=0.0 2023-12-21 13:58:13,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=68546.66666666667, ans=0.09899494936611666 2023-12-21 13:58:20,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=68613.33333333333, ans=0.2 2023-12-21 13:58:22,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=68613.33333333333, ans=0.125 2023-12-21 13:58:30,493 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.631e+01 2.824e+01 3.219e+01 3.992e+01, threshold=5.647e+01, percent-clipped=0.0 2023-12-21 13:58:59,104 INFO [train.py:886] (0/4) Epoch 3, batch 800, loss[loss=0.01819, audio_tagging_loss=0.01819, over 25000.00 frames. ], tot_loss[loss=0.018, audio_tagging_loss=0.018, over 4867315.04 frames. ], batch size: 100, lr: 3.21e-02, grad_scale: 64.0 2023-12-21 13:59:00,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.48 vs. limit=22.5 2023-12-21 13:59:14,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=68946.66666666667, ans=0.05 2023-12-21 13:59:24,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=69013.33333333333, ans=0.0 2023-12-21 13:59:33,437 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=16.33 vs. limit=15.0 2023-12-21 13:59:33,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2023-12-21 13:59:48,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=69146.66666666667, ans=0.5 2023-12-21 13:59:50,506 INFO [train.py:886] (0/4) Epoch 3, batch 850, loss[loss=0.01724, audio_tagging_loss=0.01724, over 25000.00 frames. ], tot_loss[loss=0.01801, audio_tagging_loss=0.01801, over 4890147.72 frames. ], batch size: 100, lr: 3.21e-02, grad_scale: 64.0 2023-12-21 14:00:14,044 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.575e+01 2.735e+01 3.023e+01 4.765e+01, threshold=5.470e+01, percent-clipped=0.0 2023-12-21 14:00:21,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=69413.33333333333, ans=0.125 2023-12-21 14:00:25,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=69413.33333333333, ans=0.125 2023-12-21 14:00:29,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2023-12-21 14:00:30,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2023-12-21 14:00:41,921 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.25 vs. limit=15.0 2023-12-21 14:00:42,363 INFO [train.py:886] (0/4) Epoch 3, batch 900, loss[loss=0.0251, audio_tagging_loss=0.0251, over 24750.00 frames. ], tot_loss[loss=0.01808, audio_tagging_loss=0.01808, over 4906836.46 frames. ], batch size: 99, lr: 3.20e-02, grad_scale: 64.0 2023-12-21 14:00:44,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=69546.66666666667, ans=0.0 2023-12-21 14:00:46,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=69546.66666666667, ans=0.125 2023-12-21 14:00:56,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=69613.33333333333, ans=0.125 2023-12-21 14:01:20,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.16 vs. limit=12.0 2023-12-21 14:01:35,265 INFO [train.py:886] (0/4) Epoch 3, batch 950, loss[loss=0.02049, audio_tagging_loss=0.02049, over 24945.00 frames. ], tot_loss[loss=0.01815, audio_tagging_loss=0.01815, over 4910226.25 frames. ], batch size: 100, lr: 3.20e-02, grad_scale: 64.0 2023-12-21 14:01:38,764 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=17.36 vs. limit=15.0 2023-12-21 14:01:42,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.34 vs. limit=15.0 2023-12-21 14:01:44,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=69946.66666666667, ans=0.125 2023-12-21 14:01:44,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=69946.66666666667, ans=0.1 2023-12-21 14:01:45,265 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2023-12-21 14:01:58,498 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.594e+01 2.861e+01 3.061e+01 4.080e+01, threshold=5.722e+01, percent-clipped=0.0 2023-12-21 14:02:17,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=70146.66666666667, ans=0.125 2023-12-21 14:02:19,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2023-12-21 14:02:25,375 INFO [train.py:886] (0/4) Epoch 3, batch 1000, loss[loss=0.0175, audio_tagging_loss=0.0175, over 22696.00 frames. ], tot_loss[loss=0.01809, audio_tagging_loss=0.01809, over 4909445.23 frames. ], batch size: 107, lr: 3.19e-02, grad_scale: 64.0 2023-12-21 14:02:45,080 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2023-12-21 14:03:06,701 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.757e+00 2023-12-21 14:03:10,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=70480.0, ans=0.125 2023-12-21 14:03:11,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.16 vs. limit=22.5 2023-12-21 14:03:12,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.47 vs. limit=15.0 2023-12-21 14:03:16,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=70480.0, ans=0.125 2023-12-21 14:03:18,556 INFO [train.py:886] (0/4) Epoch 3, batch 1050, loss[loss=0.01653, audio_tagging_loss=0.01653, over 25000.00 frames. ], tot_loss[loss=0.01806, audio_tagging_loss=0.01806, over 4920165.29 frames. ], batch size: 100, lr: 3.19e-02, grad_scale: 64.0 2023-12-21 14:03:43,853 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.533e+01 2.739e+01 3.016e+01 3.717e+01, threshold=5.478e+01, percent-clipped=0.0 2023-12-21 14:03:50,075 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.84 vs. limit=15.0 2023-12-21 14:03:54,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=70746.66666666667, ans=0.0 2023-12-21 14:04:11,143 INFO [train.py:886] (0/4) Epoch 3, batch 1100, loss[loss=0.01885, audio_tagging_loss=0.01885, over 24024.00 frames. ], tot_loss[loss=0.0179, audio_tagging_loss=0.0179, over 4926749.89 frames. ], batch size: 100, lr: 3.18e-02, grad_scale: 64.0 2023-12-21 14:04:28,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=70946.66666666667, ans=0.125 2023-12-21 14:04:44,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=71080.0, ans=0.0 2023-12-21 14:04:44,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=71080.0, ans=0.2 2023-12-21 14:04:56,097 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.804e+00 2023-12-21 14:05:00,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=71146.66666666667, ans=0.0 2023-12-21 14:05:02,649 INFO [train.py:886] (0/4) Epoch 3, batch 1150, loss[loss=0.01882, audio_tagging_loss=0.01882, over 24750.00 frames. ], tot_loss[loss=0.01771, audio_tagging_loss=0.01771, over 4936835.33 frames. ], batch size: 99, lr: 3.18e-02, grad_scale: 64.0 2023-12-21 14:05:23,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=71280.0, ans=0.0 2023-12-21 14:05:27,719 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.087e+01 2.554e+01 2.817e+01 3.036e+01 4.286e+01, threshold=5.635e+01, percent-clipped=0.0 2023-12-21 14:05:30,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=71346.66666666667, ans=0.125 2023-12-21 14:05:39,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=71413.33333333333, ans=0.125 2023-12-21 14:05:56,103 INFO [train.py:886] (0/4) Epoch 3, batch 1200, loss[loss=0.01726, audio_tagging_loss=0.01726, over 24750.00 frames. ], tot_loss[loss=0.01784, audio_tagging_loss=0.01784, over 4945018.41 frames. ], batch size: 99, lr: 3.17e-02, grad_scale: 64.0 2023-12-21 14:05:58,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=71546.66666666667, ans=0.125 2023-12-21 14:06:10,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.94 vs. limit=15.0 2023-12-21 14:06:28,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2023-12-21 14:06:30,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=71746.66666666667, ans=0.125 2023-12-21 14:06:32,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=71746.66666666667, ans=0.125 2023-12-21 14:06:37,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=71813.33333333333, ans=0.0 2023-12-21 14:06:46,212 INFO [train.py:886] (0/4) Epoch 3, batch 1250, loss[loss=0.02068, audio_tagging_loss=0.02068, over 24750.00 frames. ], tot_loss[loss=0.01799, audio_tagging_loss=0.01799, over 4944876.01 frames. ], batch size: 99, lr: 3.17e-02, grad_scale: 64.0 2023-12-21 14:07:00,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.41 vs. limit=22.5 2023-12-21 14:07:02,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=71946.66666666667, ans=0.2 2023-12-21 14:07:07,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.68 vs. limit=15.0 2023-12-21 14:07:10,847 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.031e+01 2.406e+01 2.702e+01 2.933e+01 3.398e+01, threshold=5.404e+01, percent-clipped=0.0 2023-12-21 14:07:11,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=72013.33333333333, ans=0.125 2023-12-21 14:07:33,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=72146.66666666667, ans=0.125 2023-12-21 14:07:38,887 INFO [train.py:886] (0/4) Epoch 3, batch 1300, loss[loss=0.0213, audio_tagging_loss=0.0213, over 24750.00 frames. ], tot_loss[loss=0.01807, audio_tagging_loss=0.01807, over 4942507.47 frames. ], batch size: 99, lr: 3.16e-02, grad_scale: 64.0 2023-12-21 14:07:39,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=12.0 2023-12-21 14:07:45,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=72213.33333333333, ans=0.2 2023-12-21 14:07:45,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=72213.33333333333, ans=0.1 2023-12-21 14:07:48,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=72280.0, ans=0.0 2023-12-21 14:07:54,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.99 vs. limit=6.0 2023-12-21 14:08:08,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=72346.66666666667, ans=0.125 2023-12-21 14:08:09,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=72413.33333333333, ans=0.1 2023-12-21 14:08:31,415 INFO [train.py:886] (0/4) Epoch 3, batch 1350, loss[loss=0.02043, audio_tagging_loss=0.02043, over 25000.00 frames. ], tot_loss[loss=0.01799, audio_tagging_loss=0.01799, over 4945136.65 frames. ], batch size: 100, lr: 3.16e-02, grad_scale: 64.0 2023-12-21 14:08:48,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=72613.33333333333, ans=0.0 2023-12-21 14:08:49,825 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.09 vs. limit=22.5 2023-12-21 14:08:54,017 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.196e+01 2.555e+01 2.757e+01 3.030e+01 4.227e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-21 14:09:02,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=72746.66666666667, ans=0.2 2023-12-21 14:09:04,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=72746.66666666667, ans=0.125 2023-12-21 14:09:05,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=72746.66666666667, ans=0.125 2023-12-21 14:09:06,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=72746.66666666667, ans=0.1 2023-12-21 14:09:07,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=72746.66666666667, ans=0.0 2023-12-21 14:09:16,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=72813.33333333333, ans=0.1 2023-12-21 14:09:21,820 INFO [train.py:886] (0/4) Epoch 3, batch 1400, loss[loss=0.01631, audio_tagging_loss=0.01631, over 24750.00 frames. ], tot_loss[loss=0.0179, audio_tagging_loss=0.0179, over 4945818.31 frames. ], batch size: 99, lr: 3.15e-02, grad_scale: 64.0 2023-12-21 14:10:13,696 INFO [train.py:886] (0/4) Epoch 3, batch 1450, loss[loss=0.02043, audio_tagging_loss=0.02043, over 21407.00 frames. ], tot_loss[loss=0.0178, audio_tagging_loss=0.0178, over 4948766.83 frames. ], batch size: 107, lr: 3.15e-02, grad_scale: 64.0 2023-12-21 14:10:15,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=73213.33333333333, ans=0.125 2023-12-21 14:10:18,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=73213.33333333333, ans=0.0 2023-12-21 14:10:24,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=73280.0, ans=0.0 2023-12-21 14:10:26,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=73280.0, ans=0.2 2023-12-21 14:10:27,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2023-12-21 14:10:38,204 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.554e+01 2.770e+01 3.046e+01 3.677e+01, threshold=5.540e+01, percent-clipped=0.0 2023-12-21 14:10:47,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=73413.33333333333, ans=0.125 2023-12-21 14:10:50,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=73413.33333333333, ans=0.125 2023-12-21 14:11:05,630 INFO [train.py:886] (0/4) Epoch 3, batch 1500, loss[loss=0.01806, audio_tagging_loss=0.01806, over 25000.00 frames. ], tot_loss[loss=0.01785, audio_tagging_loss=0.01785, over 4951653.77 frames. ], batch size: 100, lr: 3.14e-02, grad_scale: 64.0 2023-12-21 14:11:09,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=73546.66666666667, ans=0.2 2023-12-21 14:11:40,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=73746.66666666667, ans=0.125 2023-12-21 14:11:50,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.98 vs. limit=10.0 2023-12-21 14:11:58,021 INFO [train.py:886] (0/4) Epoch 3, batch 1550, loss[loss=0.01804, audio_tagging_loss=0.01804, over 24750.00 frames. ], tot_loss[loss=0.01799, audio_tagging_loss=0.01799, over 4950620.72 frames. ], batch size: 99, lr: 3.14e-02, grad_scale: 64.0 2023-12-21 14:11:59,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=73880.0, ans=0.0 2023-12-21 14:12:21,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=74013.33333333333, ans=0.125 2023-12-21 14:12:22,091 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.602e+01 2.823e+01 3.207e+01 3.964e+01, threshold=5.646e+01, percent-clipped=0.0 2023-12-21 14:12:25,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=74013.33333333333, ans=0.0 2023-12-21 14:12:26,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=74013.33333333333, ans=0.04949747468305833 2023-12-21 14:12:32,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=35.79 vs. limit=22.5 2023-12-21 14:12:49,722 INFO [train.py:886] (0/4) Epoch 3, batch 1600, loss[loss=0.01911, audio_tagging_loss=0.01911, over 24750.00 frames. ], tot_loss[loss=0.01801, audio_tagging_loss=0.01801, over 4946033.18 frames. ], batch size: 99, lr: 3.13e-02, grad_scale: 64.0 2023-12-21 14:13:14,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=74346.66666666667, ans=10.0 2023-12-21 14:13:18,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=74346.66666666667, ans=0.1 2023-12-21 14:13:29,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.65 vs. limit=22.5 2023-12-21 14:13:39,720 INFO [train.py:886] (0/4) Epoch 3, batch 1650, loss[loss=0.02055, audio_tagging_loss=0.02055, over 24009.00 frames. ], tot_loss[loss=0.01803, audio_tagging_loss=0.01803, over 4944407.37 frames. ], batch size: 100, lr: 3.13e-02, grad_scale: 64.0 2023-12-21 14:13:42,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=74546.66666666667, ans=0.125 2023-12-21 14:13:42,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=74546.66666666667, ans=0.0 2023-12-21 14:13:46,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=74546.66666666667, ans=0.0 2023-12-21 14:14:03,656 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.550e+01 2.728e+01 3.001e+01 3.650e+01, threshold=5.456e+01, percent-clipped=0.0 2023-12-21 14:14:18,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=74746.66666666667, ans=0.125 2023-12-21 14:14:30,596 INFO [train.py:886] (0/4) Epoch 3, batch 1700, loss[loss=0.01798, audio_tagging_loss=0.01798, over 21326.00 frames. ], tot_loss[loss=0.01796, audio_tagging_loss=0.01796, over 4943332.65 frames. ], batch size: 107, lr: 3.12e-02, grad_scale: 64.0 2023-12-21 14:14:35,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=74880.0, ans=0.0 2023-12-21 14:14:37,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=74880.0, ans=0.2 2023-12-21 14:14:51,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=75013.33333333333, ans=0.2 2023-12-21 14:14:56,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=75013.33333333333, ans=0.0 2023-12-21 14:15:03,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=75080.0, ans=0.0 2023-12-21 14:15:10,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=75146.66666666667, ans=0.0 2023-12-21 14:15:18,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=75146.66666666667, ans=0.2 2023-12-21 14:15:22,036 INFO [train.py:886] (0/4) Epoch 3, batch 1750, loss[loss=0.01798, audio_tagging_loss=0.01798, over 25000.00 frames. ], tot_loss[loss=0.01794, audio_tagging_loss=0.01794, over 4951963.54 frames. ], batch size: 100, lr: 3.12e-02, grad_scale: 64.0 2023-12-21 14:15:37,356 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.808e-02 2023-12-21 14:15:40,182 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.21 vs. limit=22.5 2023-12-21 14:15:44,387 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.559e+01 2.774e+01 3.008e+01 3.944e+01, threshold=5.548e+01, percent-clipped=0.0 2023-12-21 14:15:50,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=75413.33333333333, ans=0.025 2023-12-21 14:16:02,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=75480.0, ans=0.125 2023-12-21 14:16:12,116 INFO [train.py:886] (0/4) Epoch 3, batch 1800, loss[loss=0.01536, audio_tagging_loss=0.01536, over 25000.00 frames. ], tot_loss[loss=0.01789, audio_tagging_loss=0.01789, over 4958157.28 frames. ], batch size: 100, lr: 3.11e-02, grad_scale: 64.0 2023-12-21 14:16:30,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.47 vs. limit=15.0 2023-12-21 14:16:34,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=75680.0, ans=0.1 2023-12-21 14:16:38,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=75680.0, ans=0.125 2023-12-21 14:16:42,985 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.948e-01 2023-12-21 14:16:48,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.01 vs. limit=15.0 2023-12-21 14:16:51,249 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 14:17:01,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=75813.33333333333, ans=0.2 2023-12-21 14:17:04,490 INFO [train.py:886] (0/4) Epoch 3, batch 1850, loss[loss=0.01774, audio_tagging_loss=0.01774, over 24750.00 frames. ], tot_loss[loss=0.01804, audio_tagging_loss=0.01804, over 4959986.33 frames. ], batch size: 99, lr: 3.11e-02, grad_scale: 64.0 2023-12-21 14:17:13,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=75946.66666666667, ans=0.125 2023-12-21 14:17:17,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=75946.66666666667, ans=0.0 2023-12-21 14:17:17,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=75946.66666666667, ans=0.2 2023-12-21 14:17:28,691 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.193e+01 2.592e+01 2.758e+01 3.000e+01 3.750e+01, threshold=5.517e+01, percent-clipped=0.0 2023-12-21 14:17:29,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=76013.33333333333, ans=0.1 2023-12-21 14:17:55,756 INFO [train.py:886] (0/4) Epoch 3, batch 1900, loss[loss=0.0157, audio_tagging_loss=0.0157, over 24750.00 frames. ], tot_loss[loss=0.01801, audio_tagging_loss=0.01801, over 4959677.32 frames. ], batch size: 99, lr: 3.11e-02, grad_scale: 64.0 2023-12-21 14:18:01,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=76213.33333333333, ans=0.95 2023-12-21 14:18:09,123 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=12.0 2023-12-21 14:18:10,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=12.0 2023-12-21 14:18:21,305 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.34 vs. limit=12.0 2023-12-21 14:18:40,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=76480.0, ans=0.125 2023-12-21 14:18:46,714 INFO [train.py:886] (0/4) Epoch 3, batch 1950, loss[loss=0.01626, audio_tagging_loss=0.01626, over 25000.00 frames. ], tot_loss[loss=0.01788, audio_tagging_loss=0.01788, over 4956106.59 frames. ], batch size: 100, lr: 3.10e-02, grad_scale: 64.0 2023-12-21 14:18:48,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=76546.66666666667, ans=0.2 2023-12-21 14:19:00,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2023-12-21 14:19:10,601 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.559e+01 2.784e+01 3.135e+01 5.134e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-21 14:19:15,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=76680.0, ans=0.1 2023-12-21 14:19:22,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=76746.66666666667, ans=0.1 2023-12-21 14:19:27,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=76813.33333333333, ans=0.1 2023-12-21 14:19:29,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=76813.33333333333, ans=0.0 2023-12-21 14:19:37,598 INFO [train.py:886] (0/4) Epoch 3, batch 2000, loss[loss=0.0172, audio_tagging_loss=0.0172, over 25000.00 frames. ], tot_loss[loss=0.01768, audio_tagging_loss=0.01768, over 4957851.62 frames. ], batch size: 100, lr: 3.10e-02, grad_scale: 64.0 2023-12-21 14:19:39,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=76880.0, ans=0.1 2023-12-21 14:19:39,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.31 vs. limit=22.5 2023-12-21 14:19:41,842 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-12-21 14:19:46,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=76946.66666666667, ans=0.125 2023-12-21 14:19:51,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.59 vs. limit=15.0 2023-12-21 14:19:57,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.65 vs. limit=15.0 2023-12-21 14:20:16,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=77080.0, ans=0.125 2023-12-21 14:20:21,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.24 vs. limit=6.0 2023-12-21 14:20:27,160 INFO [train.py:886] (0/4) Epoch 3, batch 2050, loss[loss=0.01525, audio_tagging_loss=0.01525, over 25000.00 frames. ], tot_loss[loss=0.01744, audio_tagging_loss=0.01744, over 4953788.76 frames. ], batch size: 100, lr: 3.09e-02, grad_scale: 128.0 2023-12-21 14:20:33,130 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.74 vs. limit=5.0 2023-12-21 14:20:51,404 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.488e+01 2.688e+01 3.036e+01 3.855e+01, threshold=5.376e+01, percent-clipped=0.0 2023-12-21 14:20:52,874 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=12.0 2023-12-21 14:21:08,897 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.071e+01 2023-12-21 14:21:12,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=77480.0, ans=0.2 2023-12-21 14:21:19,148 INFO [train.py:886] (0/4) Epoch 3, batch 2100, loss[loss=0.0173, audio_tagging_loss=0.0173, over 25000.00 frames. ], tot_loss[loss=0.01742, audio_tagging_loss=0.01742, over 4949037.65 frames. ], batch size: 100, lr: 3.09e-02, grad_scale: 128.0 2023-12-21 14:21:34,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=77613.33333333333, ans=0.125 2023-12-21 14:21:41,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=77680.0, ans=0.125 2023-12-21 14:22:08,609 INFO [train.py:886] (0/4) Epoch 3, batch 2150, loss[loss=0.01623, audio_tagging_loss=0.01623, over 24750.00 frames. ], tot_loss[loss=0.01759, audio_tagging_loss=0.01759, over 4945384.13 frames. ], batch size: 99, lr: 3.08e-02, grad_scale: 128.0 2023-12-21 14:22:13,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2023-12-21 14:22:14,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=77880.0, ans=0.95 2023-12-21 14:22:26,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=77946.66666666667, ans=0.125 2023-12-21 14:22:32,515 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.494e+01 2.667e+01 2.941e+01 3.537e+01, threshold=5.333e+01, percent-clipped=0.0 2023-12-21 14:22:33,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=78013.33333333333, ans=0.125 2023-12-21 14:22:44,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=78080.0, ans=0.125 2023-12-21 14:22:46,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=78080.0, ans=0.125 2023-12-21 14:22:56,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=78146.66666666667, ans=0.125 2023-12-21 14:23:01,001 INFO [train.py:886] (0/4) Epoch 3, batch 2200, loss[loss=0.01905, audio_tagging_loss=0.01905, over 24750.00 frames. ], tot_loss[loss=0.01778, audio_tagging_loss=0.01778, over 4940287.56 frames. ], batch size: 99, lr: 3.08e-02, grad_scale: 128.0 2023-12-21 14:23:14,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=78280.0, ans=0.125 2023-12-21 14:23:16,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=78280.0, ans=0.0 2023-12-21 14:23:18,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2023-12-21 14:23:53,457 INFO [train.py:886] (0/4) Epoch 3, batch 2250, loss[loss=0.02304, audio_tagging_loss=0.02304, over 24938.00 frames. ], tot_loss[loss=0.01784, audio_tagging_loss=0.01784, over 4934055.24 frames. ], batch size: 100, lr: 3.07e-02, grad_scale: 128.0 2023-12-21 14:24:04,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=78613.33333333333, ans=0.0 2023-12-21 14:24:05,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=78613.33333333333, ans=0.125 2023-12-21 14:24:11,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=78680.0, ans=0.125 2023-12-21 14:24:16,833 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+01 2.575e+01 2.774e+01 3.056e+01 3.945e+01, threshold=5.548e+01, percent-clipped=0.0 2023-12-21 14:24:36,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=78813.33333333333, ans=0.09899494936611666 2023-12-21 14:24:42,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=78880.0, ans=0.0 2023-12-21 14:24:43,246 INFO [train.py:886] (0/4) Epoch 3, batch 2300, loss[loss=0.0172, audio_tagging_loss=0.0172, over 25000.00 frames. ], tot_loss[loss=0.01774, audio_tagging_loss=0.01774, over 4940453.89 frames. ], batch size: 100, lr: 3.07e-02, grad_scale: 128.0 2023-12-21 14:24:47,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=78880.0, ans=0.125 2023-12-21 14:24:49,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.18 vs. limit=15.0 2023-12-21 14:25:01,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=78946.66666666667, ans=10.0 2023-12-21 14:25:04,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=79013.33333333333, ans=0.0 2023-12-21 14:25:05,770 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.28 vs. limit=22.5 2023-12-21 14:25:20,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=79080.0, ans=0.0 2023-12-21 14:25:23,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.07 vs. limit=22.5 2023-12-21 14:25:27,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=79146.66666666667, ans=0.0 2023-12-21 14:25:35,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.11 vs. limit=10.0 2023-12-21 14:25:35,570 INFO [train.py:886] (0/4) Epoch 3, batch 2350, loss[loss=0.01583, audio_tagging_loss=0.01583, over 25000.00 frames. ], tot_loss[loss=0.01768, audio_tagging_loss=0.01768, over 4944196.77 frames. ], batch size: 100, lr: 3.06e-02, grad_scale: 128.0 2023-12-21 14:25:55,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=79346.66666666667, ans=0.1 2023-12-21 14:25:59,417 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.142e+01 2.547e+01 2.838e+01 3.114e+01 3.985e+01, threshold=5.677e+01, percent-clipped=0.0 2023-12-21 14:26:02,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=79346.66666666667, ans=0.125 2023-12-21 14:26:03,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-12-21 14:26:10,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=79413.33333333333, ans=0.1 2023-12-21 14:26:10,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.55 vs. limit=15.0 2023-12-21 14:26:12,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=79413.33333333333, ans=0.07 2023-12-21 14:26:26,849 INFO [train.py:886] (0/4) Epoch 3, batch 2400, loss[loss=0.01976, audio_tagging_loss=0.01976, over 22601.00 frames. ], tot_loss[loss=0.0176, audio_tagging_loss=0.0176, over 4944536.69 frames. ], batch size: 107, lr: 3.06e-02, grad_scale: 128.0 2023-12-21 14:26:32,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=79546.66666666667, ans=0.1 2023-12-21 14:26:39,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=79613.33333333333, ans=0.125 2023-12-21 14:26:51,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=79680.0, ans=0.0 2023-12-21 14:26:54,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.62 vs. limit=6.0 2023-12-21 14:27:17,473 INFO [train.py:886] (0/4) Epoch 3, batch 2450, loss[loss=0.01549, audio_tagging_loss=0.01549, over 25000.00 frames. ], tot_loss[loss=0.01751, audio_tagging_loss=0.01751, over 4945189.84 frames. ], batch size: 100, lr: 3.05e-02, grad_scale: 128.0 2023-12-21 14:27:28,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=79946.66666666667, ans=0.0 2023-12-21 14:27:35,253 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-12000.pt 2023-12-21 14:27:37,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=79946.66666666667, ans=0.0 2023-12-21 14:27:38,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=79946.66666666667, ans=0.07 2023-12-21 14:27:39,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=80013.33333333333, ans=0.0 2023-12-21 14:27:43,402 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 2.542e+01 2.747e+01 3.025e+01 4.680e+01, threshold=5.493e+01, percent-clipped=0.0 2023-12-21 14:28:04,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=80146.66666666667, ans=0.0 2023-12-21 14:28:11,031 INFO [train.py:886] (0/4) Epoch 3, batch 2500, loss[loss=0.02037, audio_tagging_loss=0.02037, over 24750.00 frames. ], tot_loss[loss=0.0177, audio_tagging_loss=0.0177, over 4950683.54 frames. ], batch size: 99, lr: 3.05e-02, grad_scale: 128.0 2023-12-21 14:28:16,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.06 vs. limit=6.0 2023-12-21 14:28:20,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2023-12-21 14:28:24,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.95 vs. limit=10.0 2023-12-21 14:28:41,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=80413.33333333333, ans=0.1 2023-12-21 14:28:43,336 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.158e+00 2023-12-21 14:28:44,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.21 vs. limit=15.0 2023-12-21 14:28:47,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=80413.33333333333, ans=0.125 2023-12-21 14:29:01,851 INFO [train.py:886] (0/4) Epoch 3, batch 2550, loss[loss=0.02399, audio_tagging_loss=0.02399, over 24944.00 frames. ], tot_loss[loss=0.01782, audio_tagging_loss=0.01782, over 4951747.26 frames. ], batch size: 100, lr: 3.05e-02, grad_scale: 128.0 2023-12-21 14:29:25,734 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.659e+01 2.860e+01 3.083e+01 5.383e+01, threshold=5.720e+01, percent-clipped=0.0 2023-12-21 14:29:32,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=15.0 2023-12-21 14:29:42,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=80746.66666666667, ans=0.0 2023-12-21 14:29:45,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=80813.33333333333, ans=0.125 2023-12-21 14:29:46,596 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.79 vs. limit=6.0 2023-12-21 14:29:54,289 INFO [train.py:886] (0/4) Epoch 3, batch 2600, loss[loss=0.01634, audio_tagging_loss=0.01634, over 24750.00 frames. ], tot_loss[loss=0.01776, audio_tagging_loss=0.01776, over 4950835.87 frames. ], batch size: 99, lr: 3.04e-02, grad_scale: 128.0 2023-12-21 14:30:04,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=80946.66666666667, ans=0.1 2023-12-21 14:30:11,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=80946.66666666667, ans=0.125 2023-12-21 14:30:21,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=15.0 2023-12-21 14:30:23,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=81013.33333333333, ans=0.1 2023-12-21 14:30:32,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=81080.0, ans=0.2 2023-12-21 14:30:34,814 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2023-12-21 14:30:37,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=15.0 2023-12-21 14:30:39,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=81146.66666666667, ans=0.125 2023-12-21 14:30:46,921 INFO [train.py:886] (0/4) Epoch 3, batch 2650, loss[loss=0.01654, audio_tagging_loss=0.01654, over 25000.00 frames. ], tot_loss[loss=0.01769, audio_tagging_loss=0.01769, over 4952300.14 frames. ], batch size: 100, lr: 3.04e-02, grad_scale: 128.0 2023-12-21 14:30:47,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=81213.33333333333, ans=0.125 2023-12-21 14:30:48,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=81213.33333333333, ans=0.035 2023-12-21 14:30:49,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=81213.33333333333, ans=0.0 2023-12-21 14:30:53,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=81213.33333333333, ans=0.1 2023-12-21 14:30:55,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=81213.33333333333, ans=0.5 2023-12-21 14:30:59,613 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.46 vs. limit=15.0 2023-12-21 14:31:10,393 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.677e+01 2.899e+01 3.171e+01 4.559e+01, threshold=5.799e+01, percent-clipped=0.0 2023-12-21 14:31:22,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.10 vs. limit=15.0 2023-12-21 14:31:31,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=81480.0, ans=0.125 2023-12-21 14:31:32,960 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.31 vs. limit=22.5 2023-12-21 14:31:38,078 INFO [train.py:886] (0/4) Epoch 3, batch 2700, loss[loss=0.01704, audio_tagging_loss=0.01704, over 25000.00 frames. ], tot_loss[loss=0.01755, audio_tagging_loss=0.01755, over 4957956.88 frames. ], batch size: 100, lr: 3.03e-02, grad_scale: 128.0 2023-12-21 14:31:56,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-12-21 14:32:07,153 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.22 vs. limit=22.5 2023-12-21 14:32:12,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.84 vs. limit=22.5 2023-12-21 14:32:16,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=81746.66666666667, ans=0.0 2023-12-21 14:32:21,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=81813.33333333333, ans=0.125 2023-12-21 14:32:23,819 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.13 vs. limit=15.0 2023-12-21 14:32:26,486 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.45 vs. limit=10.0 2023-12-21 14:32:29,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=81880.0, ans=0.1 2023-12-21 14:32:29,877 INFO [train.py:886] (0/4) Epoch 3, batch 2750, loss[loss=0.01914, audio_tagging_loss=0.01914, over 25000.00 frames. ], tot_loss[loss=0.01753, audio_tagging_loss=0.01753, over 4956109.37 frames. ], batch size: 100, lr: 3.03e-02, grad_scale: 128.0 2023-12-21 14:32:53,651 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 2.528e+01 2.695e+01 2.960e+01 4.010e+01, threshold=5.390e+01, percent-clipped=0.0 2023-12-21 14:33:03,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=82080.0, ans=0.125 2023-12-21 14:33:15,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=82146.66666666667, ans=0.125 2023-12-21 14:33:20,861 INFO [train.py:886] (0/4) Epoch 3, batch 2800, loss[loss=0.01811, audio_tagging_loss=0.01811, over 24750.00 frames. ], tot_loss[loss=0.01769, audio_tagging_loss=0.01769, over 4950850.55 frames. ], batch size: 99, lr: 3.02e-02, grad_scale: 128.0 2023-12-21 14:33:21,482 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.27 vs. limit=22.5 2023-12-21 14:33:26,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.30 vs. limit=10.0 2023-12-21 14:33:32,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.55 vs. limit=22.5 2023-12-21 14:33:41,072 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2023-12-21 14:33:51,764 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 14:33:53,722 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.99 vs. limit=12.0 2023-12-21 14:33:58,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=82413.33333333333, ans=0.0 2023-12-21 14:34:03,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.31 vs. limit=15.0 2023-12-21 14:34:08,704 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.60 vs. limit=10.0 2023-12-21 14:34:12,658 INFO [train.py:886] (0/4) Epoch 3, batch 2850, loss[loss=0.01676, audio_tagging_loss=0.01676, over 24750.00 frames. ], tot_loss[loss=0.01786, audio_tagging_loss=0.01786, over 4944487.19 frames. ], batch size: 99, lr: 3.02e-02, grad_scale: 128.0 2023-12-21 14:34:28,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=82613.33333333333, ans=0.125 2023-12-21 14:34:34,100 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2023-12-21 14:34:36,513 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.481e+01 2.766e+01 3.054e+01 4.031e+01, threshold=5.532e+01, percent-clipped=0.0 2023-12-21 14:34:40,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=82680.0, ans=0.0 2023-12-21 14:35:04,869 INFO [train.py:886] (0/4) Epoch 3, batch 2900, loss[loss=0.01766, audio_tagging_loss=0.01766, over 24750.00 frames. ], tot_loss[loss=0.01776, audio_tagging_loss=0.01776, over 4945564.16 frames. ], batch size: 99, lr: 3.01e-02, grad_scale: 128.0 2023-12-21 14:35:09,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.43 vs. limit=15.0 2023-12-21 14:35:25,655 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.07 vs. limit=22.5 2023-12-21 14:35:26,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=83013.33333333333, ans=0.0 2023-12-21 14:35:41,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=83080.0, ans=0.125 2023-12-21 14:35:49,411 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.15 vs. limit=15.0 2023-12-21 14:35:51,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=83146.66666666667, ans=0.0 2023-12-21 14:35:51,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=83146.66666666667, ans=0.2 2023-12-21 14:35:56,373 INFO [train.py:886] (0/4) Epoch 3, batch 2950, loss[loss=0.01948, audio_tagging_loss=0.01948, over 25000.00 frames. ], tot_loss[loss=0.01767, audio_tagging_loss=0.01767, over 4951912.18 frames. ], batch size: 100, lr: 3.01e-02, grad_scale: 128.0 2023-12-21 14:35:57,820 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=15.0 2023-12-21 14:36:07,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=83280.0, ans=0.125 2023-12-21 14:36:19,501 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.548e+01 2.774e+01 2.996e+01 3.948e+01, threshold=5.549e+01, percent-clipped=0.0 2023-12-21 14:36:22,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=83346.66666666667, ans=0.125 2023-12-21 14:36:22,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=83346.66666666667, ans=0.125 2023-12-21 14:36:24,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=83346.66666666667, ans=0.125 2023-12-21 14:36:31,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=83413.33333333333, ans=0.0 2023-12-21 14:36:37,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=83480.0, ans=0.02 2023-12-21 14:36:47,855 INFO [train.py:886] (0/4) Epoch 3, batch 3000, loss[loss=0.01819, audio_tagging_loss=0.01819, over 25000.00 frames. ], tot_loss[loss=0.01761, audio_tagging_loss=0.01761, over 4957553.88 frames. ], batch size: 100, lr: 3.00e-02, grad_scale: 128.0 2023-12-21 14:36:47,857 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 14:37:09,001 INFO [train.py:917] (0/4) Epoch 3, validation: loss=0.04203, audio_tagging_loss=0.04203, over 3737520.00 frames. 2023-12-21 14:37:09,002 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 14:37:14,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=83546.66666666667, ans=0.125 2023-12-21 14:37:39,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=83746.66666666667, ans=0.1 2023-12-21 14:37:59,629 INFO [train.py:886] (0/4) Epoch 3, batch 3050, loss[loss=0.02206, audio_tagging_loss=0.02206, over 25000.00 frames. ], tot_loss[loss=0.01758, audio_tagging_loss=0.01758, over 4963248.93 frames. ], batch size: 100, lr: 3.00e-02, grad_scale: 128.0 2023-12-21 14:38:22,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=84013.33333333333, ans=0.125 2023-12-21 14:38:23,516 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.446e+01 2.688e+01 2.913e+01 3.941e+01, threshold=5.377e+01, percent-clipped=0.0 2023-12-21 14:38:24,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=84013.33333333333, ans=0.0 2023-12-21 14:38:32,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=84080.0, ans=0.125 2023-12-21 14:38:41,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=84146.66666666667, ans=15.0 2023-12-21 14:38:47,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=84146.66666666667, ans=0.125 2023-12-21 14:38:51,848 INFO [train.py:886] (0/4) Epoch 3, batch 3100, loss[loss=0.01666, audio_tagging_loss=0.01666, over 24750.00 frames. ], tot_loss[loss=0.01776, audio_tagging_loss=0.01776, over 4963738.74 frames. ], batch size: 99, lr: 3.00e-02, grad_scale: 128.0 2023-12-21 14:38:53,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.03 vs. limit=15.0 2023-12-21 14:39:02,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=84280.0, ans=0.125 2023-12-21 14:39:19,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.89 vs. limit=15.0 2023-12-21 14:39:44,555 INFO [train.py:886] (0/4) Epoch 3, batch 3150, loss[loss=0.01556, audio_tagging_loss=0.01556, over 24045.00 frames. ], tot_loss[loss=0.01785, audio_tagging_loss=0.01785, over 4955371.71 frames. ], batch size: 100, lr: 2.99e-02, grad_scale: 128.0 2023-12-21 14:40:06,858 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.245e+01 2.586e+01 2.797e+01 3.017e+01 3.878e+01, threshold=5.594e+01, percent-clipped=0.0 2023-12-21 14:40:17,686 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=9.152e+00 2023-12-21 14:40:23,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.46 vs. limit=22.5 2023-12-21 14:40:26,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=84813.33333333333, ans=0.0 2023-12-21 14:40:31,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=84813.33333333333, ans=0.0 2023-12-21 14:40:33,202 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.43 vs. limit=10.0 2023-12-21 14:40:34,475 INFO [train.py:886] (0/4) Epoch 3, batch 3200, loss[loss=0.01807, audio_tagging_loss=0.01807, over 24750.00 frames. ], tot_loss[loss=0.01781, audio_tagging_loss=0.01781, over 4943521.48 frames. ], batch size: 99, lr: 2.99e-02, grad_scale: 128.0 2023-12-21 14:40:53,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=84946.66666666667, ans=15.0 2023-12-21 14:41:05,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=85080.0, ans=0.125 2023-12-21 14:41:08,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=85080.0, ans=0.125 2023-12-21 14:41:11,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=85080.0, ans=0.2 2023-12-21 14:41:12,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.46 vs. limit=22.5 2023-12-21 14:41:23,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.50 vs. limit=15.0 2023-12-21 14:41:27,344 INFO [train.py:886] (0/4) Epoch 3, batch 3250, loss[loss=0.01833, audio_tagging_loss=0.01833, over 25000.00 frames. ], tot_loss[loss=0.01774, audio_tagging_loss=0.01774, over 4942277.07 frames. ], batch size: 100, lr: 2.98e-02, grad_scale: 128.0 2023-12-21 14:41:38,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=85280.0, ans=0.125 2023-12-21 14:41:39,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=85280.0, ans=0.125 2023-12-21 14:41:51,356 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.547e+01 2.711e+01 3.063e+01 4.521e+01, threshold=5.423e+01, percent-clipped=0.0 2023-12-21 14:41:58,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=85413.33333333333, ans=0.125 2023-12-21 14:42:01,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=85413.33333333333, ans=0.2 2023-12-21 14:42:05,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=85413.33333333333, ans=0.125 2023-12-21 14:42:06,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=85480.0, ans=0.0 2023-12-21 14:42:09,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=85480.0, ans=0.2 2023-12-21 14:42:10,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=85480.0, ans=0.0 2023-12-21 14:42:17,712 INFO [train.py:886] (0/4) Epoch 3, batch 3300, loss[loss=0.01841, audio_tagging_loss=0.01841, over 25000.00 frames. ], tot_loss[loss=0.01764, audio_tagging_loss=0.01764, over 4948267.74 frames. ], batch size: 100, lr: 2.98e-02, grad_scale: 128.0 2023-12-21 14:42:36,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=85613.33333333333, ans=0.0 2023-12-21 14:42:53,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=85746.66666666667, ans=0.0 2023-12-21 14:43:03,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=85813.33333333333, ans=0.0 2023-12-21 14:43:05,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=85813.33333333333, ans=0.0 2023-12-21 14:43:10,627 INFO [train.py:886] (0/4) Epoch 3, batch 3350, loss[loss=0.01741, audio_tagging_loss=0.01741, over 25000.00 frames. ], tot_loss[loss=0.0176, audio_tagging_loss=0.0176, over 4951531.58 frames. ], batch size: 100, lr: 2.97e-02, grad_scale: 128.0 2023-12-21 14:43:12,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=85880.0, ans=0.125 2023-12-21 14:43:15,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.98 vs. limit=15.0 2023-12-21 14:43:31,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=86013.33333333333, ans=0.125 2023-12-21 14:43:34,512 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.545e+01 2.748e+01 2.993e+01 3.826e+01, threshold=5.495e+01, percent-clipped=0.0 2023-12-21 14:43:37,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=86013.33333333333, ans=0.0 2023-12-21 14:43:41,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=86080.0, ans=0.125 2023-12-21 14:43:42,565 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2023-12-21 14:43:58,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=86146.66666666667, ans=0.2 2023-12-21 14:43:58,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=86146.66666666667, ans=0.125 2023-12-21 14:44:00,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.11 vs. limit=10.0 2023-12-21 14:44:00,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.86 vs. limit=15.0 2023-12-21 14:44:03,008 INFO [train.py:886] (0/4) Epoch 3, batch 3400, loss[loss=0.01606, audio_tagging_loss=0.01606, over 25000.00 frames. ], tot_loss[loss=0.01755, audio_tagging_loss=0.01755, over 4955733.82 frames. ], batch size: 100, lr: 2.97e-02, grad_scale: 128.0 2023-12-21 14:44:06,354 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.82 vs. limit=15.0 2023-12-21 14:44:08,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=86213.33333333333, ans=0.125 2023-12-21 14:44:18,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=86280.0, ans=0.125 2023-12-21 14:44:19,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=86280.0, ans=0.0 2023-12-21 14:44:20,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=86280.0, ans=15.0 2023-12-21 14:44:30,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=86346.66666666667, ans=0.0 2023-12-21 14:44:35,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=86413.33333333333, ans=0.125 2023-12-21 14:44:38,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=86413.33333333333, ans=0.125 2023-12-21 14:44:41,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=86413.33333333333, ans=0.125 2023-12-21 14:44:47,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=86480.0, ans=0.0 2023-12-21 14:44:53,104 INFO [train.py:886] (0/4) Epoch 3, batch 3450, loss[loss=0.01936, audio_tagging_loss=0.01936, over 24750.00 frames. ], tot_loss[loss=0.01761, audio_tagging_loss=0.01761, over 4949735.06 frames. ], batch size: 99, lr: 2.97e-02, grad_scale: 128.0 2023-12-21 14:44:53,570 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.04 vs. limit=15.0 2023-12-21 14:44:57,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=86546.66666666667, ans=0.125 2023-12-21 14:45:18,161 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.604e+01 2.823e+01 3.074e+01 4.024e+01, threshold=5.647e+01, percent-clipped=0.0 2023-12-21 14:45:19,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-12-21 14:45:30,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=86746.66666666667, ans=0.2 2023-12-21 14:45:30,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=86746.66666666667, ans=0.1 2023-12-21 14:45:36,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=86813.33333333333, ans=0.125 2023-12-21 14:45:46,441 INFO [train.py:886] (0/4) Epoch 3, batch 3500, loss[loss=0.0168, audio_tagging_loss=0.0168, over 24750.00 frames. ], tot_loss[loss=0.01761, audio_tagging_loss=0.01761, over 4943149.17 frames. ], batch size: 99, lr: 2.96e-02, grad_scale: 128.0 2023-12-21 14:46:00,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=86946.66666666667, ans=0.125 2023-12-21 14:46:01,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=86946.66666666667, ans=0.2 2023-12-21 14:46:06,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=87013.33333333333, ans=0.125 2023-12-21 14:46:25,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=87080.0, ans=0.125 2023-12-21 14:46:38,190 INFO [train.py:886] (0/4) Epoch 3, batch 3550, loss[loss=0.01653, audio_tagging_loss=0.01653, over 25000.00 frames. ], tot_loss[loss=0.01761, audio_tagging_loss=0.01761, over 4944406.47 frames. ], batch size: 100, lr: 2.96e-02, grad_scale: 128.0 2023-12-21 14:46:43,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.33 vs. limit=15.0 2023-12-21 14:46:54,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=87280.0, ans=0.125 2023-12-21 14:47:01,416 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 2.500e+01 2.743e+01 3.014e+01 4.154e+01, threshold=5.485e+01, percent-clipped=0.0 2023-12-21 14:47:10,983 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=18.17 vs. limit=15.0 2023-12-21 14:47:17,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=87413.33333333333, ans=0.1 2023-12-21 14:47:29,934 INFO [train.py:886] (0/4) Epoch 3, batch 3600, loss[loss=0.01364, audio_tagging_loss=0.01364, over 25000.00 frames. ], tot_loss[loss=0.01754, audio_tagging_loss=0.01754, over 4950108.55 frames. ], batch size: 100, lr: 2.95e-02, grad_scale: 128.0 2023-12-21 14:47:30,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=87546.66666666667, ans=0.125 2023-12-21 14:47:39,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=87613.33333333333, ans=0.1 2023-12-21 14:47:44,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=87613.33333333333, ans=0.125 2023-12-21 14:47:51,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=87680.0, ans=0.125 2023-12-21 14:47:52,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=87680.0, ans=0.125 2023-12-21 14:48:01,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=87746.66666666667, ans=0.05 2023-12-21 14:48:17,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2023-12-21 14:48:22,024 INFO [train.py:886] (0/4) Epoch 3, batch 3650, loss[loss=0.0192, audio_tagging_loss=0.0192, over 25000.00 frames. ], tot_loss[loss=0.01743, audio_tagging_loss=0.01743, over 4950361.82 frames. ], batch size: 100, lr: 2.95e-02, grad_scale: 128.0 2023-12-21 14:48:30,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=87880.0, ans=0.0 2023-12-21 14:48:30,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2023-12-21 14:48:45,861 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+01 2.549e+01 2.738e+01 3.044e+01 4.294e+01, threshold=5.477e+01, percent-clipped=0.0 2023-12-21 14:48:56,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=88080.0, ans=0.0 2023-12-21 14:49:10,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=88146.66666666667, ans=0.2 2023-12-21 14:49:12,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.52 vs. limit=12.0 2023-12-21 14:49:13,676 INFO [train.py:886] (0/4) Epoch 3, batch 3700, loss[loss=0.01667, audio_tagging_loss=0.01667, over 25000.00 frames. ], tot_loss[loss=0.01744, audio_tagging_loss=0.01744, over 4952741.47 frames. ], batch size: 100, lr: 2.94e-02, grad_scale: 128.0 2023-12-21 14:49:19,294 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.436e+01 2023-12-21 14:49:21,386 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.98 vs. limit=15.0 2023-12-21 14:49:22,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.86 vs. limit=10.0 2023-12-21 14:49:30,211 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.167e+00 2023-12-21 14:49:39,701 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=12.0 2023-12-21 14:49:53,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=88413.33333333333, ans=0.125 2023-12-21 14:49:56,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2023-12-21 14:49:58,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=88480.0, ans=0.125 2023-12-21 14:50:00,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=88480.0, ans=0.125 2023-12-21 14:50:05,626 INFO [train.py:886] (0/4) Epoch 3, batch 3750, loss[loss=0.02052, audio_tagging_loss=0.02052, over 25000.00 frames. ], tot_loss[loss=0.01762, audio_tagging_loss=0.01762, over 4956981.04 frames. ], batch size: 100, lr: 2.94e-02, grad_scale: 128.0 2023-12-21 14:50:07,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=88546.66666666667, ans=0.05 2023-12-21 14:50:08,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=88546.66666666667, ans=0.2 2023-12-21 14:50:14,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=88613.33333333333, ans=0.05 2023-12-21 14:50:23,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=88613.33333333333, ans=0.2 2023-12-21 14:50:29,285 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.571e+01 2.734e+01 2.974e+01 3.491e+01, threshold=5.468e+01, percent-clipped=0.0 2023-12-21 14:50:48,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=88813.33333333333, ans=0.0 2023-12-21 14:50:57,877 INFO [train.py:886] (0/4) Epoch 3, batch 3800, loss[loss=0.0193, audio_tagging_loss=0.0193, over 24750.00 frames. ], tot_loss[loss=0.01777, audio_tagging_loss=0.01777, over 4948325.60 frames. ], batch size: 99, lr: 2.94e-02, grad_scale: 128.0 2023-12-21 14:51:18,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=89013.33333333333, ans=0.2 2023-12-21 14:51:31,128 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 14:51:33,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2023-12-21 14:51:34,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=89080.0, ans=0.0 2023-12-21 14:51:44,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=21.20 vs. limit=22.5 2023-12-21 14:51:49,279 INFO [train.py:886] (0/4) Epoch 3, batch 3850, loss[loss=0.01509, audio_tagging_loss=0.01509, over 25000.00 frames. ], tot_loss[loss=0.01766, audio_tagging_loss=0.01766, over 4950828.26 frames. ], batch size: 100, lr: 2.93e-02, grad_scale: 128.0 2023-12-21 14:51:56,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=89213.33333333333, ans=0.125 2023-12-21 14:51:59,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=89280.0, ans=0.0 2023-12-21 14:52:03,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=89280.0, ans=0.2 2023-12-21 14:52:03,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89280.0, ans=0.1 2023-12-21 14:52:09,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.87 vs. limit=22.5 2023-12-21 14:52:12,666 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.537e+01 2.733e+01 2.945e+01 4.366e+01, threshold=5.467e+01, percent-clipped=0.0 2023-12-21 14:52:12,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=89346.66666666667, ans=0.125 2023-12-21 14:52:35,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.08 vs. limit=15.0 2023-12-21 14:52:36,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=89480.0, ans=0.0 2023-12-21 14:52:39,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=89480.0, ans=0.0 2023-12-21 14:52:40,927 INFO [train.py:886] (0/4) Epoch 3, batch 3900, loss[loss=0.01676, audio_tagging_loss=0.01676, over 24750.00 frames. ], tot_loss[loss=0.0176, audio_tagging_loss=0.0176, over 4943825.42 frames. ], batch size: 99, lr: 2.93e-02, grad_scale: 128.0 2023-12-21 14:52:41,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.21 vs. limit=22.5 2023-12-21 14:52:54,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.12 vs. limit=10.0 2023-12-21 14:52:55,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=89613.33333333333, ans=0.125 2023-12-21 14:52:57,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=89613.33333333333, ans=0.0 2023-12-21 14:53:01,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=89680.0, ans=0.125 2023-12-21 14:53:02,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.63 vs. limit=15.0 2023-12-21 14:53:03,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=89680.0, ans=10.0 2023-12-21 14:53:06,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=89680.0, ans=0.125 2023-12-21 14:53:06,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=89680.0, ans=0.125 2023-12-21 14:53:09,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.19 vs. limit=10.0 2023-12-21 14:53:22,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=89813.33333333333, ans=0.125 2023-12-21 14:53:32,077 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.57 vs. limit=22.5 2023-12-21 14:53:33,205 INFO [train.py:886] (0/4) Epoch 3, batch 3950, loss[loss=0.01823, audio_tagging_loss=0.01823, over 25000.00 frames. ], tot_loss[loss=0.01757, audio_tagging_loss=0.01757, over 4950408.92 frames. ], batch size: 100, lr: 2.92e-02, grad_scale: 128.0 2023-12-21 14:53:43,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=89946.66666666667, ans=0.0 2023-12-21 14:53:57,090 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.521e+01 2.746e+01 2.975e+01 3.800e+01, threshold=5.493e+01, percent-clipped=0.0 2023-12-21 14:53:58,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=90013.33333333333, ans=0.125 2023-12-21 14:54:24,088 INFO [train.py:886] (0/4) Epoch 3, batch 4000, loss[loss=0.01522, audio_tagging_loss=0.01522, over 22521.00 frames. ], tot_loss[loss=0.01753, audio_tagging_loss=0.01753, over 4953523.69 frames. ], batch size: 107, lr: 2.92e-02, grad_scale: 128.0 2023-12-21 14:54:26,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=90213.33333333333, ans=0.1 2023-12-21 14:54:57,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=90413.33333333333, ans=10.0 2023-12-21 14:55:16,287 INFO [train.py:886] (0/4) Epoch 3, batch 4050, loss[loss=0.02075, audio_tagging_loss=0.02075, over 24750.00 frames. ], tot_loss[loss=0.01759, audio_tagging_loss=0.01759, over 4954065.06 frames. ], batch size: 99, lr: 2.92e-02, grad_scale: 256.0 2023-12-21 14:55:27,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=90613.33333333333, ans=0.07 2023-12-21 14:55:29,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=90613.33333333333, ans=0.0 2023-12-21 14:55:29,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=90613.33333333333, ans=0.125 2023-12-21 14:55:30,119 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.36 vs. limit=15.0 2023-12-21 14:55:33,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=90613.33333333333, ans=0.0 2023-12-21 14:55:35,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=90613.33333333333, ans=0.125 2023-12-21 14:55:39,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=90680.0, ans=0.1 2023-12-21 14:55:39,790 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.147e+01 2.624e+01 2.857e+01 3.103e+01 4.116e+01, threshold=5.714e+01, percent-clipped=0.0 2023-12-21 14:55:42,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=90680.0, ans=0.0 2023-12-21 14:55:47,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=90746.66666666667, ans=0.125 2023-12-21 14:55:52,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=90746.66666666667, ans=0.125 2023-12-21 14:55:55,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=90746.66666666667, ans=0.125 2023-12-21 14:56:08,201 INFO [train.py:886] (0/4) Epoch 3, batch 4100, loss[loss=0.0184, audio_tagging_loss=0.0184, over 25000.00 frames. ], tot_loss[loss=0.01774, audio_tagging_loss=0.01774, over 4948406.04 frames. ], batch size: 100, lr: 2.91e-02, grad_scale: 256.0 2023-12-21 14:56:13,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.57 vs. limit=15.0 2023-12-21 14:56:20,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=90946.66666666667, ans=0.0 2023-12-21 14:56:21,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=90946.66666666667, ans=0.125 2023-12-21 14:56:25,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90946.66666666667, ans=0.1 2023-12-21 14:56:37,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2023-12-21 14:56:38,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=91080.0, ans=15.0 2023-12-21 14:56:44,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=91080.0, ans=0.125 2023-12-21 14:56:53,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=91146.66666666667, ans=0.125 2023-12-21 14:56:59,600 INFO [train.py:886] (0/4) Epoch 3, batch 4150, loss[loss=0.01538, audio_tagging_loss=0.01538, over 25000.00 frames. ], tot_loss[loss=0.01769, audio_tagging_loss=0.01769, over 4941150.73 frames. ], batch size: 100, lr: 2.91e-02, grad_scale: 256.0 2023-12-21 14:57:00,082 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.12 vs. limit=15.0 2023-12-21 14:57:06,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.12 vs. limit=15.0 2023-12-21 14:57:22,808 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.56 vs. limit=15.0 2023-12-21 14:57:24,202 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.103e+01 2.605e+01 2.901e+01 3.180e+01 4.178e+01, threshold=5.801e+01, percent-clipped=0.0 2023-12-21 14:57:29,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=20.37 vs. limit=15.0 2023-12-21 14:57:31,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=91413.33333333333, ans=0.125 2023-12-21 14:57:37,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=91413.33333333333, ans=0.035 2023-12-21 14:57:45,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91480.0, ans=0.1 2023-12-21 14:57:52,735 INFO [train.py:886] (0/4) Epoch 3, batch 4200, loss[loss=0.01661, audio_tagging_loss=0.01661, over 24750.00 frames. ], tot_loss[loss=0.01759, audio_tagging_loss=0.01759, over 4938099.27 frames. ], batch size: 99, lr: 2.90e-02, grad_scale: 256.0 2023-12-21 14:57:59,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=91546.66666666667, ans=0.0 2023-12-21 14:58:09,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=91613.33333333333, ans=0.125 2023-12-21 14:58:13,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.84 vs. limit=15.0 2023-12-21 14:58:22,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=91746.66666666667, ans=0.2 2023-12-21 14:58:26,080 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.94 vs. limit=22.5 2023-12-21 14:58:35,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=91813.33333333333, ans=0.125 2023-12-21 14:58:40,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=91813.33333333333, ans=0.07 2023-12-21 14:58:42,613 INFO [train.py:886] (0/4) Epoch 3, batch 4250, loss[loss=0.0187, audio_tagging_loss=0.0187, over 25000.00 frames. ], tot_loss[loss=0.01748, audio_tagging_loss=0.01748, over 4949237.36 frames. ], batch size: 100, lr: 2.90e-02, grad_scale: 256.0 2023-12-21 14:58:58,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=91946.66666666667, ans=0.125 2023-12-21 14:59:04,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2023-12-21 14:59:07,471 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.544e+01 2.694e+01 3.014e+01 4.277e+01, threshold=5.388e+01, percent-clipped=0.0 2023-12-21 14:59:07,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.30 vs. limit=22.5 2023-12-21 14:59:17,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=92080.0, ans=0.1 2023-12-21 14:59:35,998 INFO [train.py:886] (0/4) Epoch 3, batch 4300, loss[loss=0.01662, audio_tagging_loss=0.01662, over 25000.00 frames. ], tot_loss[loss=0.0175, audio_tagging_loss=0.0175, over 4955861.46 frames. ], batch size: 100, lr: 2.89e-02, grad_scale: 128.0 2023-12-21 14:59:45,805 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.977e+01 2023-12-21 14:59:53,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=92280.0, ans=0.0 2023-12-21 15:00:13,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=92413.33333333333, ans=0.0 2023-12-21 15:00:13,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2023-12-21 15:00:23,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=92480.0, ans=0.0 2023-12-21 15:00:26,771 INFO [train.py:886] (0/4) Epoch 3, batch 4350, loss[loss=0.02014, audio_tagging_loss=0.02014, over 25000.00 frames. ], tot_loss[loss=0.01763, audio_tagging_loss=0.01763, over 4958803.05 frames. ], batch size: 100, lr: 2.89e-02, grad_scale: 128.0 2023-12-21 15:00:28,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=92546.66666666667, ans=0.2 2023-12-21 15:00:29,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=92546.66666666667, ans=0.125 2023-12-21 15:00:29,949 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2023-12-21 15:00:32,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=92546.66666666667, ans=0.0 2023-12-21 15:00:50,894 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.529e+01 2.731e+01 2.913e+01 4.342e+01, threshold=5.462e+01, percent-clipped=0.0 2023-12-21 15:00:51,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=92680.0, ans=0.015 2023-12-21 15:00:58,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=92746.66666666667, ans=0.0 2023-12-21 15:01:03,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.24 vs. limit=22.5 2023-12-21 15:01:17,662 INFO [train.py:886] (0/4) Epoch 3, batch 4400, loss[loss=0.01686, audio_tagging_loss=0.01686, over 24750.00 frames. ], tot_loss[loss=0.01761, audio_tagging_loss=0.01761, over 4953380.75 frames. ], batch size: 99, lr: 2.89e-02, grad_scale: 128.0 2023-12-21 15:01:25,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=92880.0, ans=0.125 2023-12-21 15:01:32,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=92946.66666666667, ans=0.1 2023-12-21 15:01:39,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=93013.33333333333, ans=0.125 2023-12-21 15:02:04,270 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.95 vs. limit=22.5 2023-12-21 15:02:10,666 INFO [train.py:886] (0/4) Epoch 3, batch 4450, loss[loss=0.01616, audio_tagging_loss=0.01616, over 25000.00 frames. ], tot_loss[loss=0.01754, audio_tagging_loss=0.01754, over 4953097.89 frames. ], batch size: 100, lr: 2.88e-02, grad_scale: 128.0 2023-12-21 15:02:34,337 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.637e+01 2.853e+01 3.131e+01 4.120e+01, threshold=5.707e+01, percent-clipped=0.0 2023-12-21 15:02:36,140 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 15:02:39,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.94 vs. limit=22.5 2023-12-21 15:02:39,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.73 vs. limit=15.0 2023-12-21 15:02:48,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=93413.33333333333, ans=0.07 2023-12-21 15:03:00,739 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2023-12-21 15:03:02,128 INFO [train.py:886] (0/4) Epoch 3, batch 4500, loss[loss=0.01578, audio_tagging_loss=0.01578, over 25000.00 frames. ], tot_loss[loss=0.01749, audio_tagging_loss=0.01749, over 4956890.96 frames. ], batch size: 100, lr: 2.88e-02, grad_scale: 128.0 2023-12-21 15:03:05,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=93546.66666666667, ans=0.0 2023-12-21 15:03:14,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.91 vs. limit=10.0 2023-12-21 15:03:21,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=93613.33333333333, ans=22.5 2023-12-21 15:03:22,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2023-12-21 15:03:24,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=93680.0, ans=0.0 2023-12-21 15:03:34,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.88 vs. limit=22.5 2023-12-21 15:03:35,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=93746.66666666667, ans=0.125 2023-12-21 15:03:54,128 INFO [train.py:886] (0/4) Epoch 3, batch 4550, loss[loss=0.0208, audio_tagging_loss=0.0208, over 25000.00 frames. ], tot_loss[loss=0.01745, audio_tagging_loss=0.01745, over 4951143.28 frames. ], batch size: 100, lr: 2.87e-02, grad_scale: 128.0 2023-12-21 15:04:09,251 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.14 vs. limit=15.0 2023-12-21 15:04:18,729 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.146e+01 2.554e+01 2.788e+01 2.993e+01 3.924e+01, threshold=5.575e+01, percent-clipped=0.0 2023-12-21 15:04:19,283 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=19.98 vs. limit=15.0 2023-12-21 15:04:30,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.21 vs. limit=15.0 2023-12-21 15:04:45,096 INFO [train.py:886] (0/4) Epoch 3, batch 4600, loss[loss=0.01698, audio_tagging_loss=0.01698, over 24750.00 frames. ], tot_loss[loss=0.01741, audio_tagging_loss=0.01741, over 4949056.22 frames. ], batch size: 99, lr: 2.87e-02, grad_scale: 128.0 2023-12-21 15:04:50,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=94213.33333333333, ans=0.0 2023-12-21 15:04:55,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2023-12-21 15:04:57,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=94280.0, ans=0.2 2023-12-21 15:05:32,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=94480.0, ans=0.125 2023-12-21 15:05:32,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=6.0 2023-12-21 15:05:37,594 INFO [train.py:886] (0/4) Epoch 3, batch 4650, loss[loss=0.01778, audio_tagging_loss=0.01778, over 25000.00 frames. ], tot_loss[loss=0.0175, audio_tagging_loss=0.0175, over 4954500.19 frames. ], batch size: 100, lr: 2.87e-02, grad_scale: 128.0 2023-12-21 15:05:45,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94546.66666666667, ans=0.1 2023-12-21 15:06:00,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.14 vs. limit=15.0 2023-12-21 15:06:03,478 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.559e+01 2.807e+01 3.071e+01 3.874e+01, threshold=5.614e+01, percent-clipped=0.0 2023-12-21 15:06:04,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.21 vs. limit=22.5 2023-12-21 15:06:08,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=94746.66666666667, ans=0.1 2023-12-21 15:06:09,254 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.707e+00 2023-12-21 15:06:17,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=94746.66666666667, ans=0.125 2023-12-21 15:06:20,469 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.95 vs. limit=15.0 2023-12-21 15:06:24,912 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.29 vs. limit=22.5 2023-12-21 15:06:28,127 INFO [train.py:886] (0/4) Epoch 3, batch 4700, loss[loss=0.01722, audio_tagging_loss=0.01722, over 24750.00 frames. ], tot_loss[loss=0.01764, audio_tagging_loss=0.01764, over 4955157.52 frames. ], batch size: 99, lr: 2.86e-02, grad_scale: 128.0 2023-12-21 15:06:28,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=94880.0, ans=0.07 2023-12-21 15:06:42,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.59 vs. limit=22.5 2023-12-21 15:06:43,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=94946.66666666667, ans=0.09899494936611666 2023-12-21 15:06:46,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=95013.33333333333, ans=0.125 2023-12-21 15:07:00,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=95080.0, ans=0.125 2023-12-21 15:07:03,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=95080.0, ans=0.09899494936611666 2023-12-21 15:07:07,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=95146.66666666667, ans=0.125 2023-12-21 15:07:15,307 INFO [train.py:886] (0/4) Epoch 3, batch 4750, loss[loss=0.01511, audio_tagging_loss=0.01511, over 24750.00 frames. ], tot_loss[loss=0.0177, audio_tagging_loss=0.0177, over 4954308.44 frames. ], batch size: 99, lr: 2.86e-02, grad_scale: 128.0 2023-12-21 15:07:16,512 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.29 vs. limit=15.0 2023-12-21 15:07:30,867 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-3.pt 2023-12-21 15:07:52,988 INFO [train.py:886] (0/4) Epoch 4, batch 0, loss[loss=0.04866, audio_tagging_loss=0.04866, over 21403.00 frames. ], tot_loss[loss=0.04866, audio_tagging_loss=0.04866, over 21403.00 frames. ], batch size: 107, lr: 2.67e-02, grad_scale: 128.0 2023-12-21 15:07:52,989 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 15:08:16,368 INFO [train.py:917] (0/4) Epoch 4, validation: loss=0.03936, audio_tagging_loss=0.03936, over 3737520.00 frames. 2023-12-21 15:08:16,368 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 15:08:25,266 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.624e+01 2.822e+01 3.250e+01 1.153e+02, threshold=5.643e+01, percent-clipped=3.0 2023-12-21 15:08:44,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=95453.33333333333, ans=0.125 2023-12-21 15:08:46,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=95520.0, ans=0.125 2023-12-21 15:08:54,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=95520.0, ans=0.0 2023-12-21 15:09:00,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=95586.66666666667, ans=0.1 2023-12-21 15:09:05,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=15.0 2023-12-21 15:09:08,005 INFO [train.py:886] (0/4) Epoch 4, batch 50, loss[loss=0.02273, audio_tagging_loss=0.02273, over 25000.00 frames. ], tot_loss[loss=0.02771, audio_tagging_loss=0.02771, over 1118185.13 frames. ], batch size: 100, lr: 2.67e-02, grad_scale: 128.0 2023-12-21 15:09:13,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.76 vs. limit=12.0 2023-12-21 15:09:15,722 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2023-12-21 15:09:15,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2023-12-21 15:09:20,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=95720.0, ans=0.125 2023-12-21 15:09:23,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=95720.0, ans=0.1 2023-12-21 15:09:36,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=95786.66666666667, ans=0.0 2023-12-21 15:09:38,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=95853.33333333333, ans=0.125 2023-12-21 15:09:42,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=95853.33333333333, ans=0.1 2023-12-21 15:09:54,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=12.0 2023-12-21 15:09:57,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.62 vs. limit=10.0 2023-12-21 15:10:00,220 INFO [train.py:886] (0/4) Epoch 4, batch 100, loss[loss=0.01936, audio_tagging_loss=0.01936, over 25000.00 frames. ], tot_loss[loss=0.02385, audio_tagging_loss=0.02385, over 1973255.51 frames. ], batch size: 100, lr: 2.67e-02, grad_scale: 128.0 2023-12-21 15:10:02,565 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.70 vs. limit=10.0 2023-12-21 15:10:03,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=95986.66666666667, ans=0.2 2023-12-21 15:10:04,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=95986.66666666667, ans=0.0 2023-12-21 15:10:06,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=95986.66666666667, ans=0.125 2023-12-21 15:10:07,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.40 vs. limit=10.0 2023-12-21 15:10:08,521 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.427e+01 2.883e+01 3.182e+01 3.510e+01 4.274e+01, threshold=6.364e+01, percent-clipped=0.0 2023-12-21 15:10:08,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=95986.66666666667, ans=0.125 2023-12-21 15:10:21,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=96120.0, ans=0.125 2023-12-21 15:10:47,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=96253.33333333333, ans=0.015 2023-12-21 15:10:51,517 INFO [train.py:886] (0/4) Epoch 4, batch 150, loss[loss=0.01841, audio_tagging_loss=0.01841, over 25000.00 frames. ], tot_loss[loss=0.02158, audio_tagging_loss=0.02158, over 2636722.46 frames. ], batch size: 100, lr: 2.66e-02, grad_scale: 128.0 2023-12-21 15:10:56,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=96320.0, ans=0.125 2023-12-21 15:11:06,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=96386.66666666667, ans=15.0 2023-12-21 15:11:13,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=96453.33333333333, ans=0.04949747468305833 2023-12-21 15:11:23,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.94 vs. limit=10.0 2023-12-21 15:11:27,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.96 vs. limit=12.0 2023-12-21 15:11:44,277 INFO [train.py:886] (0/4) Epoch 4, batch 200, loss[loss=0.01751, audio_tagging_loss=0.01751, over 25000.00 frames. ], tot_loss[loss=0.02038, audio_tagging_loss=0.02038, over 3153169.01 frames. ], batch size: 100, lr: 2.66e-02, grad_scale: 128.0 2023-12-21 15:11:51,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=96653.33333333333, ans=0.05 2023-12-21 15:11:51,858 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+01 2.596e+01 2.833e+01 2.992e+01 3.762e+01, threshold=5.666e+01, percent-clipped=0.0 2023-12-21 15:12:35,176 INFO [train.py:886] (0/4) Epoch 4, batch 250, loss[loss=0.01914, audio_tagging_loss=0.01914, over 25000.00 frames. ], tot_loss[loss=0.01959, audio_tagging_loss=0.01959, over 3556204.35 frames. ], batch size: 100, lr: 2.65e-02, grad_scale: 128.0 2023-12-21 15:12:52,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.65 vs. limit=15.0 2023-12-21 15:12:53,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=97053.33333333333, ans=0.125 2023-12-21 15:12:56,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.81 vs. limit=22.5 2023-12-21 15:13:07,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=97186.66666666667, ans=0.1 2023-12-21 15:13:17,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2023-12-21 15:13:26,981 INFO [train.py:886] (0/4) Epoch 4, batch 300, loss[loss=0.01814, audio_tagging_loss=0.01814, over 24750.00 frames. ], tot_loss[loss=0.01905, audio_tagging_loss=0.01905, over 3863069.85 frames. ], batch size: 99, lr: 2.65e-02, grad_scale: 128.0 2023-12-21 15:13:34,777 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.584e+01 2.801e+01 3.020e+01 3.817e+01, threshold=5.601e+01, percent-clipped=0.0 2023-12-21 15:13:59,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=97520.0, ans=0.2 2023-12-21 15:14:18,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=97653.33333333333, ans=0.125 2023-12-21 15:14:19,494 INFO [train.py:886] (0/4) Epoch 4, batch 350, loss[loss=0.01704, audio_tagging_loss=0.01704, over 25000.00 frames. ], tot_loss[loss=0.01869, audio_tagging_loss=0.01869, over 4095402.61 frames. ], batch size: 100, lr: 2.65e-02, grad_scale: 128.0 2023-12-21 15:14:39,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=97786.66666666667, ans=0.125 2023-12-21 15:14:41,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=97786.66666666667, ans=0.0 2023-12-21 15:14:44,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.93 vs. limit=15.0 2023-12-21 15:14:57,963 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.17 vs. limit=10.0 2023-12-21 15:15:01,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=97920.0, ans=10.0 2023-12-21 15:15:09,508 INFO [train.py:886] (0/4) Epoch 4, batch 400, loss[loss=0.02309, audio_tagging_loss=0.02309, over 21999.00 frames. ], tot_loss[loss=0.01835, audio_tagging_loss=0.01835, over 4282663.06 frames. ], batch size: 107, lr: 2.64e-02, grad_scale: 128.0 2023-12-21 15:15:11,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.89 vs. limit=22.5 2023-12-21 15:15:18,625 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.499e+01 2.678e+01 2.862e+01 4.047e+01, threshold=5.355e+01, percent-clipped=0.0 2023-12-21 15:15:47,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=98186.66666666667, ans=0.125 2023-12-21 15:15:49,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.79 vs. limit=22.5 2023-12-21 15:15:50,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=98253.33333333333, ans=0.125 2023-12-21 15:15:58,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=98253.33333333333, ans=0.0 2023-12-21 15:16:01,870 INFO [train.py:886] (0/4) Epoch 4, batch 450, loss[loss=0.01467, audio_tagging_loss=0.01467, over 25000.00 frames. ], tot_loss[loss=0.01797, audio_tagging_loss=0.01797, over 4419077.13 frames. ], batch size: 100, lr: 2.64e-02, grad_scale: 128.0 2023-12-21 15:16:13,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=15.0 2023-12-21 15:16:21,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=98453.33333333333, ans=0.125 2023-12-21 15:16:24,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=98453.33333333333, ans=0.1 2023-12-21 15:16:38,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=98520.0, ans=0.0 2023-12-21 15:16:50,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=98586.66666666667, ans=0.0 2023-12-21 15:16:52,052 INFO [train.py:886] (0/4) Epoch 4, batch 500, loss[loss=0.01526, audio_tagging_loss=0.01526, over 24750.00 frames. ], tot_loss[loss=0.01781, audio_tagging_loss=0.01781, over 4541120.14 frames. ], batch size: 99, lr: 2.64e-02, grad_scale: 128.0 2023-12-21 15:16:54,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=98653.33333333333, ans=10.0 2023-12-21 15:17:02,592 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.476e+01 2.666e+01 2.895e+01 4.028e+01, threshold=5.332e+01, percent-clipped=0.0 2023-12-21 15:17:06,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.50 vs. limit=15.0 2023-12-21 15:17:15,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=98786.66666666667, ans=0.125 2023-12-21 15:17:18,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=98786.66666666667, ans=0.2 2023-12-21 15:17:25,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=98853.33333333333, ans=0.125 2023-12-21 15:17:44,834 INFO [train.py:886] (0/4) Epoch 4, batch 550, loss[loss=0.01891, audio_tagging_loss=0.01891, over 25000.00 frames. ], tot_loss[loss=0.01767, audio_tagging_loss=0.01767, over 4634718.00 frames. ], batch size: 100, lr: 2.63e-02, grad_scale: 128.0 2023-12-21 15:17:51,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=98986.66666666667, ans=0.125 2023-12-21 15:17:55,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=99053.33333333333, ans=0.0 2023-12-21 15:18:02,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.11 vs. limit=10.0 2023-12-21 15:18:07,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=99120.0, ans=0.04949747468305833 2023-12-21 15:18:08,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=99120.0, ans=0.0 2023-12-21 15:18:11,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=99120.0, ans=0.0 2023-12-21 15:18:20,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=99186.66666666667, ans=0.125 2023-12-21 15:18:23,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=99186.66666666667, ans=0.125 2023-12-21 15:18:36,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=99320.0, ans=0.125 2023-12-21 15:18:37,710 INFO [train.py:886] (0/4) Epoch 4, batch 600, loss[loss=0.01948, audio_tagging_loss=0.01948, over 25000.00 frames. ], tot_loss[loss=0.0176, audio_tagging_loss=0.0176, over 4706908.54 frames. ], batch size: 100, lr: 2.63e-02, grad_scale: 128.0 2023-12-21 15:18:38,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=99320.0, ans=0.125 2023-12-21 15:18:45,281 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.158e+01 2.527e+01 2.841e+01 3.010e+01 4.382e+01, threshold=5.682e+01, percent-clipped=0.0 2023-12-21 15:18:49,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=99386.66666666667, ans=0.0 2023-12-21 15:18:57,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=99453.33333333333, ans=0.125 2023-12-21 15:19:19,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=99586.66666666667, ans=0.0 2023-12-21 15:19:27,909 INFO [train.py:886] (0/4) Epoch 4, batch 650, loss[loss=0.01906, audio_tagging_loss=0.01906, over 24750.00 frames. ], tot_loss[loss=0.01763, audio_tagging_loss=0.01763, over 4758760.58 frames. ], batch size: 99, lr: 2.63e-02, grad_scale: 128.0 2023-12-21 15:19:31,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=99653.33333333333, ans=0.125 2023-12-21 15:19:33,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=99653.33333333333, ans=0.0 2023-12-21 15:19:58,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=99853.33333333333, ans=0.1 2023-12-21 15:20:19,119 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 15:20:19,904 INFO [train.py:886] (0/4) Epoch 4, batch 700, loss[loss=0.01936, audio_tagging_loss=0.01936, over 22339.00 frames. ], tot_loss[loss=0.01763, audio_tagging_loss=0.01763, over 4796401.78 frames. ], batch size: 107, lr: 2.62e-02, grad_scale: 128.0 2023-12-21 15:20:27,573 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.543e+01 2.759e+01 3.003e+01 3.794e+01, threshold=5.518e+01, percent-clipped=0.0 2023-12-21 15:20:32,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=100053.33333333333, ans=0.2 2023-12-21 15:21:01,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-12-21 15:21:06,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=100253.33333333333, ans=0.1 2023-12-21 15:21:07,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=100253.33333333333, ans=0.0 2023-12-21 15:21:09,201 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 15:21:12,372 INFO [train.py:886] (0/4) Epoch 4, batch 750, loss[loss=0.01652, audio_tagging_loss=0.01652, over 25000.00 frames. ], tot_loss[loss=0.01744, audio_tagging_loss=0.01744, over 4830994.06 frames. ], batch size: 100, lr: 2.62e-02, grad_scale: 128.0 2023-12-21 15:21:24,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.30 vs. limit=15.0 2023-12-21 15:21:25,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=100386.66666666667, ans=0.0 2023-12-21 15:21:25,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=100386.66666666667, ans=0.0 2023-12-21 15:21:28,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=100386.66666666667, ans=0.0 2023-12-21 15:21:33,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-12-21 15:21:46,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=100520.0, ans=10.0 2023-12-21 15:21:54,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=100586.66666666667, ans=0.125 2023-12-21 15:22:03,767 INFO [train.py:886] (0/4) Epoch 4, batch 800, loss[loss=0.01435, audio_tagging_loss=0.01435, over 25000.00 frames. ], tot_loss[loss=0.01725, audio_tagging_loss=0.01725, over 4859132.72 frames. ], batch size: 100, lr: 2.62e-02, grad_scale: 128.0 2023-12-21 15:22:04,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=100653.33333333333, ans=0.2 2023-12-21 15:22:05,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=100653.33333333333, ans=0.125 2023-12-21 15:22:09,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=100653.33333333333, ans=0.1 2023-12-21 15:22:11,434 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.426e+01 2.587e+01 2.849e+01 3.873e+01, threshold=5.173e+01, percent-clipped=0.0 2023-12-21 15:22:33,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.96 vs. limit=22.5 2023-12-21 15:22:41,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=100853.33333333333, ans=0.0 2023-12-21 15:22:46,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=100920.0, ans=0.125 2023-12-21 15:22:52,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2023-12-21 15:22:54,915 INFO [train.py:886] (0/4) Epoch 4, batch 850, loss[loss=0.01731, audio_tagging_loss=0.01731, over 25000.00 frames. ], tot_loss[loss=0.01733, audio_tagging_loss=0.01733, over 4884783.73 frames. ], batch size: 100, lr: 2.61e-02, grad_scale: 128.0 2023-12-21 15:23:02,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=100986.66666666667, ans=0.125 2023-12-21 15:23:03,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.78 vs. limit=15.0 2023-12-21 15:23:11,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=101053.33333333333, ans=0.125 2023-12-21 15:23:13,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=101053.33333333333, ans=0.125 2023-12-21 15:23:31,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=101186.66666666667, ans=0.125 2023-12-21 15:23:36,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=101253.33333333333, ans=0.04949747468305833 2023-12-21 15:23:41,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=101253.33333333333, ans=0.125 2023-12-21 15:23:45,073 INFO [train.py:886] (0/4) Epoch 4, batch 900, loss[loss=0.01852, audio_tagging_loss=0.01852, over 25000.00 frames. ], tot_loss[loss=0.01735, audio_tagging_loss=0.01735, over 4897902.69 frames. ], batch size: 100, lr: 2.61e-02, grad_scale: 128.0 2023-12-21 15:23:49,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=101320.0, ans=0.2 2023-12-21 15:23:54,880 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.627e+01 2.825e+01 3.078e+01 4.421e+01, threshold=5.651e+01, percent-clipped=0.0 2023-12-21 15:23:59,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=101386.66666666667, ans=0.2 2023-12-21 15:24:01,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=101386.66666666667, ans=0.125 2023-12-21 15:24:02,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=101386.66666666667, ans=0.125 2023-12-21 15:24:25,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=101520.0, ans=0.0 2023-12-21 15:24:33,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=101586.66666666667, ans=0.0 2023-12-21 15:24:34,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=101586.66666666667, ans=0.125 2023-12-21 15:24:37,280 INFO [train.py:886] (0/4) Epoch 4, batch 950, loss[loss=0.01904, audio_tagging_loss=0.01904, over 24750.00 frames. ], tot_loss[loss=0.0174, audio_tagging_loss=0.0174, over 4907160.20 frames. ], batch size: 99, lr: 2.60e-02, grad_scale: 128.0 2023-12-21 15:25:04,395 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.68 vs. limit=22.5 2023-12-21 15:25:11,500 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 15:25:18,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2023-12-21 15:25:29,006 INFO [train.py:886] (0/4) Epoch 4, batch 1000, loss[loss=0.01953, audio_tagging_loss=0.01953, over 25000.00 frames. ], tot_loss[loss=0.01747, audio_tagging_loss=0.01747, over 4914688.47 frames. ], batch size: 100, lr: 2.60e-02, grad_scale: 128.0 2023-12-21 15:25:33,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=101986.66666666667, ans=0.0 2023-12-21 15:25:35,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=101986.66666666667, ans=0.1 2023-12-21 15:25:35,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.76 vs. limit=22.5 2023-12-21 15:25:36,516 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.499e+01 2.686e+01 2.949e+01 3.703e+01, threshold=5.372e+01, percent-clipped=0.0 2023-12-21 15:25:38,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.40 vs. limit=15.0 2023-12-21 15:25:52,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=102120.0, ans=0.0 2023-12-21 15:25:59,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=102186.66666666667, ans=0.125 2023-12-21 15:26:13,577 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2023-12-21 15:26:14,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102253.33333333333, ans=0.1 2023-12-21 15:26:17,644 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.06 vs. limit=22.5 2023-12-21 15:26:19,846 INFO [train.py:886] (0/4) Epoch 4, batch 1050, loss[loss=0.0133, audio_tagging_loss=0.0133, over 24750.00 frames. ], tot_loss[loss=0.01729, audio_tagging_loss=0.01729, over 4921075.58 frames. ], batch size: 99, lr: 2.60e-02, grad_scale: 128.0 2023-12-21 15:26:37,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=102386.66666666667, ans=0.0 2023-12-21 15:26:57,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=102520.0, ans=0.0 2023-12-21 15:27:00,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=102586.66666666667, ans=0.0 2023-12-21 15:27:08,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=102586.66666666667, ans=0.0 2023-12-21 15:27:11,324 INFO [train.py:886] (0/4) Epoch 4, batch 1100, loss[loss=0.01713, audio_tagging_loss=0.01713, over 24750.00 frames. ], tot_loss[loss=0.01734, audio_tagging_loss=0.01734, over 4926130.91 frames. ], batch size: 99, lr: 2.59e-02, grad_scale: 128.0 2023-12-21 15:27:13,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=102653.33333333333, ans=0.0 2023-12-21 15:27:19,904 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.582e+01 2.804e+01 3.074e+01 3.939e+01, threshold=5.607e+01, percent-clipped=0.0 2023-12-21 15:27:21,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=102720.0, ans=0.125 2023-12-21 15:27:38,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=102786.66666666667, ans=0.125 2023-12-21 15:27:41,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=102853.33333333333, ans=0.125 2023-12-21 15:27:48,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=102853.33333333333, ans=0.09899494936611666 2023-12-21 15:27:56,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102920.0, ans=0.1 2023-12-21 15:27:58,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2023-12-21 15:27:59,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=102920.0, ans=0.125 2023-12-21 15:28:02,455 INFO [train.py:886] (0/4) Epoch 4, batch 1150, loss[loss=0.01786, audio_tagging_loss=0.01786, over 25000.00 frames. ], tot_loss[loss=0.01733, audio_tagging_loss=0.01733, over 4935654.12 frames. ], batch size: 100, lr: 2.59e-02, grad_scale: 128.0 2023-12-21 15:28:16,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=103053.33333333333, ans=0.125 2023-12-21 15:28:40,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=103186.66666666667, ans=0.125 2023-12-21 15:28:47,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=103253.33333333333, ans=0.04949747468305833 2023-12-21 15:28:53,961 INFO [train.py:886] (0/4) Epoch 4, batch 1200, loss[loss=0.01493, audio_tagging_loss=0.01493, over 24750.00 frames. ], tot_loss[loss=0.01738, audio_tagging_loss=0.01738, over 4937409.83 frames. ], batch size: 99, lr: 2.59e-02, grad_scale: 128.0 2023-12-21 15:28:54,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=103320.0, ans=0.125 2023-12-21 15:28:56,502 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2023-12-21 15:29:02,238 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.608e+01 2.769e+01 2.975e+01 3.396e+01, threshold=5.537e+01, percent-clipped=0.0 2023-12-21 15:29:07,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=103386.66666666667, ans=0.125 2023-12-21 15:29:18,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=103453.33333333333, ans=0.0 2023-12-21 15:29:33,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=103520.0, ans=0.2 2023-12-21 15:29:36,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.75 vs. limit=10.0 2023-12-21 15:29:46,013 INFO [train.py:886] (0/4) Epoch 4, batch 1250, loss[loss=0.01866, audio_tagging_loss=0.01866, over 24750.00 frames. ], tot_loss[loss=0.01742, audio_tagging_loss=0.01742, over 4941285.99 frames. ], batch size: 99, lr: 2.58e-02, grad_scale: 128.0 2023-12-21 15:29:46,195 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 15:29:48,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=103653.33333333333, ans=0.0 2023-12-21 15:29:53,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=103653.33333333333, ans=0.0 2023-12-21 15:30:04,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=103786.66666666667, ans=0.125 2023-12-21 15:30:05,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=103786.66666666667, ans=0.125 2023-12-21 15:30:05,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=103786.66666666667, ans=0.125 2023-12-21 15:30:12,251 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.14 vs. limit=22.5 2023-12-21 15:30:18,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=103853.33333333333, ans=0.0 2023-12-21 15:30:30,859 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.672e-02 2023-12-21 15:30:31,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=103920.0, ans=0.0 2023-12-21 15:30:35,963 INFO [train.py:886] (0/4) Epoch 4, batch 1300, loss[loss=0.01417, audio_tagging_loss=0.01417, over 25000.00 frames. ], tot_loss[loss=0.01748, audio_tagging_loss=0.01748, over 4938293.69 frames. ], batch size: 100, lr: 2.58e-02, grad_scale: 128.0 2023-12-21 15:30:45,066 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.570e+01 2.794e+01 3.000e+01 3.597e+01, threshold=5.589e+01, percent-clipped=0.0 2023-12-21 15:30:47,198 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 15:30:50,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=104053.33333333333, ans=0.2 2023-12-21 15:31:08,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=104186.66666666667, ans=0.0 2023-12-21 15:31:09,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=104186.66666666667, ans=0.125 2023-12-21 15:31:22,194 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.61 vs. limit=15.0 2023-12-21 15:31:27,400 INFO [train.py:886] (0/4) Epoch 4, batch 1350, loss[loss=0.02087, audio_tagging_loss=0.02087, over 24750.00 frames. ], tot_loss[loss=0.01741, audio_tagging_loss=0.01741, over 4934446.84 frames. ], batch size: 99, lr: 2.58e-02, grad_scale: 128.0 2023-12-21 15:31:32,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=104320.0, ans=0.2 2023-12-21 15:32:01,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=104520.0, ans=0.0 2023-12-21 15:32:19,154 INFO [train.py:886] (0/4) Epoch 4, batch 1400, loss[loss=0.01673, audio_tagging_loss=0.01673, over 25000.00 frames. ], tot_loss[loss=0.01738, audio_tagging_loss=0.01738, over 4944003.88 frames. ], batch size: 100, lr: 2.57e-02, grad_scale: 128.0 2023-12-21 15:32:19,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=104653.33333333333, ans=0.125 2023-12-21 15:32:26,648 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.479e+01 2.726e+01 3.027e+01 3.760e+01, threshold=5.453e+01, percent-clipped=0.0 2023-12-21 15:32:28,172 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=15.0 2023-12-21 15:32:33,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=104720.0, ans=0.1 2023-12-21 15:32:43,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.70 vs. limit=10.0 2023-12-21 15:32:57,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=104853.33333333333, ans=0.125 2023-12-21 15:33:06,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=104920.0, ans=0.1 2023-12-21 15:33:08,516 INFO [train.py:886] (0/4) Epoch 4, batch 1450, loss[loss=0.01578, audio_tagging_loss=0.01578, over 25000.00 frames. ], tot_loss[loss=0.01722, audio_tagging_loss=0.01722, over 4943141.87 frames. ], batch size: 100, lr: 2.57e-02, grad_scale: 128.0 2023-12-21 15:33:08,698 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 15:33:26,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=105053.33333333333, ans=0.125 2023-12-21 15:33:28,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=12.0 2023-12-21 15:33:40,925 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.11 vs. limit=22.5 2023-12-21 15:33:49,698 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-12-21 15:34:00,924 INFO [train.py:886] (0/4) Epoch 4, batch 1500, loss[loss=0.01823, audio_tagging_loss=0.01823, over 25000.00 frames. ], tot_loss[loss=0.01721, audio_tagging_loss=0.01721, over 4952567.98 frames. ], batch size: 100, lr: 2.57e-02, grad_scale: 128.0 2023-12-21 15:34:01,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=105320.0, ans=0.125 2023-12-21 15:34:08,808 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.551e+01 2.727e+01 2.901e+01 3.751e+01, threshold=5.454e+01, percent-clipped=0.0 2023-12-21 15:34:25,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2023-12-21 15:34:26,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=105453.33333333333, ans=0.2 2023-12-21 15:34:42,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=105586.66666666667, ans=0.125 2023-12-21 15:34:50,182 INFO [train.py:886] (0/4) Epoch 4, batch 1550, loss[loss=0.01851, audio_tagging_loss=0.01851, over 24750.00 frames. ], tot_loss[loss=0.01729, audio_tagging_loss=0.01729, over 4951752.82 frames. ], batch size: 99, lr: 2.56e-02, grad_scale: 256.0 2023-12-21 15:35:04,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=105720.0, ans=0.125 2023-12-21 15:35:32,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=105920.0, ans=0.125 2023-12-21 15:35:33,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=105920.0, ans=0.2 2023-12-21 15:35:34,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=105920.0, ans=0.2 2023-12-21 15:35:41,574 INFO [train.py:886] (0/4) Epoch 4, batch 1600, loss[loss=0.01817, audio_tagging_loss=0.01817, over 24750.00 frames. ], tot_loss[loss=0.01736, audio_tagging_loss=0.01736, over 4947631.01 frames. ], batch size: 99, lr: 2.56e-02, grad_scale: 128.0 2023-12-21 15:35:47,594 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-12-21 15:35:49,971 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.541e+01 2.790e+01 3.125e+01 4.127e+01, threshold=5.579e+01, percent-clipped=0.0 2023-12-21 15:35:58,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=106053.33333333333, ans=0.125 2023-12-21 15:36:04,613 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.55 vs. limit=15.0 2023-12-21 15:36:14,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=106186.66666666667, ans=0.125 2023-12-21 15:36:14,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=106186.66666666667, ans=0.125 2023-12-21 15:36:18,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=106186.66666666667, ans=0.125 2023-12-21 15:36:33,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=106320.0, ans=0.125 2023-12-21 15:36:34,462 INFO [train.py:886] (0/4) Epoch 4, batch 1650, loss[loss=0.01861, audio_tagging_loss=0.01861, over 25000.00 frames. ], tot_loss[loss=0.01727, audio_tagging_loss=0.01727, over 4950194.14 frames. ], batch size: 100, lr: 2.56e-02, grad_scale: 128.0 2023-12-21 15:36:34,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.19 vs. limit=15.0 2023-12-21 15:36:40,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=106320.0, ans=0.2 2023-12-21 15:36:52,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=106453.33333333333, ans=0.0 2023-12-21 15:37:06,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=106520.0, ans=0.0 2023-12-21 15:37:24,415 INFO [train.py:886] (0/4) Epoch 4, batch 1700, loss[loss=0.0173, audio_tagging_loss=0.0173, over 25000.00 frames. ], tot_loss[loss=0.01722, audio_tagging_loss=0.01722, over 4947122.54 frames. ], batch size: 100, lr: 2.55e-02, grad_scale: 128.0 2023-12-21 15:37:25,579 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-16000.pt 2023-12-21 15:37:37,932 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.494e+01 2.744e+01 2.970e+01 4.279e+01, threshold=5.487e+01, percent-clipped=0.0 2023-12-21 15:37:49,814 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.90 vs. limit=6.0 2023-12-21 15:37:53,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=106786.66666666667, ans=0.125 2023-12-21 15:38:18,928 INFO [train.py:886] (0/4) Epoch 4, batch 1750, loss[loss=0.01732, audio_tagging_loss=0.01732, over 25000.00 frames. ], tot_loss[loss=0.0172, audio_tagging_loss=0.0172, over 4946420.19 frames. ], batch size: 100, lr: 2.55e-02, grad_scale: 128.0 2023-12-21 15:38:20,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-12-21 15:38:32,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.65 vs. limit=22.5 2023-12-21 15:38:34,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=12.0 2023-12-21 15:38:34,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=107053.33333333333, ans=12.0 2023-12-21 15:38:36,107 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.47 vs. limit=15.0 2023-12-21 15:38:43,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=107120.0, ans=0.125 2023-12-21 15:38:48,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2023-12-21 15:38:53,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.25 vs. limit=15.0 2023-12-21 15:39:00,796 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.08 vs. limit=15.0 2023-12-21 15:39:04,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.88 vs. limit=10.0 2023-12-21 15:39:09,775 INFO [train.py:886] (0/4) Epoch 4, batch 1800, loss[loss=0.01796, audio_tagging_loss=0.01796, over 24750.00 frames. ], tot_loss[loss=0.01724, audio_tagging_loss=0.01724, over 4953692.44 frames. ], batch size: 99, lr: 2.55e-02, grad_scale: 128.0 2023-12-21 15:39:13,680 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.52 vs. limit=12.0 2023-12-21 15:39:19,655 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.593e+01 2.724e+01 2.959e+01 3.559e+01, threshold=5.448e+01, percent-clipped=0.0 2023-12-21 15:39:20,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=107386.66666666667, ans=0.125 2023-12-21 15:39:38,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=107453.33333333333, ans=0.125 2023-12-21 15:39:44,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2023-12-21 15:39:54,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=107586.66666666667, ans=0.1 2023-12-21 15:39:55,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=107586.66666666667, ans=0.2 2023-12-21 15:40:00,864 INFO [train.py:886] (0/4) Epoch 4, batch 1850, loss[loss=0.02281, audio_tagging_loss=0.02281, over 24950.00 frames. ], tot_loss[loss=0.01729, audio_tagging_loss=0.01729, over 4952330.20 frames. ], batch size: 100, lr: 2.54e-02, grad_scale: 128.0 2023-12-21 15:40:02,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=107653.33333333333, ans=0.125 2023-12-21 15:40:03,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=107653.33333333333, ans=0.1 2023-12-21 15:40:08,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=107653.33333333333, ans=15.0 2023-12-21 15:40:15,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.69 vs. limit=10.0 2023-12-21 15:40:22,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=107786.66666666667, ans=0.1 2023-12-21 15:40:40,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=12.0 2023-12-21 15:40:44,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=107920.0, ans=0.125 2023-12-21 15:40:51,376 INFO [train.py:886] (0/4) Epoch 4, batch 1900, loss[loss=0.01818, audio_tagging_loss=0.01818, over 24750.00 frames. ], tot_loss[loss=0.01739, audio_tagging_loss=0.01739, over 4943614.72 frames. ], batch size: 99, lr: 2.54e-02, grad_scale: 128.0 2023-12-21 15:41:00,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.24 vs. limit=6.0 2023-12-21 15:41:00,679 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.376e+01 2.677e+01 2.844e+01 3.083e+01 3.785e+01, threshold=5.688e+01, percent-clipped=0.0 2023-12-21 15:41:00,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=108053.33333333333, ans=0.125 2023-12-21 15:41:06,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=108053.33333333333, ans=0.125 2023-12-21 15:41:25,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2023-12-21 15:41:25,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=108186.66666666667, ans=0.125 2023-12-21 15:41:34,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=108253.33333333333, ans=15.0 2023-12-21 15:41:35,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=108253.33333333333, ans=0.1 2023-12-21 15:41:36,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=108253.33333333333, ans=0.2 2023-12-21 15:41:41,753 INFO [train.py:886] (0/4) Epoch 4, batch 1950, loss[loss=0.0175, audio_tagging_loss=0.0175, over 24750.00 frames. ], tot_loss[loss=0.01739, audio_tagging_loss=0.01739, over 4941880.81 frames. ], batch size: 99, lr: 2.54e-02, grad_scale: 128.0 2023-12-21 15:41:45,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2023-12-21 15:41:56,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2023-12-21 15:42:01,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=108453.33333333333, ans=0.1 2023-12-21 15:42:04,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.95 vs. limit=6.0 2023-12-21 15:42:07,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=108453.33333333333, ans=0.0 2023-12-21 15:42:09,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.64 vs. limit=22.5 2023-12-21 15:42:29,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=108586.66666666667, ans=0.125 2023-12-21 15:42:33,628 INFO [train.py:886] (0/4) Epoch 4, batch 2000, loss[loss=0.02196, audio_tagging_loss=0.02196, over 25000.00 frames. ], tot_loss[loss=0.01722, audio_tagging_loss=0.01722, over 4946882.74 frames. ], batch size: 100, lr: 2.54e-02, grad_scale: 128.0 2023-12-21 15:42:42,111 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.508e+01 2.716e+01 2.994e+01 3.930e+01, threshold=5.432e+01, percent-clipped=0.0 2023-12-21 15:42:56,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=108786.66666666667, ans=0.2 2023-12-21 15:43:10,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.28 vs. limit=15.0 2023-12-21 15:43:24,454 INFO [train.py:886] (0/4) Epoch 4, batch 2050, loss[loss=0.01577, audio_tagging_loss=0.01577, over 25000.00 frames. ], tot_loss[loss=0.01718, audio_tagging_loss=0.01718, over 4949343.86 frames. ], batch size: 100, lr: 2.53e-02, grad_scale: 128.0 2023-12-21 15:43:47,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-12-21 15:44:04,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.18 vs. limit=22.5 2023-12-21 15:44:05,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=109253.33333333333, ans=0.125 2023-12-21 15:44:13,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=109320.0, ans=0.125 2023-12-21 15:44:14,505 INFO [train.py:886] (0/4) Epoch 4, batch 2100, loss[loss=0.01639, audio_tagging_loss=0.01639, over 25000.00 frames. ], tot_loss[loss=0.01713, audio_tagging_loss=0.01713, over 4958606.37 frames. ], batch size: 100, lr: 2.53e-02, grad_scale: 128.0 2023-12-21 15:44:17,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=109320.0, ans=0.125 2023-12-21 15:44:23,905 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.560e+01 2.764e+01 2.957e+01 4.648e+01, threshold=5.528e+01, percent-clipped=0.0 2023-12-21 15:44:28,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.12 vs. limit=22.5 2023-12-21 15:44:33,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.95 vs. limit=6.0 2023-12-21 15:44:34,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.85 vs. limit=22.5 2023-12-21 15:44:47,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=109520.0, ans=0.1 2023-12-21 15:44:54,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=109586.66666666667, ans=0.0 2023-12-21 15:45:00,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=109586.66666666667, ans=0.125 2023-12-21 15:45:02,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=109586.66666666667, ans=0.0 2023-12-21 15:45:05,237 INFO [train.py:886] (0/4) Epoch 4, batch 2150, loss[loss=0.01983, audio_tagging_loss=0.01983, over 24750.00 frames. ], tot_loss[loss=0.0172, audio_tagging_loss=0.0172, over 4957498.22 frames. ], batch size: 99, lr: 2.53e-02, grad_scale: 128.0 2023-12-21 15:45:05,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.85 vs. limit=15.0 2023-12-21 15:45:18,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=109720.0, ans=0.0 2023-12-21 15:45:23,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=109720.0, ans=0.125 2023-12-21 15:45:24,748 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.63 vs. limit=6.0 2023-12-21 15:45:29,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=109786.66666666667, ans=0.125 2023-12-21 15:45:34,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=109786.66666666667, ans=0.0 2023-12-21 15:45:42,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=109853.33333333333, ans=10.0 2023-12-21 15:45:50,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109920.0, ans=0.1 2023-12-21 15:45:55,486 INFO [train.py:886] (0/4) Epoch 4, batch 2200, loss[loss=0.01891, audio_tagging_loss=0.01891, over 24750.00 frames. ], tot_loss[loss=0.01731, audio_tagging_loss=0.01731, over 4952859.15 frames. ], batch size: 99, lr: 2.52e-02, grad_scale: 128.0 2023-12-21 15:45:57,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=109986.66666666667, ans=0.0 2023-12-21 15:46:02,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=109986.66666666667, ans=0.125 2023-12-21 15:46:04,582 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 2.592e+01 2.698e+01 2.922e+01 4.235e+01, threshold=5.395e+01, percent-clipped=0.0 2023-12-21 15:46:12,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=110053.33333333333, ans=0.125 2023-12-21 15:46:31,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=110186.66666666667, ans=0.125 2023-12-21 15:46:34,275 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 15:46:45,324 INFO [train.py:886] (0/4) Epoch 4, batch 2250, loss[loss=0.0146, audio_tagging_loss=0.0146, over 24750.00 frames. ], tot_loss[loss=0.01735, audio_tagging_loss=0.01735, over 4947126.42 frames. ], batch size: 99, lr: 2.52e-02, grad_scale: 128.0 2023-12-21 15:47:23,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=110520.0, ans=0.125 2023-12-21 15:47:32,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=110586.66666666667, ans=0.0 2023-12-21 15:47:37,167 INFO [train.py:886] (0/4) Epoch 4, batch 2300, loss[loss=0.01756, audio_tagging_loss=0.01756, over 24049.00 frames. ], tot_loss[loss=0.01722, audio_tagging_loss=0.01722, over 4946824.66 frames. ], batch size: 100, lr: 2.52e-02, grad_scale: 128.0 2023-12-21 15:47:37,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-12-21 15:47:45,881 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.543e+01 2.713e+01 2.924e+01 4.099e+01, threshold=5.427e+01, percent-clipped=0.0 2023-12-21 15:47:50,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.03 vs. limit=22.5 2023-12-21 15:47:59,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=110786.66666666667, ans=0.1 2023-12-21 15:47:59,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=110786.66666666667, ans=0.125 2023-12-21 15:48:06,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=110786.66666666667, ans=0.125 2023-12-21 15:48:16,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.42 vs. limit=22.5 2023-12-21 15:48:27,596 INFO [train.py:886] (0/4) Epoch 4, batch 2350, loss[loss=0.01628, audio_tagging_loss=0.01628, over 25000.00 frames. ], tot_loss[loss=0.01723, audio_tagging_loss=0.01723, over 4946900.39 frames. ], batch size: 100, lr: 2.51e-02, grad_scale: 128.0 2023-12-21 15:48:33,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=110986.66666666667, ans=0.0 2023-12-21 15:48:34,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110986.66666666667, ans=0.1 2023-12-21 15:48:39,057 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.60 vs. limit=15.0 2023-12-21 15:48:44,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=111053.33333333333, ans=0.0 2023-12-21 15:48:48,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=111120.0, ans=0.07 2023-12-21 15:48:48,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=111120.0, ans=0.125 2023-12-21 15:48:49,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2023-12-21 15:48:51,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=111120.0, ans=0.2 2023-12-21 15:48:51,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=111120.0, ans=0.0 2023-12-21 15:48:51,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.26 vs. limit=12.0 2023-12-21 15:48:52,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111120.0, ans=0.1 2023-12-21 15:48:56,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2023-12-21 15:49:01,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=111186.66666666667, ans=0.2 2023-12-21 15:49:12,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=15.0 2023-12-21 15:49:18,926 INFO [train.py:886] (0/4) Epoch 4, batch 2400, loss[loss=0.01711, audio_tagging_loss=0.01711, over 25000.00 frames. ], tot_loss[loss=0.01715, audio_tagging_loss=0.01715, over 4948449.66 frames. ], batch size: 100, lr: 2.51e-02, grad_scale: 128.0 2023-12-21 15:49:22,228 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.88 vs. limit=22.5 2023-12-21 15:49:26,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=12.0 2023-12-21 15:49:27,413 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.500e+01 2.719e+01 2.953e+01 3.930e+01, threshold=5.438e+01, percent-clipped=0.0 2023-12-21 15:49:30,797 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.72 vs. limit=12.0 2023-12-21 15:49:33,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=111386.66666666667, ans=0.0 2023-12-21 15:49:45,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=111453.33333333333, ans=0.0 2023-12-21 15:49:46,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=111453.33333333333, ans=0.0 2023-12-21 15:49:48,874 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2023-12-21 15:49:52,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=111520.0, ans=0.0 2023-12-21 15:50:04,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=111586.66666666667, ans=0.1 2023-12-21 15:50:05,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.34 vs. limit=22.5 2023-12-21 15:50:11,409 INFO [train.py:886] (0/4) Epoch 4, batch 2450, loss[loss=0.01509, audio_tagging_loss=0.01509, over 25000.00 frames. ], tot_loss[loss=0.01724, audio_tagging_loss=0.01724, over 4957363.02 frames. ], batch size: 100, lr: 2.51e-02, grad_scale: 128.0 2023-12-21 15:50:11,599 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.030e+00 2023-12-21 15:50:21,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=111720.0, ans=0.1 2023-12-21 15:50:35,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=111786.66666666667, ans=0.125 2023-12-21 15:50:38,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=111786.66666666667, ans=0.5 2023-12-21 15:51:02,659 INFO [train.py:886] (0/4) Epoch 4, batch 2500, loss[loss=0.01835, audio_tagging_loss=0.01835, over 24750.00 frames. ], tot_loss[loss=0.01738, audio_tagging_loss=0.01738, over 4951568.07 frames. ], batch size: 99, lr: 2.50e-02, grad_scale: 128.0 2023-12-21 15:51:05,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=111986.66666666667, ans=0.1 2023-12-21 15:51:12,016 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.590e+01 2.772e+01 2.981e+01 3.773e+01, threshold=5.543e+01, percent-clipped=0.0 2023-12-21 15:51:15,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=112053.33333333333, ans=0.1 2023-12-21 15:51:15,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=112053.33333333333, ans=0.0 2023-12-21 15:51:18,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=112053.33333333333, ans=0.5 2023-12-21 15:51:30,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112120.0, ans=0.1 2023-12-21 15:51:35,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=112186.66666666667, ans=0.125 2023-12-21 15:51:37,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=112186.66666666667, ans=0.125 2023-12-21 15:51:45,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=112253.33333333333, ans=0.125 2023-12-21 15:51:56,342 INFO [train.py:886] (0/4) Epoch 4, batch 2550, loss[loss=0.01846, audio_tagging_loss=0.01846, over 24750.00 frames. ], tot_loss[loss=0.01761, audio_tagging_loss=0.01761, over 4943586.22 frames. ], batch size: 99, lr: 2.50e-02, grad_scale: 128.0 2023-12-21 15:52:05,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=112386.66666666667, ans=0.125 2023-12-21 15:52:22,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=112453.33333333333, ans=0.125 2023-12-21 15:52:22,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=112453.33333333333, ans=0.0 2023-12-21 15:52:35,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=25.92 vs. limit=15.0 2023-12-21 15:52:42,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=112586.66666666667, ans=0.125 2023-12-21 15:52:47,834 INFO [train.py:886] (0/4) Epoch 4, batch 2600, loss[loss=0.01778, audio_tagging_loss=0.01778, over 24750.00 frames. ], tot_loss[loss=0.01743, audio_tagging_loss=0.01743, over 4941900.48 frames. ], batch size: 99, lr: 2.50e-02, grad_scale: 128.0 2023-12-21 15:52:58,415 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.211e+01 2.608e+01 2.784e+01 3.013e+01 3.853e+01, threshold=5.568e+01, percent-clipped=0.0 2023-12-21 15:53:15,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=112786.66666666667, ans=0.125 2023-12-21 15:53:17,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=112853.33333333333, ans=0.2 2023-12-21 15:53:24,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=112853.33333333333, ans=0.125 2023-12-21 15:53:35,718 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.90 vs. limit=6.0 2023-12-21 15:53:40,008 INFO [train.py:886] (0/4) Epoch 4, batch 2650, loss[loss=0.01369, audio_tagging_loss=0.01369, over 25000.00 frames. ], tot_loss[loss=0.01727, audio_tagging_loss=0.01727, over 4949754.84 frames. ], batch size: 100, lr: 2.49e-02, grad_scale: 128.0 2023-12-21 15:53:41,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=112986.66666666667, ans=0.125 2023-12-21 15:53:46,159 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.20 vs. limit=15.0 2023-12-21 15:53:47,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.84 vs. limit=22.5 2023-12-21 15:53:47,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.24 vs. limit=15.0 2023-12-21 15:54:06,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2023-12-21 15:54:14,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=113186.66666666667, ans=0.125 2023-12-21 15:54:31,517 INFO [train.py:886] (0/4) Epoch 4, batch 2700, loss[loss=0.01743, audio_tagging_loss=0.01743, over 24750.00 frames. ], tot_loss[loss=0.01723, audio_tagging_loss=0.01723, over 4953389.77 frames. ], batch size: 99, lr: 2.49e-02, grad_scale: 128.0 2023-12-21 15:54:32,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=113320.0, ans=0.07 2023-12-21 15:54:34,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=113320.0, ans=0.125 2023-12-21 15:54:39,418 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 15:54:40,197 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.542e+01 2.756e+01 2.979e+01 4.286e+01, threshold=5.511e+01, percent-clipped=0.0 2023-12-21 15:54:45,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=113386.66666666667, ans=0.1 2023-12-21 15:54:50,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=113453.33333333333, ans=0.0 2023-12-21 15:55:09,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113520.0, ans=0.1 2023-12-21 15:55:11,463 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.06 vs. limit=22.5 2023-12-21 15:55:17,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=113586.66666666667, ans=0.125 2023-12-21 15:55:20,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=113653.33333333333, ans=0.5 2023-12-21 15:55:21,440 INFO [train.py:886] (0/4) Epoch 4, batch 2750, loss[loss=0.01709, audio_tagging_loss=0.01709, over 25000.00 frames. ], tot_loss[loss=0.01713, audio_tagging_loss=0.01713, over 4951842.44 frames. ], batch size: 100, lr: 2.49e-02, grad_scale: 128.0 2023-12-21 15:55:36,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=113720.0, ans=0.0 2023-12-21 15:55:49,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=113786.66666666667, ans=0.0 2023-12-21 15:55:50,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.99 vs. limit=22.5 2023-12-21 15:55:55,401 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2023-12-21 15:56:13,778 INFO [train.py:886] (0/4) Epoch 4, batch 2800, loss[loss=0.01693, audio_tagging_loss=0.01693, over 25000.00 frames. ], tot_loss[loss=0.01717, audio_tagging_loss=0.01717, over 4957631.35 frames. ], batch size: 100, lr: 2.49e-02, grad_scale: 128.0 2023-12-21 15:56:22,307 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.583e+01 2.780e+01 3.121e+01 4.159e+01, threshold=5.559e+01, percent-clipped=0.0 2023-12-21 15:56:43,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=114186.66666666667, ans=0.2 2023-12-21 15:56:46,796 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=12.0 2023-12-21 15:56:47,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=114186.66666666667, ans=0.125 2023-12-21 15:56:51,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=114186.66666666667, ans=0.07 2023-12-21 15:57:03,908 INFO [train.py:886] (0/4) Epoch 4, batch 2850, loss[loss=0.01638, audio_tagging_loss=0.01638, over 24750.00 frames. ], tot_loss[loss=0.01722, audio_tagging_loss=0.01722, over 4956440.78 frames. ], batch size: 99, lr: 2.48e-02, grad_scale: 128.0 2023-12-21 15:57:15,319 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=15.0 2023-12-21 15:57:16,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=114386.66666666667, ans=0.2 2023-12-21 15:57:20,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=114386.66666666667, ans=0.0 2023-12-21 15:57:23,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.89 vs. limit=15.0 2023-12-21 15:57:31,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=114453.33333333333, ans=0.1 2023-12-21 15:57:32,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2023-12-21 15:57:53,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.73 vs. limit=22.5 2023-12-21 15:57:55,956 INFO [train.py:886] (0/4) Epoch 4, batch 2900, loss[loss=0.01772, audio_tagging_loss=0.01772, over 24750.00 frames. ], tot_loss[loss=0.01721, audio_tagging_loss=0.01721, over 4950949.49 frames. ], batch size: 99, lr: 2.48e-02, grad_scale: 128.0 2023-12-21 15:58:01,245 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=12.0 2023-12-21 15:58:02,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=114653.33333333333, ans=0.125 2023-12-21 15:58:04,679 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.289e+01 2.617e+01 2.786e+01 3.010e+01 3.894e+01, threshold=5.572e+01, percent-clipped=0.0 2023-12-21 15:58:11,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=114720.0, ans=0.125 2023-12-21 15:58:19,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=114786.66666666667, ans=0.1 2023-12-21 15:58:19,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.73 vs. limit=22.5 2023-12-21 15:58:25,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=114853.33333333333, ans=0.025 2023-12-21 15:58:37,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=114920.0, ans=0.125 2023-12-21 15:58:37,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=114920.0, ans=0.125 2023-12-21 15:58:41,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=114920.0, ans=0.125 2023-12-21 15:58:48,194 INFO [train.py:886] (0/4) Epoch 4, batch 2950, loss[loss=0.01715, audio_tagging_loss=0.01715, over 25000.00 frames. ], tot_loss[loss=0.01705, audio_tagging_loss=0.01705, over 4951905.66 frames. ], batch size: 100, lr: 2.48e-02, grad_scale: 128.0 2023-12-21 15:59:18,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=115186.66666666667, ans=0.2 2023-12-21 15:59:30,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2023-12-21 15:59:38,322 INFO [train.py:886] (0/4) Epoch 4, batch 3000, loss[loss=0.01715, audio_tagging_loss=0.01715, over 25000.00 frames. ], tot_loss[loss=0.01698, audio_tagging_loss=0.01698, over 4959077.87 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 128.0 2023-12-21 15:59:38,324 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 15:59:59,363 INFO [train.py:917] (0/4) Epoch 4, validation: loss=0.04177, audio_tagging_loss=0.04177, over 3737520.00 frames. 2023-12-21 15:59:59,363 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 16:00:02,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=115320.0, ans=0.125 2023-12-21 16:00:07,861 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.203e+01 2.539e+01 2.719e+01 2.990e+01 3.720e+01, threshold=5.438e+01, percent-clipped=0.0 2023-12-21 16:00:20,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.39 vs. limit=22.5 2023-12-21 16:00:39,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=115586.66666666667, ans=0.0 2023-12-21 16:00:45,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=115586.66666666667, ans=0.125 2023-12-21 16:00:46,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=115586.66666666667, ans=0.125 2023-12-21 16:00:50,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=115653.33333333333, ans=0.125 2023-12-21 16:00:51,249 INFO [train.py:886] (0/4) Epoch 4, batch 3050, loss[loss=0.01566, audio_tagging_loss=0.01566, over 21495.00 frames. ], tot_loss[loss=0.01698, audio_tagging_loss=0.01698, over 4952964.91 frames. ], batch size: 107, lr: 2.47e-02, grad_scale: 128.0 2023-12-21 16:01:11,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=115786.66666666667, ans=0.125 2023-12-21 16:01:21,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=115853.33333333333, ans=0.125 2023-12-21 16:01:32,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=115920.0, ans=0.125 2023-12-21 16:01:42,100 INFO [train.py:886] (0/4) Epoch 4, batch 3100, loss[loss=0.01856, audio_tagging_loss=0.01856, over 25000.00 frames. ], tot_loss[loss=0.01704, audio_tagging_loss=0.01704, over 4957958.45 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 128.0 2023-12-21 16:01:52,025 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.579e+01 2.744e+01 2.909e+01 3.915e+01, threshold=5.487e+01, percent-clipped=0.0 2023-12-21 16:01:56,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=116053.33333333333, ans=0.0 2023-12-21 16:02:05,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=116120.0, ans=0.125 2023-12-21 16:02:06,761 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.585e+00 2023-12-21 16:02:29,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=116253.33333333333, ans=0.2 2023-12-21 16:02:34,949 INFO [train.py:886] (0/4) Epoch 4, batch 3150, loss[loss=0.0219, audio_tagging_loss=0.0219, over 24750.00 frames. ], tot_loss[loss=0.01724, audio_tagging_loss=0.01724, over 4955087.81 frames. ], batch size: 99, lr: 2.46e-02, grad_scale: 128.0 2023-12-21 16:02:36,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=116320.0, ans=0.125 2023-12-21 16:02:38,342 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.40 vs. limit=22.5 2023-12-21 16:02:45,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=116386.66666666667, ans=0.1 2023-12-21 16:03:03,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=116453.33333333333, ans=0.125 2023-12-21 16:03:11,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=116520.0, ans=0.0 2023-12-21 16:03:22,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.80 vs. limit=15.0 2023-12-21 16:03:27,446 INFO [train.py:886] (0/4) Epoch 4, batch 3200, loss[loss=0.01674, audio_tagging_loss=0.01674, over 24750.00 frames. ], tot_loss[loss=0.0172, audio_tagging_loss=0.0172, over 4952347.52 frames. ], batch size: 99, lr: 2.46e-02, grad_scale: 128.0 2023-12-21 16:03:28,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=116653.33333333333, ans=0.95 2023-12-21 16:03:29,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2023-12-21 16:03:36,016 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.555e+01 2.782e+01 3.048e+01 4.020e+01, threshold=5.565e+01, percent-clipped=0.0 2023-12-21 16:03:43,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2023-12-21 16:03:47,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=116786.66666666667, ans=0.125 2023-12-21 16:03:54,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=116786.66666666667, ans=0.2 2023-12-21 16:04:14,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=116920.0, ans=15.0 2023-12-21 16:04:18,489 INFO [train.py:886] (0/4) Epoch 4, batch 3250, loss[loss=0.01904, audio_tagging_loss=0.01904, over 25000.00 frames. ], tot_loss[loss=0.01713, audio_tagging_loss=0.01713, over 4954979.97 frames. ], batch size: 100, lr: 2.46e-02, grad_scale: 128.0 2023-12-21 16:04:38,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=117120.0, ans=0.125 2023-12-21 16:04:45,321 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2023-12-21 16:04:46,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=117120.0, ans=10.0 2023-12-21 16:04:48,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=15.0 2023-12-21 16:05:01,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=117253.33333333333, ans=0.2 2023-12-21 16:05:08,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=117253.33333333333, ans=0.125 2023-12-21 16:05:11,463 INFO [train.py:886] (0/4) Epoch 4, batch 3300, loss[loss=0.01548, audio_tagging_loss=0.01548, over 24750.00 frames. ], tot_loss[loss=0.01708, audio_tagging_loss=0.01708, over 4954189.73 frames. ], batch size: 99, lr: 2.46e-02, grad_scale: 128.0 2023-12-21 16:05:19,390 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.096e-01 2023-12-21 16:05:20,924 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.625e+01 2.805e+01 3.075e+01 3.853e+01, threshold=5.610e+01, percent-clipped=0.0 2023-12-21 16:05:21,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=117386.66666666667, ans=0.2 2023-12-21 16:05:25,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2023-12-21 16:05:26,125 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.90 vs. limit=15.0 2023-12-21 16:05:34,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=117453.33333333333, ans=0.125 2023-12-21 16:05:45,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=117520.0, ans=0.0 2023-12-21 16:05:47,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=117520.0, ans=0.125 2023-12-21 16:06:03,536 INFO [train.py:886] (0/4) Epoch 4, batch 3350, loss[loss=0.01728, audio_tagging_loss=0.01728, over 25000.00 frames. ], tot_loss[loss=0.017, audio_tagging_loss=0.017, over 4956622.79 frames. ], batch size: 100, lr: 2.45e-02, grad_scale: 128.0 2023-12-21 16:06:04,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.22 vs. limit=15.0 2023-12-21 16:06:15,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=117720.0, ans=0.125 2023-12-21 16:06:21,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=117720.0, ans=0.125 2023-12-21 16:06:33,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=117853.33333333333, ans=0.0 2023-12-21 16:06:40,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=117853.33333333333, ans=0.125 2023-12-21 16:06:54,561 INFO [train.py:886] (0/4) Epoch 4, batch 3400, loss[loss=0.01976, audio_tagging_loss=0.01976, over 24944.00 frames. ], tot_loss[loss=0.01699, audio_tagging_loss=0.01699, over 4956882.36 frames. ], batch size: 100, lr: 2.45e-02, grad_scale: 128.0 2023-12-21 16:06:54,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.65 vs. limit=22.5 2023-12-21 16:07:03,685 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.590e+01 2.740e+01 3.014e+01 4.535e+01, threshold=5.480e+01, percent-clipped=0.0 2023-12-21 16:07:14,813 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.09 vs. limit=10.0 2023-12-21 16:07:16,740 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.69 vs. limit=15.0 2023-12-21 16:07:20,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2023-12-21 16:07:22,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118120.0, ans=0.1 2023-12-21 16:07:30,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=118186.66666666667, ans=0.0 2023-12-21 16:07:43,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.60 vs. limit=10.0 2023-12-21 16:07:47,994 INFO [train.py:886] (0/4) Epoch 4, batch 3450, loss[loss=0.01885, audio_tagging_loss=0.01885, over 24750.00 frames. ], tot_loss[loss=0.01703, audio_tagging_loss=0.01703, over 4957939.32 frames. ], batch size: 99, lr: 2.45e-02, grad_scale: 128.0 2023-12-21 16:07:49,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=118320.0, ans=0.1 2023-12-21 16:07:59,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.59 vs. limit=15.0 2023-12-21 16:08:06,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.00 vs. limit=22.5 2023-12-21 16:08:10,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=118453.33333333333, ans=0.125 2023-12-21 16:08:10,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.88 vs. limit=15.0 2023-12-21 16:08:11,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=118453.33333333333, ans=10.0 2023-12-21 16:08:36,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=118586.66666666667, ans=0.125 2023-12-21 16:08:36,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=118586.66666666667, ans=0.0 2023-12-21 16:08:38,279 INFO [train.py:886] (0/4) Epoch 4, batch 3500, loss[loss=0.01982, audio_tagging_loss=0.01982, over 24750.00 frames. ], tot_loss[loss=0.01703, audio_tagging_loss=0.01703, over 4954181.97 frames. ], batch size: 99, lr: 2.44e-02, grad_scale: 128.0 2023-12-21 16:08:43,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=118653.33333333333, ans=0.125 2023-12-21 16:08:48,902 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+01 2.564e+01 2.726e+01 3.075e+01 4.208e+01, threshold=5.453e+01, percent-clipped=0.0 2023-12-21 16:08:54,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2023-12-21 16:08:54,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=118720.0, ans=0.125 2023-12-21 16:08:59,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.26 vs. limit=15.0 2023-12-21 16:09:10,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=118853.33333333333, ans=0.2 2023-12-21 16:09:16,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=118853.33333333333, ans=0.0 2023-12-21 16:09:30,643 INFO [train.py:886] (0/4) Epoch 4, batch 3550, loss[loss=0.01741, audio_tagging_loss=0.01741, over 24750.00 frames. ], tot_loss[loss=0.01697, audio_tagging_loss=0.01697, over 4951673.99 frames. ], batch size: 99, lr: 2.44e-02, grad_scale: 128.0 2023-12-21 16:09:32,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=118986.66666666667, ans=0.0 2023-12-21 16:09:39,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=119053.33333333333, ans=0.125 2023-12-21 16:09:43,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=119053.33333333333, ans=0.2 2023-12-21 16:09:53,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=119120.0, ans=0.125 2023-12-21 16:10:04,278 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.20 vs. limit=15.0 2023-12-21 16:10:17,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=119253.33333333333, ans=0.0 2023-12-21 16:10:19,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=119253.33333333333, ans=0.125 2023-12-21 16:10:22,227 INFO [train.py:886] (0/4) Epoch 4, batch 3600, loss[loss=0.01588, audio_tagging_loss=0.01588, over 25000.00 frames. ], tot_loss[loss=0.01691, audio_tagging_loss=0.01691, over 4957573.13 frames. ], batch size: 100, lr: 2.44e-02, grad_scale: 128.0 2023-12-21 16:10:26,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=119320.0, ans=0.2 2023-12-21 16:10:29,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=119320.0, ans=0.125 2023-12-21 16:10:32,448 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.504e+01 2.701e+01 2.952e+01 4.327e+01, threshold=5.401e+01, percent-clipped=0.0 2023-12-21 16:10:35,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=119386.66666666667, ans=0.04949747468305833 2023-12-21 16:10:55,502 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.15 vs. limit=10.0 2023-12-21 16:11:00,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=119520.0, ans=0.125 2023-12-21 16:11:12,783 INFO [train.py:886] (0/4) Epoch 4, batch 3650, loss[loss=0.01444, audio_tagging_loss=0.01444, over 25000.00 frames. ], tot_loss[loss=0.01676, audio_tagging_loss=0.01676, over 4958830.46 frames. ], batch size: 100, lr: 2.43e-02, grad_scale: 128.0 2023-12-21 16:11:14,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=119653.33333333333, ans=0.2 2023-12-21 16:11:18,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.17 vs. limit=15.0 2023-12-21 16:11:22,276 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-12-21 16:11:22,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=119720.0, ans=0.0 2023-12-21 16:11:26,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.60 vs. limit=22.5 2023-12-21 16:11:26,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=119720.0, ans=0.0 2023-12-21 16:11:47,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119853.33333333333, ans=0.1 2023-12-21 16:11:51,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=119853.33333333333, ans=0.0 2023-12-21 16:12:04,965 INFO [train.py:886] (0/4) Epoch 4, batch 3700, loss[loss=0.01973, audio_tagging_loss=0.01973, over 25000.00 frames. ], tot_loss[loss=0.01672, audio_tagging_loss=0.01672, over 4953627.60 frames. ], batch size: 100, lr: 2.43e-02, grad_scale: 128.0 2023-12-21 16:12:05,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=119986.66666666667, ans=0.05 2023-12-21 16:12:14,755 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.251e+01 2.607e+01 2.781e+01 3.058e+01 3.851e+01, threshold=5.562e+01, percent-clipped=0.0 2023-12-21 16:12:22,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=120053.33333333333, ans=0.2 2023-12-21 16:12:41,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=120186.66666666667, ans=0.09899494936611666 2023-12-21 16:12:44,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=120186.66666666667, ans=0.125 2023-12-21 16:12:44,992 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.401e+01 2023-12-21 16:12:45,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=120253.33333333333, ans=0.125 2023-12-21 16:12:48,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=120253.33333333333, ans=0.125 2023-12-21 16:12:53,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.30 vs. limit=22.5 2023-12-21 16:12:55,109 INFO [train.py:886] (0/4) Epoch 4, batch 3750, loss[loss=0.02028, audio_tagging_loss=0.02028, over 24750.00 frames. ], tot_loss[loss=0.01696, audio_tagging_loss=0.01696, over 4951417.62 frames. ], batch size: 99, lr: 2.43e-02, grad_scale: 128.0 2023-12-21 16:12:56,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=15.0 2023-12-21 16:13:00,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=120320.0, ans=0.04949747468305833 2023-12-21 16:13:05,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2023-12-21 16:13:11,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=120386.66666666667, ans=0.125 2023-12-21 16:13:17,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=120453.33333333333, ans=0.125 2023-12-21 16:13:32,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=15.0 2023-12-21 16:13:34,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=120520.0, ans=0.125 2023-12-21 16:13:35,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=120520.0, ans=0.0 2023-12-21 16:13:45,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=120653.33333333333, ans=0.04949747468305833 2023-12-21 16:13:46,565 INFO [train.py:886] (0/4) Epoch 4, batch 3800, loss[loss=0.01777, audio_tagging_loss=0.01777, over 24750.00 frames. ], tot_loss[loss=0.01713, audio_tagging_loss=0.01713, over 4945171.85 frames. ], batch size: 99, lr: 2.43e-02, grad_scale: 128.0 2023-12-21 16:13:56,038 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 2.567e+01 2.797e+01 3.040e+01 4.165e+01, threshold=5.595e+01, percent-clipped=0.0 2023-12-21 16:13:56,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=120720.0, ans=0.2 2023-12-21 16:13:56,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2023-12-21 16:13:59,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=120720.0, ans=0.0 2023-12-21 16:14:02,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.25 vs. limit=22.5 2023-12-21 16:14:05,906 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.542e+00 2023-12-21 16:14:18,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=17.62 vs. limit=15.0 2023-12-21 16:14:24,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.23 vs. limit=22.5 2023-12-21 16:14:25,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=120920.0, ans=0.07 2023-12-21 16:14:34,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=120920.0, ans=0.0 2023-12-21 16:14:38,124 INFO [train.py:886] (0/4) Epoch 4, batch 3850, loss[loss=0.01619, audio_tagging_loss=0.01619, over 25000.00 frames. ], tot_loss[loss=0.01697, audio_tagging_loss=0.01697, over 4946417.62 frames. ], batch size: 100, lr: 2.42e-02, grad_scale: 128.0 2023-12-21 16:14:58,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=121120.0, ans=0.1 2023-12-21 16:15:06,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2023-12-21 16:15:08,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.17 vs. limit=22.5 2023-12-21 16:15:12,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=121186.66666666667, ans=0.1 2023-12-21 16:15:14,493 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-12-21 16:15:28,374 INFO [train.py:886] (0/4) Epoch 4, batch 3900, loss[loss=0.01693, audio_tagging_loss=0.01693, over 25000.00 frames. ], tot_loss[loss=0.0169, audio_tagging_loss=0.0169, over 4947984.70 frames. ], batch size: 100, lr: 2.42e-02, grad_scale: 128.0 2023-12-21 16:15:35,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2023-12-21 16:15:39,372 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.569e+01 2.731e+01 2.970e+01 3.861e+01, threshold=5.462e+01, percent-clipped=0.0 2023-12-21 16:16:02,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=121520.0, ans=0.0 2023-12-21 16:16:13,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=121586.66666666667, ans=0.5 2023-12-21 16:16:19,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=121586.66666666667, ans=0.125 2023-12-21 16:16:21,507 INFO [train.py:886] (0/4) Epoch 4, batch 3950, loss[loss=0.01537, audio_tagging_loss=0.01537, over 25000.00 frames. ], tot_loss[loss=0.01695, audio_tagging_loss=0.01695, over 4947759.71 frames. ], batch size: 100, lr: 2.42e-02, grad_scale: 128.0 2023-12-21 16:16:23,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=121653.33333333333, ans=0.0 2023-12-21 16:16:23,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=121653.33333333333, ans=0.125 2023-12-21 16:16:28,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=121653.33333333333, ans=0.0 2023-12-21 16:16:36,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=121720.0, ans=0.1 2023-12-21 16:16:37,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=121720.0, ans=0.0 2023-12-21 16:16:39,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=121720.0, ans=0.125 2023-12-21 16:16:41,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-12-21 16:16:50,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=121786.66666666667, ans=0.125 2023-12-21 16:17:12,381 INFO [train.py:886] (0/4) Epoch 4, batch 4000, loss[loss=0.01792, audio_tagging_loss=0.01792, over 25000.00 frames. ], tot_loss[loss=0.01701, audio_tagging_loss=0.01701, over 4950468.09 frames. ], batch size: 100, lr: 2.41e-02, grad_scale: 128.0 2023-12-21 16:17:14,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.67 vs. limit=15.0 2023-12-21 16:17:23,136 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.583e+01 2.753e+01 2.897e+01 3.593e+01, threshold=5.506e+01, percent-clipped=0.0 2023-12-21 16:17:27,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.37 vs. limit=15.0 2023-12-21 16:17:36,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=122120.0, ans=0.1 2023-12-21 16:17:38,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=122120.0, ans=0.125 2023-12-21 16:17:38,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=122120.0, ans=15.0 2023-12-21 16:17:39,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.74 vs. limit=15.0 2023-12-21 16:17:59,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=122253.33333333333, ans=0.0 2023-12-21 16:18:03,842 INFO [train.py:886] (0/4) Epoch 4, batch 4050, loss[loss=0.01936, audio_tagging_loss=0.01936, over 24750.00 frames. ], tot_loss[loss=0.01709, audio_tagging_loss=0.01709, over 4957398.86 frames. ], batch size: 99, lr: 2.41e-02, grad_scale: 128.0 2023-12-21 16:18:32,527 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.033e+00 2023-12-21 16:18:41,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=122520.0, ans=0.1 2023-12-21 16:18:56,103 INFO [train.py:886] (0/4) Epoch 4, batch 4100, loss[loss=0.02116, audio_tagging_loss=0.02116, over 24750.00 frames. ], tot_loss[loss=0.01715, audio_tagging_loss=0.01715, over 4953653.75 frames. ], batch size: 99, lr: 2.41e-02, grad_scale: 128.0 2023-12-21 16:19:06,544 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.561e+01 2.807e+01 3.053e+01 3.767e+01, threshold=5.614e+01, percent-clipped=0.0 2023-12-21 16:19:12,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=122720.0, ans=0.125 2023-12-21 16:19:27,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=122853.33333333333, ans=0.2 2023-12-21 16:19:32,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2023-12-21 16:19:47,652 INFO [train.py:886] (0/4) Epoch 4, batch 4150, loss[loss=0.0176, audio_tagging_loss=0.0176, over 24013.00 frames. ], tot_loss[loss=0.01722, audio_tagging_loss=0.01722, over 4954804.48 frames. ], batch size: 100, lr: 2.41e-02, grad_scale: 128.0 2023-12-21 16:19:53,638 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.30 vs. limit=15.0 2023-12-21 16:20:07,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=123053.33333333333, ans=0.125 2023-12-21 16:20:24,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=123186.66666666667, ans=10.0 2023-12-21 16:20:33,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=123253.33333333333, ans=0.125 2023-12-21 16:20:34,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=123253.33333333333, ans=0.0 2023-12-21 16:20:40,631 INFO [train.py:886] (0/4) Epoch 4, batch 4200, loss[loss=0.01605, audio_tagging_loss=0.01605, over 25000.00 frames. ], tot_loss[loss=0.01707, audio_tagging_loss=0.01707, over 4960479.93 frames. ], batch size: 100, lr: 2.40e-02, grad_scale: 128.0 2023-12-21 16:20:44,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2023-12-21 16:20:50,051 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.575e+01 2.802e+01 3.033e+01 3.875e+01, threshold=5.604e+01, percent-clipped=0.0 2023-12-21 16:21:05,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=123453.33333333333, ans=0.0 2023-12-21 16:21:10,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=123453.33333333333, ans=0.1 2023-12-21 16:21:12,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=123520.0, ans=0.125 2023-12-21 16:21:19,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=123520.0, ans=0.125 2023-12-21 16:21:23,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=123586.66666666667, ans=0.2 2023-12-21 16:21:32,183 INFO [train.py:886] (0/4) Epoch 4, batch 4250, loss[loss=0.0155, audio_tagging_loss=0.0155, over 24750.00 frames. ], tot_loss[loss=0.01704, audio_tagging_loss=0.01704, over 4958234.06 frames. ], batch size: 99, lr: 2.40e-02, grad_scale: 128.0 2023-12-21 16:21:32,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=123653.33333333333, ans=15.0 2023-12-21 16:21:36,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2023-12-21 16:21:56,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=123786.66666666667, ans=0.125 2023-12-21 16:22:00,983 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.588e+00 2023-12-21 16:22:02,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=123853.33333333333, ans=0.125 2023-12-21 16:22:09,106 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=9.394e-01 2023-12-21 16:22:11,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.73 vs. limit=15.0 2023-12-21 16:22:11,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2023-12-21 16:22:23,668 INFO [train.py:886] (0/4) Epoch 4, batch 4300, loss[loss=0.0154, audio_tagging_loss=0.0154, over 25000.00 frames. ], tot_loss[loss=0.01703, audio_tagging_loss=0.01703, over 4954946.26 frames. ], batch size: 100, lr: 2.40e-02, grad_scale: 128.0 2023-12-21 16:22:33,904 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 2.616e+01 2.824e+01 3.044e+01 4.145e+01, threshold=5.649e+01, percent-clipped=0.0 2023-12-21 16:22:34,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=124053.33333333333, ans=0.0 2023-12-21 16:22:35,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.66 vs. limit=15.0 2023-12-21 16:22:38,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2023-12-21 16:22:42,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=124053.33333333333, ans=0.015 2023-12-21 16:22:55,040 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.705e+00 2023-12-21 16:23:15,658 INFO [train.py:886] (0/4) Epoch 4, batch 4350, loss[loss=0.01498, audio_tagging_loss=0.01498, over 24750.00 frames. ], tot_loss[loss=0.01708, audio_tagging_loss=0.01708, over 4960157.65 frames. ], batch size: 99, lr: 2.40e-02, grad_scale: 128.0 2023-12-21 16:23:20,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=124320.0, ans=0.1 2023-12-21 16:23:23,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=124320.0, ans=0.125 2023-12-21 16:23:28,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=124386.66666666667, ans=0.0 2023-12-21 16:23:30,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=124386.66666666667, ans=0.125 2023-12-21 16:23:37,416 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2023-12-21 16:23:41,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=124453.33333333333, ans=0.125 2023-12-21 16:23:43,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=124453.33333333333, ans=0.125 2023-12-21 16:24:04,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=124586.66666666667, ans=0.125 2023-12-21 16:24:07,053 INFO [train.py:886] (0/4) Epoch 4, batch 4400, loss[loss=0.01871, audio_tagging_loss=0.01871, over 24750.00 frames. ], tot_loss[loss=0.01721, audio_tagging_loss=0.01721, over 4949070.11 frames. ], batch size: 99, lr: 2.39e-02, grad_scale: 128.0 2023-12-21 16:24:11,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=124653.33333333333, ans=0.125 2023-12-21 16:24:15,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=124653.33333333333, ans=0.2 2023-12-21 16:24:15,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=124653.33333333333, ans=0.07 2023-12-21 16:24:17,861 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 2.690e+01 2.854e+01 3.099e+01 4.055e+01, threshold=5.708e+01, percent-clipped=0.0 2023-12-21 16:24:31,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=124786.66666666667, ans=0.0 2023-12-21 16:24:58,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.49 vs. limit=15.0 2023-12-21 16:24:58,563 INFO [train.py:886] (0/4) Epoch 4, batch 4450, loss[loss=0.01719, audio_tagging_loss=0.01719, over 25000.00 frames. ], tot_loss[loss=0.01725, audio_tagging_loss=0.01725, over 4947382.88 frames. ], batch size: 100, lr: 2.39e-02, grad_scale: 128.0 2023-12-21 16:25:04,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=124986.66666666667, ans=0.2 2023-12-21 16:25:06,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=124986.66666666667, ans=0.2 2023-12-21 16:25:06,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=124986.66666666667, ans=0.125 2023-12-21 16:25:16,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.65 vs. limit=10.0 2023-12-21 16:25:23,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=125120.0, ans=0.0 2023-12-21 16:25:27,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=125120.0, ans=0.2 2023-12-21 16:25:37,675 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.86 vs. limit=22.5 2023-12-21 16:25:40,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=125253.33333333333, ans=0.125 2023-12-21 16:25:40,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=15.0 2023-12-21 16:25:45,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.49 vs. limit=15.0 2023-12-21 16:25:51,700 INFO [train.py:886] (0/4) Epoch 4, batch 4500, loss[loss=0.01542, audio_tagging_loss=0.01542, over 24750.00 frames. ], tot_loss[loss=0.01708, audio_tagging_loss=0.01708, over 4948814.82 frames. ], batch size: 99, lr: 2.39e-02, grad_scale: 128.0 2023-12-21 16:26:01,671 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.563e+01 2.777e+01 3.038e+01 3.654e+01, threshold=5.554e+01, percent-clipped=0.0 2023-12-21 16:26:43,101 INFO [train.py:886] (0/4) Epoch 4, batch 4550, loss[loss=0.01628, audio_tagging_loss=0.01628, over 24750.00 frames. ], tot_loss[loss=0.01701, audio_tagging_loss=0.01701, over 4952566.95 frames. ], batch size: 99, lr: 2.38e-02, grad_scale: 128.0 2023-12-21 16:26:46,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=125653.33333333333, ans=0.0 2023-12-21 16:26:54,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=125720.0, ans=0.0 2023-12-21 16:27:28,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=125920.0, ans=0.125 2023-12-21 16:27:35,638 INFO [train.py:886] (0/4) Epoch 4, batch 4600, loss[loss=0.01705, audio_tagging_loss=0.01705, over 25000.00 frames. ], tot_loss[loss=0.017, audio_tagging_loss=0.017, over 4958557.25 frames. ], batch size: 100, lr: 2.38e-02, grad_scale: 128.0 2023-12-21 16:27:36,076 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.60 vs. limit=22.5 2023-12-21 16:27:40,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=125986.66666666667, ans=0.125 2023-12-21 16:27:43,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.29 vs. limit=22.5 2023-12-21 16:27:45,210 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.614e+01 2.813e+01 2.992e+01 3.999e+01, threshold=5.626e+01, percent-clipped=0.0 2023-12-21 16:27:56,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=126120.0, ans=0.1 2023-12-21 16:28:02,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=126120.0, ans=0.2 2023-12-21 16:28:21,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=126253.33333333333, ans=0.1 2023-12-21 16:28:23,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2023-12-21 16:28:27,568 INFO [train.py:886] (0/4) Epoch 4, batch 4650, loss[loss=0.01729, audio_tagging_loss=0.01729, over 25000.00 frames. ], tot_loss[loss=0.01705, audio_tagging_loss=0.01705, over 4963467.29 frames. ], batch size: 100, lr: 2.38e-02, grad_scale: 128.0 2023-12-21 16:28:33,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=126320.0, ans=0.0 2023-12-21 16:28:34,072 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2023-12-21 16:28:36,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=126320.0, ans=0.125 2023-12-21 16:28:52,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=126453.33333333333, ans=0.07 2023-12-21 16:28:55,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=126453.33333333333, ans=0.125 2023-12-21 16:28:57,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=126520.0, ans=0.0 2023-12-21 16:29:03,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=126520.0, ans=0.0 2023-12-21 16:29:14,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=126586.66666666667, ans=0.125 2023-12-21 16:29:18,464 INFO [train.py:886] (0/4) Epoch 4, batch 4700, loss[loss=0.01673, audio_tagging_loss=0.01673, over 24750.00 frames. ], tot_loss[loss=0.01723, audio_tagging_loss=0.01723, over 4955473.70 frames. ], batch size: 99, lr: 2.38e-02, grad_scale: 128.0 2023-12-21 16:29:27,723 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 2.637e+01 2.864e+01 3.107e+01 3.954e+01, threshold=5.728e+01, percent-clipped=0.0 2023-12-21 16:29:30,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=15.0 2023-12-21 16:29:41,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=126786.66666666667, ans=0.07 2023-12-21 16:29:47,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.42 vs. limit=15.0 2023-12-21 16:30:05,315 INFO [train.py:886] (0/4) Epoch 4, batch 4750, loss[loss=0.01862, audio_tagging_loss=0.01862, over 24750.00 frames. ], tot_loss[loss=0.01743, audio_tagging_loss=0.01743, over 4949574.95 frames. ], batch size: 99, lr: 2.37e-02, grad_scale: 128.0 2023-12-21 16:30:16,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=127053.33333333333, ans=0.1 2023-12-21 16:30:20,859 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-4.pt 2023-12-21 16:30:42,774 INFO [train.py:886] (0/4) Epoch 5, batch 0, loss[loss=0.04636, audio_tagging_loss=0.04636, over 21066.00 frames. ], tot_loss[loss=0.04636, audio_tagging_loss=0.04636, over 21066.00 frames. ], batch size: 107, lr: 2.21e-02, grad_scale: 128.0 2023-12-21 16:30:42,776 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 16:31:04,473 INFO [train.py:917] (0/4) Epoch 5, validation: loss=0.03772, audio_tagging_loss=0.03772, over 3737520.00 frames. 2023-12-21 16:31:04,473 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 16:31:17,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=127160.0, ans=0.125 2023-12-21 16:31:23,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=127226.66666666667, ans=0.125 2023-12-21 16:31:32,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=127293.33333333333, ans=0.05 2023-12-21 16:31:34,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=127293.33333333333, ans=0.0 2023-12-21 16:31:37,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=127293.33333333333, ans=0.1 2023-12-21 16:31:47,042 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.748e+01 3.184e+01 3.727e+01 1.037e+02, threshold=6.368e+01, percent-clipped=5.0 2023-12-21 16:31:52,722 INFO [train.py:886] (0/4) Epoch 5, batch 50, loss[loss=0.02143, audio_tagging_loss=0.02143, over 25000.00 frames. ], tot_loss[loss=0.02736, audio_tagging_loss=0.02736, over 1116358.62 frames. ], batch size: 100, lr: 2.21e-02, grad_scale: 128.0 2023-12-21 16:32:12,601 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.37 vs. limit=10.0 2023-12-21 16:32:16,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=127560.0, ans=0.125 2023-12-21 16:32:28,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=127626.66666666667, ans=0.0 2023-12-21 16:32:31,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=127693.33333333333, ans=0.0 2023-12-21 16:32:32,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=127693.33333333333, ans=0.0 2023-12-21 16:32:35,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=127693.33333333333, ans=0.2 2023-12-21 16:32:43,318 INFO [train.py:886] (0/4) Epoch 5, batch 100, loss[loss=0.01823, audio_tagging_loss=0.01823, over 25000.00 frames. ], tot_loss[loss=0.02345, audio_tagging_loss=0.02345, over 1972127.08 frames. ], batch size: 100, lr: 2.20e-02, grad_scale: 128.0 2023-12-21 16:32:57,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=33.04 vs. limit=22.5 2023-12-21 16:32:59,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=127826.66666666667, ans=0.0 2023-12-21 16:33:03,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=127893.33333333333, ans=0.05 2023-12-21 16:33:06,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=127893.33333333333, ans=0.125 2023-12-21 16:33:06,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.15 vs. limit=15.0 2023-12-21 16:33:12,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=127960.0, ans=0.2 2023-12-21 16:33:26,370 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.735e+01 2.959e+01 3.135e+01 3.807e+01, threshold=5.918e+01, percent-clipped=0.0 2023-12-21 16:33:26,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=128026.66666666667, ans=0.0 2023-12-21 16:33:27,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=12.0 2023-12-21 16:33:32,078 INFO [train.py:886] (0/4) Epoch 5, batch 150, loss[loss=0.01937, audio_tagging_loss=0.01937, over 25000.00 frames. ], tot_loss[loss=0.02121, audio_tagging_loss=0.02121, over 2633776.02 frames. ], batch size: 100, lr: 2.20e-02, grad_scale: 128.0 2023-12-21 16:33:46,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128160.0, ans=0.1 2023-12-21 16:33:56,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=128226.66666666667, ans=0.2 2023-12-21 16:33:59,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=128226.66666666667, ans=0.0 2023-12-21 16:34:15,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=128360.0, ans=0.125 2023-12-21 16:34:19,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=128360.0, ans=0.125 2023-12-21 16:34:19,078 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.134e+00 2023-12-21 16:34:20,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=128360.0, ans=0.07 2023-12-21 16:34:23,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.14 vs. limit=12.0 2023-12-21 16:34:23,635 INFO [train.py:886] (0/4) Epoch 5, batch 200, loss[loss=0.01628, audio_tagging_loss=0.01628, over 25000.00 frames. ], tot_loss[loss=0.01972, audio_tagging_loss=0.01972, over 3155248.52 frames. ], batch size: 100, lr: 2.20e-02, grad_scale: 128.0 2023-12-21 16:34:24,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2023-12-21 16:34:25,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2023-12-21 16:34:32,943 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2023-12-21 16:34:35,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=128493.33333333333, ans=0.09899494936611666 2023-12-21 16:34:46,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=128560.0, ans=0.025 2023-12-21 16:34:51,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=128560.0, ans=0.125 2023-12-21 16:34:57,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.00 vs. limit=15.0 2023-12-21 16:35:07,582 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.571e+01 2.693e+01 2.979e+01 3.922e+01, threshold=5.386e+01, percent-clipped=0.0 2023-12-21 16:35:09,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128693.33333333333, ans=0.1 2023-12-21 16:35:13,420 INFO [train.py:886] (0/4) Epoch 5, batch 250, loss[loss=0.0162, audio_tagging_loss=0.0162, over 25000.00 frames. ], tot_loss[loss=0.01887, audio_tagging_loss=0.01887, over 3557593.86 frames. ], batch size: 100, lr: 2.20e-02, grad_scale: 128.0 2023-12-21 16:35:19,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.00 vs. limit=22.5 2023-12-21 16:36:04,545 INFO [train.py:886] (0/4) Epoch 5, batch 300, loss[loss=0.01672, audio_tagging_loss=0.01672, over 24750.00 frames. ], tot_loss[loss=0.01842, audio_tagging_loss=0.01842, over 3868300.49 frames. ], batch size: 99, lr: 2.19e-02, grad_scale: 128.0 2023-12-21 16:36:13,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2023-12-21 16:36:19,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2023-12-21 16:36:42,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.29 vs. limit=22.5 2023-12-21 16:36:50,050 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.527e+01 2.747e+01 2.947e+01 3.578e+01, threshold=5.493e+01, percent-clipped=0.0 2023-12-21 16:36:56,498 INFO [train.py:886] (0/4) Epoch 5, batch 350, loss[loss=0.01526, audio_tagging_loss=0.01526, over 24750.00 frames. ], tot_loss[loss=0.01806, audio_tagging_loss=0.01806, over 4105235.46 frames. ], batch size: 99, lr: 2.19e-02, grad_scale: 128.0 2023-12-21 16:37:10,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=129493.33333333333, ans=0.125 2023-12-21 16:37:30,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=129626.66666666667, ans=0.125 2023-12-21 16:37:30,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=129626.66666666667, ans=0.0 2023-12-21 16:37:44,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=129693.33333333333, ans=0.125 2023-12-21 16:37:46,220 INFO [train.py:886] (0/4) Epoch 5, batch 400, loss[loss=0.01549, audio_tagging_loss=0.01549, over 25000.00 frames. ], tot_loss[loss=0.01767, audio_tagging_loss=0.01767, over 4292279.35 frames. ], batch size: 100, lr: 2.19e-02, grad_scale: 128.0 2023-12-21 16:38:10,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.06 vs. limit=15.0 2023-12-21 16:38:13,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=129893.33333333333, ans=0.0 2023-12-21 16:38:31,456 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.542e+01 2.726e+01 2.924e+01 4.010e+01, threshold=5.453e+01, percent-clipped=0.0 2023-12-21 16:38:37,873 INFO [train.py:886] (0/4) Epoch 5, batch 450, loss[loss=0.01453, audio_tagging_loss=0.01453, over 25000.00 frames. ], tot_loss[loss=0.01731, audio_tagging_loss=0.01731, over 4438054.46 frames. ], batch size: 100, lr: 2.19e-02, grad_scale: 128.0 2023-12-21 16:38:48,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.17 vs. limit=22.5 2023-12-21 16:39:02,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2023-12-21 16:39:19,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=130360.0, ans=0.125 2023-12-21 16:39:27,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=130360.0, ans=0.125 2023-12-21 16:39:28,839 INFO [train.py:886] (0/4) Epoch 5, batch 500, loss[loss=0.01702, audio_tagging_loss=0.01702, over 25000.00 frames. ], tot_loss[loss=0.0172, audio_tagging_loss=0.0172, over 4551347.68 frames. ], batch size: 100, lr: 2.18e-02, grad_scale: 128.0 2023-12-21 16:39:29,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=130426.66666666667, ans=0.125 2023-12-21 16:39:30,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=130426.66666666667, ans=0.1 2023-12-21 16:39:40,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=130493.33333333333, ans=0.2 2023-12-21 16:39:47,079 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2023-12-21 16:40:06,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=130626.66666666667, ans=0.0 2023-12-21 16:40:13,611 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.089e+01 2.467e+01 2.664e+01 2.880e+01 3.369e+01, threshold=5.329e+01, percent-clipped=0.0 2023-12-21 16:40:19,479 INFO [train.py:886] (0/4) Epoch 5, batch 550, loss[loss=0.0149, audio_tagging_loss=0.0149, over 24750.00 frames. ], tot_loss[loss=0.01708, audio_tagging_loss=0.01708, over 4641557.74 frames. ], batch size: 99, lr: 2.18e-02, grad_scale: 128.0 2023-12-21 16:40:22,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=130760.0, ans=0.1 2023-12-21 16:40:30,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=130826.66666666667, ans=0.1 2023-12-21 16:40:38,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=130826.66666666667, ans=0.125 2023-12-21 16:41:02,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=131026.66666666667, ans=0.125 2023-12-21 16:41:03,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=131026.66666666667, ans=0.0 2023-12-21 16:41:05,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.78 vs. limit=10.0 2023-12-21 16:41:10,427 INFO [train.py:886] (0/4) Epoch 5, batch 600, loss[loss=0.01867, audio_tagging_loss=0.01867, over 24750.00 frames. ], tot_loss[loss=0.01708, audio_tagging_loss=0.01708, over 4712089.07 frames. ], batch size: 99, lr: 2.18e-02, grad_scale: 128.0 2023-12-21 16:41:33,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.03 vs. limit=22.5 2023-12-21 16:41:40,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.03 vs. limit=22.5 2023-12-21 16:41:44,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=131293.33333333334, ans=0.0 2023-12-21 16:41:53,918 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.178e+01 2.543e+01 2.791e+01 2.900e+01 3.878e+01, threshold=5.581e+01, percent-clipped=0.0 2023-12-21 16:42:00,248 INFO [train.py:886] (0/4) Epoch 5, batch 650, loss[loss=0.01812, audio_tagging_loss=0.01812, over 24750.00 frames. ], tot_loss[loss=0.01721, audio_tagging_loss=0.01721, over 4759460.05 frames. ], batch size: 99, lr: 2.18e-02, grad_scale: 128.0 2023-12-21 16:42:02,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=131426.66666666666, ans=0.125 2023-12-21 16:42:10,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=131493.33333333334, ans=0.0 2023-12-21 16:42:23,873 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.534e-02 2023-12-21 16:42:34,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.36 vs. limit=15.0 2023-12-21 16:42:36,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=131626.66666666666, ans=0.125 2023-12-21 16:42:50,308 INFO [train.py:886] (0/4) Epoch 5, batch 700, loss[loss=0.01536, audio_tagging_loss=0.01536, over 25000.00 frames. ], tot_loss[loss=0.01717, audio_tagging_loss=0.01717, over 4799245.79 frames. ], batch size: 100, lr: 2.18e-02, grad_scale: 128.0 2023-12-21 16:42:57,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131760.0, ans=0.1 2023-12-21 16:43:08,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=131826.66666666666, ans=0.125 2023-12-21 16:43:14,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2023-12-21 16:43:24,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=131960.0, ans=0.125 2023-12-21 16:43:35,716 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 2.570e+01 2.719e+01 2.922e+01 3.851e+01, threshold=5.438e+01, percent-clipped=0.0 2023-12-21 16:43:41,358 INFO [train.py:886] (0/4) Epoch 5, batch 750, loss[loss=0.01892, audio_tagging_loss=0.01892, over 25000.00 frames. ], tot_loss[loss=0.01708, audio_tagging_loss=0.01708, over 4830027.70 frames. ], batch size: 100, lr: 2.17e-02, grad_scale: 128.0 2023-12-21 16:43:56,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=132160.0, ans=0.1 2023-12-21 16:44:02,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2023-12-21 16:44:16,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=132293.33333333334, ans=0.125 2023-12-21 16:44:20,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=12.0 2023-12-21 16:44:31,578 INFO [train.py:886] (0/4) Epoch 5, batch 800, loss[loss=0.01708, audio_tagging_loss=0.01708, over 24750.00 frames. ], tot_loss[loss=0.01699, audio_tagging_loss=0.01699, over 4862511.50 frames. ], batch size: 99, lr: 2.17e-02, grad_scale: 128.0 2023-12-21 16:44:37,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=132426.66666666666, ans=0.125 2023-12-21 16:44:39,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=132426.66666666666, ans=0.2 2023-12-21 16:44:43,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=132493.33333333334, ans=0.125 2023-12-21 16:44:45,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=132493.33333333334, ans=0.0 2023-12-21 16:44:53,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=132560.0, ans=0.125 2023-12-21 16:45:16,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=132693.33333333334, ans=0.1 2023-12-21 16:45:18,288 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.636e+01 2.805e+01 3.064e+01 4.001e+01, threshold=5.610e+01, percent-clipped=0.0 2023-12-21 16:45:23,040 INFO [train.py:886] (0/4) Epoch 5, batch 850, loss[loss=0.01411, audio_tagging_loss=0.01411, over 24750.00 frames. ], tot_loss[loss=0.01688, audio_tagging_loss=0.01688, over 4887637.32 frames. ], batch size: 99, lr: 2.17e-02, grad_scale: 128.0 2023-12-21 16:45:41,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=132826.66666666666, ans=0.1 2023-12-21 16:45:49,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=132893.33333333334, ans=0.125 2023-12-21 16:45:49,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=15.0 2023-12-21 16:45:51,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=132893.33333333334, ans=0.125 2023-12-21 16:46:14,105 INFO [train.py:886] (0/4) Epoch 5, batch 900, loss[loss=0.01733, audio_tagging_loss=0.01733, over 24750.00 frames. ], tot_loss[loss=0.01695, audio_tagging_loss=0.01695, over 4904189.00 frames. ], batch size: 99, lr: 2.17e-02, grad_scale: 128.0 2023-12-21 16:46:23,782 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.293e-02 2023-12-21 16:46:42,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=133226.66666666666, ans=0.125 2023-12-21 16:46:49,902 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-20000.pt 2023-12-21 16:46:55,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=133360.0, ans=0.0 2023-12-21 16:47:00,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=133360.0, ans=0.025 2023-12-21 16:47:01,325 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 2.545e+01 2.742e+01 2.943e+01 3.695e+01, threshold=5.484e+01, percent-clipped=0.0 2023-12-21 16:47:05,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=133426.66666666666, ans=10.0 2023-12-21 16:47:06,125 INFO [train.py:886] (0/4) Epoch 5, batch 950, loss[loss=0.01723, audio_tagging_loss=0.01723, over 24750.00 frames. ], tot_loss[loss=0.01697, audio_tagging_loss=0.01697, over 4909198.55 frames. ], batch size: 99, lr: 2.16e-02, grad_scale: 128.0 2023-12-21 16:47:08,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=133426.66666666666, ans=0.0 2023-12-21 16:47:11,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=133426.66666666666, ans=0.1 2023-12-21 16:47:33,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=133560.0, ans=0.0 2023-12-21 16:47:57,562 INFO [train.py:886] (0/4) Epoch 5, batch 1000, loss[loss=0.0181, audio_tagging_loss=0.0181, over 25000.00 frames. ], tot_loss[loss=0.01689, audio_tagging_loss=0.01689, over 4913874.69 frames. ], batch size: 100, lr: 2.16e-02, grad_scale: 128.0 2023-12-21 16:48:04,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=133760.0, ans=0.1 2023-12-21 16:48:08,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=133826.66666666666, ans=0.2 2023-12-21 16:48:09,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.98 vs. limit=6.0 2023-12-21 16:48:10,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=133826.66666666666, ans=0.2 2023-12-21 16:48:11,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=133826.66666666666, ans=0.07 2023-12-21 16:48:13,777 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=7.950e-03 2023-12-21 16:48:28,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=133960.0, ans=0.1 2023-12-21 16:48:38,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=134026.66666666666, ans=0.125 2023-12-21 16:48:38,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-12-21 16:48:38,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=134026.66666666666, ans=0.125 2023-12-21 16:48:42,463 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 2.511e+01 2.666e+01 2.885e+01 3.641e+01, threshold=5.332e+01, percent-clipped=0.0 2023-12-21 16:48:48,791 INFO [train.py:886] (0/4) Epoch 5, batch 1050, loss[loss=0.01619, audio_tagging_loss=0.01619, over 25000.00 frames. ], tot_loss[loss=0.01672, audio_tagging_loss=0.01672, over 4921982.97 frames. ], batch size: 100, lr: 2.16e-02, grad_scale: 128.0 2023-12-21 16:48:51,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=134093.33333333334, ans=0.0 2023-12-21 16:48:52,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=134093.33333333334, ans=0.125 2023-12-21 16:48:57,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=134093.33333333334, ans=0.09899494936611666 2023-12-21 16:48:57,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.72 vs. limit=15.0 2023-12-21 16:49:04,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2023-12-21 16:49:06,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2023-12-21 16:49:15,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2023-12-21 16:49:38,673 INFO [train.py:886] (0/4) Epoch 5, batch 1100, loss[loss=0.01632, audio_tagging_loss=0.01632, over 24909.00 frames. ], tot_loss[loss=0.01671, audio_tagging_loss=0.01671, over 4926372.58 frames. ], batch size: 100, lr: 2.16e-02, grad_scale: 128.0 2023-12-21 16:50:18,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=134693.33333333334, ans=0.015 2023-12-21 16:50:24,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=134693.33333333334, ans=0.125 2023-12-21 16:50:25,437 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.502e+01 2.717e+01 2.945e+01 3.841e+01, threshold=5.435e+01, percent-clipped=0.0 2023-12-21 16:50:30,924 INFO [train.py:886] (0/4) Epoch 5, batch 1150, loss[loss=0.01712, audio_tagging_loss=0.01712, over 24750.00 frames. ], tot_loss[loss=0.01671, audio_tagging_loss=0.01671, over 4933030.37 frames. ], batch size: 99, lr: 2.15e-02, grad_scale: 128.0 2023-12-21 16:50:34,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=134760.0, ans=0.125 2023-12-21 16:50:36,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=134760.0, ans=0.025 2023-12-21 16:50:50,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=134893.33333333334, ans=0.0 2023-12-21 16:50:52,147 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.067e-01 2023-12-21 16:50:52,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=134893.33333333334, ans=0.125 2023-12-21 16:50:57,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=134893.33333333334, ans=10.0 2023-12-21 16:51:07,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=134960.0, ans=0.125 2023-12-21 16:51:21,036 INFO [train.py:886] (0/4) Epoch 5, batch 1200, loss[loss=0.01445, audio_tagging_loss=0.01445, over 24750.00 frames. ], tot_loss[loss=0.01678, audio_tagging_loss=0.01678, over 4938349.06 frames. ], batch size: 99, lr: 2.15e-02, grad_scale: 128.0 2023-12-21 16:51:36,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=135160.0, ans=0.0 2023-12-21 16:51:49,068 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.91 vs. limit=6.0 2023-12-21 16:51:51,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=135293.33333333334, ans=0.05 2023-12-21 16:52:06,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=135360.0, ans=0.125 2023-12-21 16:52:07,533 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 2.551e+01 2.723e+01 2.888e+01 4.328e+01, threshold=5.445e+01, percent-clipped=0.0 2023-12-21 16:52:08,053 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2023-12-21 16:52:08,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=135360.0, ans=0.125 2023-12-21 16:52:12,135 INFO [train.py:886] (0/4) Epoch 5, batch 1250, loss[loss=0.01641, audio_tagging_loss=0.01641, over 21498.00 frames. ], tot_loss[loss=0.01685, audio_tagging_loss=0.01685, over 4938140.65 frames. ], batch size: 107, lr: 2.15e-02, grad_scale: 128.0 2023-12-21 16:52:30,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.32 vs. limit=15.0 2023-12-21 16:52:33,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=135560.0, ans=0.125 2023-12-21 16:52:46,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.29 vs. limit=6.0 2023-12-21 16:52:57,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=135693.33333333334, ans=0.1 2023-12-21 16:52:57,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=135693.33333333334, ans=0.07 2023-12-21 16:53:04,276 INFO [train.py:886] (0/4) Epoch 5, batch 1300, loss[loss=0.01934, audio_tagging_loss=0.01934, over 24750.00 frames. ], tot_loss[loss=0.01689, audio_tagging_loss=0.01689, over 4938971.69 frames. ], batch size: 99, lr: 2.15e-02, grad_scale: 128.0 2023-12-21 16:53:25,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=135893.33333333334, ans=0.125 2023-12-21 16:53:41,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=135960.0, ans=0.125 2023-12-21 16:53:46,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=136026.66666666666, ans=0.1 2023-12-21 16:53:49,038 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.561e+01 2.737e+01 2.945e+01 3.593e+01, threshold=5.474e+01, percent-clipped=0.0 2023-12-21 16:53:53,815 INFO [train.py:886] (0/4) Epoch 5, batch 1350, loss[loss=0.01454, audio_tagging_loss=0.01454, over 25000.00 frames. ], tot_loss[loss=0.01675, audio_tagging_loss=0.01675, over 4942529.76 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 128.0 2023-12-21 16:54:07,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=136160.0, ans=0.125 2023-12-21 16:54:11,316 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.74 vs. limit=22.5 2023-12-21 16:54:18,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=136226.66666666666, ans=0.05 2023-12-21 16:54:28,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=136293.33333333334, ans=0.125 2023-12-21 16:54:33,089 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.11 vs. limit=15.0 2023-12-21 16:54:46,085 INFO [train.py:886] (0/4) Epoch 5, batch 1400, loss[loss=0.01717, audio_tagging_loss=0.01717, over 25000.00 frames. ], tot_loss[loss=0.01665, audio_tagging_loss=0.01665, over 4947963.48 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 128.0 2023-12-21 16:54:46,299 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.455e+01 2023-12-21 16:55:05,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.42 vs. limit=10.0 2023-12-21 16:55:11,457 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.82 vs. limit=22.5 2023-12-21 16:55:18,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=136626.66666666666, ans=0.0 2023-12-21 16:55:31,496 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.570e+01 2.791e+01 2.991e+01 3.776e+01, threshold=5.582e+01, percent-clipped=0.0 2023-12-21 16:55:36,226 INFO [train.py:886] (0/4) Epoch 5, batch 1450, loss[loss=0.01624, audio_tagging_loss=0.01624, over 24750.00 frames. ], tot_loss[loss=0.01662, audio_tagging_loss=0.01662, over 4950023.54 frames. ], batch size: 99, lr: 2.14e-02, grad_scale: 128.0 2023-12-21 16:55:48,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=136826.66666666666, ans=0.125 2023-12-21 16:55:48,949 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.883e+01 2023-12-21 16:55:50,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=136826.66666666666, ans=0.0 2023-12-21 16:55:50,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=136826.66666666666, ans=0.125 2023-12-21 16:55:52,094 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.48 vs. limit=22.5 2023-12-21 16:55:52,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=136826.66666666666, ans=0.125 2023-12-21 16:56:21,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=137026.66666666666, ans=0.125 2023-12-21 16:56:22,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=137026.66666666666, ans=0.0 2023-12-21 16:56:26,826 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.01 vs. limit=15.0 2023-12-21 16:56:29,142 INFO [train.py:886] (0/4) Epoch 5, batch 1500, loss[loss=0.01714, audio_tagging_loss=0.01714, over 25000.00 frames. ], tot_loss[loss=0.01668, audio_tagging_loss=0.01668, over 4953688.94 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 128.0 2023-12-21 16:56:29,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=137093.33333333334, ans=0.125 2023-12-21 16:56:45,420 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2023-12-21 16:56:52,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=137226.66666666666, ans=0.125 2023-12-21 16:57:09,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=137360.0, ans=0.0 2023-12-21 16:57:14,689 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 2.623e+01 2.830e+01 3.033e+01 3.476e+01, threshold=5.660e+01, percent-clipped=0.0 2023-12-21 16:57:14,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=137360.0, ans=0.0 2023-12-21 16:57:15,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=137360.0, ans=0.0 2023-12-21 16:57:20,724 INFO [train.py:886] (0/4) Epoch 5, batch 1550, loss[loss=0.02132, audio_tagging_loss=0.02132, over 24750.00 frames. ], tot_loss[loss=0.01684, audio_tagging_loss=0.01684, over 4949304.33 frames. ], batch size: 99, lr: 2.14e-02, grad_scale: 128.0 2023-12-21 16:57:20,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=137426.66666666666, ans=0.125 2023-12-21 16:57:25,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=12.0 2023-12-21 16:57:26,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2023-12-21 16:57:27,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.65 vs. limit=10.0 2023-12-21 16:57:41,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=137560.0, ans=0.125 2023-12-21 16:57:44,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=137560.0, ans=0.125 2023-12-21 16:57:55,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.60 vs. limit=22.5 2023-12-21 16:57:56,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.54 vs. limit=15.0 2023-12-21 16:58:02,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2023-12-21 16:58:10,326 INFO [train.py:886] (0/4) Epoch 5, batch 1600, loss[loss=0.01952, audio_tagging_loss=0.01952, over 25000.00 frames. ], tot_loss[loss=0.01685, audio_tagging_loss=0.01685, over 4945091.27 frames. ], batch size: 100, lr: 2.13e-02, grad_scale: 128.0 2023-12-21 16:58:56,109 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.601e+01 2.757e+01 2.954e+01 3.912e+01, threshold=5.513e+01, percent-clipped=0.0 2023-12-21 16:59:01,652 INFO [train.py:886] (0/4) Epoch 5, batch 1650, loss[loss=0.02169, audio_tagging_loss=0.02169, over 21502.00 frames. ], tot_loss[loss=0.01677, audio_tagging_loss=0.01677, over 4943517.91 frames. ], batch size: 107, lr: 2.13e-02, grad_scale: 128.0 2023-12-21 16:59:13,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=138160.0, ans=0.125 2023-12-21 16:59:14,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.49 vs. limit=15.0 2023-12-21 16:59:23,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=138226.66666666666, ans=0.125 2023-12-21 16:59:31,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=138226.66666666666, ans=0.125 2023-12-21 16:59:47,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=138360.0, ans=0.1 2023-12-21 16:59:51,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=138426.66666666666, ans=0.0 2023-12-21 16:59:52,665 INFO [train.py:886] (0/4) Epoch 5, batch 1700, loss[loss=0.01595, audio_tagging_loss=0.01595, over 25000.00 frames. ], tot_loss[loss=0.01671, audio_tagging_loss=0.01671, over 4945824.72 frames. ], batch size: 100, lr: 2.13e-02, grad_scale: 128.0 2023-12-21 17:00:06,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=138493.33333333334, ans=0.125 2023-12-21 17:00:10,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=138493.33333333334, ans=0.1 2023-12-21 17:00:16,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.78 vs. limit=15.0 2023-12-21 17:00:18,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-12-21 17:00:32,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=138626.66666666666, ans=0.025 2023-12-21 17:00:35,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=138693.33333333334, ans=0.1 2023-12-21 17:00:40,306 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.593e+01 2.813e+01 3.022e+01 3.717e+01, threshold=5.626e+01, percent-clipped=0.0 2023-12-21 17:00:43,622 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2023-12-21 17:00:45,134 INFO [train.py:886] (0/4) Epoch 5, batch 1750, loss[loss=0.01326, audio_tagging_loss=0.01326, over 25000.00 frames. ], tot_loss[loss=0.01666, audio_tagging_loss=0.01666, over 4949137.00 frames. ], batch size: 100, lr: 2.13e-02, grad_scale: 128.0 2023-12-21 17:00:51,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=138760.0, ans=0.1 2023-12-21 17:00:54,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=138826.66666666666, ans=0.125 2023-12-21 17:01:04,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=138826.66666666666, ans=0.125 2023-12-21 17:01:09,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=138893.33333333334, ans=0.0 2023-12-21 17:01:14,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.53 vs. limit=15.0 2023-12-21 17:01:27,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2023-12-21 17:01:37,533 INFO [train.py:886] (0/4) Epoch 5, batch 1800, loss[loss=0.01517, audio_tagging_loss=0.01517, over 22097.00 frames. ], tot_loss[loss=0.01662, audio_tagging_loss=0.01662, over 4950335.63 frames. ], batch size: 107, lr: 2.12e-02, grad_scale: 128.0 2023-12-21 17:01:41,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=139093.33333333334, ans=0.1 2023-12-21 17:01:42,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=139093.33333333334, ans=0.125 2023-12-21 17:01:50,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=139160.0, ans=0.125 2023-12-21 17:01:52,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=139160.0, ans=0.125 2023-12-21 17:01:54,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=139160.0, ans=0.125 2023-12-21 17:02:23,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=139360.0, ans=0.125 2023-12-21 17:02:23,762 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+01 2.569e+01 2.782e+01 2.969e+01 3.788e+01, threshold=5.564e+01, percent-clipped=0.0 2023-12-21 17:02:27,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=139426.66666666666, ans=0.1 2023-12-21 17:02:28,440 INFO [train.py:886] (0/4) Epoch 5, batch 1850, loss[loss=0.01454, audio_tagging_loss=0.01454, over 24750.00 frames. ], tot_loss[loss=0.01666, audio_tagging_loss=0.01666, over 4951796.23 frames. ], batch size: 99, lr: 2.12e-02, grad_scale: 128.0 2023-12-21 17:02:36,182 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=12.0 2023-12-21 17:03:06,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=139626.66666666666, ans=0.0 2023-12-21 17:03:08,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=139626.66666666666, ans=0.1 2023-12-21 17:03:19,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=139760.0, ans=0.125 2023-12-21 17:03:19,899 INFO [train.py:886] (0/4) Epoch 5, batch 1900, loss[loss=0.02169, audio_tagging_loss=0.02169, over 24750.00 frames. ], tot_loss[loss=0.01684, audio_tagging_loss=0.01684, over 4946613.70 frames. ], batch size: 99, lr: 2.12e-02, grad_scale: 128.0 2023-12-21 17:03:46,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=139893.33333333334, ans=0.04949747468305833 2023-12-21 17:03:50,807 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.84 vs. limit=22.5 2023-12-21 17:04:00,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=140026.66666666666, ans=0.0 2023-12-21 17:04:05,391 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 2.621e+01 2.837e+01 3.063e+01 3.713e+01, threshold=5.675e+01, percent-clipped=0.0 2023-12-21 17:04:05,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2023-12-21 17:04:06,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=140026.66666666666, ans=0.2 2023-12-21 17:04:11,608 INFO [train.py:886] (0/4) Epoch 5, batch 1950, loss[loss=0.01461, audio_tagging_loss=0.01461, over 24750.00 frames. ], tot_loss[loss=0.01675, audio_tagging_loss=0.01675, over 4939946.13 frames. ], batch size: 99, lr: 2.12e-02, grad_scale: 128.0 2023-12-21 17:04:20,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=140093.33333333334, ans=0.125 2023-12-21 17:04:29,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=140160.0, ans=0.125 2023-12-21 17:04:41,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.00 vs. limit=10.0 2023-12-21 17:04:46,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.02 vs. limit=15.0 2023-12-21 17:04:47,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140293.33333333334, ans=0.1 2023-12-21 17:05:00,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=140360.0, ans=0.125 2023-12-21 17:05:02,944 INFO [train.py:886] (0/4) Epoch 5, batch 2000, loss[loss=0.01866, audio_tagging_loss=0.01866, over 25000.00 frames. ], tot_loss[loss=0.01668, audio_tagging_loss=0.01668, over 4940623.70 frames. ], batch size: 100, lr: 2.11e-02, grad_scale: 64.0 2023-12-21 17:05:03,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=140426.66666666666, ans=0.125 2023-12-21 17:05:05,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=140426.66666666666, ans=0.07 2023-12-21 17:05:19,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140493.33333333334, ans=0.1 2023-12-21 17:05:37,443 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.902e-02 2023-12-21 17:05:40,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=140626.66666666666, ans=0.125 2023-12-21 17:05:42,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=140626.66666666666, ans=0.2 2023-12-21 17:05:46,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=140693.33333333334, ans=0.125 2023-12-21 17:05:51,608 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.499e+01 2.652e+01 2.874e+01 3.620e+01, threshold=5.305e+01, percent-clipped=0.0 2023-12-21 17:05:55,398 INFO [train.py:886] (0/4) Epoch 5, batch 2050, loss[loss=0.01538, audio_tagging_loss=0.01538, over 25000.00 frames. ], tot_loss[loss=0.01665, audio_tagging_loss=0.01665, over 4950575.11 frames. ], batch size: 100, lr: 2.11e-02, grad_scale: 64.0 2023-12-21 17:06:00,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=140760.0, ans=0.5 2023-12-21 17:06:09,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=140826.66666666666, ans=0.125 2023-12-21 17:06:12,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=140826.66666666666, ans=0.125 2023-12-21 17:06:21,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=140893.33333333334, ans=0.2 2023-12-21 17:06:23,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=140893.33333333334, ans=0.125 2023-12-21 17:06:24,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2023-12-21 17:06:25,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=140960.0, ans=0.125 2023-12-21 17:06:39,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=141026.66666666666, ans=0.125 2023-12-21 17:06:46,194 INFO [train.py:886] (0/4) Epoch 5, batch 2100, loss[loss=0.01762, audio_tagging_loss=0.01762, over 25000.00 frames. ], tot_loss[loss=0.0166, audio_tagging_loss=0.0166, over 4958715.18 frames. ], batch size: 100, lr: 2.11e-02, grad_scale: 64.0 2023-12-21 17:06:47,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.92 vs. limit=22.5 2023-12-21 17:06:52,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=141093.33333333334, ans=0.125 2023-12-21 17:06:57,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=141160.0, ans=0.125 2023-12-21 17:06:59,834 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 17:07:02,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.49 vs. limit=10.0 2023-12-21 17:07:05,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=15.0 2023-12-21 17:07:12,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=141226.66666666666, ans=0.0 2023-12-21 17:07:23,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=141293.33333333334, ans=0.0 2023-12-21 17:07:24,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=141293.33333333334, ans=0.0 2023-12-21 17:07:34,669 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.214e+01 2.574e+01 2.738e+01 2.896e+01 3.657e+01, threshold=5.477e+01, percent-clipped=0.0 2023-12-21 17:07:38,494 INFO [train.py:886] (0/4) Epoch 5, batch 2150, loss[loss=0.01537, audio_tagging_loss=0.01537, over 25000.00 frames. ], tot_loss[loss=0.01662, audio_tagging_loss=0.01662, over 4963296.67 frames. ], batch size: 100, lr: 2.11e-02, grad_scale: 64.0 2023-12-21 17:07:44,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=141426.66666666666, ans=0.125 2023-12-21 17:07:48,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141493.33333333334, ans=0.1 2023-12-21 17:08:04,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=141560.0, ans=0.125 2023-12-21 17:08:27,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=141693.33333333334, ans=0.125 2023-12-21 17:08:31,222 INFO [train.py:886] (0/4) Epoch 5, batch 2200, loss[loss=0.01658, audio_tagging_loss=0.01658, over 24750.00 frames. ], tot_loss[loss=0.01689, audio_tagging_loss=0.01689, over 4951824.63 frames. ], batch size: 99, lr: 2.11e-02, grad_scale: 64.0 2023-12-21 17:08:33,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=141760.0, ans=15.0 2023-12-21 17:08:36,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=141760.0, ans=0.2 2023-12-21 17:08:44,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=141826.66666666666, ans=0.025 2023-12-21 17:08:48,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=141826.66666666666, ans=0.2 2023-12-21 17:08:55,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=141893.33333333334, ans=0.0 2023-12-21 17:09:09,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=141960.0, ans=0.125 2023-12-21 17:09:17,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=142026.66666666666, ans=0.125 2023-12-21 17:09:17,620 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+01 2.618e+01 2.828e+01 3.065e+01 3.912e+01, threshold=5.656e+01, percent-clipped=0.0 2023-12-21 17:09:17,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=142026.66666666666, ans=0.0 2023-12-21 17:09:17,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=142026.66666666666, ans=0.07 2023-12-21 17:09:18,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=142026.66666666666, ans=0.125 2023-12-21 17:09:21,470 INFO [train.py:886] (0/4) Epoch 5, batch 2250, loss[loss=0.01587, audio_tagging_loss=0.01587, over 24750.00 frames. ], tot_loss[loss=0.01685, audio_tagging_loss=0.01685, over 4949357.30 frames. ], batch size: 99, lr: 2.10e-02, grad_scale: 64.0 2023-12-21 17:09:29,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=142093.33333333334, ans=0.125 2023-12-21 17:09:43,542 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=12.0 2023-12-21 17:09:44,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=142226.66666666666, ans=0.1 2023-12-21 17:09:59,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=142293.33333333334, ans=0.05 2023-12-21 17:10:14,388 INFO [train.py:886] (0/4) Epoch 5, batch 2300, loss[loss=0.01696, audio_tagging_loss=0.01696, over 25000.00 frames. ], tot_loss[loss=0.0168, audio_tagging_loss=0.0168, over 4950625.55 frames. ], batch size: 100, lr: 2.10e-02, grad_scale: 64.0 2023-12-21 17:10:21,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=142426.66666666666, ans=10.0 2023-12-21 17:10:21,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=142426.66666666666, ans=0.125 2023-12-21 17:10:25,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=142493.33333333334, ans=0.125 2023-12-21 17:10:30,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=142493.33333333334, ans=0.2 2023-12-21 17:10:48,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.87 vs. limit=22.5 2023-12-21 17:11:00,691 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.568e+01 2.751e+01 2.902e+01 4.993e+01, threshold=5.502e+01, percent-clipped=0.0 2023-12-21 17:11:01,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.87 vs. limit=15.0 2023-12-21 17:11:04,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=142760.0, ans=0.125 2023-12-21 17:11:05,210 INFO [train.py:886] (0/4) Epoch 5, batch 2350, loss[loss=0.01567, audio_tagging_loss=0.01567, over 25000.00 frames. ], tot_loss[loss=0.01674, audio_tagging_loss=0.01674, over 4953692.56 frames. ], batch size: 100, lr: 2.10e-02, grad_scale: 64.0 2023-12-21 17:11:16,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=142826.66666666666, ans=0.125 2023-12-21 17:11:16,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=142826.66666666666, ans=0.125 2023-12-21 17:11:20,351 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.69 vs. limit=22.5 2023-12-21 17:11:20,354 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=21.37 vs. limit=15.0 2023-12-21 17:11:23,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=142826.66666666666, ans=0.2 2023-12-21 17:11:23,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.81 vs. limit=22.5 2023-12-21 17:11:28,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=12.0 2023-12-21 17:11:30,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=142893.33333333334, ans=0.1 2023-12-21 17:11:33,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=142893.33333333334, ans=0.125 2023-12-21 17:11:57,052 INFO [train.py:886] (0/4) Epoch 5, batch 2400, loss[loss=0.01814, audio_tagging_loss=0.01814, over 21663.00 frames. ], tot_loss[loss=0.01677, audio_tagging_loss=0.01677, over 4950232.31 frames. ], batch size: 107, lr: 2.10e-02, grad_scale: 64.0 2023-12-21 17:12:00,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=143093.33333333334, ans=0.0 2023-12-21 17:12:07,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=15.0 2023-12-21 17:12:10,411 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.95 vs. limit=15.0 2023-12-21 17:12:34,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143293.33333333334, ans=0.1 2023-12-21 17:12:44,893 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.521e+01 2.685e+01 2.894e+01 4.113e+01, threshold=5.370e+01, percent-clipped=0.0 2023-12-21 17:12:49,445 INFO [train.py:886] (0/4) Epoch 5, batch 2450, loss[loss=0.01959, audio_tagging_loss=0.01959, over 24750.00 frames. ], tot_loss[loss=0.01681, audio_tagging_loss=0.01681, over 4957238.56 frames. ], batch size: 99, lr: 2.10e-02, grad_scale: 64.0 2023-12-21 17:12:50,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=143426.66666666666, ans=0.0 2023-12-21 17:13:15,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=143560.0, ans=0.07 2023-12-21 17:13:17,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=143560.0, ans=0.125 2023-12-21 17:13:18,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=143560.0, ans=0.125 2023-12-21 17:13:28,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.38 vs. limit=15.0 2023-12-21 17:13:36,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=143693.33333333334, ans=0.125 2023-12-21 17:13:39,917 INFO [train.py:886] (0/4) Epoch 5, batch 2500, loss[loss=0.02229, audio_tagging_loss=0.02229, over 24750.00 frames. ], tot_loss[loss=0.01687, audio_tagging_loss=0.01687, over 4956006.16 frames. ], batch size: 99, lr: 2.09e-02, grad_scale: 64.0 2023-12-21 17:13:54,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.68 vs. limit=22.5 2023-12-21 17:14:04,457 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=15.0 2023-12-21 17:14:07,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.91 vs. limit=6.0 2023-12-21 17:14:18,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=143960.0, ans=0.125 2023-12-21 17:14:27,830 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.613e+01 2.844e+01 3.079e+01 3.579e+01, threshold=5.689e+01, percent-clipped=0.0 2023-12-21 17:14:31,606 INFO [train.py:886] (0/4) Epoch 5, batch 2550, loss[loss=0.01588, audio_tagging_loss=0.01588, over 24750.00 frames. ], tot_loss[loss=0.01691, audio_tagging_loss=0.01691, over 4948685.10 frames. ], batch size: 99, lr: 2.09e-02, grad_scale: 64.0 2023-12-21 17:14:34,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.88 vs. limit=10.0 2023-12-21 17:14:40,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=144160.0, ans=22.5 2023-12-21 17:14:42,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=144160.0, ans=0.09899494936611666 2023-12-21 17:15:08,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=144293.33333333334, ans=0.0 2023-12-21 17:15:18,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=144360.0, ans=0.125 2023-12-21 17:15:21,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=144426.66666666666, ans=0.125 2023-12-21 17:15:22,871 INFO [train.py:886] (0/4) Epoch 5, batch 2600, loss[loss=0.01674, audio_tagging_loss=0.01674, over 25000.00 frames. ], tot_loss[loss=0.01684, audio_tagging_loss=0.01684, over 4947074.72 frames. ], batch size: 100, lr: 2.09e-02, grad_scale: 64.0 2023-12-21 17:15:31,318 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.61 vs. limit=15.0 2023-12-21 17:15:53,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=144626.66666666666, ans=0.0 2023-12-21 17:16:07,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=144693.33333333334, ans=0.0 2023-12-21 17:16:10,459 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.544e+01 2.733e+01 3.057e+01 4.063e+01, threshold=5.467e+01, percent-clipped=0.0 2023-12-21 17:16:14,197 INFO [train.py:886] (0/4) Epoch 5, batch 2650, loss[loss=0.0151, audio_tagging_loss=0.0151, over 25000.00 frames. ], tot_loss[loss=0.01682, audio_tagging_loss=0.01682, over 4949942.35 frames. ], batch size: 100, lr: 2.09e-02, grad_scale: 64.0 2023-12-21 17:16:42,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=15.0 2023-12-21 17:17:04,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2023-12-21 17:17:05,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2023-12-21 17:17:07,052 INFO [train.py:886] (0/4) Epoch 5, batch 2700, loss[loss=0.01607, audio_tagging_loss=0.01607, over 25000.00 frames. ], tot_loss[loss=0.01674, audio_tagging_loss=0.01674, over 4950633.04 frames. ], batch size: 100, lr: 2.08e-02, grad_scale: 64.0 2023-12-21 17:17:08,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2023-12-21 17:17:13,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=145093.33333333334, ans=0.1 2023-12-21 17:17:36,246 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.67 vs. limit=15.0 2023-12-21 17:17:39,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=145293.33333333334, ans=0.0 2023-12-21 17:17:39,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=145293.33333333334, ans=0.0 2023-12-21 17:17:45,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=145293.33333333334, ans=0.125 2023-12-21 17:17:49,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=145360.0, ans=0.2 2023-12-21 17:17:53,872 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.469e+01 2.654e+01 2.877e+01 3.564e+01, threshold=5.308e+01, percent-clipped=0.0 2023-12-21 17:17:55,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=145360.0, ans=0.04949747468305833 2023-12-21 17:17:57,714 INFO [train.py:886] (0/4) Epoch 5, batch 2750, loss[loss=0.01609, audio_tagging_loss=0.01609, over 24750.00 frames. ], tot_loss[loss=0.01671, audio_tagging_loss=0.01671, over 4955100.66 frames. ], batch size: 99, lr: 2.08e-02, grad_scale: 64.0 2023-12-21 17:18:16,082 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.56 vs. limit=6.0 2023-12-21 17:18:29,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=145626.66666666666, ans=0.125 2023-12-21 17:18:43,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=145693.33333333334, ans=0.0 2023-12-21 17:18:50,489 INFO [train.py:886] (0/4) Epoch 5, batch 2800, loss[loss=0.0187, audio_tagging_loss=0.0187, over 24954.00 frames. ], tot_loss[loss=0.01681, audio_tagging_loss=0.01681, over 4955188.11 frames. ], batch size: 100, lr: 2.08e-02, grad_scale: 64.0 2023-12-21 17:19:00,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=145826.66666666666, ans=0.0 2023-12-21 17:19:01,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.36 vs. limit=15.0 2023-12-21 17:19:22,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=145960.0, ans=0.0 2023-12-21 17:19:26,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=145960.0, ans=0.1 2023-12-21 17:19:38,107 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.375e+01 2.607e+01 2.811e+01 3.107e+01 4.329e+01, threshold=5.621e+01, percent-clipped=0.0 2023-12-21 17:19:42,660 INFO [train.py:886] (0/4) Epoch 5, batch 2850, loss[loss=0.01697, audio_tagging_loss=0.01697, over 24750.00 frames. ], tot_loss[loss=0.01689, audio_tagging_loss=0.01689, over 4952561.99 frames. ], batch size: 99, lr: 2.08e-02, grad_scale: 64.0 2023-12-21 17:19:51,473 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.98 vs. limit=15.0 2023-12-21 17:20:14,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=146293.33333333334, ans=0.0 2023-12-21 17:20:30,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=146360.0, ans=0.125 2023-12-21 17:20:33,323 INFO [train.py:886] (0/4) Epoch 5, batch 2900, loss[loss=0.01701, audio_tagging_loss=0.01701, over 25000.00 frames. ], tot_loss[loss=0.01672, audio_tagging_loss=0.01672, over 4956786.38 frames. ], batch size: 100, lr: 2.08e-02, grad_scale: 64.0 2023-12-21 17:20:49,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=19.99 vs. limit=15.0 2023-12-21 17:20:55,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=146560.0, ans=0.07 2023-12-21 17:20:58,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=146560.0, ans=0.125 2023-12-21 17:21:01,841 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 17:21:06,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.16 vs. limit=22.5 2023-12-21 17:21:21,920 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.595e+01 2.753e+01 2.968e+01 3.866e+01, threshold=5.507e+01, percent-clipped=0.0 2023-12-21 17:21:25,874 INFO [train.py:886] (0/4) Epoch 5, batch 2950, loss[loss=0.01648, audio_tagging_loss=0.01648, over 24750.00 frames. ], tot_loss[loss=0.01662, audio_tagging_loss=0.01662, over 4959685.14 frames. ], batch size: 99, lr: 2.07e-02, grad_scale: 64.0 2023-12-21 17:21:33,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.63 vs. limit=15.0 2023-12-21 17:21:34,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=146760.0, ans=0.0 2023-12-21 17:21:37,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=146826.66666666666, ans=0.1 2023-12-21 17:21:47,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=146893.33333333334, ans=0.125 2023-12-21 17:21:51,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=146893.33333333334, ans=0.125 2023-12-21 17:21:56,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.04 vs. limit=22.5 2023-12-21 17:22:03,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=146960.0, ans=0.0 2023-12-21 17:22:05,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.73 vs. limit=15.0 2023-12-21 17:22:14,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=15.0 2023-12-21 17:22:18,285 INFO [train.py:886] (0/4) Epoch 5, batch 3000, loss[loss=0.01788, audio_tagging_loss=0.01788, over 25000.00 frames. ], tot_loss[loss=0.01651, audio_tagging_loss=0.01651, over 4960687.62 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 64.0 2023-12-21 17:22:18,287 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 17:22:29,114 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0789, 4.5342, 4.8898, 4.5683], device='cuda:0') 2023-12-21 17:22:39,428 INFO [train.py:917] (0/4) Epoch 5, validation: loss=0.04009, audio_tagging_loss=0.04009, over 3737520.00 frames. 2023-12-21 17:22:39,429 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 17:22:42,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=147093.33333333334, ans=0.1 2023-12-21 17:22:54,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=147160.0, ans=0.0 2023-12-21 17:23:03,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=147226.66666666666, ans=0.2 2023-12-21 17:23:04,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=147226.66666666666, ans=0.125 2023-12-21 17:23:11,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.79 vs. limit=22.5 2023-12-21 17:23:13,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.41 vs. limit=15.0 2023-12-21 17:23:23,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=147360.0, ans=0.1 2023-12-21 17:23:27,623 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.222e+01 2.543e+01 2.712e+01 2.939e+01 3.335e+01, threshold=5.424e+01, percent-clipped=0.0 2023-12-21 17:23:31,445 INFO [train.py:886] (0/4) Epoch 5, batch 3050, loss[loss=0.0197, audio_tagging_loss=0.0197, over 25000.00 frames. ], tot_loss[loss=0.01652, audio_tagging_loss=0.01652, over 4963630.55 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 64.0 2023-12-21 17:23:42,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=147493.33333333334, ans=0.95 2023-12-21 17:23:52,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.24 vs. limit=22.5 2023-12-21 17:23:54,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=147560.0, ans=0.0 2023-12-21 17:23:58,276 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2023-12-21 17:24:12,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-12-21 17:24:19,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=147693.33333333334, ans=0.0 2023-12-21 17:24:23,425 INFO [train.py:886] (0/4) Epoch 5, batch 3100, loss[loss=0.02172, audio_tagging_loss=0.02172, over 25000.00 frames. ], tot_loss[loss=0.01652, audio_tagging_loss=0.01652, over 4960509.68 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 64.0 2023-12-21 17:24:45,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=147893.33333333334, ans=0.125 2023-12-21 17:25:05,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2023-12-21 17:25:06,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=148026.66666666666, ans=0.125 2023-12-21 17:25:09,743 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.613e+01 2.780e+01 2.978e+01 3.396e+01, threshold=5.559e+01, percent-clipped=0.0 2023-12-21 17:25:13,593 INFO [train.py:886] (0/4) Epoch 5, batch 3150, loss[loss=0.01967, audio_tagging_loss=0.01967, over 24750.00 frames. ], tot_loss[loss=0.01659, audio_tagging_loss=0.01659, over 4948603.44 frames. ], batch size: 99, lr: 2.07e-02, grad_scale: 64.0 2023-12-21 17:25:21,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=148093.33333333334, ans=0.2 2023-12-21 17:25:24,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=148160.0, ans=0.0 2023-12-21 17:25:24,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=15.0 2023-12-21 17:25:25,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=148160.0, ans=0.0 2023-12-21 17:25:31,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=148160.0, ans=0.2 2023-12-21 17:25:33,944 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 17:25:41,043 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2023-12-21 17:25:45,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=148293.33333333334, ans=0.125 2023-12-21 17:25:58,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=148360.0, ans=0.1 2023-12-21 17:26:05,977 INFO [train.py:886] (0/4) Epoch 5, batch 3200, loss[loss=0.01712, audio_tagging_loss=0.01712, over 24750.00 frames. ], tot_loss[loss=0.01662, audio_tagging_loss=0.01662, over 4947284.42 frames. ], batch size: 99, lr: 2.06e-02, grad_scale: 64.0 2023-12-21 17:26:25,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2023-12-21 17:26:30,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=148560.0, ans=0.0 2023-12-21 17:26:38,679 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2023-12-21 17:26:44,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=148626.66666666666, ans=0.125 2023-12-21 17:26:48,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=148693.33333333334, ans=0.0 2023-12-21 17:26:51,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=148693.33333333334, ans=0.0 2023-12-21 17:26:52,670 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.499e+01 2.673e+01 2.895e+01 4.175e+01, threshold=5.346e+01, percent-clipped=0.0 2023-12-21 17:26:52,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=148693.33333333334, ans=0.125 2023-12-21 17:26:57,279 INFO [train.py:886] (0/4) Epoch 5, batch 3250, loss[loss=0.01608, audio_tagging_loss=0.01608, over 25000.00 frames. ], tot_loss[loss=0.01653, audio_tagging_loss=0.01653, over 4946811.75 frames. ], batch size: 100, lr: 2.06e-02, grad_scale: 64.0 2023-12-21 17:27:47,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.66 vs. limit=15.0 2023-12-21 17:27:48,900 INFO [train.py:886] (0/4) Epoch 5, batch 3300, loss[loss=0.01468, audio_tagging_loss=0.01468, over 24112.00 frames. ], tot_loss[loss=0.0165, audio_tagging_loss=0.0165, over 4941058.28 frames. ], batch size: 100, lr: 2.06e-02, grad_scale: 64.0 2023-12-21 17:27:56,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=149093.33333333334, ans=0.0 2023-12-21 17:28:11,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=149226.66666666666, ans=0.1 2023-12-21 17:28:14,820 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=12.0 2023-12-21 17:28:26,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=149293.33333333334, ans=0.125 2023-12-21 17:28:29,746 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=8.138e-02 2023-12-21 17:28:36,618 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.590e+01 2.838e+01 3.066e+01 3.975e+01, threshold=5.677e+01, percent-clipped=0.0 2023-12-21 17:28:41,794 INFO [train.py:886] (0/4) Epoch 5, batch 3350, loss[loss=0.01685, audio_tagging_loss=0.01685, over 25000.00 frames. ], tot_loss[loss=0.01653, audio_tagging_loss=0.01653, over 4940335.31 frames. ], batch size: 100, lr: 2.06e-02, grad_scale: 64.0 2023-12-21 17:28:44,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=149426.66666666666, ans=0.125 2023-12-21 17:28:51,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=149493.33333333334, ans=0.125 2023-12-21 17:28:51,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=149493.33333333334, ans=0.125 2023-12-21 17:28:52,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=149493.33333333334, ans=0.1 2023-12-21 17:29:01,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=149560.0, ans=0.0 2023-12-21 17:29:04,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2023-12-21 17:29:11,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=149626.66666666666, ans=0.1 2023-12-21 17:29:15,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=149626.66666666666, ans=0.125 2023-12-21 17:29:31,395 INFO [train.py:886] (0/4) Epoch 5, batch 3400, loss[loss=0.01676, audio_tagging_loss=0.01676, over 25000.00 frames. ], tot_loss[loss=0.01664, audio_tagging_loss=0.01664, over 4946085.30 frames. ], batch size: 100, lr: 2.05e-02, grad_scale: 64.0 2023-12-21 17:29:41,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=149826.66666666666, ans=0.125 2023-12-21 17:29:41,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.76 vs. limit=22.5 2023-12-21 17:29:46,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=149826.66666666666, ans=0.1 2023-12-21 17:29:48,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.22 vs. limit=22.5 2023-12-21 17:29:54,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=149893.33333333334, ans=0.125 2023-12-21 17:30:02,080 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2023-12-21 17:30:07,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=12.0 2023-12-21 17:30:19,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=150026.66666666666, ans=0.125 2023-12-21 17:30:20,654 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.365e+01 2.600e+01 2.817e+01 3.077e+01 4.267e+01, threshold=5.633e+01, percent-clipped=0.0 2023-12-21 17:30:22,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2023-12-21 17:30:24,432 INFO [train.py:886] (0/4) Epoch 5, batch 3450, loss[loss=0.01599, audio_tagging_loss=0.01599, over 23993.00 frames. ], tot_loss[loss=0.01671, audio_tagging_loss=0.01671, over 4936546.34 frames. ], batch size: 100, lr: 2.05e-02, grad_scale: 64.0 2023-12-21 17:30:36,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=150160.0, ans=0.0 2023-12-21 17:30:53,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2023-12-21 17:31:09,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=150360.0, ans=0.125 2023-12-21 17:31:15,700 INFO [train.py:886] (0/4) Epoch 5, batch 3500, loss[loss=0.01754, audio_tagging_loss=0.01754, over 24750.00 frames. ], tot_loss[loss=0.01671, audio_tagging_loss=0.01671, over 4938206.25 frames. ], batch size: 99, lr: 2.05e-02, grad_scale: 64.0 2023-12-21 17:31:24,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.00 vs. limit=15.0 2023-12-21 17:31:31,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=150493.33333333334, ans=0.0 2023-12-21 17:31:40,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150560.0, ans=0.1 2023-12-21 17:31:41,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=150560.0, ans=0.1 2023-12-21 17:32:00,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=150693.33333333334, ans=0.0 2023-12-21 17:32:00,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=150693.33333333334, ans=0.1 2023-12-21 17:32:01,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=150693.33333333334, ans=0.0 2023-12-21 17:32:03,513 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.549e+01 2.777e+01 3.022e+01 4.198e+01, threshold=5.554e+01, percent-clipped=0.0 2023-12-21 17:32:07,368 INFO [train.py:886] (0/4) Epoch 5, batch 3550, loss[loss=0.01667, audio_tagging_loss=0.01667, over 24750.00 frames. ], tot_loss[loss=0.01659, audio_tagging_loss=0.01659, over 4938879.19 frames. ], batch size: 99, lr: 2.05e-02, grad_scale: 64.0 2023-12-21 17:32:09,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.89 vs. limit=15.0 2023-12-21 17:32:17,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=150826.66666666666, ans=0.125 2023-12-21 17:32:43,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=150960.0, ans=0.0 2023-12-21 17:32:59,023 INFO [train.py:886] (0/4) Epoch 5, batch 3600, loss[loss=0.01571, audio_tagging_loss=0.01571, over 25000.00 frames. ], tot_loss[loss=0.01644, audio_tagging_loss=0.01644, over 4948336.89 frames. ], batch size: 100, lr: 2.05e-02, grad_scale: 64.0 2023-12-21 17:33:03,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=151093.33333333334, ans=0.1 2023-12-21 17:33:15,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=151160.0, ans=0.0 2023-12-21 17:33:16,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=12.0 2023-12-21 17:33:19,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=151226.66666666666, ans=0.125 2023-12-21 17:33:40,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=151360.0, ans=0.04949747468305833 2023-12-21 17:33:46,184 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.142e+01 2.518e+01 2.646e+01 2.841e+01 3.411e+01, threshold=5.291e+01, percent-clipped=0.0 2023-12-21 17:33:49,953 INFO [train.py:886] (0/4) Epoch 5, batch 3650, loss[loss=0.01718, audio_tagging_loss=0.01718, over 25000.00 frames. ], tot_loss[loss=0.01639, audio_tagging_loss=0.01639, over 4954527.65 frames. ], batch size: 100, lr: 2.04e-02, grad_scale: 64.0 2023-12-21 17:34:19,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=151560.0, ans=0.125 2023-12-21 17:34:28,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.60 vs. limit=22.5 2023-12-21 17:34:38,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=151693.33333333334, ans=0.0 2023-12-21 17:34:42,997 INFO [train.py:886] (0/4) Epoch 5, batch 3700, loss[loss=0.01459, audio_tagging_loss=0.01459, over 25000.00 frames. ], tot_loss[loss=0.01649, audio_tagging_loss=0.01649, over 4958518.95 frames. ], batch size: 100, lr: 2.04e-02, grad_scale: 64.0 2023-12-21 17:34:55,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=151826.66666666666, ans=0.1 2023-12-21 17:34:56,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=151826.66666666666, ans=0.2 2023-12-21 17:35:18,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=151960.0, ans=0.0 2023-12-21 17:35:30,111 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.307e+01 2.577e+01 2.848e+01 3.103e+01 4.088e+01, threshold=5.696e+01, percent-clipped=0.0 2023-12-21 17:35:34,624 INFO [train.py:886] (0/4) Epoch 5, batch 3750, loss[loss=0.01785, audio_tagging_loss=0.01785, over 24750.00 frames. ], tot_loss[loss=0.01673, audio_tagging_loss=0.01673, over 4953017.87 frames. ], batch size: 99, lr: 2.04e-02, grad_scale: 64.0 2023-12-21 17:35:49,547 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.290e+00 2023-12-21 17:35:55,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.96 vs. limit=22.5 2023-12-21 17:35:56,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=152226.66666666666, ans=0.125 2023-12-21 17:35:58,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=152226.66666666666, ans=0.1 2023-12-21 17:35:59,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=152226.66666666666, ans=0.125 2023-12-21 17:36:01,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=152226.66666666666, ans=0.0 2023-12-21 17:36:26,439 INFO [train.py:886] (0/4) Epoch 5, batch 3800, loss[loss=0.01674, audio_tagging_loss=0.01674, over 24750.00 frames. ], tot_loss[loss=0.01673, audio_tagging_loss=0.01673, over 4940278.15 frames. ], batch size: 99, lr: 2.04e-02, grad_scale: 64.0 2023-12-21 17:36:30,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=152426.66666666666, ans=0.0 2023-12-21 17:36:39,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=152493.33333333334, ans=0.125 2023-12-21 17:37:14,679 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.532e+01 2.751e+01 2.984e+01 4.150e+01, threshold=5.501e+01, percent-clipped=0.0 2023-12-21 17:37:18,456 INFO [train.py:886] (0/4) Epoch 5, batch 3850, loss[loss=0.01655, audio_tagging_loss=0.01655, over 25000.00 frames. ], tot_loss[loss=0.01656, audio_tagging_loss=0.01656, over 4942207.51 frames. ], batch size: 100, lr: 2.04e-02, grad_scale: 64.0 2023-12-21 17:37:56,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=152960.0, ans=0.0 2023-12-21 17:38:07,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=153026.66666666666, ans=0.035 2023-12-21 17:38:11,302 INFO [train.py:886] (0/4) Epoch 5, batch 3900, loss[loss=0.01547, audio_tagging_loss=0.01547, over 25000.00 frames. ], tot_loss[loss=0.01649, audio_tagging_loss=0.01649, over 4944453.81 frames. ], batch size: 100, lr: 2.03e-02, grad_scale: 64.0 2023-12-21 17:38:17,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=153093.33333333334, ans=0.125 2023-12-21 17:38:23,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2023-12-21 17:38:24,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=153160.0, ans=0.0 2023-12-21 17:38:28,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=153160.0, ans=0.0 2023-12-21 17:38:36,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=153226.66666666666, ans=0.125 2023-12-21 17:38:58,379 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.170e+01 2.553e+01 2.751e+01 2.942e+01 3.918e+01, threshold=5.503e+01, percent-clipped=0.0 2023-12-21 17:38:58,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=153360.0, ans=0.125 2023-12-21 17:38:59,009 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=12.0 2023-12-21 17:39:02,235 INFO [train.py:886] (0/4) Epoch 5, batch 3950, loss[loss=0.01763, audio_tagging_loss=0.01763, over 25000.00 frames. ], tot_loss[loss=0.01651, audio_tagging_loss=0.01651, over 4950655.60 frames. ], batch size: 100, lr: 2.03e-02, grad_scale: 64.0 2023-12-21 17:39:07,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=153426.66666666666, ans=0.125 2023-12-21 17:39:08,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.21 vs. limit=10.0 2023-12-21 17:39:38,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2023-12-21 17:39:39,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.61 vs. limit=15.0 2023-12-21 17:39:40,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=153626.66666666666, ans=0.1 2023-12-21 17:39:40,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=153626.66666666666, ans=0.2 2023-12-21 17:39:55,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=153760.0, ans=0.125 2023-12-21 17:39:55,695 INFO [train.py:886] (0/4) Epoch 5, batch 4000, loss[loss=0.01663, audio_tagging_loss=0.01663, over 25000.00 frames. ], tot_loss[loss=0.01649, audio_tagging_loss=0.01649, over 4951076.54 frames. ], batch size: 100, lr: 2.03e-02, grad_scale: 128.0 2023-12-21 17:40:00,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=153760.0, ans=0.0 2023-12-21 17:40:06,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=153826.66666666666, ans=0.05 2023-12-21 17:40:19,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=153893.33333333334, ans=0.125 2023-12-21 17:40:22,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=153893.33333333334, ans=0.09899494936611666 2023-12-21 17:40:27,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=153960.0, ans=0.1 2023-12-21 17:40:34,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=153960.0, ans=0.125 2023-12-21 17:40:42,019 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.602e+01 2.728e+01 2.900e+01 3.775e+01, threshold=5.457e+01, percent-clipped=0.0 2023-12-21 17:40:46,609 INFO [train.py:886] (0/4) Epoch 5, batch 4050, loss[loss=0.02024, audio_tagging_loss=0.02024, over 24750.00 frames. ], tot_loss[loss=0.01649, audio_tagging_loss=0.01649, over 4951772.27 frames. ], batch size: 99, lr: 2.03e-02, grad_scale: 128.0 2023-12-21 17:40:48,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=154093.33333333334, ans=0.125 2023-12-21 17:40:59,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=154160.0, ans=0.1 2023-12-21 17:41:02,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=154160.0, ans=0.5 2023-12-21 17:41:38,556 INFO [train.py:886] (0/4) Epoch 5, batch 4100, loss[loss=0.01547, audio_tagging_loss=0.01547, over 24750.00 frames. ], tot_loss[loss=0.01661, audio_tagging_loss=0.01661, over 4941905.37 frames. ], batch size: 99, lr: 2.03e-02, grad_scale: 128.0 2023-12-21 17:41:39,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=154426.66666666666, ans=0.0 2023-12-21 17:41:57,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=154493.33333333334, ans=0.125 2023-12-21 17:41:59,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=154560.0, ans=0.125 2023-12-21 17:42:03,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=154560.0, ans=0.0 2023-12-21 17:42:08,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=154626.66666666666, ans=0.125 2023-12-21 17:42:26,231 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.535e+01 2.761e+01 2.990e+01 3.473e+01, threshold=5.523e+01, percent-clipped=0.0 2023-12-21 17:42:30,732 INFO [train.py:886] (0/4) Epoch 5, batch 4150, loss[loss=0.0165, audio_tagging_loss=0.0165, over 24750.00 frames. ], tot_loss[loss=0.01672, audio_tagging_loss=0.01672, over 4936315.02 frames. ], batch size: 99, lr: 2.02e-02, grad_scale: 128.0 2023-12-21 17:42:33,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=154760.0, ans=0.0 2023-12-21 17:42:38,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=154760.0, ans=0.125 2023-12-21 17:42:57,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=154893.33333333334, ans=0.125 2023-12-21 17:43:17,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=155026.66666666666, ans=0.04949747468305833 2023-12-21 17:43:21,614 INFO [train.py:886] (0/4) Epoch 5, batch 4200, loss[loss=0.01964, audio_tagging_loss=0.01964, over 24750.00 frames. ], tot_loss[loss=0.01663, audio_tagging_loss=0.01663, over 4939703.69 frames. ], batch size: 99, lr: 2.02e-02, grad_scale: 128.0 2023-12-21 17:43:37,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=155160.0, ans=0.125 2023-12-21 17:43:45,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=155226.66666666666, ans=0.09899494936611666 2023-12-21 17:43:49,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=155226.66666666666, ans=0.1 2023-12-21 17:43:54,390 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.75 vs. limit=22.5 2023-12-21 17:44:09,863 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.602e+01 2.757e+01 3.020e+01 3.961e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-21 17:44:13,607 INFO [train.py:886] (0/4) Epoch 5, batch 4250, loss[loss=0.01742, audio_tagging_loss=0.01742, over 25000.00 frames. ], tot_loss[loss=0.01665, audio_tagging_loss=0.01665, over 4946802.84 frames. ], batch size: 100, lr: 2.02e-02, grad_scale: 128.0 2023-12-21 17:44:25,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2023-12-21 17:44:28,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=155493.33333333334, ans=0.125 2023-12-21 17:44:33,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.12 vs. limit=15.0 2023-12-21 17:44:36,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=155560.0, ans=0.0 2023-12-21 17:44:54,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=155693.33333333334, ans=0.2 2023-12-21 17:45:00,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=155693.33333333334, ans=0.125 2023-12-21 17:45:03,643 INFO [train.py:886] (0/4) Epoch 5, batch 4300, loss[loss=0.01903, audio_tagging_loss=0.01903, over 25000.00 frames. ], tot_loss[loss=0.01668, audio_tagging_loss=0.01668, over 4954717.82 frames. ], batch size: 100, lr: 2.02e-02, grad_scale: 128.0 2023-12-21 17:45:03,933 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=6.748e+00 2023-12-21 17:45:08,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=155760.0, ans=0.0 2023-12-21 17:45:10,267 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.48 vs. limit=10.0 2023-12-21 17:45:12,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=155760.0, ans=0.125 2023-12-21 17:45:17,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=155826.66666666666, ans=0.0 2023-12-21 17:45:27,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=155893.33333333334, ans=0.125 2023-12-21 17:45:31,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=155893.33333333334, ans=0.2 2023-12-21 17:45:34,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.39 vs. limit=15.0 2023-12-21 17:45:37,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=155960.0, ans=0.125 2023-12-21 17:45:49,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-12-21 17:45:52,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=15.0 2023-12-21 17:45:53,520 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.195e+01 2.657e+01 2.804e+01 3.021e+01 3.869e+01, threshold=5.607e+01, percent-clipped=0.0 2023-12-21 17:45:54,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=156026.66666666666, ans=0.125 2023-12-21 17:45:56,391 INFO [train.py:886] (0/4) Epoch 5, batch 4350, loss[loss=0.01582, audio_tagging_loss=0.01582, over 24750.00 frames. ], tot_loss[loss=0.01669, audio_tagging_loss=0.01669, over 4959907.94 frames. ], batch size: 99, lr: 2.02e-02, grad_scale: 64.0 2023-12-21 17:45:57,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=156093.33333333334, ans=0.125 2023-12-21 17:45:58,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=156093.33333333334, ans=0.125 2023-12-21 17:46:05,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.21 vs. limit=22.5 2023-12-21 17:46:06,422 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2023-12-21 17:46:07,185 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.55 vs. limit=15.0 2023-12-21 17:46:24,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=156226.66666666666, ans=0.125 2023-12-21 17:46:29,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=156293.33333333334, ans=0.1 2023-12-21 17:46:41,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=156360.0, ans=0.125 2023-12-21 17:46:48,701 INFO [train.py:886] (0/4) Epoch 5, batch 4400, loss[loss=0.01556, audio_tagging_loss=0.01556, over 24137.00 frames. ], tot_loss[loss=0.01671, audio_tagging_loss=0.01671, over 4953359.74 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:46:52,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=156426.66666666666, ans=0.125 2023-12-21 17:46:55,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.69 vs. limit=22.5 2023-12-21 17:47:00,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=156493.33333333334, ans=0.125 2023-12-21 17:47:13,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2023-12-21 17:47:22,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=15.0 2023-12-21 17:47:22,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=156626.66666666666, ans=0.0 2023-12-21 17:47:35,935 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 2.647e+01 2.808e+01 3.114e+01 3.579e+01, threshold=5.615e+01, percent-clipped=0.0 2023-12-21 17:47:37,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=156693.33333333334, ans=0.125 2023-12-21 17:47:38,836 INFO [train.py:886] (0/4) Epoch 5, batch 4450, loss[loss=0.01545, audio_tagging_loss=0.01545, over 24750.00 frames. ], tot_loss[loss=0.01676, audio_tagging_loss=0.01676, over 4948128.70 frames. ], batch size: 99, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:47:40,143 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=12.0 2023-12-21 17:47:54,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=156826.66666666666, ans=0.125 2023-12-21 17:47:57,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=156826.66666666666, ans=0.0 2023-12-21 17:48:08,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.58 vs. limit=15.0 2023-12-21 17:48:14,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=156960.0, ans=0.05 2023-12-21 17:48:23,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=157026.66666666666, ans=0.0 2023-12-21 17:48:28,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=157026.66666666666, ans=0.125 2023-12-21 17:48:31,770 INFO [train.py:886] (0/4) Epoch 5, batch 4500, loss[loss=0.01903, audio_tagging_loss=0.01903, over 25000.00 frames. ], tot_loss[loss=0.01667, audio_tagging_loss=0.01667, over 4950330.79 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:48:39,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=157093.33333333334, ans=0.0 2023-12-21 17:48:48,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=157160.0, ans=0.125 2023-12-21 17:48:57,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=157226.66666666666, ans=0.125 2023-12-21 17:49:02,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=157293.33333333334, ans=0.125 2023-12-21 17:49:07,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=157293.33333333334, ans=0.125 2023-12-21 17:49:19,754 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.304e+01 2.535e+01 2.688e+01 2.916e+01 3.475e+01, threshold=5.375e+01, percent-clipped=0.0 2023-12-21 17:49:20,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=157360.0, ans=0.125 2023-12-21 17:49:23,251 INFO [train.py:886] (0/4) Epoch 5, batch 4550, loss[loss=0.01542, audio_tagging_loss=0.01542, over 25000.00 frames. ], tot_loss[loss=0.01653, audio_tagging_loss=0.01653, over 4957077.51 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:49:28,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=157426.66666666666, ans=0.0 2023-12-21 17:49:28,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.39 vs. limit=6.0 2023-12-21 17:49:58,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=157626.66666666666, ans=0.0 2023-12-21 17:50:08,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=157693.33333333334, ans=0.1 2023-12-21 17:50:10,872 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 17:50:15,388 INFO [train.py:886] (0/4) Epoch 5, batch 4600, loss[loss=0.01725, audio_tagging_loss=0.01725, over 25000.00 frames. ], tot_loss[loss=0.01646, audio_tagging_loss=0.01646, over 4960686.26 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:50:19,752 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2023-12-21 17:50:20,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=157760.0, ans=0.125 2023-12-21 17:50:29,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=157826.66666666666, ans=0.0 2023-12-21 17:50:30,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.43 vs. limit=22.5 2023-12-21 17:50:42,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=157893.33333333334, ans=0.125 2023-12-21 17:50:47,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=157960.0, ans=0.0 2023-12-21 17:50:51,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=157960.0, ans=0.125 2023-12-21 17:50:52,228 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.36 vs. limit=15.0 2023-12-21 17:51:04,726 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.622e+01 2.861e+01 3.041e+01 4.016e+01, threshold=5.722e+01, percent-clipped=0.0 2023-12-21 17:51:06,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=158026.66666666666, ans=0.0 2023-12-21 17:51:08,381 INFO [train.py:886] (0/4) Epoch 5, batch 4650, loss[loss=0.01507, audio_tagging_loss=0.01507, over 25000.00 frames. ], tot_loss[loss=0.01642, audio_tagging_loss=0.01642, over 4960884.53 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:51:08,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=158093.33333333334, ans=0.1 2023-12-21 17:51:08,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=158093.33333333334, ans=0.125 2023-12-21 17:51:09,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=158093.33333333334, ans=0.125 2023-12-21 17:51:16,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=158093.33333333334, ans=0.0 2023-12-21 17:51:17,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=158093.33333333334, ans=6.0 2023-12-21 17:51:23,579 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.175e+00 2023-12-21 17:51:29,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=158226.66666666666, ans=0.0 2023-12-21 17:51:30,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=158226.66666666666, ans=0.025 2023-12-21 17:51:35,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=17.28 vs. limit=15.0 2023-12-21 17:51:36,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=158226.66666666666, ans=0.125 2023-12-21 17:51:44,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.56 vs. limit=15.0 2023-12-21 17:51:47,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=158293.33333333334, ans=0.1 2023-12-21 17:51:48,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=158293.33333333334, ans=0.0 2023-12-21 17:51:52,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=158360.0, ans=0.0 2023-12-21 17:51:58,920 INFO [train.py:886] (0/4) Epoch 5, batch 4700, loss[loss=0.01652, audio_tagging_loss=0.01652, over 24750.00 frames. ], tot_loss[loss=0.01654, audio_tagging_loss=0.01654, over 4958151.64 frames. ], batch size: 99, lr: 2.00e-02, grad_scale: 64.0 2023-12-21 17:52:19,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=158560.0, ans=0.125 2023-12-21 17:52:39,452 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.61 vs. limit=15.0 2023-12-21 17:52:43,575 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 2.632e+01 2.850e+01 3.066e+01 3.768e+01, threshold=5.700e+01, percent-clipped=0.0 2023-12-21 17:52:46,341 INFO [train.py:886] (0/4) Epoch 5, batch 4750, loss[loss=0.01702, audio_tagging_loss=0.01702, over 24750.00 frames. ], tot_loss[loss=0.01661, audio_tagging_loss=0.01661, over 4952648.07 frames. ], batch size: 99, lr: 2.00e-02, grad_scale: 64.0 2023-12-21 17:52:47,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=158760.0, ans=0.125 2023-12-21 17:53:01,991 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-5.pt 2023-12-21 17:53:24,101 INFO [train.py:886] (0/4) Epoch 6, batch 0, loss[loss=0.03513, audio_tagging_loss=0.03513, over 25000.00 frames. ], tot_loss[loss=0.03513, audio_tagging_loss=0.03513, over 25000.00 frames. ], batch size: 100, lr: 1.87e-02, grad_scale: 64.0 2023-12-21 17:53:24,103 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 17:53:45,408 INFO [train.py:917] (0/4) Epoch 6, validation: loss=0.03649, audio_tagging_loss=0.03649, over 3737520.00 frames. 2023-12-21 17:53:45,409 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 17:53:48,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=18.12 vs. limit=15.0 2023-12-21 17:54:09,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=159000.0, ans=0.1 2023-12-21 17:54:15,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=159066.66666666666, ans=0.1 2023-12-21 17:54:21,404 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2023-12-21 17:54:27,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=159133.33333333334, ans=0.1 2023-12-21 17:54:29,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=159133.33333333334, ans=0.125 2023-12-21 17:54:36,664 INFO [train.py:886] (0/4) Epoch 6, batch 50, loss[loss=0.02295, audio_tagging_loss=0.02295, over 25000.00 frames. ], tot_loss[loss=0.02577, audio_tagging_loss=0.02577, over 1109438.56 frames. ], batch size: 100, lr: 1.87e-02, grad_scale: 64.0 2023-12-21 17:54:36,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=159200.0, ans=0.125 2023-12-21 17:54:43,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=159200.0, ans=0.2 2023-12-21 17:54:47,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=159266.66666666666, ans=0.0 2023-12-21 17:55:00,481 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 17:55:08,012 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 3.015e+01 3.367e+01 3.698e+01 8.619e+01, threshold=6.734e+01, percent-clipped=4.0 2023-12-21 17:55:19,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=159466.66666666666, ans=0.125 2023-12-21 17:55:22,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=159466.66666666666, ans=0.0 2023-12-21 17:55:28,823 INFO [train.py:886] (0/4) Epoch 6, batch 100, loss[loss=0.01774, audio_tagging_loss=0.01774, over 25000.00 frames. ], tot_loss[loss=0.02247, audio_tagging_loss=0.02247, over 1962745.97 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 64.0 2023-12-21 17:55:44,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.43 vs. limit=6.0 2023-12-21 17:56:06,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=159733.33333333334, ans=15.0 2023-12-21 17:56:09,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2023-12-21 17:56:17,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=159800.0, ans=0.125 2023-12-21 17:56:18,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=159800.0, ans=0.125 2023-12-21 17:56:19,827 INFO [train.py:886] (0/4) Epoch 6, batch 150, loss[loss=0.02044, audio_tagging_loss=0.02044, over 25000.00 frames. ], tot_loss[loss=0.02039, audio_tagging_loss=0.02039, over 2630888.89 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 64.0 2023-12-21 17:56:22,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=159866.66666666666, ans=0.1 2023-12-21 17:56:40,111 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-24000.pt 2023-12-21 17:56:43,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.33 vs. limit=15.0 2023-12-21 17:56:54,051 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.183e+01 2.731e+01 2.909e+01 3.115e+01 3.553e+01, threshold=5.819e+01, percent-clipped=0.0 2023-12-21 17:56:57,996 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 17:57:05,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.05 vs. limit=15.0 2023-12-21 17:57:12,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=160133.33333333334, ans=0.125 2023-12-21 17:57:14,124 INFO [train.py:886] (0/4) Epoch 6, batch 200, loss[loss=0.0186, audio_tagging_loss=0.0186, over 25000.00 frames. ], tot_loss[loss=0.01919, audio_tagging_loss=0.01919, over 3148726.99 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 64.0 2023-12-21 17:57:23,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=160266.66666666666, ans=0.2 2023-12-21 17:57:26,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.69 vs. limit=15.0 2023-12-21 17:57:29,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=160266.66666666666, ans=0.2 2023-12-21 17:57:45,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=160400.0, ans=0.125 2023-12-21 17:57:59,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=160466.66666666666, ans=0.125 2023-12-21 17:57:59,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=160466.66666666666, ans=0.1 2023-12-21 17:58:05,853 INFO [train.py:886] (0/4) Epoch 6, batch 250, loss[loss=0.01777, audio_tagging_loss=0.01777, over 25000.00 frames. ], tot_loss[loss=0.01833, audio_tagging_loss=0.01833, over 3550323.13 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 64.0 2023-12-21 17:58:11,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=160533.33333333334, ans=0.2 2023-12-21 17:58:14,803 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.23 vs. limit=22.5 2023-12-21 17:58:16,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=160600.0, ans=0.1 2023-12-21 17:58:33,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=160666.66666666666, ans=0.0 2023-12-21 17:58:37,628 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.574e+01 2.757e+01 2.978e+01 3.329e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-21 17:58:37,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=160733.33333333334, ans=0.2 2023-12-21 17:58:40,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=160733.33333333334, ans=0.0 2023-12-21 17:58:45,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=160733.33333333334, ans=0.2 2023-12-21 17:58:53,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=15.0 2023-12-21 17:58:54,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=160800.0, ans=0.125 2023-12-21 17:58:57,039 INFO [train.py:886] (0/4) Epoch 6, batch 300, loss[loss=0.01753, audio_tagging_loss=0.01753, over 24750.00 frames. ], tot_loss[loss=0.01801, audio_tagging_loss=0.01801, over 3861195.15 frames. ], batch size: 99, lr: 1.86e-02, grad_scale: 64.0 2023-12-21 17:59:00,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=160866.66666666666, ans=22.5 2023-12-21 17:59:09,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=160933.33333333334, ans=0.125 2023-12-21 17:59:33,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=161066.66666666666, ans=0.125 2023-12-21 17:59:36,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=161066.66666666666, ans=0.125 2023-12-21 17:59:40,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2023-12-21 17:59:43,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=161133.33333333334, ans=0.125 2023-12-21 17:59:47,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=161133.33333333334, ans=0.0 2023-12-21 17:59:49,340 INFO [train.py:886] (0/4) Epoch 6, batch 350, loss[loss=0.01561, audio_tagging_loss=0.01561, over 24750.00 frames. ], tot_loss[loss=0.01765, audio_tagging_loss=0.01765, over 4097452.23 frames. ], batch size: 99, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 17:59:59,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=161266.66666666666, ans=0.2 2023-12-21 18:00:01,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2023-12-21 18:00:06,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=161266.66666666666, ans=0.0 2023-12-21 18:00:19,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=161400.0, ans=0.125 2023-12-21 18:00:20,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=161400.0, ans=0.0 2023-12-21 18:00:21,394 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 2.561e+01 2.750e+01 3.002e+01 3.673e+01, threshold=5.501e+01, percent-clipped=0.0 2023-12-21 18:00:33,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2023-12-21 18:00:40,947 INFO [train.py:886] (0/4) Epoch 6, batch 400, loss[loss=0.01545, audio_tagging_loss=0.01545, over 24750.00 frames. ], tot_loss[loss=0.01715, audio_tagging_loss=0.01715, over 4285915.22 frames. ], batch size: 99, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:00:45,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=161533.33333333334, ans=0.035 2023-12-21 18:01:07,696 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.916e-02 2023-12-21 18:01:10,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=161666.66666666666, ans=0.125 2023-12-21 18:01:15,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.08 vs. limit=22.5 2023-12-21 18:01:32,602 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.28 vs. limit=15.0 2023-12-21 18:01:32,972 INFO [train.py:886] (0/4) Epoch 6, batch 450, loss[loss=0.0143, audio_tagging_loss=0.0143, over 21973.00 frames. ], tot_loss[loss=0.01688, audio_tagging_loss=0.01688, over 4433861.27 frames. ], batch size: 107, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:01:50,771 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.91 vs. limit=6.0 2023-12-21 18:01:55,509 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.87 vs. limit=22.5 2023-12-21 18:02:04,635 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.486e+01 2.731e+01 2.952e+01 3.646e+01, threshold=5.463e+01, percent-clipped=0.0 2023-12-21 18:02:18,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=162133.33333333334, ans=0.2 2023-12-21 18:02:25,549 INFO [train.py:886] (0/4) Epoch 6, batch 500, loss[loss=0.01602, audio_tagging_loss=0.01602, over 25000.00 frames. ], tot_loss[loss=0.01671, audio_tagging_loss=0.01671, over 4551089.00 frames. ], batch size: 100, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:02:30,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=162200.0, ans=0.125 2023-12-21 18:02:45,460 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.57 vs. limit=22.5 2023-12-21 18:02:47,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=162333.33333333334, ans=0.125 2023-12-21 18:02:48,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=15.0 2023-12-21 18:02:48,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.55 vs. limit=15.0 2023-12-21 18:02:49,318 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.60 vs. limit=15.0 2023-12-21 18:03:03,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=162400.0, ans=0.1 2023-12-21 18:03:11,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=162466.66666666666, ans=0.125 2023-12-21 18:03:17,207 INFO [train.py:886] (0/4) Epoch 6, batch 550, loss[loss=0.01623, audio_tagging_loss=0.01623, over 25000.00 frames. ], tot_loss[loss=0.01665, audio_tagging_loss=0.01665, over 4641676.04 frames. ], batch size: 100, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:03:33,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=15.0 2023-12-21 18:03:37,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=162666.66666666666, ans=0.125 2023-12-21 18:03:49,381 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.251e+01 2.522e+01 2.669e+01 2.950e+01 3.849e+01, threshold=5.338e+01, percent-clipped=0.0 2023-12-21 18:03:52,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=162733.33333333334, ans=0.125 2023-12-21 18:03:54,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.03 vs. limit=6.0 2023-12-21 18:04:08,690 INFO [train.py:886] (0/4) Epoch 6, batch 600, loss[loss=0.0199, audio_tagging_loss=0.0199, over 24750.00 frames. ], tot_loss[loss=0.01672, audio_tagging_loss=0.01672, over 4713963.41 frames. ], batch size: 99, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:04:23,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=162933.33333333334, ans=0.07 2023-12-21 18:04:33,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=163000.0, ans=0.2 2023-12-21 18:04:33,797 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.55 vs. limit=15.0 2023-12-21 18:04:45,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=163066.66666666666, ans=0.125 2023-12-21 18:05:01,098 INFO [train.py:886] (0/4) Epoch 6, batch 650, loss[loss=0.01669, audio_tagging_loss=0.01669, over 23189.00 frames. ], tot_loss[loss=0.0167, audio_tagging_loss=0.0167, over 4751810.14 frames. ], batch size: 107, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:05:06,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=163200.0, ans=0.125 2023-12-21 18:05:08,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=163200.0, ans=0.2 2023-12-21 18:05:16,507 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.94 vs. limit=22.5 2023-12-21 18:05:31,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=163400.0, ans=0.125 2023-12-21 18:05:33,143 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.590e+01 2.810e+01 2.971e+01 4.076e+01, threshold=5.620e+01, percent-clipped=0.0 2023-12-21 18:05:52,686 INFO [train.py:886] (0/4) Epoch 6, batch 700, loss[loss=0.0121, audio_tagging_loss=0.0121, over 24750.00 frames. ], tot_loss[loss=0.01662, audio_tagging_loss=0.01662, over 4793372.88 frames. ], batch size: 99, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:06:22,250 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.518e-01 2023-12-21 18:06:28,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=163733.33333333334, ans=0.125 2023-12-21 18:06:44,770 INFO [train.py:886] (0/4) Epoch 6, batch 750, loss[loss=0.01655, audio_tagging_loss=0.01655, over 21620.00 frames. ], tot_loss[loss=0.0166, audio_tagging_loss=0.0166, over 4821273.70 frames. ], batch size: 107, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:07:08,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.59 vs. limit=15.0 2023-12-21 18:07:16,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=164066.66666666666, ans=0.125 2023-12-21 18:07:17,166 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+01 2.591e+01 2.731e+01 2.911e+01 3.574e+01, threshold=5.461e+01, percent-clipped=0.0 2023-12-21 18:07:31,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.77 vs. limit=15.0 2023-12-21 18:07:35,943 INFO [train.py:886] (0/4) Epoch 6, batch 800, loss[loss=0.01542, audio_tagging_loss=0.01542, over 25000.00 frames. ], tot_loss[loss=0.01648, audio_tagging_loss=0.01648, over 4850694.42 frames. ], batch size: 100, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:07:44,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=164200.0, ans=0.2 2023-12-21 18:07:44,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=164200.0, ans=0.0 2023-12-21 18:07:58,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=164333.33333333334, ans=0.0 2023-12-21 18:07:58,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2023-12-21 18:08:05,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=164400.0, ans=0.125 2023-12-21 18:08:15,647 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-12-21 18:08:26,508 INFO [train.py:886] (0/4) Epoch 6, batch 850, loss[loss=0.01865, audio_tagging_loss=0.01865, over 25000.00 frames. ], tot_loss[loss=0.01647, audio_tagging_loss=0.01647, over 4879644.19 frames. ], batch size: 100, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:08:57,859 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.614e+01 2.752e+01 3.030e+01 4.008e+01, threshold=5.503e+01, percent-clipped=0.0 2023-12-21 18:09:08,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=164800.0, ans=0.1 2023-12-21 18:09:08,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=164800.0, ans=0.2 2023-12-21 18:09:17,283 INFO [train.py:886] (0/4) Epoch 6, batch 900, loss[loss=0.01552, audio_tagging_loss=0.01552, over 21091.00 frames. ], tot_loss[loss=0.01649, audio_tagging_loss=0.01649, over 4891609.13 frames. ], batch size: 107, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:09:38,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=165000.0, ans=0.1 2023-12-21 18:09:50,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=165066.66666666666, ans=0.1 2023-12-21 18:09:57,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=165066.66666666666, ans=0.0 2023-12-21 18:10:08,987 INFO [train.py:886] (0/4) Epoch 6, batch 950, loss[loss=0.01532, audio_tagging_loss=0.01532, over 24750.00 frames. ], tot_loss[loss=0.0166, audio_tagging_loss=0.0166, over 4903569.47 frames. ], batch size: 99, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:10:09,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.35 vs. limit=15.0 2023-12-21 18:10:10,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=165200.0, ans=0.0 2023-12-21 18:10:16,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.53 vs. limit=22.5 2023-12-21 18:10:33,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=165333.33333333334, ans=0.125 2023-12-21 18:10:34,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=165333.33333333334, ans=0.125 2023-12-21 18:10:36,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=165333.33333333334, ans=0.2 2023-12-21 18:10:39,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=165400.0, ans=0.125 2023-12-21 18:10:41,005 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.630e+01 2.802e+01 3.017e+01 4.236e+01, threshold=5.603e+01, percent-clipped=0.0 2023-12-21 18:11:00,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=165533.33333333334, ans=10.0 2023-12-21 18:11:00,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.77 vs. limit=15.0 2023-12-21 18:11:01,348 INFO [train.py:886] (0/4) Epoch 6, batch 1000, loss[loss=0.0172, audio_tagging_loss=0.0172, over 24750.00 frames. ], tot_loss[loss=0.01645, audio_tagging_loss=0.01645, over 4913897.05 frames. ], batch size: 99, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:11:13,869 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.83 vs. limit=15.0 2023-12-21 18:11:18,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=165600.0, ans=0.125 2023-12-21 18:11:18,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=14.12 vs. limit=15.0 2023-12-21 18:11:22,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=165666.66666666666, ans=0.125 2023-12-21 18:11:33,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=165733.33333333334, ans=0.02 2023-12-21 18:11:34,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2023-12-21 18:11:34,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=165733.33333333334, ans=0.2 2023-12-21 18:11:49,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=165800.0, ans=0.125 2023-12-21 18:11:52,568 INFO [train.py:886] (0/4) Epoch 6, batch 1050, loss[loss=0.01381, audio_tagging_loss=0.01381, over 23978.00 frames. ], tot_loss[loss=0.01633, audio_tagging_loss=0.01633, over 4918647.25 frames. ], batch size: 100, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:11:58,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=165866.66666666666, ans=0.2 2023-12-21 18:11:58,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=165866.66666666666, ans=0.09899494936611666 2023-12-21 18:12:00,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=165866.66666666666, ans=0.1 2023-12-21 18:12:02,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.05 vs. limit=15.0 2023-12-21 18:12:16,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.01 vs. limit=15.0 2023-12-21 18:12:25,142 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.084e+01 2.514e+01 2.651e+01 2.892e+01 3.448e+01, threshold=5.301e+01, percent-clipped=0.0 2023-12-21 18:12:38,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=166133.33333333334, ans=0.1 2023-12-21 18:12:43,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=166133.33333333334, ans=0.0 2023-12-21 18:12:45,367 INFO [train.py:886] (0/4) Epoch 6, batch 1100, loss[loss=0.01611, audio_tagging_loss=0.01611, over 24750.00 frames. ], tot_loss[loss=0.01621, audio_tagging_loss=0.01621, over 4925848.11 frames. ], batch size: 99, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:12:51,246 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.07 vs. limit=22.5 2023-12-21 18:12:54,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=166266.66666666666, ans=0.0 2023-12-21 18:13:17,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=166400.0, ans=0.125 2023-12-21 18:13:20,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=166400.0, ans=0.1 2023-12-21 18:13:26,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=166466.66666666666, ans=0.2 2023-12-21 18:13:37,813 INFO [train.py:886] (0/4) Epoch 6, batch 1150, loss[loss=0.01817, audio_tagging_loss=0.01817, over 25000.00 frames. ], tot_loss[loss=0.01628, audio_tagging_loss=0.01628, over 4937366.89 frames. ], batch size: 100, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:13:42,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=166533.33333333334, ans=0.1 2023-12-21 18:13:42,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=166533.33333333334, ans=0.1 2023-12-21 18:13:49,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.23 vs. limit=10.0 2023-12-21 18:13:50,339 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=12.0 2023-12-21 18:13:56,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2023-12-21 18:13:58,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=166666.66666666666, ans=0.05 2023-12-21 18:14:10,170 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+01 2.615e+01 2.716e+01 2.897e+01 3.579e+01, threshold=5.433e+01, percent-clipped=0.0 2023-12-21 18:14:10,579 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2023-12-21 18:14:26,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=166800.0, ans=0.1 2023-12-21 18:14:29,580 INFO [train.py:886] (0/4) Epoch 6, batch 1200, loss[loss=0.01728, audio_tagging_loss=0.01728, over 25000.00 frames. ], tot_loss[loss=0.01637, audio_tagging_loss=0.01637, over 4934609.21 frames. ], batch size: 100, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:14:29,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=166866.66666666666, ans=0.125 2023-12-21 18:14:35,336 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 18:14:44,754 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2023-12-21 18:14:51,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=167000.0, ans=0.0 2023-12-21 18:14:56,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=167000.0, ans=0.1 2023-12-21 18:14:58,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=167000.0, ans=0.0 2023-12-21 18:15:07,994 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.05 vs. limit=15.0 2023-12-21 18:15:21,718 INFO [train.py:886] (0/4) Epoch 6, batch 1250, loss[loss=0.01604, audio_tagging_loss=0.01604, over 24750.00 frames. ], tot_loss[loss=0.01648, audio_tagging_loss=0.01648, over 4934364.96 frames. ], batch size: 99, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:15:33,953 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 18:15:45,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=167333.33333333334, ans=0.125 2023-12-21 18:15:46,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=167333.33333333334, ans=0.0 2023-12-21 18:15:51,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=167400.0, ans=0.2 2023-12-21 18:15:53,903 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.561e+01 2.733e+01 2.928e+01 3.774e+01, threshold=5.467e+01, percent-clipped=0.0 2023-12-21 18:16:00,428 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.55 vs. limit=22.5 2023-12-21 18:16:08,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=167466.66666666666, ans=0.04949747468305833 2023-12-21 18:16:13,500 INFO [train.py:886] (0/4) Epoch 6, batch 1300, loss[loss=0.01805, audio_tagging_loss=0.01805, over 25000.00 frames. ], tot_loss[loss=0.01655, audio_tagging_loss=0.01655, over 4927259.50 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:16:40,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=167666.66666666666, ans=0.125 2023-12-21 18:16:51,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=167733.33333333334, ans=0.0 2023-12-21 18:16:57,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=167800.0, ans=10.0 2023-12-21 18:17:02,574 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.185e+01 2023-12-21 18:17:05,875 INFO [train.py:886] (0/4) Epoch 6, batch 1350, loss[loss=0.01692, audio_tagging_loss=0.01692, over 25000.00 frames. ], tot_loss[loss=0.01646, audio_tagging_loss=0.01646, over 4932756.39 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:17:06,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=167866.66666666666, ans=0.125 2023-12-21 18:17:11,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=167866.66666666666, ans=0.2 2023-12-21 18:17:14,838 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.74 vs. limit=22.5 2023-12-21 18:17:19,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=167933.33333333334, ans=0.0 2023-12-21 18:17:24,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=167933.33333333334, ans=0.125 2023-12-21 18:17:32,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=168000.0, ans=0.125 2023-12-21 18:17:36,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=168066.66666666666, ans=0.0 2023-12-21 18:17:37,724 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+01 2.576e+01 2.766e+01 2.896e+01 3.623e+01, threshold=5.532e+01, percent-clipped=0.0 2023-12-21 18:17:43,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=168066.66666666666, ans=0.0 2023-12-21 18:17:57,280 INFO [train.py:886] (0/4) Epoch 6, batch 1400, loss[loss=0.0138, audio_tagging_loss=0.0138, over 25000.00 frames. ], tot_loss[loss=0.01634, audio_tagging_loss=0.01634, over 4942052.48 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:18:08,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=168266.66666666666, ans=0.0 2023-12-21 18:18:17,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2023-12-21 18:18:19,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=168333.33333333334, ans=0.0 2023-12-21 18:18:27,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=168400.0, ans=0.0 2023-12-21 18:18:39,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=168466.66666666666, ans=0.0 2023-12-21 18:18:48,997 INFO [train.py:886] (0/4) Epoch 6, batch 1450, loss[loss=0.0153, audio_tagging_loss=0.0153, over 25000.00 frames. ], tot_loss[loss=0.01629, audio_tagging_loss=0.01629, over 4937708.87 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:18:51,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=168533.33333333334, ans=0.125 2023-12-21 18:18:51,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=168533.33333333334, ans=0.04949747468305833 2023-12-21 18:19:02,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=168600.0, ans=0.025 2023-12-21 18:19:03,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.90 vs. limit=22.5 2023-12-21 18:19:12,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=168666.66666666666, ans=0.125 2023-12-21 18:19:21,217 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 2.551e+01 2.724e+01 2.909e+01 3.642e+01, threshold=5.448e+01, percent-clipped=0.0 2023-12-21 18:19:25,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=168733.33333333334, ans=0.125 2023-12-21 18:19:30,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=168800.0, ans=0.1 2023-12-21 18:19:34,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=168800.0, ans=0.1 2023-12-21 18:19:40,610 INFO [train.py:886] (0/4) Epoch 6, batch 1500, loss[loss=0.01552, audio_tagging_loss=0.01552, over 25000.00 frames. ], tot_loss[loss=0.01636, audio_tagging_loss=0.01636, over 4942993.91 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:19:41,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=168866.66666666666, ans=0.0 2023-12-21 18:19:48,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=168866.66666666666, ans=0.0 2023-12-21 18:20:16,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=169066.66666666666, ans=0.1 2023-12-21 18:20:24,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=169133.33333333334, ans=0.1 2023-12-21 18:20:33,295 INFO [train.py:886] (0/4) Epoch 6, batch 1550, loss[loss=0.0175, audio_tagging_loss=0.0175, over 24750.00 frames. ], tot_loss[loss=0.01658, audio_tagging_loss=0.01658, over 4942745.59 frames. ], batch size: 99, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:20:50,577 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.75 vs. limit=15.0 2023-12-21 18:21:02,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=169400.0, ans=0.2 2023-12-21 18:21:04,687 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.623e+01 2.761e+01 2.988e+01 3.455e+01, threshold=5.522e+01, percent-clipped=0.0 2023-12-21 18:21:18,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=169466.66666666666, ans=0.1 2023-12-21 18:21:23,949 INFO [train.py:886] (0/4) Epoch 6, batch 1600, loss[loss=0.0161, audio_tagging_loss=0.0161, over 24750.00 frames. ], tot_loss[loss=0.01668, audio_tagging_loss=0.01668, over 4936560.35 frames. ], batch size: 99, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:21:25,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=169533.33333333334, ans=0.125 2023-12-21 18:21:28,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=169533.33333333334, ans=0.0 2023-12-21 18:21:29,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=169533.33333333334, ans=0.0 2023-12-21 18:21:33,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=169600.0, ans=0.07 2023-12-21 18:21:36,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=169600.0, ans=0.0 2023-12-21 18:21:48,214 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.026e-01 2023-12-21 18:21:58,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.99 vs. limit=22.5 2023-12-21 18:22:07,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=169800.0, ans=0.125 2023-12-21 18:22:08,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=169800.0, ans=0.2 2023-12-21 18:22:14,568 INFO [train.py:886] (0/4) Epoch 6, batch 1650, loss[loss=0.01889, audio_tagging_loss=0.01889, over 22311.00 frames. ], tot_loss[loss=0.01656, audio_tagging_loss=0.01656, over 4938280.79 frames. ], batch size: 107, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:22:47,809 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.251e+01 2.597e+01 2.773e+01 2.993e+01 4.191e+01, threshold=5.546e+01, percent-clipped=0.0 2023-12-21 18:22:56,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=170133.33333333334, ans=0.125 2023-12-21 18:23:06,295 INFO [train.py:886] (0/4) Epoch 6, batch 1700, loss[loss=0.01558, audio_tagging_loss=0.01558, over 25000.00 frames. ], tot_loss[loss=0.01644, audio_tagging_loss=0.01644, over 4938933.64 frames. ], batch size: 100, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:23:20,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=170266.66666666666, ans=0.125 2023-12-21 18:23:32,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.08 vs. limit=22.5 2023-12-21 18:23:34,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=170333.33333333334, ans=0.2 2023-12-21 18:23:46,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=170400.0, ans=0.1 2023-12-21 18:23:49,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.15 vs. limit=15.0 2023-12-21 18:23:50,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.92 vs. limit=15.0 2023-12-21 18:23:53,791 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.21 vs. limit=15.0 2023-12-21 18:23:57,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=170533.33333333334, ans=0.04949747468305833 2023-12-21 18:23:58,253 INFO [train.py:886] (0/4) Epoch 6, batch 1750, loss[loss=0.01575, audio_tagging_loss=0.01575, over 24750.00 frames. ], tot_loss[loss=0.01639, audio_tagging_loss=0.01639, over 4943100.20 frames. ], batch size: 99, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:24:01,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=170533.33333333334, ans=0.0 2023-12-21 18:24:07,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=170533.33333333334, ans=0.125 2023-12-21 18:24:15,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=170600.0, ans=0.125 2023-12-21 18:24:31,727 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.494e+01 2.677e+01 2.882e+01 3.741e+01, threshold=5.353e+01, percent-clipped=0.0 2023-12-21 18:24:49,454 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.04 vs. limit=15.0 2023-12-21 18:24:51,582 INFO [train.py:886] (0/4) Epoch 6, batch 1800, loss[loss=0.01595, audio_tagging_loss=0.01595, over 21257.00 frames. ], tot_loss[loss=0.01641, audio_tagging_loss=0.01641, over 4942044.55 frames. ], batch size: 107, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:25:42,680 INFO [train.py:886] (0/4) Epoch 6, batch 1850, loss[loss=0.01378, audio_tagging_loss=0.01378, over 24750.00 frames. ], tot_loss[loss=0.01646, audio_tagging_loss=0.01646, over 4947631.47 frames. ], batch size: 99, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:25:52,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=171266.66666666666, ans=0.2 2023-12-21 18:25:56,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=171266.66666666666, ans=0.125 2023-12-21 18:26:00,222 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.24 vs. limit=6.0 2023-12-21 18:26:03,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.53 vs. limit=15.0 2023-12-21 18:26:15,599 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.610e+01 2.790e+01 3.016e+01 3.716e+01, threshold=5.580e+01, percent-clipped=0.0 2023-12-21 18:26:16,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=171400.0, ans=0.1 2023-12-21 18:26:34,798 INFO [train.py:886] (0/4) Epoch 6, batch 1900, loss[loss=0.01548, audio_tagging_loss=0.01548, over 24750.00 frames. ], tot_loss[loss=0.01656, audio_tagging_loss=0.01656, over 4946504.96 frames. ], batch size: 99, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:26:35,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=171533.33333333334, ans=0.0 2023-12-21 18:26:38,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=171533.33333333334, ans=0.125 2023-12-21 18:26:42,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.06 vs. limit=15.0 2023-12-21 18:26:46,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=15.0 2023-12-21 18:27:27,090 INFO [train.py:886] (0/4) Epoch 6, batch 1950, loss[loss=0.01661, audio_tagging_loss=0.01661, over 25000.00 frames. ], tot_loss[loss=0.01658, audio_tagging_loss=0.01658, over 4941726.24 frames. ], batch size: 100, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:27:38,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=171933.33333333334, ans=0.125 2023-12-21 18:27:40,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=171933.33333333334, ans=0.0 2023-12-21 18:27:47,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=172000.0, ans=0.2 2023-12-21 18:27:50,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=172000.0, ans=0.125 2023-12-21 18:28:00,985 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.565e+01 2.716e+01 2.900e+01 3.603e+01, threshold=5.432e+01, percent-clipped=0.0 2023-12-21 18:28:05,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=172066.66666666666, ans=0.125 2023-12-21 18:28:13,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=172133.33333333334, ans=0.125 2023-12-21 18:28:18,826 INFO [train.py:886] (0/4) Epoch 6, batch 2000, loss[loss=0.01504, audio_tagging_loss=0.01504, over 24750.00 frames. ], tot_loss[loss=0.01648, audio_tagging_loss=0.01648, over 4942015.94 frames. ], batch size: 99, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:28:19,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=172200.0, ans=0.0 2023-12-21 18:28:24,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-12-21 18:28:39,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=12.0 2023-12-21 18:28:43,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=172333.33333333334, ans=0.1 2023-12-21 18:29:10,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=172533.33333333334, ans=0.1 2023-12-21 18:29:10,791 INFO [train.py:886] (0/4) Epoch 6, batch 2050, loss[loss=0.01865, audio_tagging_loss=0.01865, over 25000.00 frames. ], tot_loss[loss=0.01634, audio_tagging_loss=0.01634, over 4942587.10 frames. ], batch size: 100, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:29:16,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=172533.33333333334, ans=0.0 2023-12-21 18:29:43,370 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.562e+01 2.748e+01 2.968e+01 3.569e+01, threshold=5.496e+01, percent-clipped=0.0 2023-12-21 18:29:44,843 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.63 vs. limit=22.5 2023-12-21 18:29:52,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=172800.0, ans=0.0 2023-12-21 18:30:01,209 INFO [train.py:886] (0/4) Epoch 6, batch 2100, loss[loss=0.01778, audio_tagging_loss=0.01778, over 25000.00 frames. ], tot_loss[loss=0.01637, audio_tagging_loss=0.01637, over 4941718.32 frames. ], batch size: 100, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:30:02,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172866.66666666666, ans=0.1 2023-12-21 18:30:24,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.59 vs. limit=10.0 2023-12-21 18:30:31,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=173066.66666666666, ans=0.1 2023-12-21 18:30:32,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=173066.66666666666, ans=0.1 2023-12-21 18:30:36,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=173066.66666666666, ans=0.0 2023-12-21 18:30:44,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=173133.33333333334, ans=0.1 2023-12-21 18:30:44,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.69 vs. limit=15.0 2023-12-21 18:30:49,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=173133.33333333334, ans=0.0 2023-12-21 18:30:53,342 INFO [train.py:886] (0/4) Epoch 6, batch 2150, loss[loss=0.01775, audio_tagging_loss=0.01775, over 24750.00 frames. ], tot_loss[loss=0.01633, audio_tagging_loss=0.01633, over 4940079.45 frames. ], batch size: 99, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:30:58,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=173200.0, ans=0.025 2023-12-21 18:31:02,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=15.0 2023-12-21 18:31:12,529 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=15.0 2023-12-21 18:31:25,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=173400.0, ans=0.0 2023-12-21 18:31:26,792 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.590e+01 2.794e+01 3.040e+01 3.581e+01, threshold=5.589e+01, percent-clipped=0.0 2023-12-21 18:31:30,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=173400.0, ans=0.02 2023-12-21 18:31:38,490 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2023-12-21 18:31:41,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=173466.66666666666, ans=0.125 2023-12-21 18:31:46,022 INFO [train.py:886] (0/4) Epoch 6, batch 2200, loss[loss=0.01905, audio_tagging_loss=0.01905, over 24750.00 frames. ], tot_loss[loss=0.01641, audio_tagging_loss=0.01641, over 4938032.66 frames. ], batch size: 99, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:31:50,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=173533.33333333334, ans=0.125 2023-12-21 18:31:54,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=173533.33333333334, ans=0.0 2023-12-21 18:31:59,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=173600.0, ans=0.1 2023-12-21 18:32:05,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=173666.66666666666, ans=0.09899494936611666 2023-12-21 18:32:06,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=173666.66666666666, ans=0.1 2023-12-21 18:32:15,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=173666.66666666666, ans=0.125 2023-12-21 18:32:18,520 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.294e+00 2023-12-21 18:32:37,600 INFO [train.py:886] (0/4) Epoch 6, batch 2250, loss[loss=0.01574, audio_tagging_loss=0.01574, over 25000.00 frames. ], tot_loss[loss=0.01645, audio_tagging_loss=0.01645, over 4938458.70 frames. ], batch size: 100, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:32:37,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=173866.66666666666, ans=0.2 2023-12-21 18:32:39,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=173866.66666666666, ans=0.035 2023-12-21 18:32:43,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=173866.66666666666, ans=0.025 2023-12-21 18:32:43,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=173866.66666666666, ans=0.125 2023-12-21 18:32:43,871 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=15.0 2023-12-21 18:32:50,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=173933.33333333334, ans=0.125 2023-12-21 18:32:52,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=173933.33333333334, ans=0.05 2023-12-21 18:33:02,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=174000.0, ans=0.0 2023-12-21 18:33:10,616 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+01 2.579e+01 2.731e+01 2.928e+01 3.593e+01, threshold=5.463e+01, percent-clipped=0.0 2023-12-21 18:33:15,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=174066.66666666666, ans=0.125 2023-12-21 18:33:30,079 INFO [train.py:886] (0/4) Epoch 6, batch 2300, loss[loss=0.01382, audio_tagging_loss=0.01382, over 24750.00 frames. ], tot_loss[loss=0.01629, audio_tagging_loss=0.01629, over 4940637.75 frames. ], batch size: 99, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:33:45,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=174266.66666666666, ans=0.125 2023-12-21 18:33:54,619 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=28.60 vs. limit=22.5 2023-12-21 18:34:11,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=16.35 vs. limit=15.0 2023-12-21 18:34:15,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=174466.66666666666, ans=0.125 2023-12-21 18:34:17,041 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-12-21 18:34:21,987 INFO [train.py:886] (0/4) Epoch 6, batch 2350, loss[loss=0.01431, audio_tagging_loss=0.01431, over 25000.00 frames. ], tot_loss[loss=0.01629, audio_tagging_loss=0.01629, over 4940292.98 frames. ], batch size: 100, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:34:25,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=174533.33333333334, ans=0.0 2023-12-21 18:34:32,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=174600.0, ans=0.0 2023-12-21 18:34:42,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=174666.66666666666, ans=0.0 2023-12-21 18:34:50,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2023-12-21 18:34:54,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-12-21 18:34:55,210 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.528e+01 2.689e+01 2.848e+01 3.552e+01, threshold=5.378e+01, percent-clipped=0.0 2023-12-21 18:35:12,589 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=15.0 2023-12-21 18:35:13,770 INFO [train.py:886] (0/4) Epoch 6, batch 2400, loss[loss=0.01611, audio_tagging_loss=0.01611, over 25000.00 frames. ], tot_loss[loss=0.01623, audio_tagging_loss=0.01623, over 4947534.17 frames. ], batch size: 100, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:35:14,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=174866.66666666666, ans=0.0 2023-12-21 18:35:22,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.36 vs. limit=15.0 2023-12-21 18:35:28,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=174933.33333333334, ans=0.1 2023-12-21 18:35:44,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.81 vs. limit=22.5 2023-12-21 18:35:48,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=175066.66666666666, ans=0.0 2023-12-21 18:36:05,896 INFO [train.py:886] (0/4) Epoch 6, batch 2450, loss[loss=0.01476, audio_tagging_loss=0.01476, over 25000.00 frames. ], tot_loss[loss=0.01628, audio_tagging_loss=0.01628, over 4947795.36 frames. ], batch size: 100, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:36:07,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=175200.0, ans=0.125 2023-12-21 18:36:21,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=175266.66666666666, ans=0.125 2023-12-21 18:36:38,853 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.641e+01 2.797e+01 2.976e+01 3.945e+01, threshold=5.593e+01, percent-clipped=0.0 2023-12-21 18:36:42,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=175400.0, ans=0.2 2023-12-21 18:36:54,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=175466.66666666666, ans=0.125 2023-12-21 18:36:57,348 INFO [train.py:886] (0/4) Epoch 6, batch 2500, loss[loss=0.01743, audio_tagging_loss=0.01743, over 24750.00 frames. ], tot_loss[loss=0.01638, audio_tagging_loss=0.01638, over 4947442.65 frames. ], batch size: 99, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:37:05,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2023-12-21 18:37:06,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=175600.0, ans=0.2 2023-12-21 18:37:10,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=175600.0, ans=0.1 2023-12-21 18:37:15,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=175600.0, ans=0.125 2023-12-21 18:37:16,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=175600.0, ans=0.0 2023-12-21 18:37:20,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=175666.66666666666, ans=0.0 2023-12-21 18:37:49,682 INFO [train.py:886] (0/4) Epoch 6, batch 2550, loss[loss=0.01662, audio_tagging_loss=0.01662, over 24750.00 frames. ], tot_loss[loss=0.01641, audio_tagging_loss=0.01641, over 4944222.81 frames. ], batch size: 99, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:37:50,194 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-12-21 18:37:55,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=175866.66666666666, ans=0.125 2023-12-21 18:38:04,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=175933.33333333334, ans=0.125 2023-12-21 18:38:11,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=176000.0, ans=0.125 2023-12-21 18:38:22,985 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.592e+01 2.752e+01 3.040e+01 4.422e+01, threshold=5.504e+01, percent-clipped=0.0 2023-12-21 18:38:26,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=176066.66666666666, ans=0.2 2023-12-21 18:38:35,004 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 18:38:42,265 INFO [train.py:886] (0/4) Epoch 6, batch 2600, loss[loss=0.01691, audio_tagging_loss=0.01691, over 25000.00 frames. ], tot_loss[loss=0.01633, audio_tagging_loss=0.01633, over 4949130.32 frames. ], batch size: 100, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:38:44,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2023-12-21 18:38:44,858 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.34 vs. limit=15.0 2023-12-21 18:38:53,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=176266.66666666666, ans=0.1 2023-12-21 18:39:02,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=176333.33333333334, ans=0.1 2023-12-21 18:39:10,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=176333.33333333334, ans=0.1 2023-12-21 18:39:12,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=176400.0, ans=0.125 2023-12-21 18:39:20,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.99 vs. limit=15.0 2023-12-21 18:39:25,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=176466.66666666666, ans=0.125 2023-12-21 18:39:33,957 INFO [train.py:886] (0/4) Epoch 6, batch 2650, loss[loss=0.01651, audio_tagging_loss=0.01651, over 25000.00 frames. ], tot_loss[loss=0.01625, audio_tagging_loss=0.01625, over 4950800.38 frames. ], batch size: 100, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:39:56,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=176666.66666666666, ans=0.125 2023-12-21 18:40:06,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=176733.33333333334, ans=0.125 2023-12-21 18:40:07,084 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.240e+01 2.558e+01 2.691e+01 2.831e+01 3.904e+01, threshold=5.381e+01, percent-clipped=0.0 2023-12-21 18:40:24,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2023-12-21 18:40:26,264 INFO [train.py:886] (0/4) Epoch 6, batch 2700, loss[loss=0.01568, audio_tagging_loss=0.01568, over 25000.00 frames. ], tot_loss[loss=0.0162, audio_tagging_loss=0.0162, over 4948709.20 frames. ], batch size: 100, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:40:33,990 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-21 18:40:36,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=176933.33333333334, ans=0.125 2023-12-21 18:40:50,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=177000.0, ans=0.125 2023-12-21 18:40:59,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.97 vs. limit=22.5 2023-12-21 18:41:16,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=177200.0, ans=0.125 2023-12-21 18:41:16,689 INFO [train.py:886] (0/4) Epoch 6, batch 2750, loss[loss=0.01492, audio_tagging_loss=0.01492, over 25000.00 frames. ], tot_loss[loss=0.01616, audio_tagging_loss=0.01616, over 4949731.11 frames. ], batch size: 100, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:41:45,954 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=12.0 2023-12-21 18:41:49,368 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.263e+01 2.557e+01 2.736e+01 2.928e+01 3.710e+01, threshold=5.471e+01, percent-clipped=0.0 2023-12-21 18:41:50,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=12.0 2023-12-21 18:41:50,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.11 vs. limit=15.0 2023-12-21 18:42:02,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=177466.66666666666, ans=0.0 2023-12-21 18:42:06,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=177533.33333333334, ans=0.0 2023-12-21 18:42:07,723 INFO [train.py:886] (0/4) Epoch 6, batch 2800, loss[loss=0.01963, audio_tagging_loss=0.01963, over 24750.00 frames. ], tot_loss[loss=0.01629, audio_tagging_loss=0.01629, over 4954446.56 frames. ], batch size: 99, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:42:14,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.85 vs. limit=15.0 2023-12-21 18:42:26,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.63 vs. limit=15.0 2023-12-21 18:42:33,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=177666.66666666666, ans=0.2 2023-12-21 18:42:34,403 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.925e-02 2023-12-21 18:42:44,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=177733.33333333334, ans=0.1 2023-12-21 18:42:45,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=177733.33333333334, ans=0.125 2023-12-21 18:42:55,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=177800.0, ans=0.125 2023-12-21 18:42:59,859 INFO [train.py:886] (0/4) Epoch 6, batch 2850, loss[loss=0.0155, audio_tagging_loss=0.0155, over 24750.00 frames. ], tot_loss[loss=0.01642, audio_tagging_loss=0.01642, over 4954895.19 frames. ], batch size: 99, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:43:00,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=177866.66666666666, ans=0.0 2023-12-21 18:43:13,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=177933.33333333334, ans=0.125 2023-12-21 18:43:15,235 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2023-12-21 18:43:33,496 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+01 2.570e+01 2.729e+01 2.961e+01 3.657e+01, threshold=5.459e+01, percent-clipped=0.0 2023-12-21 18:43:34,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=178066.66666666666, ans=10.0 2023-12-21 18:43:37,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=178066.66666666666, ans=0.125 2023-12-21 18:43:42,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2023-12-21 18:43:45,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.64 vs. limit=10.0 2023-12-21 18:43:46,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=178133.33333333334, ans=0.0 2023-12-21 18:43:51,164 INFO [train.py:886] (0/4) Epoch 6, batch 2900, loss[loss=0.01403, audio_tagging_loss=0.01403, over 24750.00 frames. ], tot_loss[loss=0.0163, audio_tagging_loss=0.0163, over 4956694.30 frames. ], batch size: 99, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:44:04,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=178266.66666666666, ans=0.0 2023-12-21 18:44:26,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=178400.0, ans=0.0 2023-12-21 18:44:26,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.20 vs. limit=15.0 2023-12-21 18:44:39,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=178466.66666666666, ans=0.0 2023-12-21 18:44:43,480 INFO [train.py:886] (0/4) Epoch 6, batch 2950, loss[loss=0.01593, audio_tagging_loss=0.01593, over 25000.00 frames. ], tot_loss[loss=0.01617, audio_tagging_loss=0.01617, over 4953665.63 frames. ], batch size: 100, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:44:46,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=15.0 2023-12-21 18:44:49,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=178533.33333333334, ans=0.125 2023-12-21 18:44:57,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=178600.0, ans=0.0 2023-12-21 18:45:06,245 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.02 vs. limit=6.0 2023-12-21 18:45:06,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=178666.66666666666, ans=0.05 2023-12-21 18:45:12,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2023-12-21 18:45:16,979 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.209e+01 2.523e+01 2.674e+01 2.981e+01 3.708e+01, threshold=5.347e+01, percent-clipped=0.0 2023-12-21 18:45:34,822 INFO [train.py:886] (0/4) Epoch 6, batch 3000, loss[loss=0.0164, audio_tagging_loss=0.0164, over 25000.00 frames. ], tot_loss[loss=0.01605, audio_tagging_loss=0.01605, over 4947378.17 frames. ], batch size: 100, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:45:34,824 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 18:45:43,889 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0001, 5.7678, 5.7126, 5.9383], device='cuda:0') 2023-12-21 18:45:44,029 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8159, 2.5069, 2.6352, 2.2424, 2.3166, 1.6863, 1.2976, 2.4145], device='cuda:0') 2023-12-21 18:45:56,020 INFO [train.py:917] (0/4) Epoch 6, validation: loss=0.03776, audio_tagging_loss=0.03776, over 3737520.00 frames. 2023-12-21 18:45:56,021 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 18:45:58,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=178866.66666666666, ans=0.125 2023-12-21 18:46:03,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=178866.66666666666, ans=0.125 2023-12-21 18:46:18,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.78 vs. limit=22.5 2023-12-21 18:46:20,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179000.0, ans=0.1 2023-12-21 18:46:48,107 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.71 vs. limit=15.0 2023-12-21 18:46:48,378 INFO [train.py:886] (0/4) Epoch 6, batch 3050, loss[loss=0.01556, audio_tagging_loss=0.01556, over 25000.00 frames. ], tot_loss[loss=0.0162, audio_tagging_loss=0.0162, over 4946084.90 frames. ], batch size: 100, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:46:51,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=15.0 2023-12-21 18:47:06,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=179266.66666666666, ans=0.0 2023-12-21 18:47:14,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=179333.33333333334, ans=0.125 2023-12-21 18:47:21,408 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.290e+01 2.570e+01 2.697e+01 2.943e+01 3.684e+01, threshold=5.394e+01, percent-clipped=0.0 2023-12-21 18:47:23,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=179400.0, ans=0.0 2023-12-21 18:47:25,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-12-21 18:47:31,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=179466.66666666666, ans=0.125 2023-12-21 18:47:40,086 INFO [train.py:886] (0/4) Epoch 6, batch 3100, loss[loss=0.01518, audio_tagging_loss=0.01518, over 24750.00 frames. ], tot_loss[loss=0.01627, audio_tagging_loss=0.01627, over 4946491.10 frames. ], batch size: 99, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:48:19,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=179733.33333333334, ans=6.0 2023-12-21 18:48:31,645 INFO [train.py:886] (0/4) Epoch 6, batch 3150, loss[loss=0.01249, audio_tagging_loss=0.01249, over 23997.00 frames. ], tot_loss[loss=0.01632, audio_tagging_loss=0.01632, over 4944816.80 frames. ], batch size: 100, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:48:33,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.84 vs. limit=15.0 2023-12-21 18:48:37,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=179866.66666666666, ans=0.2 2023-12-21 18:48:43,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=179933.33333333334, ans=0.125 2023-12-21 18:48:52,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=180000.0, ans=0.0 2023-12-21 18:48:55,274 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.64 vs. limit=15.0 2023-12-21 18:48:57,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=180000.0, ans=0.1 2023-12-21 18:49:04,095 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.326e+01 2.612e+01 2.785e+01 2.963e+01 3.956e+01, threshold=5.570e+01, percent-clipped=0.0 2023-12-21 18:49:05,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=180066.66666666666, ans=0.0 2023-12-21 18:49:08,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=180066.66666666666, ans=0.125 2023-12-21 18:49:10,786 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.082e-02 2023-12-21 18:49:11,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=180133.33333333334, ans=0.125 2023-12-21 18:49:13,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=180133.33333333334, ans=0.125 2023-12-21 18:49:21,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=180133.33333333334, ans=0.1 2023-12-21 18:49:23,228 INFO [train.py:886] (0/4) Epoch 6, batch 3200, loss[loss=0.01444, audio_tagging_loss=0.01444, over 25000.00 frames. ], tot_loss[loss=0.0162, audio_tagging_loss=0.0162, over 4943091.73 frames. ], batch size: 100, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:49:27,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=180200.0, ans=0.125 2023-12-21 18:49:46,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=180333.33333333334, ans=0.1 2023-12-21 18:49:55,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=180400.0, ans=0.125 2023-12-21 18:49:55,123 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.16 vs. limit=15.0 2023-12-21 18:50:02,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=180400.0, ans=0.125 2023-12-21 18:50:12,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=180466.66666666666, ans=0.0 2023-12-21 18:50:14,310 INFO [train.py:886] (0/4) Epoch 6, batch 3250, loss[loss=0.01585, audio_tagging_loss=0.01585, over 25000.00 frames. ], tot_loss[loss=0.0162, audio_tagging_loss=0.0162, over 4935935.66 frames. ], batch size: 100, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:50:24,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=180533.33333333334, ans=0.2 2023-12-21 18:50:41,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=180666.66666666666, ans=0.125 2023-12-21 18:50:47,527 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.514e+01 2.742e+01 2.966e+01 4.089e+01, threshold=5.485e+01, percent-clipped=0.0 2023-12-21 18:50:58,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2023-12-21 18:50:59,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=180800.0, ans=0.1 2023-12-21 18:51:02,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=180800.0, ans=0.125 2023-12-21 18:51:05,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=180866.66666666666, ans=0.0 2023-12-21 18:51:06,738 INFO [train.py:886] (0/4) Epoch 6, batch 3300, loss[loss=0.01482, audio_tagging_loss=0.01482, over 25000.00 frames. ], tot_loss[loss=0.01609, audio_tagging_loss=0.01609, over 4940173.32 frames. ], batch size: 100, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:51:32,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181000.0, ans=0.1 2023-12-21 18:51:39,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=181066.66666666666, ans=0.125 2023-12-21 18:51:45,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=181066.66666666666, ans=0.125 2023-12-21 18:51:59,253 INFO [train.py:886] (0/4) Epoch 6, batch 3350, loss[loss=0.01633, audio_tagging_loss=0.01633, over 25000.00 frames. ], tot_loss[loss=0.01606, audio_tagging_loss=0.01606, over 4950212.78 frames. ], batch size: 100, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:52:01,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181200.0, ans=0.1 2023-12-21 18:52:22,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=181333.33333333334, ans=0.125 2023-12-21 18:52:32,374 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.289e+01 2.583e+01 2.776e+01 2.913e+01 4.067e+01, threshold=5.551e+01, percent-clipped=0.0 2023-12-21 18:52:37,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=181400.0, ans=0.125 2023-12-21 18:52:43,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=181466.66666666666, ans=0.0 2023-12-21 18:52:50,296 INFO [train.py:886] (0/4) Epoch 6, batch 3400, loss[loss=0.01658, audio_tagging_loss=0.01658, over 25000.00 frames. ], tot_loss[loss=0.0161, audio_tagging_loss=0.0161, over 4951266.09 frames. ], batch size: 100, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:53:12,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-12-21 18:53:17,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=181666.66666666666, ans=0.125 2023-12-21 18:53:39,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=181800.0, ans=0.2 2023-12-21 18:53:42,536 INFO [train.py:886] (0/4) Epoch 6, batch 3450, loss[loss=0.01791, audio_tagging_loss=0.01791, over 24750.00 frames. ], tot_loss[loss=0.01626, audio_tagging_loss=0.01626, over 4952801.84 frames. ], batch size: 99, lr: 1.75e-02, grad_scale: 64.0 2023-12-21 18:53:44,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=181866.66666666666, ans=0.2 2023-12-21 18:53:46,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=181866.66666666666, ans=0.125 2023-12-21 18:53:47,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=181866.66666666666, ans=0.0 2023-12-21 18:53:58,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181933.33333333334, ans=0.1 2023-12-21 18:53:58,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=181933.33333333334, ans=0.125 2023-12-21 18:53:58,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=181933.33333333334, ans=0.1 2023-12-21 18:54:05,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=182000.0, ans=0.125 2023-12-21 18:54:12,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=182066.66666666666, ans=0.125 2023-12-21 18:54:13,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=182066.66666666666, ans=0.2 2023-12-21 18:54:15,474 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.142e+01 2.567e+01 2.759e+01 2.912e+01 3.537e+01, threshold=5.518e+01, percent-clipped=0.0 2023-12-21 18:54:15,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=12.0 2023-12-21 18:54:26,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=182133.33333333334, ans=0.07 2023-12-21 18:54:32,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=182133.33333333334, ans=0.125 2023-12-21 18:54:34,844 INFO [train.py:886] (0/4) Epoch 6, batch 3500, loss[loss=0.01466, audio_tagging_loss=0.01466, over 25000.00 frames. ], tot_loss[loss=0.0163, audio_tagging_loss=0.0163, over 4953934.99 frames. ], batch size: 100, lr: 1.75e-02, grad_scale: 64.0 2023-12-21 18:54:42,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=182200.0, ans=0.04949747468305833 2023-12-21 18:55:00,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=182333.33333333334, ans=0.0 2023-12-21 18:55:13,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.04 vs. limit=10.0 2023-12-21 18:55:20,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=182466.66666666666, ans=0.125 2023-12-21 18:55:26,232 INFO [train.py:886] (0/4) Epoch 6, batch 3550, loss[loss=0.01662, audio_tagging_loss=0.01662, over 25000.00 frames. ], tot_loss[loss=0.01627, audio_tagging_loss=0.01627, over 4945772.20 frames. ], batch size: 100, lr: 1.75e-02, grad_scale: 64.0 2023-12-21 18:55:41,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=182600.0, ans=0.0 2023-12-21 18:55:47,722 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2023-12-21 18:55:50,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=182666.66666666666, ans=0.125 2023-12-21 18:55:59,188 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 2.568e+01 2.734e+01 3.047e+01 3.818e+01, threshold=5.468e+01, percent-clipped=0.0 2023-12-21 18:56:12,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=182800.0, ans=0.0 2023-12-21 18:56:18,364 INFO [train.py:886] (0/4) Epoch 6, batch 3600, loss[loss=0.01533, audio_tagging_loss=0.01533, over 25000.00 frames. ], tot_loss[loss=0.01623, audio_tagging_loss=0.01623, over 4945956.87 frames. ], batch size: 100, lr: 1.75e-02, grad_scale: 128.0 2023-12-21 18:56:18,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=182866.66666666666, ans=0.125 2023-12-21 18:56:21,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=182866.66666666666, ans=0.0 2023-12-21 18:56:22,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=182866.66666666666, ans=0.125 2023-12-21 18:56:32,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=182933.33333333334, ans=0.0 2023-12-21 18:56:40,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=183000.0, ans=0.0 2023-12-21 18:56:47,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=183000.0, ans=0.125 2023-12-21 18:56:50,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=183066.66666666666, ans=0.0 2023-12-21 18:56:57,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=183066.66666666666, ans=0.2 2023-12-21 18:56:59,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=183133.33333333334, ans=0.0 2023-12-21 18:57:09,933 INFO [train.py:886] (0/4) Epoch 6, batch 3650, loss[loss=0.01814, audio_tagging_loss=0.01814, over 25000.00 frames. ], tot_loss[loss=0.01616, audio_tagging_loss=0.01616, over 4948677.71 frames. ], batch size: 100, lr: 1.75e-02, grad_scale: 128.0 2023-12-21 18:57:12,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=183200.0, ans=0.125 2023-12-21 18:57:24,055 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.47 vs. limit=22.5 2023-12-21 18:57:25,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.74 vs. limit=15.0 2023-12-21 18:57:37,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=183333.33333333334, ans=0.125 2023-12-21 18:57:42,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=183400.0, ans=0.125 2023-12-21 18:57:43,146 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+01 2.536e+01 2.775e+01 2.969e+01 4.342e+01, threshold=5.551e+01, percent-clipped=0.0 2023-12-21 18:57:55,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=183466.66666666666, ans=0.125 2023-12-21 18:58:01,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.76 vs. limit=15.0 2023-12-21 18:58:01,814 INFO [train.py:886] (0/4) Epoch 6, batch 3700, loss[loss=0.01556, audio_tagging_loss=0.01556, over 25000.00 frames. ], tot_loss[loss=0.01616, audio_tagging_loss=0.01616, over 4951637.38 frames. ], batch size: 100, lr: 1.75e-02, grad_scale: 128.0 2023-12-21 18:58:24,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=183666.66666666666, ans=0.1 2023-12-21 18:58:41,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=183800.0, ans=0.125 2023-12-21 18:58:54,270 INFO [train.py:886] (0/4) Epoch 6, batch 3750, loss[loss=0.01708, audio_tagging_loss=0.01708, over 24750.00 frames. ], tot_loss[loss=0.01639, audio_tagging_loss=0.01639, over 4943421.31 frames. ], batch size: 99, lr: 1.75e-02, grad_scale: 128.0 2023-12-21 18:59:01,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.55 vs. limit=15.0 2023-12-21 18:59:02,037 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2023-12-21 18:59:09,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=183933.33333333334, ans=0.0 2023-12-21 18:59:20,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=184000.0, ans=0.015 2023-12-21 18:59:28,151 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.576e+01 2.747e+01 2.976e+01 3.504e+01, threshold=5.494e+01, percent-clipped=0.0 2023-12-21 18:59:41,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=184133.33333333334, ans=0.0 2023-12-21 18:59:45,100 INFO [train.py:886] (0/4) Epoch 6, batch 3800, loss[loss=0.01309, audio_tagging_loss=0.01309, over 24750.00 frames. ], tot_loss[loss=0.01652, audio_tagging_loss=0.01652, over 4942935.95 frames. ], batch size: 99, lr: 1.74e-02, grad_scale: 128.0 2023-12-21 19:00:17,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=184400.0, ans=0.0 2023-12-21 19:00:24,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=184400.0, ans=0.1 2023-12-21 19:00:26,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=184466.66666666666, ans=0.09899494936611666 2023-12-21 19:00:37,456 INFO [train.py:886] (0/4) Epoch 6, batch 3850, loss[loss=0.0175, audio_tagging_loss=0.0175, over 25000.00 frames. ], tot_loss[loss=0.01633, audio_tagging_loss=0.01633, over 4947103.50 frames. ], batch size: 100, lr: 1.74e-02, grad_scale: 128.0 2023-12-21 19:00:48,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=184600.0, ans=0.125 2023-12-21 19:00:50,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=184600.0, ans=0.125 2023-12-21 19:01:08,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=184733.33333333334, ans=0.125 2023-12-21 19:01:11,862 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.661e+01 2.816e+01 3.118e+01 3.976e+01, threshold=5.631e+01, percent-clipped=0.0 2023-12-21 19:01:27,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=184800.0, ans=0.07 2023-12-21 19:01:29,358 INFO [train.py:886] (0/4) Epoch 6, batch 3900, loss[loss=0.01869, audio_tagging_loss=0.01869, over 25000.00 frames. ], tot_loss[loss=0.01611, audio_tagging_loss=0.01611, over 4952996.29 frames. ], batch size: 100, lr: 1.74e-02, grad_scale: 64.0 2023-12-21 19:01:39,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=184933.33333333334, ans=0.125 2023-12-21 19:01:40,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=184933.33333333334, ans=0.125 2023-12-21 19:01:41,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.76 vs. limit=15.0 2023-12-21 19:01:44,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=184933.33333333334, ans=15.0 2023-12-21 19:01:51,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=185000.0, ans=0.0 2023-12-21 19:02:10,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=12.0 2023-12-21 19:02:20,931 INFO [train.py:886] (0/4) Epoch 6, batch 3950, loss[loss=0.01719, audio_tagging_loss=0.01719, over 24750.00 frames. ], tot_loss[loss=0.01595, audio_tagging_loss=0.01595, over 4959075.28 frames. ], batch size: 99, lr: 1.74e-02, grad_scale: 64.0 2023-12-21 19:02:47,168 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.82 vs. limit=22.5 2023-12-21 19:02:49,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=185333.33333333334, ans=0.1 2023-12-21 19:02:55,023 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 2.574e+01 2.731e+01 2.913e+01 3.749e+01, threshold=5.463e+01, percent-clipped=0.0 2023-12-21 19:02:55,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.87 vs. limit=15.0 2023-12-21 19:02:58,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=185400.0, ans=10.0 2023-12-21 19:02:58,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=185400.0, ans=0.1 2023-12-21 19:03:00,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=185400.0, ans=0.2 2023-12-21 19:03:01,995 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2023-12-21 19:03:13,948 INFO [train.py:886] (0/4) Epoch 6, batch 4000, loss[loss=0.01809, audio_tagging_loss=0.01809, over 25000.00 frames. ], tot_loss[loss=0.01602, audio_tagging_loss=0.01602, over 4958998.87 frames. ], batch size: 100, lr: 1.74e-02, grad_scale: 64.0 2023-12-21 19:03:17,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=185533.33333333334, ans=0.125 2023-12-21 19:03:18,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=185533.33333333334, ans=0.125 2023-12-21 19:03:19,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=185533.33333333334, ans=0.1 2023-12-21 19:03:28,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.98 vs. limit=6.0 2023-12-21 19:03:36,979 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.34 vs. limit=15.0 2023-12-21 19:03:47,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=185733.33333333334, ans=0.1 2023-12-21 19:03:53,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=185733.33333333334, ans=0.04949747468305833 2023-12-21 19:03:54,376 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-12-21 19:04:04,178 INFO [train.py:886] (0/4) Epoch 6, batch 4050, loss[loss=0.01505, audio_tagging_loss=0.01505, over 25000.00 frames. ], tot_loss[loss=0.01599, audio_tagging_loss=0.01599, over 4951460.17 frames. ], batch size: 100, lr: 1.74e-02, grad_scale: 64.0 2023-12-21 19:04:15,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=185933.33333333334, ans=10.0 2023-12-21 19:04:15,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=185933.33333333334, ans=0.125 2023-12-21 19:04:24,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=186000.0, ans=0.0 2023-12-21 19:04:24,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=186000.0, ans=0.0 2023-12-21 19:04:25,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.22 vs. limit=22.5 2023-12-21 19:04:31,271 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2023-12-21 19:04:38,169 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.265e+01 2.667e+01 2.852e+01 3.052e+01 4.692e+01, threshold=5.704e+01, percent-clipped=0.0 2023-12-21 19:04:43,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=186066.66666666666, ans=0.2 2023-12-21 19:04:48,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=186133.33333333334, ans=0.1 2023-12-21 19:04:56,390 INFO [train.py:886] (0/4) Epoch 6, batch 4100, loss[loss=0.01705, audio_tagging_loss=0.01705, over 24750.00 frames. ], tot_loss[loss=0.01616, audio_tagging_loss=0.01616, over 4944966.95 frames. ], batch size: 99, lr: 1.74e-02, grad_scale: 64.0 2023-12-21 19:04:59,758 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.28 vs. limit=12.0 2023-12-21 19:05:10,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=186266.66666666666, ans=0.125 2023-12-21 19:05:28,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=186400.0, ans=0.09899494936611666 2023-12-21 19:05:47,585 INFO [train.py:886] (0/4) Epoch 6, batch 4150, loss[loss=0.01751, audio_tagging_loss=0.01751, over 24932.00 frames. ], tot_loss[loss=0.01616, audio_tagging_loss=0.01616, over 4945475.52 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:06:07,193 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-28000.pt 2023-12-21 19:06:10,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=186666.66666666666, ans=0.125 2023-12-21 19:06:16,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=186666.66666666666, ans=0.0 2023-12-21 19:06:23,376 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.572e+01 2.768e+01 2.919e+01 3.427e+01, threshold=5.536e+01, percent-clipped=0.0 2023-12-21 19:06:39,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=186800.0, ans=0.0 2023-12-21 19:06:41,002 INFO [train.py:886] (0/4) Epoch 6, batch 4200, loss[loss=0.01533, audio_tagging_loss=0.01533, over 24750.00 frames. ], tot_loss[loss=0.01601, audio_tagging_loss=0.01601, over 4949075.10 frames. ], batch size: 99, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:06:43,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.05 vs. limit=15.0 2023-12-21 19:06:47,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2023-12-21 19:06:49,482 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:07:07,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=187000.0, ans=0.125 2023-12-21 19:07:33,804 INFO [train.py:886] (0/4) Epoch 6, batch 4250, loss[loss=0.01582, audio_tagging_loss=0.01582, over 25000.00 frames. ], tot_loss[loss=0.01602, audio_tagging_loss=0.01602, over 4953020.52 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:07:49,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.53 vs. limit=15.0 2023-12-21 19:08:07,998 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 2.574e+01 2.753e+01 2.984e+01 3.993e+01, threshold=5.507e+01, percent-clipped=0.0 2023-12-21 19:08:11,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=187400.0, ans=0.125 2023-12-21 19:08:11,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=187400.0, ans=0.0 2023-12-21 19:08:16,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187466.66666666666, ans=0.1 2023-12-21 19:08:24,685 INFO [train.py:886] (0/4) Epoch 6, batch 4300, loss[loss=0.01399, audio_tagging_loss=0.01399, over 25000.00 frames. ], tot_loss[loss=0.016, audio_tagging_loss=0.016, over 4956864.79 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:08:35,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2023-12-21 19:08:45,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2023-12-21 19:09:08,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=187800.0, ans=0.0 2023-12-21 19:09:17,050 INFO [train.py:886] (0/4) Epoch 6, batch 4350, loss[loss=0.0173, audio_tagging_loss=0.0173, over 24750.00 frames. ], tot_loss[loss=0.01606, audio_tagging_loss=0.01606, over 4959525.65 frames. ], batch size: 99, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:09:21,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.80 vs. limit=10.0 2023-12-21 19:09:32,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2023-12-21 19:09:34,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=187933.33333333334, ans=0.125 2023-12-21 19:09:38,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=188000.0, ans=0.2 2023-12-21 19:09:51,261 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.707e+01 2.868e+01 3.047e+01 3.925e+01, threshold=5.736e+01, percent-clipped=0.0 2023-12-21 19:09:51,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=188066.66666666666, ans=0.125 2023-12-21 19:10:08,771 INFO [train.py:886] (0/4) Epoch 6, batch 4400, loss[loss=0.01439, audio_tagging_loss=0.01439, over 24750.00 frames. ], tot_loss[loss=0.01621, audio_tagging_loss=0.01621, over 4950533.97 frames. ], batch size: 99, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:10:11,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=188200.0, ans=0.0 2023-12-21 19:10:27,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2023-12-21 19:10:33,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2023-12-21 19:10:35,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=188333.33333333334, ans=0.125 2023-12-21 19:10:36,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.47 vs. limit=10.0 2023-12-21 19:10:42,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=188400.0, ans=0.1 2023-12-21 19:10:46,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=188400.0, ans=0.125 2023-12-21 19:10:46,805 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.50 vs. limit=15.0 2023-12-21 19:10:51,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=188466.66666666666, ans=0.2 2023-12-21 19:11:00,394 INFO [train.py:886] (0/4) Epoch 6, batch 4450, loss[loss=0.01732, audio_tagging_loss=0.01732, over 24750.00 frames. ], tot_loss[loss=0.01629, audio_tagging_loss=0.01629, over 4942055.85 frames. ], batch size: 99, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:11:22,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=188666.66666666666, ans=0.0 2023-12-21 19:11:35,075 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.682e+01 2.838e+01 3.055e+01 3.746e+01, threshold=5.675e+01, percent-clipped=0.0 2023-12-21 19:11:39,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=188733.33333333334, ans=0.125 2023-12-21 19:11:41,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-12-21 19:11:52,449 INFO [train.py:886] (0/4) Epoch 6, batch 4500, loss[loss=0.01825, audio_tagging_loss=0.01825, over 25000.00 frames. ], tot_loss[loss=0.0162, audio_tagging_loss=0.0162, over 4945591.31 frames. ], batch size: 100, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:11:56,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=188866.66666666666, ans=0.0 2023-12-21 19:11:57,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=188866.66666666666, ans=0.125 2023-12-21 19:12:02,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=188933.33333333334, ans=0.125 2023-12-21 19:12:08,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=188933.33333333334, ans=0.0 2023-12-21 19:12:21,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=189000.0, ans=0.1 2023-12-21 19:12:29,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=189066.66666666666, ans=0.125 2023-12-21 19:12:30,542 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.89 vs. limit=12.0 2023-12-21 19:12:34,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.61 vs. limit=15.0 2023-12-21 19:12:44,063 INFO [train.py:886] (0/4) Epoch 6, batch 4550, loss[loss=0.01665, audio_tagging_loss=0.01665, over 25000.00 frames. ], tot_loss[loss=0.01615, audio_tagging_loss=0.01615, over 4951520.56 frames. ], batch size: 100, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:13:10,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=189333.33333333334, ans=0.0 2023-12-21 19:13:12,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.78 vs. limit=10.0 2023-12-21 19:13:15,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=189400.0, ans=0.125 2023-12-21 19:13:15,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=189400.0, ans=0.0 2023-12-21 19:13:18,809 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 2.583e+01 2.791e+01 2.970e+01 3.966e+01, threshold=5.581e+01, percent-clipped=0.0 2023-12-21 19:13:36,223 INFO [train.py:886] (0/4) Epoch 6, batch 4600, loss[loss=0.01762, audio_tagging_loss=0.01762, over 25000.00 frames. ], tot_loss[loss=0.01621, audio_tagging_loss=0.01621, over 4950291.73 frames. ], batch size: 100, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:13:47,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=189600.0, ans=0.1 2023-12-21 19:13:58,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=189666.66666666666, ans=0.95 2023-12-21 19:14:00,520 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2023-12-21 19:14:02,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=15.0 2023-12-21 19:14:03,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=189666.66666666666, ans=0.125 2023-12-21 19:14:05,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=189666.66666666666, ans=0.125 2023-12-21 19:14:08,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.26 vs. limit=15.0 2023-12-21 19:14:27,544 INFO [train.py:886] (0/4) Epoch 6, batch 4650, loss[loss=0.01768, audio_tagging_loss=0.01768, over 25000.00 frames. ], tot_loss[loss=0.01625, audio_tagging_loss=0.01625, over 4959547.25 frames. ], batch size: 100, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:14:38,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=189933.33333333334, ans=0.125 2023-12-21 19:14:43,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=189933.33333333334, ans=0.2 2023-12-21 19:14:44,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=189933.33333333334, ans=0.125 2023-12-21 19:15:02,397 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.265e+01 2.615e+01 2.807e+01 2.981e+01 3.491e+01, threshold=5.613e+01, percent-clipped=0.0 2023-12-21 19:15:05,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=190066.66666666666, ans=0.125 2023-12-21 19:15:17,999 INFO [train.py:886] (0/4) Epoch 6, batch 4700, loss[loss=0.01674, audio_tagging_loss=0.01674, over 24750.00 frames. ], tot_loss[loss=0.0164, audio_tagging_loss=0.0164, over 4954660.28 frames. ], batch size: 99, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:15:24,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=190200.0, ans=0.125 2023-12-21 19:15:32,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.61 vs. limit=22.5 2023-12-21 19:15:34,130 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=15.0 2023-12-21 19:15:35,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=190266.66666666666, ans=0.1 2023-12-21 19:15:36,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=190333.33333333334, ans=0.0 2023-12-21 19:15:47,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=190400.0, ans=0.125 2023-12-21 19:15:55,443 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:16:04,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=190466.66666666666, ans=0.0 2023-12-21 19:16:05,840 INFO [train.py:886] (0/4) Epoch 6, batch 4750, loss[loss=0.01652, audio_tagging_loss=0.01652, over 24750.00 frames. ], tot_loss[loss=0.01639, audio_tagging_loss=0.01639, over 4952736.15 frames. ], batch size: 99, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:16:07,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=190533.33333333334, ans=0.035 2023-12-21 19:16:11,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=190533.33333333334, ans=0.0 2023-12-21 19:16:14,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=190600.0, ans=0.125 2023-12-21 19:16:20,879 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-6.pt 2023-12-21 19:16:43,675 INFO [train.py:886] (0/4) Epoch 7, batch 0, loss[loss=0.03619, audio_tagging_loss=0.03619, over 25000.00 frames. ], tot_loss[loss=0.03619, audio_tagging_loss=0.03619, over 25000.00 frames. ], batch size: 100, lr: 1.61e-02, grad_scale: 64.0 2023-12-21 19:16:43,677 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 19:17:05,426 INFO [train.py:917] (0/4) Epoch 7, validation: loss=0.03667, audio_tagging_loss=0.03667, over 3737520.00 frames. 2023-12-21 19:17:05,427 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 19:17:07,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=190640.0, ans=0.1 2023-12-21 19:17:23,802 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+01 2.615e+01 2.821e+01 3.087e+01 1.022e+02, threshold=5.642e+01, percent-clipped=4.0 2023-12-21 19:17:24,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2023-12-21 19:17:29,094 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2023-12-21 19:17:34,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=190773.33333333334, ans=0.0 2023-12-21 19:17:40,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=190840.0, ans=0.125 2023-12-21 19:17:45,363 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.50 vs. limit=22.5 2023-12-21 19:17:56,687 INFO [train.py:886] (0/4) Epoch 7, batch 50, loss[loss=0.02352, audio_tagging_loss=0.02352, over 25000.00 frames. ], tot_loss[loss=0.02634, audio_tagging_loss=0.02634, over 1109142.43 frames. ], batch size: 100, lr: 1.61e-02, grad_scale: 32.0 2023-12-21 19:18:00,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.32 vs. limit=22.5 2023-12-21 19:18:07,517 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:18:13,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.42 vs. limit=22.5 2023-12-21 19:18:19,895 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:18:30,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=191173.33333333334, ans=0.0 2023-12-21 19:18:34,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=191173.33333333334, ans=0.125 2023-12-21 19:18:46,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=22.5 2023-12-21 19:18:47,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.96 vs. limit=15.0 2023-12-21 19:18:47,573 INFO [train.py:886] (0/4) Epoch 7, batch 100, loss[loss=0.01738, audio_tagging_loss=0.01738, over 25000.00 frames. ], tot_loss[loss=0.02244, audio_tagging_loss=0.02244, over 1966395.31 frames. ], batch size: 100, lr: 1.61e-02, grad_scale: 32.0 2023-12-21 19:18:58,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=191373.33333333334, ans=0.1 2023-12-21 19:19:05,816 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 2.945e+01 3.158e+01 3.404e+01 4.637e+01, threshold=6.317e+01, percent-clipped=0.0 2023-12-21 19:19:30,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2023-12-21 19:19:37,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=191573.33333333334, ans=0.125 2023-12-21 19:19:38,897 INFO [train.py:886] (0/4) Epoch 7, batch 150, loss[loss=0.01727, audio_tagging_loss=0.01727, over 25000.00 frames. ], tot_loss[loss=0.0203, audio_tagging_loss=0.0203, over 2619820.95 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:19:48,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=191706.66666666666, ans=0.125 2023-12-21 19:20:14,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=191840.0, ans=0.2 2023-12-21 19:20:29,285 INFO [train.py:886] (0/4) Epoch 7, batch 200, loss[loss=0.01662, audio_tagging_loss=0.01662, over 25000.00 frames. ], tot_loss[loss=0.01899, audio_tagging_loss=0.01899, over 3141932.07 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:20:37,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=191973.33333333334, ans=0.0 2023-12-21 19:20:40,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=192040.0, ans=0.05 2023-12-21 19:20:48,759 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+01 2.567e+01 2.755e+01 2.935e+01 3.522e+01, threshold=5.511e+01, percent-clipped=0.0 2023-12-21 19:20:51,972 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.06 vs. limit=15.0 2023-12-21 19:21:05,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=192173.33333333334, ans=0.1 2023-12-21 19:21:09,510 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2023-12-21 19:21:13,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=192240.0, ans=0.0 2023-12-21 19:21:16,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=192240.0, ans=0.125 2023-12-21 19:21:19,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=192240.0, ans=0.0 2023-12-21 19:21:22,176 INFO [train.py:886] (0/4) Epoch 7, batch 250, loss[loss=0.01546, audio_tagging_loss=0.01546, over 25000.00 frames. ], tot_loss[loss=0.01808, audio_tagging_loss=0.01808, over 3548295.91 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:21:24,498 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.77 vs. limit=15.0 2023-12-21 19:21:38,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=192373.33333333334, ans=0.125 2023-12-21 19:21:55,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.04 vs. limit=22.5 2023-12-21 19:21:56,410 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.449e+00 2023-12-21 19:22:02,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=192573.33333333334, ans=0.125 2023-12-21 19:22:03,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=192573.33333333334, ans=0.0 2023-12-21 19:22:05,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=192573.33333333334, ans=0.125 2023-12-21 19:22:13,472 INFO [train.py:886] (0/4) Epoch 7, batch 300, loss[loss=0.01347, audio_tagging_loss=0.01347, over 23996.00 frames. ], tot_loss[loss=0.01758, audio_tagging_loss=0.01758, over 3859859.70 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:22:31,652 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 2.537e+01 2.670e+01 2.875e+01 3.479e+01, threshold=5.340e+01, percent-clipped=0.0 2023-12-21 19:22:32,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=192773.33333333334, ans=0.125 2023-12-21 19:22:50,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=192840.0, ans=0.125 2023-12-21 19:22:58,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=192906.66666666666, ans=0.125 2023-12-21 19:23:04,677 INFO [train.py:886] (0/4) Epoch 7, batch 350, loss[loss=0.01405, audio_tagging_loss=0.01405, over 24750.00 frames. ], tot_loss[loss=0.0173, audio_tagging_loss=0.0173, over 4099132.84 frames. ], batch size: 99, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:23:11,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=192973.33333333334, ans=0.0 2023-12-21 19:23:27,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193106.66666666666, ans=0.1 2023-12-21 19:23:56,031 INFO [train.py:886] (0/4) Epoch 7, batch 400, loss[loss=0.01241, audio_tagging_loss=0.01241, over 25000.00 frames. ], tot_loss[loss=0.01701, audio_tagging_loss=0.01701, over 4289863.41 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:24:01,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=193306.66666666666, ans=0.125 2023-12-21 19:24:08,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=193373.33333333334, ans=0.125 2023-12-21 19:24:09,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193373.33333333334, ans=0.1 2023-12-21 19:24:15,301 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.543e+01 2.742e+01 2.935e+01 3.819e+01, threshold=5.484e+01, percent-clipped=0.0 2023-12-21 19:24:35,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=193506.66666666666, ans=0.125 2023-12-21 19:24:43,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193573.33333333334, ans=0.1 2023-12-21 19:24:47,521 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.91 vs. limit=22.5 2023-12-21 19:24:48,531 INFO [train.py:886] (0/4) Epoch 7, batch 450, loss[loss=0.01729, audio_tagging_loss=0.01729, over 25000.00 frames. ], tot_loss[loss=0.0168, audio_tagging_loss=0.0168, over 4437741.46 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:24:50,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=193640.0, ans=15.0 2023-12-21 19:25:00,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=193706.66666666666, ans=0.1 2023-12-21 19:25:06,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=193706.66666666666, ans=0.0 2023-12-21 19:25:07,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=193706.66666666666, ans=0.04949747468305833 2023-12-21 19:25:12,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=193773.33333333334, ans=0.125 2023-12-21 19:25:14,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=193773.33333333334, ans=0.025 2023-12-21 19:25:19,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=193840.0, ans=0.125 2023-12-21 19:25:21,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=193840.0, ans=0.125 2023-12-21 19:25:26,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=193840.0, ans=0.0 2023-12-21 19:25:27,127 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.21 vs. limit=15.0 2023-12-21 19:25:27,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=193840.0, ans=0.125 2023-12-21 19:25:40,739 INFO [train.py:886] (0/4) Epoch 7, batch 500, loss[loss=0.01427, audio_tagging_loss=0.01427, over 25000.00 frames. ], tot_loss[loss=0.0165, audio_tagging_loss=0.0165, over 4554763.18 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:25:42,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=193973.33333333334, ans=0.0 2023-12-21 19:25:43,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=193973.33333333334, ans=0.07 2023-12-21 19:25:53,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=194040.0, ans=0.125 2023-12-21 19:25:55,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=194040.0, ans=0.125 2023-12-21 19:25:58,610 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.495e+01 2.691e+01 2.855e+01 3.742e+01, threshold=5.381e+01, percent-clipped=0.0 2023-12-21 19:26:00,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=194106.66666666666, ans=0.125 2023-12-21 19:26:15,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=194173.33333333334, ans=0.2 2023-12-21 19:26:16,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.95 vs. limit=15.0 2023-12-21 19:26:31,533 INFO [train.py:886] (0/4) Epoch 7, batch 550, loss[loss=0.01586, audio_tagging_loss=0.01586, over 25000.00 frames. ], tot_loss[loss=0.01637, audio_tagging_loss=0.01637, over 4639314.44 frames. ], batch size: 100, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:26:32,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=194306.66666666666, ans=0.015 2023-12-21 19:26:43,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=194373.33333333334, ans=0.2 2023-12-21 19:26:47,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=194373.33333333334, ans=15.0 2023-12-21 19:26:50,093 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.036e-02 2023-12-21 19:26:55,622 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:26:55,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=194440.0, ans=0.125 2023-12-21 19:27:10,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=194506.66666666666, ans=0.2 2023-12-21 19:27:17,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=194573.33333333334, ans=0.125 2023-12-21 19:27:21,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=194573.33333333334, ans=0.125 2023-12-21 19:27:23,606 INFO [train.py:886] (0/4) Epoch 7, batch 600, loss[loss=0.01404, audio_tagging_loss=0.01404, over 24750.00 frames. ], tot_loss[loss=0.01632, audio_tagging_loss=0.01632, over 4714034.43 frames. ], batch size: 99, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:27:32,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.39 vs. limit=15.0 2023-12-21 19:27:42,301 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.620e+01 2.779e+01 2.985e+01 3.932e+01, threshold=5.559e+01, percent-clipped=0.0 2023-12-21 19:27:46,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=194773.33333333334, ans=0.125 2023-12-21 19:27:47,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.79 vs. limit=22.5 2023-12-21 19:27:49,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=194773.33333333334, ans=0.0 2023-12-21 19:28:11,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.43 vs. limit=10.0 2023-12-21 19:28:14,643 INFO [train.py:886] (0/4) Epoch 7, batch 650, loss[loss=0.01782, audio_tagging_loss=0.01782, over 23998.00 frames. ], tot_loss[loss=0.01623, audio_tagging_loss=0.01623, over 4760994.08 frames. ], batch size: 100, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:28:23,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=194973.33333333334, ans=0.125 2023-12-21 19:28:24,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=195040.0, ans=0.0 2023-12-21 19:29:05,862 INFO [train.py:886] (0/4) Epoch 7, batch 700, loss[loss=0.01461, audio_tagging_loss=0.01461, over 22227.00 frames. ], tot_loss[loss=0.01625, audio_tagging_loss=0.01625, over 4799731.64 frames. ], batch size: 107, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:29:11,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.39 vs. limit=6.0 2023-12-21 19:29:15,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.18 vs. limit=10.0 2023-12-21 19:29:18,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=195373.33333333334, ans=0.2 2023-12-21 19:29:23,194 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.16 vs. limit=10.0 2023-12-21 19:29:24,333 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 2.528e+01 2.672e+01 2.888e+01 3.469e+01, threshold=5.344e+01, percent-clipped=0.0 2023-12-21 19:29:30,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=195440.0, ans=0.0 2023-12-21 19:29:44,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.38 vs. limit=15.0 2023-12-21 19:29:49,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=195573.33333333334, ans=22.5 2023-12-21 19:29:56,836 INFO [train.py:886] (0/4) Epoch 7, batch 750, loss[loss=0.01887, audio_tagging_loss=0.01887, over 24750.00 frames. ], tot_loss[loss=0.01607, audio_tagging_loss=0.01607, over 4838340.81 frames. ], batch size: 99, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:30:18,927 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.54 vs. limit=15.0 2023-12-21 19:30:34,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=195840.0, ans=0.2 2023-12-21 19:30:40,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=195906.66666666666, ans=0.2 2023-12-21 19:30:46,829 INFO [train.py:886] (0/4) Epoch 7, batch 800, loss[loss=0.01893, audio_tagging_loss=0.01893, over 24750.00 frames. ], tot_loss[loss=0.01607, audio_tagging_loss=0.01607, over 4858394.66 frames. ], batch size: 99, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:30:53,076 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.098e-02 2023-12-21 19:31:05,898 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.570e+01 2.779e+01 3.006e+01 3.610e+01, threshold=5.558e+01, percent-clipped=0.0 2023-12-21 19:31:10,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.47 vs. limit=15.0 2023-12-21 19:31:20,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=196173.33333333334, ans=0.125 2023-12-21 19:31:39,172 INFO [train.py:886] (0/4) Epoch 7, batch 850, loss[loss=0.01741, audio_tagging_loss=0.01741, over 24750.00 frames. ], tot_loss[loss=0.01604, audio_tagging_loss=0.01604, over 4883233.36 frames. ], batch size: 99, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:31:40,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196306.66666666666, ans=0.1 2023-12-21 19:31:42,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=196306.66666666666, ans=0.1 2023-12-21 19:31:58,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.49 vs. limit=15.0 2023-12-21 19:31:59,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=196440.0, ans=0.1 2023-12-21 19:32:05,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=196440.0, ans=0.125 2023-12-21 19:32:06,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=196440.0, ans=0.125 2023-12-21 19:32:06,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=196440.0, ans=0.125 2023-12-21 19:32:07,279 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2023-12-21 19:32:16,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.15 vs. limit=15.0 2023-12-21 19:32:16,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.23 vs. limit=22.5 2023-12-21 19:32:24,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=196573.33333333334, ans=0.2 2023-12-21 19:32:31,651 INFO [train.py:886] (0/4) Epoch 7, batch 900, loss[loss=0.01638, audio_tagging_loss=0.01638, over 24750.00 frames. ], tot_loss[loss=0.01601, audio_tagging_loss=0.01601, over 4899377.69 frames. ], batch size: 99, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:32:34,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=196640.0, ans=0.125 2023-12-21 19:32:41,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=196706.66666666666, ans=0.05 2023-12-21 19:32:43,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=196706.66666666666, ans=0.05 2023-12-21 19:32:50,062 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 2.564e+01 2.733e+01 2.886e+01 3.706e+01, threshold=5.467e+01, percent-clipped=0.0 2023-12-21 19:32:52,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.89 vs. limit=15.0 2023-12-21 19:32:53,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196773.33333333334, ans=0.1 2023-12-21 19:32:59,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=196773.33333333334, ans=0.125 2023-12-21 19:33:02,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=196840.0, ans=0.125 2023-12-21 19:33:07,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=196840.0, ans=0.1 2023-12-21 19:33:09,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.45 vs. limit=15.0 2023-12-21 19:33:10,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=196840.0, ans=0.125 2023-12-21 19:33:22,388 INFO [train.py:886] (0/4) Epoch 7, batch 950, loss[loss=0.02094, audio_tagging_loss=0.02094, over 24750.00 frames. ], tot_loss[loss=0.01621, audio_tagging_loss=0.01621, over 4901891.97 frames. ], batch size: 99, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:33:27,580 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=12.0 2023-12-21 19:33:44,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=197106.66666666666, ans=0.125 2023-12-21 19:33:44,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=197106.66666666666, ans=0.0 2023-12-21 19:33:55,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=197173.33333333334, ans=0.125 2023-12-21 19:34:01,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=197173.33333333334, ans=0.125 2023-12-21 19:34:11,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=197240.0, ans=0.125 2023-12-21 19:34:14,456 INFO [train.py:886] (0/4) Epoch 7, batch 1000, loss[loss=0.01655, audio_tagging_loss=0.01655, over 24750.00 frames. ], tot_loss[loss=0.01612, audio_tagging_loss=0.01612, over 4913720.08 frames. ], batch size: 99, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:34:23,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=197373.33333333334, ans=0.2 2023-12-21 19:34:32,233 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.519e+01 2.705e+01 2.941e+01 3.391e+01, threshold=5.409e+01, percent-clipped=0.0 2023-12-21 19:35:01,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=197573.33333333334, ans=0.125 2023-12-21 19:35:05,198 INFO [train.py:886] (0/4) Epoch 7, batch 1050, loss[loss=0.01308, audio_tagging_loss=0.01308, over 25000.00 frames. ], tot_loss[loss=0.01605, audio_tagging_loss=0.01605, over 4918744.43 frames. ], batch size: 100, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:35:14,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=197640.0, ans=0.125 2023-12-21 19:35:18,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=197706.66666666666, ans=0.125 2023-12-21 19:35:22,360 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:35:32,159 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.30 vs. limit=5.0 2023-12-21 19:35:38,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=197840.0, ans=0.1 2023-12-21 19:35:45,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=197840.0, ans=0.0 2023-12-21 19:35:50,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.28 vs. limit=15.0 2023-12-21 19:35:51,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=197906.66666666666, ans=0.1 2023-12-21 19:35:54,712 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2023-12-21 19:35:57,691 INFO [train.py:886] (0/4) Epoch 7, batch 1100, loss[loss=0.01374, audio_tagging_loss=0.01374, over 25000.00 frames. ], tot_loss[loss=0.01608, audio_tagging_loss=0.01608, over 4927405.31 frames. ], batch size: 100, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:36:00,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=197973.33333333334, ans=0.125 2023-12-21 19:36:04,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.05 vs. limit=22.5 2023-12-21 19:36:08,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=198040.0, ans=0.0 2023-12-21 19:36:16,000 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.563e+01 2.707e+01 2.877e+01 3.432e+01, threshold=5.414e+01, percent-clipped=0.0 2023-12-21 19:36:21,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=198106.66666666666, ans=0.07 2023-12-21 19:36:23,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=198106.66666666666, ans=0.125 2023-12-21 19:36:28,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=198173.33333333334, ans=0.0 2023-12-21 19:36:37,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=198240.0, ans=0.07 2023-12-21 19:36:37,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=198240.0, ans=0.04949747468305833 2023-12-21 19:36:48,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198306.66666666666, ans=0.1 2023-12-21 19:36:49,284 INFO [train.py:886] (0/4) Epoch 7, batch 1150, loss[loss=0.01375, audio_tagging_loss=0.01375, over 25000.00 frames. ], tot_loss[loss=0.01602, audio_tagging_loss=0.01602, over 4935205.37 frames. ], batch size: 100, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:36:53,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=198306.66666666666, ans=0.0 2023-12-21 19:36:57,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=198306.66666666666, ans=0.125 2023-12-21 19:37:00,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=198373.33333333334, ans=0.025 2023-12-21 19:37:01,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=198373.33333333334, ans=0.2 2023-12-21 19:37:02,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=198373.33333333334, ans=0.125 2023-12-21 19:37:03,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.46 vs. limit=15.0 2023-12-21 19:37:05,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=198373.33333333334, ans=6.0 2023-12-21 19:37:12,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=198440.0, ans=0.125 2023-12-21 19:37:26,491 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.87 vs. limit=15.0 2023-12-21 19:37:30,027 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.55 vs. limit=15.0 2023-12-21 19:37:33,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=198573.33333333334, ans=0.125 2023-12-21 19:37:39,453 INFO [train.py:886] (0/4) Epoch 7, batch 1200, loss[loss=0.01674, audio_tagging_loss=0.01674, over 24940.00 frames. ], tot_loss[loss=0.01601, audio_tagging_loss=0.01601, over 4942486.20 frames. ], batch size: 100, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:37:49,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198706.66666666666, ans=0.1 2023-12-21 19:37:54,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=198706.66666666666, ans=0.125 2023-12-21 19:37:54,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=198706.66666666666, ans=0.05 2023-12-21 19:37:54,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=198706.66666666666, ans=15.0 2023-12-21 19:37:58,030 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.173e+01 2.536e+01 2.723e+01 2.860e+01 3.472e+01, threshold=5.446e+01, percent-clipped=0.0 2023-12-21 19:38:00,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=198773.33333333334, ans=0.2 2023-12-21 19:38:02,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=198773.33333333334, ans=0.0 2023-12-21 19:38:11,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198840.0, ans=0.1 2023-12-21 19:38:16,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 2023-12-21 19:38:21,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=198906.66666666666, ans=0.0 2023-12-21 19:38:28,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=198906.66666666666, ans=0.0 2023-12-21 19:38:31,182 INFO [train.py:886] (0/4) Epoch 7, batch 1250, loss[loss=0.01617, audio_tagging_loss=0.01617, over 24750.00 frames. ], tot_loss[loss=0.01611, audio_tagging_loss=0.01611, over 4942328.97 frames. ], batch size: 99, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:38:42,568 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2023-12-21 19:39:00,791 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.461e-02 2023-12-21 19:39:23,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=199306.66666666666, ans=0.125 2023-12-21 19:39:23,913 INFO [train.py:886] (0/4) Epoch 7, batch 1300, loss[loss=0.01605, audio_tagging_loss=0.01605, over 21273.00 frames. ], tot_loss[loss=0.01622, audio_tagging_loss=0.01622, over 4938025.86 frames. ], batch size: 107, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:39:32,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=199373.33333333334, ans=0.0 2023-12-21 19:39:40,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.61 vs. limit=15.0 2023-12-21 19:39:42,181 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.626e+01 2.798e+01 3.036e+01 3.776e+01, threshold=5.596e+01, percent-clipped=0.0 2023-12-21 19:39:59,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=199506.66666666666, ans=0.125 2023-12-21 19:40:15,035 INFO [train.py:886] (0/4) Epoch 7, batch 1350, loss[loss=0.01376, audio_tagging_loss=0.01376, over 25000.00 frames. ], tot_loss[loss=0.01607, audio_tagging_loss=0.01607, over 4942114.21 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:40:19,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=199640.0, ans=0.95 2023-12-21 19:40:24,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=199706.66666666666, ans=0.025 2023-12-21 19:40:34,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=199773.33333333334, ans=0.125 2023-12-21 19:40:39,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=199773.33333333334, ans=0.0 2023-12-21 19:40:51,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=199840.0, ans=0.1 2023-12-21 19:41:06,912 INFO [train.py:886] (0/4) Epoch 7, batch 1400, loss[loss=0.01499, audio_tagging_loss=0.01499, over 25000.00 frames. ], tot_loss[loss=0.01601, audio_tagging_loss=0.01601, over 4948699.85 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:41:07,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2023-12-21 19:41:08,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=199973.33333333334, ans=0.0 2023-12-21 19:41:11,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2023-12-21 19:41:25,721 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.540e+01 2.768e+01 3.021e+01 3.899e+01, threshold=5.536e+01, percent-clipped=0.0 2023-12-21 19:41:27,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=200106.66666666666, ans=0.0 2023-12-21 19:41:42,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=200173.33333333334, ans=0.125 2023-12-21 19:41:51,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.06 vs. limit=22.5 2023-12-21 19:41:51,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2023-12-21 19:41:58,860 INFO [train.py:886] (0/4) Epoch 7, batch 1450, loss[loss=0.01843, audio_tagging_loss=0.01843, over 24750.00 frames. ], tot_loss[loss=0.01587, audio_tagging_loss=0.01587, over 4951680.69 frames. ], batch size: 99, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:42:02,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=200306.66666666666, ans=0.0 2023-12-21 19:42:08,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=200373.33333333334, ans=0.125 2023-12-21 19:42:41,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.40 vs. limit=10.0 2023-12-21 19:42:48,863 INFO [train.py:886] (0/4) Epoch 7, batch 1500, loss[loss=0.0181, audio_tagging_loss=0.0181, over 24750.00 frames. ], tot_loss[loss=0.01586, audio_tagging_loss=0.01586, over 4953827.40 frames. ], batch size: 99, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:42:57,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=200640.0, ans=0.2 2023-12-21 19:43:07,816 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.525e+01 2.780e+01 2.987e+01 4.498e+01, threshold=5.560e+01, percent-clipped=0.0 2023-12-21 19:43:13,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=200773.33333333334, ans=0.0 2023-12-21 19:43:20,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=200840.0, ans=0.125 2023-12-21 19:43:40,011 INFO [train.py:886] (0/4) Epoch 7, batch 1550, loss[loss=0.01633, audio_tagging_loss=0.01633, over 24750.00 frames. ], tot_loss[loss=0.01593, audio_tagging_loss=0.01593, over 4951095.82 frames. ], batch size: 99, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:43:49,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=201040.0, ans=0.2 2023-12-21 19:44:06,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=201106.66666666666, ans=0.025 2023-12-21 19:44:26,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201240.0, ans=0.1 2023-12-21 19:44:29,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.71 vs. limit=22.5 2023-12-21 19:44:29,922 INFO [train.py:886] (0/4) Epoch 7, batch 1600, loss[loss=0.01324, audio_tagging_loss=0.01324, over 24750.00 frames. ], tot_loss[loss=0.01614, audio_tagging_loss=0.01614, over 4950305.80 frames. ], batch size: 99, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:44:35,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=201306.66666666666, ans=10.0 2023-12-21 19:44:45,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2023-12-21 19:44:49,144 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 2.627e+01 2.765e+01 2.991e+01 3.550e+01, threshold=5.529e+01, percent-clipped=0.0 2023-12-21 19:44:58,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=201440.0, ans=22.5 2023-12-21 19:45:00,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=201506.66666666666, ans=0.04949747468305833 2023-12-21 19:45:08,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201506.66666666666, ans=0.1 2023-12-21 19:45:11,963 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=22.08 vs. limit=22.5 2023-12-21 19:45:17,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.81 vs. limit=10.0 2023-12-21 19:45:18,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=201573.33333333334, ans=0.0 2023-12-21 19:45:21,527 INFO [train.py:886] (0/4) Epoch 7, batch 1650, loss[loss=0.01675, audio_tagging_loss=0.01675, over 24750.00 frames. ], tot_loss[loss=0.01601, audio_tagging_loss=0.01601, over 4949972.96 frames. ], batch size: 99, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:45:22,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=201640.0, ans=0.025 2023-12-21 19:45:29,207 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=12.0 2023-12-21 19:45:34,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.44 vs. limit=22.5 2023-12-21 19:45:42,127 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2023-12-21 19:45:42,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=201773.33333333334, ans=0.2 2023-12-21 19:45:43,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=201773.33333333334, ans=0.125 2023-12-21 19:46:00,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=201840.0, ans=0.2 2023-12-21 19:46:12,795 INFO [train.py:886] (0/4) Epoch 7, batch 1700, loss[loss=0.01548, audio_tagging_loss=0.01548, over 25000.00 frames. ], tot_loss[loss=0.01589, audio_tagging_loss=0.01589, over 4952457.49 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:46:13,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=201973.33333333334, ans=0.2 2023-12-21 19:46:16,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=201973.33333333334, ans=0.125 2023-12-21 19:46:30,527 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.180e+01 2.547e+01 2.700e+01 2.895e+01 3.451e+01, threshold=5.401e+01, percent-clipped=0.0 2023-12-21 19:46:39,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=202106.66666666666, ans=0.125 2023-12-21 19:46:51,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=202173.33333333334, ans=0.0 2023-12-21 19:47:03,149 INFO [train.py:886] (0/4) Epoch 7, batch 1750, loss[loss=0.01971, audio_tagging_loss=0.01971, over 25000.00 frames. ], tot_loss[loss=0.01581, audio_tagging_loss=0.01581, over 4955075.09 frames. ], batch size: 100, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:47:16,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=202373.33333333334, ans=0.2 2023-12-21 19:47:33,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=202506.66666666666, ans=0.125 2023-12-21 19:47:52,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202573.33333333334, ans=0.1 2023-12-21 19:47:53,954 INFO [train.py:886] (0/4) Epoch 7, batch 1800, loss[loss=0.01541, audio_tagging_loss=0.01541, over 25000.00 frames. ], tot_loss[loss=0.01581, audio_tagging_loss=0.01581, over 4954664.71 frames. ], batch size: 100, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:47:54,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.53 vs. limit=22.5 2023-12-21 19:48:00,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=202640.0, ans=0.125 2023-12-21 19:48:12,066 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.560e+01 2.744e+01 2.950e+01 3.454e+01, threshold=5.487e+01, percent-clipped=0.0 2023-12-21 19:48:14,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=202773.33333333334, ans=0.125 2023-12-21 19:48:19,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.97 vs. limit=22.5 2023-12-21 19:48:22,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=202773.33333333334, ans=0.0 2023-12-21 19:48:28,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=202840.0, ans=0.0 2023-12-21 19:48:45,170 INFO [train.py:886] (0/4) Epoch 7, batch 1850, loss[loss=0.01555, audio_tagging_loss=0.01555, over 24750.00 frames. ], tot_loss[loss=0.01584, audio_tagging_loss=0.01584, over 4952869.90 frames. ], batch size: 99, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:48:45,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2023-12-21 19:49:01,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=203040.0, ans=0.2 2023-12-21 19:49:02,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=203040.0, ans=0.125 2023-12-21 19:49:19,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=203173.33333333334, ans=0.2 2023-12-21 19:49:30,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=203240.0, ans=0.0 2023-12-21 19:49:31,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=203240.0, ans=0.125 2023-12-21 19:49:31,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=203240.0, ans=0.125 2023-12-21 19:49:37,309 INFO [train.py:886] (0/4) Epoch 7, batch 1900, loss[loss=0.01635, audio_tagging_loss=0.01635, over 24750.00 frames. ], tot_loss[loss=0.01604, audio_tagging_loss=0.01604, over 4944023.32 frames. ], batch size: 99, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:49:37,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=203306.66666666666, ans=0.2 2023-12-21 19:49:46,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=203373.33333333334, ans=0.0 2023-12-21 19:49:56,099 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.649e+01 2.834e+01 2.981e+01 3.501e+01, threshold=5.668e+01, percent-clipped=0.0 2023-12-21 19:49:59,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=203440.0, ans=0.125 2023-12-21 19:50:21,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=203573.33333333334, ans=0.2 2023-12-21 19:50:26,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=203573.33333333334, ans=0.0 2023-12-21 19:50:29,004 INFO [train.py:886] (0/4) Epoch 7, batch 1950, loss[loss=0.01479, audio_tagging_loss=0.01479, over 25000.00 frames. ], tot_loss[loss=0.01593, audio_tagging_loss=0.01593, over 4941150.88 frames. ], batch size: 100, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:50:32,143 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.52 vs. limit=15.0 2023-12-21 19:50:40,657 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=12.0 2023-12-21 19:50:47,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=203706.66666666666, ans=0.125 2023-12-21 19:51:18,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.96 vs. limit=6.0 2023-12-21 19:51:20,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=203973.33333333334, ans=0.0 2023-12-21 19:51:20,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.01 vs. limit=22.5 2023-12-21 19:51:20,726 INFO [train.py:886] (0/4) Epoch 7, batch 2000, loss[loss=0.01514, audio_tagging_loss=0.01514, over 25000.00 frames. ], tot_loss[loss=0.01574, audio_tagging_loss=0.01574, over 4946006.14 frames. ], batch size: 100, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:51:26,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=203973.33333333334, ans=0.2 2023-12-21 19:51:40,087 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.281e+01 2.631e+01 2.743e+01 2.994e+01 3.542e+01, threshold=5.486e+01, percent-clipped=0.0 2023-12-21 19:51:52,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.25 vs. limit=15.0 2023-12-21 19:51:56,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=204173.33333333334, ans=0.1 2023-12-21 19:52:03,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=204240.0, ans=0.125 2023-12-21 19:52:12,970 INFO [train.py:886] (0/4) Epoch 7, batch 2050, loss[loss=0.01516, audio_tagging_loss=0.01516, over 25000.00 frames. ], tot_loss[loss=0.01583, audio_tagging_loss=0.01583, over 4945943.18 frames. ], batch size: 100, lr: 1.56e-02, grad_scale: 64.0 2023-12-21 19:52:22,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=204373.33333333334, ans=0.1 2023-12-21 19:52:25,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=204373.33333333334, ans=0.125 2023-12-21 19:52:32,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=204440.0, ans=0.0 2023-12-21 19:52:56,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=204573.33333333334, ans=0.125 2023-12-21 19:53:02,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=204640.0, ans=0.125 2023-12-21 19:53:03,695 INFO [train.py:886] (0/4) Epoch 7, batch 2100, loss[loss=0.01383, audio_tagging_loss=0.01383, over 25000.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 4951453.76 frames. ], batch size: 100, lr: 1.56e-02, grad_scale: 64.0 2023-12-21 19:53:07,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=204640.0, ans=0.0 2023-12-21 19:53:09,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=15.0 2023-12-21 19:53:23,306 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.557e+01 2.714e+01 2.947e+01 3.593e+01, threshold=5.429e+01, percent-clipped=0.0 2023-12-21 19:53:30,253 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.041e-03 2023-12-21 19:53:31,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=204773.33333333334, ans=0.1 2023-12-21 19:53:34,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=204840.0, ans=0.125 2023-12-21 19:53:48,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.63 vs. limit=22.5 2023-12-21 19:53:56,454 INFO [train.py:886] (0/4) Epoch 7, batch 2150, loss[loss=0.01427, audio_tagging_loss=0.01427, over 25000.00 frames. ], tot_loss[loss=0.0159, audio_tagging_loss=0.0159, over 4959983.72 frames. ], batch size: 100, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:54:45,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=205240.0, ans=0.2 2023-12-21 19:54:45,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=205240.0, ans=0.125 2023-12-21 19:54:47,850 INFO [train.py:886] (0/4) Epoch 7, batch 2200, loss[loss=0.02004, audio_tagging_loss=0.02004, over 24949.00 frames. ], tot_loss[loss=0.01598, audio_tagging_loss=0.01598, over 4957523.87 frames. ], batch size: 100, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:54:59,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=205373.33333333334, ans=0.0 2023-12-21 19:55:06,280 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.584e+01 2.750e+01 3.020e+01 3.487e+01, threshold=5.500e+01, percent-clipped=0.0 2023-12-21 19:55:09,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=205440.0, ans=0.125 2023-12-21 19:55:09,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=205440.0, ans=0.0 2023-12-21 19:55:20,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=205506.66666666666, ans=0.125 2023-12-21 19:55:23,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=205506.66666666666, ans=0.125 2023-12-21 19:55:25,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=205506.66666666666, ans=0.125 2023-12-21 19:55:33,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=205573.33333333334, ans=0.0 2023-12-21 19:55:36,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=205573.33333333334, ans=0.5 2023-12-21 19:55:38,660 INFO [train.py:886] (0/4) Epoch 7, batch 2250, loss[loss=0.01763, audio_tagging_loss=0.01763, over 24750.00 frames. ], tot_loss[loss=0.01598, audio_tagging_loss=0.01598, over 4954130.83 frames. ], batch size: 99, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:55:42,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=205640.0, ans=0.0 2023-12-21 19:55:45,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=205640.0, ans=0.125 2023-12-21 19:56:02,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=205773.33333333334, ans=0.125 2023-12-21 19:56:19,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=205906.66666666666, ans=0.0 2023-12-21 19:56:29,776 INFO [train.py:886] (0/4) Epoch 7, batch 2300, loss[loss=0.01673, audio_tagging_loss=0.01673, over 25000.00 frames. ], tot_loss[loss=0.0159, audio_tagging_loss=0.0159, over 4948840.16 frames. ], batch size: 100, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:56:34,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=205973.33333333334, ans=0.0 2023-12-21 19:56:36,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=205973.33333333334, ans=0.125 2023-12-21 19:56:48,195 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.578e+01 2.756e+01 2.990e+01 3.667e+01, threshold=5.511e+01, percent-clipped=0.0 2023-12-21 19:56:50,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=206106.66666666666, ans=0.5 2023-12-21 19:56:53,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=206106.66666666666, ans=0.125 2023-12-21 19:57:02,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=206173.33333333334, ans=0.1 2023-12-21 19:57:04,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.55 vs. limit=12.0 2023-12-21 19:57:14,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=206240.0, ans=0.125 2023-12-21 19:57:21,956 INFO [train.py:886] (0/4) Epoch 7, batch 2350, loss[loss=0.01589, audio_tagging_loss=0.01589, over 24750.00 frames. ], tot_loss[loss=0.01587, audio_tagging_loss=0.01587, over 4948498.96 frames. ], batch size: 99, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:57:31,602 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:57:32,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-12-21 19:57:33,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=206373.33333333334, ans=0.0 2023-12-21 19:57:43,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=206440.0, ans=0.0 2023-12-21 19:57:46,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.10 vs. limit=15.0 2023-12-21 19:57:58,053 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=15.0 2023-12-21 19:58:09,568 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.68 vs. limit=22.5 2023-12-21 19:58:13,395 INFO [train.py:886] (0/4) Epoch 7, batch 2400, loss[loss=0.01287, audio_tagging_loss=0.01287, over 24750.00 frames. ], tot_loss[loss=0.01586, audio_tagging_loss=0.01586, over 4944489.45 frames. ], batch size: 99, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:58:26,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206706.66666666666, ans=0.1 2023-12-21 19:58:31,990 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.547e+01 2.719e+01 2.913e+01 3.717e+01, threshold=5.437e+01, percent-clipped=0.0 2023-12-21 19:58:45,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=206840.0, ans=0.125 2023-12-21 19:58:46,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=206840.0, ans=0.0 2023-12-21 19:58:50,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=206840.0, ans=0.0 2023-12-21 19:58:57,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=206906.66666666666, ans=0.125 2023-12-21 19:59:05,238 INFO [train.py:886] (0/4) Epoch 7, batch 2450, loss[loss=0.01721, audio_tagging_loss=0.01721, over 25000.00 frames. ], tot_loss[loss=0.01589, audio_tagging_loss=0.01589, over 4945751.38 frames. ], batch size: 100, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:59:11,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=206973.33333333334, ans=0.125 2023-12-21 19:59:15,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=207040.0, ans=0.125 2023-12-21 19:59:29,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=207106.66666666666, ans=0.0 2023-12-21 19:59:43,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.90 vs. limit=12.0 2023-12-21 19:59:46,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=207240.0, ans=0.0 2023-12-21 19:59:56,863 INFO [train.py:886] (0/4) Epoch 7, batch 2500, loss[loss=0.01801, audio_tagging_loss=0.01801, over 24750.00 frames. ], tot_loss[loss=0.01594, audio_tagging_loss=0.01594, over 4941296.65 frames. ], batch size: 99, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 20:00:05,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=207306.66666666666, ans=0.0 2023-12-21 20:00:10,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=207373.33333333334, ans=0.125 2023-12-21 20:00:15,327 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 2.623e+01 2.777e+01 2.967e+01 3.606e+01, threshold=5.553e+01, percent-clipped=0.0 2023-12-21 20:00:39,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=207573.33333333334, ans=0.0 2023-12-21 20:00:48,965 INFO [train.py:886] (0/4) Epoch 7, batch 2550, loss[loss=0.01288, audio_tagging_loss=0.01288, over 25000.00 frames. ], tot_loss[loss=0.01604, audio_tagging_loss=0.01604, over 4940724.75 frames. ], batch size: 100, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 20:01:01,921 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.17 vs. limit=12.0 2023-12-21 20:01:25,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=207840.0, ans=0.125 2023-12-21 20:01:28,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=207840.0, ans=0.0 2023-12-21 20:01:41,311 INFO [train.py:886] (0/4) Epoch 7, batch 2600, loss[loss=0.01835, audio_tagging_loss=0.01835, over 25000.00 frames. ], tot_loss[loss=0.01599, audio_tagging_loss=0.01599, over 4942733.98 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:01:41,840 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2023-12-21 20:01:42,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=207973.33333333334, ans=0.07 2023-12-21 20:01:51,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=208040.0, ans=0.125 2023-12-21 20:01:51,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=208040.0, ans=0.5 2023-12-21 20:01:59,603 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 2.546e+01 2.741e+01 2.943e+01 4.018e+01, threshold=5.482e+01, percent-clipped=0.0 2023-12-21 20:02:08,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=208106.66666666666, ans=0.0 2023-12-21 20:02:32,844 INFO [train.py:886] (0/4) Epoch 7, batch 2650, loss[loss=0.01672, audio_tagging_loss=0.01672, over 24750.00 frames. ], tot_loss[loss=0.01595, audio_tagging_loss=0.01595, over 4947248.57 frames. ], batch size: 99, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:02:33,467 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=15.0 2023-12-21 20:02:38,858 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.08 vs. limit=6.0 2023-12-21 20:02:52,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2023-12-21 20:02:55,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.37 vs. limit=15.0 2023-12-21 20:03:07,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-12-21 20:03:24,907 INFO [train.py:886] (0/4) Epoch 7, batch 2700, loss[loss=0.01404, audio_tagging_loss=0.01404, over 25000.00 frames. ], tot_loss[loss=0.01592, audio_tagging_loss=0.01592, over 4947738.33 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:03:26,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=208640.0, ans=0.035 2023-12-21 20:03:27,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=208640.0, ans=0.1 2023-12-21 20:03:30,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=15.0 2023-12-21 20:03:32,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=208640.0, ans=0.125 2023-12-21 20:03:32,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=208640.0, ans=0.0 2023-12-21 20:03:43,479 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 2.523e+01 2.665e+01 2.872e+01 3.649e+01, threshold=5.330e+01, percent-clipped=0.0 2023-12-21 20:03:52,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=208773.33333333334, ans=0.0 2023-12-21 20:04:01,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2023-12-21 20:04:05,028 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.842e-02 2023-12-21 20:04:07,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=208906.66666666666, ans=0.0 2023-12-21 20:04:11,661 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.546e-03 2023-12-21 20:04:11,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2023-12-21 20:04:16,736 INFO [train.py:886] (0/4) Epoch 7, batch 2750, loss[loss=0.01631, audio_tagging_loss=0.01631, over 25000.00 frames. ], tot_loss[loss=0.01588, audio_tagging_loss=0.01588, over 4948222.34 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:04:16,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=208973.33333333334, ans=0.0 2023-12-21 20:04:40,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=209106.66666666666, ans=0.0 2023-12-21 20:04:45,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.57 vs. limit=22.5 2023-12-21 20:04:59,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2023-12-21 20:05:06,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=209240.0, ans=22.5 2023-12-21 20:05:08,640 INFO [train.py:886] (0/4) Epoch 7, batch 2800, loss[loss=0.01632, audio_tagging_loss=0.01632, over 24750.00 frames. ], tot_loss[loss=0.01594, audio_tagging_loss=0.01594, over 4950496.15 frames. ], batch size: 99, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:05:15,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=209306.66666666666, ans=0.0 2023-12-21 20:05:24,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=209373.33333333334, ans=0.2 2023-12-21 20:05:28,168 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 2.667e+01 2.782e+01 2.958e+01 3.854e+01, threshold=5.564e+01, percent-clipped=0.0 2023-12-21 20:05:43,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.41 vs. limit=15.0 2023-12-21 20:06:00,778 INFO [train.py:886] (0/4) Epoch 7, batch 2850, loss[loss=0.01781, audio_tagging_loss=0.01781, over 24750.00 frames. ], tot_loss[loss=0.016, audio_tagging_loss=0.016, over 4945437.50 frames. ], batch size: 99, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:06:51,798 INFO [train.py:886] (0/4) Epoch 7, batch 2900, loss[loss=0.01232, audio_tagging_loss=0.01232, over 24750.00 frames. ], tot_loss[loss=0.01586, audio_tagging_loss=0.01586, over 4943872.40 frames. ], batch size: 99, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:07:10,770 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.581e+01 2.783e+01 3.000e+01 3.656e+01, threshold=5.566e+01, percent-clipped=0.0 2023-12-21 20:07:16,608 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-12-21 20:07:26,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=210173.33333333334, ans=0.2 2023-12-21 20:07:43,854 INFO [train.py:886] (0/4) Epoch 7, batch 2950, loss[loss=0.01543, audio_tagging_loss=0.01543, over 25000.00 frames. ], tot_loss[loss=0.01572, audio_tagging_loss=0.01572, over 4946118.30 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:07:55,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=210373.33333333334, ans=0.125 2023-12-21 20:08:26,758 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.04 vs. limit=10.0 2023-12-21 20:08:36,043 INFO [train.py:886] (0/4) Epoch 7, batch 3000, loss[loss=0.01302, audio_tagging_loss=0.01302, over 24750.00 frames. ], tot_loss[loss=0.01566, audio_tagging_loss=0.01566, over 4947994.11 frames. ], batch size: 99, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:08:36,045 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 20:08:43,984 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7173, 2.0690, 3.3722, 3.1793], device='cuda:0') 2023-12-21 20:08:57,466 INFO [train.py:917] (0/4) Epoch 7, validation: loss=0.03818, audio_tagging_loss=0.03818, over 3737520.00 frames. 2023-12-21 20:08:57,467 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 20:09:03,614 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.38 vs. limit=22.5 2023-12-21 20:09:10,666 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.14 vs. limit=15.0 2023-12-21 20:09:12,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2023-12-21 20:09:15,949 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.521e+01 2.655e+01 2.830e+01 3.730e+01, threshold=5.311e+01, percent-clipped=0.0 2023-12-21 20:09:19,084 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.108e+01 2023-12-21 20:09:25,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=210773.33333333334, ans=0.125 2023-12-21 20:09:28,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=210840.0, ans=0.5 2023-12-21 20:09:35,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=210840.0, ans=0.2 2023-12-21 20:09:36,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=210840.0, ans=0.125 2023-12-21 20:09:45,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=15.0 2023-12-21 20:09:49,766 INFO [train.py:886] (0/4) Epoch 7, batch 3050, loss[loss=0.01768, audio_tagging_loss=0.01768, over 25000.00 frames. ], tot_loss[loss=0.01571, audio_tagging_loss=0.01571, over 4945327.00 frames. ], batch size: 100, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:09:54,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=210973.33333333334, ans=0.125 2023-12-21 20:10:12,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=211106.66666666666, ans=0.0 2023-12-21 20:10:15,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=211106.66666666666, ans=0.0 2023-12-21 20:10:23,148 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 20:10:27,495 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.49 vs. limit=22.5 2023-12-21 20:10:31,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=211240.0, ans=0.0 2023-12-21 20:10:36,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=15.0 2023-12-21 20:10:41,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=211306.66666666666, ans=0.125 2023-12-21 20:10:42,207 INFO [train.py:886] (0/4) Epoch 7, batch 3100, loss[loss=0.01549, audio_tagging_loss=0.01549, over 24750.00 frames. ], tot_loss[loss=0.01582, audio_tagging_loss=0.01582, over 4946565.36 frames. ], batch size: 99, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:11:00,331 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+01 2.611e+01 2.756e+01 2.922e+01 4.082e+01, threshold=5.512e+01, percent-clipped=0.0 2023-12-21 20:11:28,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=211573.33333333334, ans=0.125 2023-12-21 20:11:33,192 INFO [train.py:886] (0/4) Epoch 7, batch 3150, loss[loss=0.01769, audio_tagging_loss=0.01769, over 24750.00 frames. ], tot_loss[loss=0.01594, audio_tagging_loss=0.01594, over 4948156.67 frames. ], batch size: 99, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:11:34,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=211640.0, ans=0.0 2023-12-21 20:11:38,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=15.0 2023-12-21 20:11:44,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2023-12-21 20:11:46,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=211706.66666666666, ans=0.2 2023-12-21 20:11:49,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=211706.66666666666, ans=0.125 2023-12-21 20:12:21,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=211906.66666666666, ans=0.2 2023-12-21 20:12:25,085 INFO [train.py:886] (0/4) Epoch 7, batch 3200, loss[loss=0.01576, audio_tagging_loss=0.01576, over 25000.00 frames. ], tot_loss[loss=0.016, audio_tagging_loss=0.016, over 4947344.69 frames. ], batch size: 100, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:12:42,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2023-12-21 20:12:43,123 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.209e+01 2.609e+01 2.784e+01 2.998e+01 3.894e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-21 20:13:17,064 INFO [train.py:886] (0/4) Epoch 7, batch 3250, loss[loss=0.01635, audio_tagging_loss=0.01635, over 24750.00 frames. ], tot_loss[loss=0.01592, audio_tagging_loss=0.01592, over 4952102.89 frames. ], batch size: 99, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:13:25,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.93 vs. limit=10.0 2023-12-21 20:13:32,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=212373.33333333334, ans=0.05 2023-12-21 20:13:42,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=212440.0, ans=0.0 2023-12-21 20:13:46,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=212506.66666666666, ans=0.125 2023-12-21 20:13:53,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=212506.66666666666, ans=0.0 2023-12-21 20:13:56,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=212506.66666666666, ans=0.125 2023-12-21 20:13:58,609 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 20:14:08,614 INFO [train.py:886] (0/4) Epoch 7, batch 3300, loss[loss=0.01783, audio_tagging_loss=0.01783, over 22515.00 frames. ], tot_loss[loss=0.01594, audio_tagging_loss=0.01594, over 4952502.64 frames. ], batch size: 107, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:14:11,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=212640.0, ans=0.0 2023-12-21 20:14:15,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=212640.0, ans=0.125 2023-12-21 20:14:15,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=212640.0, ans=0.0 2023-12-21 20:14:27,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-12-21 20:14:27,799 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.135e+01 2.590e+01 2.789e+01 2.998e+01 3.537e+01, threshold=5.578e+01, percent-clipped=0.0 2023-12-21 20:14:28,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=212706.66666666666, ans=0.125 2023-12-21 20:14:29,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=212773.33333333334, ans=0.09899494936611666 2023-12-21 20:14:31,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=212773.33333333334, ans=0.0 2023-12-21 20:14:46,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=212840.0, ans=0.2 2023-12-21 20:14:56,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.76 vs. limit=15.0 2023-12-21 20:14:58,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=212906.66666666666, ans=0.05 2023-12-21 20:15:01,340 INFO [train.py:886] (0/4) Epoch 7, batch 3350, loss[loss=0.01667, audio_tagging_loss=0.01667, over 25000.00 frames. ], tot_loss[loss=0.01591, audio_tagging_loss=0.01591, over 4959342.16 frames. ], batch size: 100, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:15:05,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.75 vs. limit=22.5 2023-12-21 20:15:08,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=212973.33333333334, ans=0.05 2023-12-21 20:15:11,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.16 vs. limit=15.0 2023-12-21 20:15:32,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.56 vs. limit=15.0 2023-12-21 20:15:53,143 INFO [train.py:886] (0/4) Epoch 7, batch 3400, loss[loss=0.01971, audio_tagging_loss=0.01971, over 24946.00 frames. ], tot_loss[loss=0.0159, audio_tagging_loss=0.0159, over 4956485.96 frames. ], batch size: 100, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:15:56,168 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-32000.pt 2023-12-21 20:16:08,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=213373.33333333334, ans=0.125 2023-12-21 20:16:13,540 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+01 2.594e+01 2.749e+01 2.971e+01 3.801e+01, threshold=5.499e+01, percent-clipped=0.0 2023-12-21 20:16:14,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=213440.0, ans=0.1 2023-12-21 20:16:33,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.05 vs. limit=15.0 2023-12-21 20:16:41,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=213573.33333333334, ans=0.125 2023-12-21 20:16:46,479 INFO [train.py:886] (0/4) Epoch 7, batch 3450, loss[loss=0.01321, audio_tagging_loss=0.01321, over 24067.00 frames. ], tot_loss[loss=0.01602, audio_tagging_loss=0.01602, over 4949099.96 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:17:25,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=213840.0, ans=0.2 2023-12-21 20:17:28,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.55 vs. limit=15.0 2023-12-21 20:17:35,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.45 vs. limit=15.0 2023-12-21 20:17:38,620 INFO [train.py:886] (0/4) Epoch 7, batch 3500, loss[loss=0.01475, audio_tagging_loss=0.01475, over 24750.00 frames. ], tot_loss[loss=0.01604, audio_tagging_loss=0.01604, over 4940706.36 frames. ], batch size: 99, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:17:45,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=213973.33333333334, ans=0.0 2023-12-21 20:17:53,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=214040.0, ans=0.035 2023-12-21 20:17:56,290 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.629e+01 2.792e+01 2.990e+01 3.562e+01, threshold=5.583e+01, percent-clipped=0.0 2023-12-21 20:17:59,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=214106.66666666666, ans=0.1 2023-12-21 20:18:06,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=214106.66666666666, ans=22.5 2023-12-21 20:18:14,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=214173.33333333334, ans=0.1 2023-12-21 20:18:29,576 INFO [train.py:886] (0/4) Epoch 7, batch 3550, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.01587, audio_tagging_loss=0.01587, over 4942129.95 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:18:31,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.40 vs. limit=15.0 2023-12-21 20:18:33,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=214306.66666666666, ans=0.125 2023-12-21 20:18:56,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=214440.0, ans=0.125 2023-12-21 20:19:16,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=214573.33333333334, ans=15.0 2023-12-21 20:19:20,731 INFO [train.py:886] (0/4) Epoch 7, batch 3600, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 4948349.40 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:19:21,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=214640.0, ans=0.125 2023-12-21 20:19:31,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=214706.66666666666, ans=0.125 2023-12-21 20:19:32,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2023-12-21 20:19:38,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.00 vs. limit=10.0 2023-12-21 20:19:39,452 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.510e+01 2.669e+01 2.897e+01 3.609e+01, threshold=5.338e+01, percent-clipped=0.0 2023-12-21 20:19:44,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=214773.33333333334, ans=0.2 2023-12-21 20:19:54,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=214840.0, ans=0.2 2023-12-21 20:20:04,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=214906.66666666666, ans=0.125 2023-12-21 20:20:05,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=214906.66666666666, ans=0.2 2023-12-21 20:20:05,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=214906.66666666666, ans=0.2 2023-12-21 20:20:12,777 INFO [train.py:886] (0/4) Epoch 7, batch 3650, loss[loss=0.01437, audio_tagging_loss=0.01437, over 25000.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 4946332.96 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:20:16,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=214973.33333333334, ans=0.125 2023-12-21 20:20:18,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=214973.33333333334, ans=0.0 2023-12-21 20:20:20,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=214973.33333333334, ans=0.125 2023-12-21 20:20:21,885 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.354e+00 2023-12-21 20:20:46,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.21 vs. limit=15.0 2023-12-21 20:20:59,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.16 vs. limit=6.0 2023-12-21 20:21:01,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=215240.0, ans=0.0 2023-12-21 20:21:02,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=215240.0, ans=0.0 2023-12-21 20:21:04,227 INFO [train.py:886] (0/4) Epoch 7, batch 3700, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.0157, audio_tagging_loss=0.0157, over 4951483.62 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:21:05,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=215306.66666666666, ans=0.125 2023-12-21 20:21:08,385 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2023-12-21 20:21:10,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=215306.66666666666, ans=0.1 2023-12-21 20:21:13,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=215373.33333333334, ans=0.0 2023-12-21 20:21:17,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=215373.33333333334, ans=0.125 2023-12-21 20:21:20,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=215373.33333333334, ans=0.0 2023-12-21 20:21:24,015 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.509e+01 2.714e+01 2.873e+01 3.497e+01, threshold=5.429e+01, percent-clipped=0.0 2023-12-21 20:21:29,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.83 vs. limit=22.5 2023-12-21 20:21:43,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2023-12-21 20:21:56,908 INFO [train.py:886] (0/4) Epoch 7, batch 3750, loss[loss=0.01991, audio_tagging_loss=0.01991, over 24750.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 4951554.11 frames. ], batch size: 99, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:22:06,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.15 vs. limit=22.5 2023-12-21 20:22:47,569 INFO [train.py:886] (0/4) Epoch 7, batch 3800, loss[loss=0.01981, audio_tagging_loss=0.01981, over 24750.00 frames. ], tot_loss[loss=0.01592, audio_tagging_loss=0.01592, over 4944310.16 frames. ], batch size: 99, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:22:55,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=15.0 2023-12-21 20:23:06,292 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.355e+01 2.628e+01 2.789e+01 3.025e+01 3.707e+01, threshold=5.578e+01, percent-clipped=0.0 2023-12-21 20:23:07,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=216106.66666666666, ans=0.0 2023-12-21 20:23:15,973 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2023-12-21 20:23:16,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=216106.66666666666, ans=0.125 2023-12-21 20:23:28,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=216240.0, ans=0.125 2023-12-21 20:23:38,072 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.96 vs. limit=12.0 2023-12-21 20:23:39,551 INFO [train.py:886] (0/4) Epoch 7, batch 3850, loss[loss=0.01889, audio_tagging_loss=0.01889, over 24923.00 frames. ], tot_loss[loss=0.01586, audio_tagging_loss=0.01586, over 4942073.05 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:23:41,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=216306.66666666666, ans=0.1 2023-12-21 20:23:48,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=216306.66666666666, ans=0.1 2023-12-21 20:23:53,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=216373.33333333334, ans=0.125 2023-12-21 20:24:11,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.98 vs. limit=15.0 2023-12-21 20:24:30,980 INFO [train.py:886] (0/4) Epoch 7, batch 3900, loss[loss=0.01647, audio_tagging_loss=0.01647, over 25000.00 frames. ], tot_loss[loss=0.01587, audio_tagging_loss=0.01587, over 4944132.18 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:24:34,791 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.466e+00 2023-12-21 20:24:37,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.64 vs. limit=15.0 2023-12-21 20:24:48,636 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.626e+01 2.763e+01 2.975e+01 3.870e+01, threshold=5.525e+01, percent-clipped=0.0 2023-12-21 20:25:21,838 INFO [train.py:886] (0/4) Epoch 7, batch 3950, loss[loss=0.01644, audio_tagging_loss=0.01644, over 25000.00 frames. ], tot_loss[loss=0.01575, audio_tagging_loss=0.01575, over 4952069.52 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:25:27,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=216973.33333333334, ans=0.125 2023-12-21 20:25:28,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.25 vs. limit=22.5 2023-12-21 20:25:29,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=216973.33333333334, ans=0.0 2023-12-21 20:25:44,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=217106.66666666666, ans=15.0 2023-12-21 20:26:01,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=217173.33333333334, ans=0.125 2023-12-21 20:26:14,573 INFO [train.py:886] (0/4) Epoch 7, batch 4000, loss[loss=0.01771, audio_tagging_loss=0.01771, over 25000.00 frames. ], tot_loss[loss=0.01577, audio_tagging_loss=0.01577, over 4955591.07 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:26:20,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=217306.66666666666, ans=0.125 2023-12-21 20:26:21,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=217306.66666666666, ans=0.125 2023-12-21 20:26:23,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=217373.33333333334, ans=0.125 2023-12-21 20:26:24,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=217373.33333333334, ans=0.05 2023-12-21 20:26:32,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-12-21 20:26:32,674 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+01 2.642e+01 2.775e+01 3.007e+01 3.915e+01, threshold=5.549e+01, percent-clipped=0.0 2023-12-21 20:26:39,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217440.0, ans=0.1 2023-12-21 20:27:05,536 INFO [train.py:886] (0/4) Epoch 7, batch 4050, loss[loss=0.0152, audio_tagging_loss=0.0152, over 23993.00 frames. ], tot_loss[loss=0.01575, audio_tagging_loss=0.01575, over 4959878.83 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:27:10,982 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 20:27:17,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217706.66666666666, ans=0.1 2023-12-21 20:27:19,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=217706.66666666666, ans=0.0 2023-12-21 20:27:19,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217706.66666666666, ans=0.1 2023-12-21 20:27:24,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=217706.66666666666, ans=0.2 2023-12-21 20:27:28,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.96 vs. limit=22.5 2023-12-21 20:27:35,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=217840.0, ans=0.125 2023-12-21 20:27:37,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=217840.0, ans=0.0 2023-12-21 20:27:37,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=217840.0, ans=0.0 2023-12-21 20:27:54,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=217906.66666666666, ans=0.125 2023-12-21 20:27:54,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.90 vs. limit=15.0 2023-12-21 20:27:58,019 INFO [train.py:886] (0/4) Epoch 7, batch 4100, loss[loss=0.01467, audio_tagging_loss=0.01467, over 24750.00 frames. ], tot_loss[loss=0.01592, audio_tagging_loss=0.01592, over 4956382.33 frames. ], batch size: 99, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:28:18,076 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.263e+01 2.585e+01 2.755e+01 2.921e+01 3.340e+01, threshold=5.510e+01, percent-clipped=0.0 2023-12-21 20:28:21,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=12.0 2023-12-21 20:28:36,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.42 vs. limit=22.5 2023-12-21 20:28:39,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218240.0, ans=0.1 2023-12-21 20:28:41,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.96 vs. limit=15.0 2023-12-21 20:28:47,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218240.0, ans=0.1 2023-12-21 20:28:50,131 INFO [train.py:886] (0/4) Epoch 7, batch 4150, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.01587, audio_tagging_loss=0.01587, over 4954721.80 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:28:55,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=218306.66666666666, ans=0.125 2023-12-21 20:29:09,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=218440.0, ans=0.125 2023-12-21 20:29:21,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2023-12-21 20:29:23,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=218506.66666666666, ans=0.125 2023-12-21 20:29:36,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=218573.33333333334, ans=0.125 2023-12-21 20:29:41,472 INFO [train.py:886] (0/4) Epoch 7, batch 4200, loss[loss=0.01451, audio_tagging_loss=0.01451, over 24750.00 frames. ], tot_loss[loss=0.01587, audio_tagging_loss=0.01587, over 4949832.36 frames. ], batch size: 99, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:29:41,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=218640.0, ans=0.1 2023-12-21 20:29:58,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=218706.66666666666, ans=0.0 2023-12-21 20:30:01,126 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.567e+01 2.744e+01 2.993e+01 3.872e+01, threshold=5.489e+01, percent-clipped=0.0 2023-12-21 20:30:12,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218840.0, ans=0.1 2023-12-21 20:30:15,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=218840.0, ans=0.125 2023-12-21 20:30:18,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=218840.0, ans=0.125 2023-12-21 20:30:19,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=218840.0, ans=0.125 2023-12-21 20:30:33,455 INFO [train.py:886] (0/4) Epoch 7, batch 4250, loss[loss=0.01547, audio_tagging_loss=0.01547, over 23993.00 frames. ], tot_loss[loss=0.01581, audio_tagging_loss=0.01581, over 4952144.80 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:31:12,116 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=6.606e-02 2023-12-21 20:31:15,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=219240.0, ans=0.2 2023-12-21 20:31:23,638 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.21 vs. limit=12.0 2023-12-21 20:31:25,683 INFO [train.py:886] (0/4) Epoch 7, batch 4300, loss[loss=0.01512, audio_tagging_loss=0.01512, over 25000.00 frames. ], tot_loss[loss=0.01582, audio_tagging_loss=0.01582, over 4955182.16 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:31:32,175 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.510e-03 2023-12-21 20:31:33,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=219306.66666666666, ans=0.0 2023-12-21 20:31:34,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=219306.66666666666, ans=0.04949747468305833 2023-12-21 20:31:35,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=219373.33333333334, ans=0.05 2023-12-21 20:31:39,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.18 vs. limit=15.0 2023-12-21 20:31:45,085 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+01 2.678e+01 2.884e+01 3.041e+01 3.867e+01, threshold=5.769e+01, percent-clipped=0.0 2023-12-21 20:31:56,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2023-12-21 20:31:59,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=219506.66666666666, ans=0.1 2023-12-21 20:32:01,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=219506.66666666666, ans=0.125 2023-12-21 20:32:01,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=219506.66666666666, ans=0.0 2023-12-21 20:32:06,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=219573.33333333334, ans=0.125 2023-12-21 20:32:06,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2023-12-21 20:32:07,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=219573.33333333334, ans=0.125 2023-12-21 20:32:13,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=219573.33333333334, ans=0.125 2023-12-21 20:32:13,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.13 vs. limit=15.0 2023-12-21 20:32:16,665 INFO [train.py:886] (0/4) Epoch 7, batch 4350, loss[loss=0.01691, audio_tagging_loss=0.01691, over 25000.00 frames. ], tot_loss[loss=0.01582, audio_tagging_loss=0.01582, over 4955249.80 frames. ], batch size: 100, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:32:18,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=219640.0, ans=0.1 2023-12-21 20:32:34,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=219706.66666666666, ans=0.0 2023-12-21 20:32:44,379 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.37 vs. limit=22.5 2023-12-21 20:32:46,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=219773.33333333334, ans=0.2 2023-12-21 20:32:53,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219840.0, ans=0.1 2023-12-21 20:32:57,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.40 vs. limit=10.0 2023-12-21 20:33:02,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.63 vs. limit=15.0 2023-12-21 20:33:09,187 INFO [train.py:886] (0/4) Epoch 7, batch 4400, loss[loss=0.01543, audio_tagging_loss=0.01543, over 25000.00 frames. ], tot_loss[loss=0.01595, audio_tagging_loss=0.01595, over 4955085.05 frames. ], batch size: 100, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:33:13,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=219973.33333333334, ans=0.125 2023-12-21 20:33:14,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=219973.33333333334, ans=0.125 2023-12-21 20:33:26,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=220040.0, ans=0.125 2023-12-21 20:33:27,130 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.46 vs. limit=22.5 2023-12-21 20:33:28,579 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+01 2.625e+01 2.823e+01 3.033e+01 3.448e+01, threshold=5.646e+01, percent-clipped=0.0 2023-12-21 20:33:28,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=220106.66666666666, ans=0.125 2023-12-21 20:33:30,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=220106.66666666666, ans=0.5 2023-12-21 20:33:37,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=220106.66666666666, ans=0.0 2023-12-21 20:33:40,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=220173.33333333334, ans=0.125 2023-12-21 20:33:40,924 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.20 vs. limit=22.5 2023-12-21 20:33:59,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.43 vs. limit=15.0 2023-12-21 20:34:00,661 INFO [train.py:886] (0/4) Epoch 7, batch 4450, loss[loss=0.0146, audio_tagging_loss=0.0146, over 25000.00 frames. ], tot_loss[loss=0.016, audio_tagging_loss=0.016, over 4949394.06 frames. ], batch size: 100, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:34:17,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=220373.33333333334, ans=0.0 2023-12-21 20:34:23,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=220440.0, ans=0.125 2023-12-21 20:34:28,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=220440.0, ans=0.0 2023-12-21 20:34:35,723 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 20:34:52,048 INFO [train.py:886] (0/4) Epoch 7, batch 4500, loss[loss=0.01737, audio_tagging_loss=0.01737, over 25000.00 frames. ], tot_loss[loss=0.0159, audio_tagging_loss=0.0159, over 4949197.30 frames. ], batch size: 100, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:34:58,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2023-12-21 20:35:05,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=15.0 2023-12-21 20:35:12,164 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.651e+01 2.803e+01 3.008e+01 3.714e+01, threshold=5.606e+01, percent-clipped=0.0 2023-12-21 20:35:25,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=220840.0, ans=0.125 2023-12-21 20:35:25,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=12.0 2023-12-21 20:35:28,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=220840.0, ans=0.025 2023-12-21 20:35:35,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=220906.66666666666, ans=0.2 2023-12-21 20:35:43,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220973.33333333334, ans=0.1 2023-12-21 20:35:44,516 INFO [train.py:886] (0/4) Epoch 7, batch 4550, loss[loss=0.01317, audio_tagging_loss=0.01317, over 24750.00 frames. ], tot_loss[loss=0.01589, audio_tagging_loss=0.01589, over 4947027.28 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:36:00,183 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.34 vs. limit=15.0 2023-12-21 20:36:00,253 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2023-12-21 20:36:07,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=221106.66666666666, ans=0.1 2023-12-21 20:36:09,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=221106.66666666666, ans=0.1 2023-12-21 20:36:10,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.86 vs. limit=22.5 2023-12-21 20:36:20,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=221173.33333333334, ans=0.125 2023-12-21 20:36:24,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=221173.33333333334, ans=0.0 2023-12-21 20:36:28,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=221240.0, ans=0.125 2023-12-21 20:36:28,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=221240.0, ans=0.0 2023-12-21 20:36:30,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2023-12-21 20:36:31,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=221240.0, ans=0.125 2023-12-21 20:36:36,197 INFO [train.py:886] (0/4) Epoch 7, batch 4600, loss[loss=0.01834, audio_tagging_loss=0.01834, over 24750.00 frames. ], tot_loss[loss=0.0159, audio_tagging_loss=0.0159, over 4949486.97 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:36:44,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=221306.66666666666, ans=0.125 2023-12-21 20:36:49,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=221373.33333333334, ans=0.125 2023-12-21 20:36:56,444 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 2.617e+01 2.767e+01 2.974e+01 3.555e+01, threshold=5.535e+01, percent-clipped=0.0 2023-12-21 20:37:20,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=221573.33333333334, ans=0.07 2023-12-21 20:37:25,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=221573.33333333334, ans=0.07 2023-12-21 20:37:28,707 INFO [train.py:886] (0/4) Epoch 7, batch 4650, loss[loss=0.01471, audio_tagging_loss=0.01471, over 25000.00 frames. ], tot_loss[loss=0.01586, audio_tagging_loss=0.01586, over 4957261.46 frames. ], batch size: 100, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:37:40,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-12-21 20:37:42,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=221706.66666666666, ans=0.07 2023-12-21 20:38:07,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=221840.0, ans=0.07 2023-12-21 20:38:11,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=221906.66666666666, ans=0.0 2023-12-21 20:38:15,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=221906.66666666666, ans=0.125 2023-12-21 20:38:19,053 INFO [train.py:886] (0/4) Epoch 7, batch 4700, loss[loss=0.01921, audio_tagging_loss=0.01921, over 24750.00 frames. ], tot_loss[loss=0.01604, audio_tagging_loss=0.01604, over 4958290.84 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:38:22,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=221973.33333333334, ans=0.2 2023-12-21 20:38:24,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.74 vs. limit=10.0 2023-12-21 20:38:30,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=222040.0, ans=0.125 2023-12-21 20:38:33,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=222040.0, ans=0.125 2023-12-21 20:38:36,962 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.697e+01 2.843e+01 3.036e+01 3.752e+01, threshold=5.686e+01, percent-clipped=0.0 2023-12-21 20:38:48,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=222173.33333333334, ans=0.125 2023-12-21 20:39:00,389 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2023-12-21 20:39:04,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=222240.0, ans=0.125 2023-12-21 20:39:05,635 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 20:39:06,413 INFO [train.py:886] (0/4) Epoch 7, batch 4750, loss[loss=0.014, audio_tagging_loss=0.014, over 24750.00 frames. ], tot_loss[loss=0.01612, audio_tagging_loss=0.01612, over 4955131.51 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:39:17,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=222373.33333333334, ans=0.125 2023-12-21 20:39:21,460 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-7.pt 2023-12-21 20:39:42,213 INFO [train.py:886] (0/4) Epoch 8, batch 0, loss[loss=0.03649, audio_tagging_loss=0.03649, over 24048.00 frames. ], tot_loss[loss=0.03649, audio_tagging_loss=0.03649, over 24048.00 frames. ], batch size: 100, lr: 1.41e-02, grad_scale: 64.0 2023-12-21 20:39:42,215 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 20:39:59,436 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.3537, 2.3715, 3.3816, 3.5156], device='cuda:0') 2023-12-21 20:40:00,128 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.3701, 2.3438, 3.3598, 3.5326], device='cuda:0') 2023-12-21 20:40:03,501 INFO [train.py:917] (0/4) Epoch 8, validation: loss=0.0357, audio_tagging_loss=0.0357, over 3737520.00 frames. 2023-12-21 20:40:03,501 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 20:40:21,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=222480.0, ans=0.125 2023-12-21 20:40:35,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-12-21 20:40:55,244 INFO [train.py:886] (0/4) Epoch 8, batch 50, loss[loss=0.02202, audio_tagging_loss=0.02202, over 25000.00 frames. ], tot_loss[loss=0.02533, audio_tagging_loss=0.02533, over 1119314.70 frames. ], batch size: 100, lr: 1.41e-02, grad_scale: 32.0 2023-12-21 20:40:55,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=222746.66666666666, ans=0.0 2023-12-21 20:40:59,553 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.852e+01 3.338e+01 3.973e+01 1.217e+02, threshold=6.676e+01, percent-clipped=6.0 2023-12-21 20:41:02,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=222746.66666666666, ans=0.125 2023-12-21 20:41:06,341 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.02 vs. limit=22.5 2023-12-21 20:41:22,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2023-12-21 20:41:28,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.04 vs. limit=22.5 2023-12-21 20:41:31,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=222946.66666666666, ans=0.1 2023-12-21 20:41:47,035 INFO [train.py:886] (0/4) Epoch 8, batch 100, loss[loss=0.01759, audio_tagging_loss=0.01759, over 25000.00 frames. ], tot_loss[loss=0.02155, audio_tagging_loss=0.02155, over 1972504.06 frames. ], batch size: 100, lr: 1.41e-02, grad_scale: 32.0 2023-12-21 20:41:57,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=223146.66666666666, ans=0.0 2023-12-21 20:42:06,581 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.999e-01 2023-12-21 20:42:09,650 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=22.5 2023-12-21 20:42:23,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.88 vs. limit=10.0 2023-12-21 20:42:26,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=223280.0, ans=0.125 2023-12-21 20:42:31,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=223346.66666666666, ans=0.125 2023-12-21 20:42:33,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=223346.66666666666, ans=0.125 2023-12-21 20:42:37,953 INFO [train.py:886] (0/4) Epoch 8, batch 150, loss[loss=0.01683, audio_tagging_loss=0.01683, over 25000.00 frames. ], tot_loss[loss=0.01966, audio_tagging_loss=0.01966, over 2640852.59 frames. ], batch size: 100, lr: 1.41e-02, grad_scale: 32.0 2023-12-21 20:42:39,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=223413.33333333334, ans=0.09899494936611666 2023-12-21 20:42:41,687 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.729e+01 2.893e+01 3.075e+01 3.731e+01, threshold=5.785e+01, percent-clipped=0.0 2023-12-21 20:42:49,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=223480.0, ans=0.125 2023-12-21 20:42:56,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=223480.0, ans=0.0 2023-12-21 20:43:00,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.83 vs. limit=10.0 2023-12-21 20:43:04,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=223546.66666666666, ans=0.125 2023-12-21 20:43:11,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2023-12-21 20:43:20,079 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2023-12-21 20:43:24,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=223680.0, ans=0.125 2023-12-21 20:43:28,864 INFO [train.py:886] (0/4) Epoch 8, batch 200, loss[loss=0.01869, audio_tagging_loss=0.01869, over 25000.00 frames. ], tot_loss[loss=0.01864, audio_tagging_loss=0.01864, over 3157999.09 frames. ], batch size: 100, lr: 1.41e-02, grad_scale: 32.0 2023-12-21 20:43:49,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=223880.0, ans=0.125 2023-12-21 20:44:21,444 INFO [train.py:886] (0/4) Epoch 8, batch 250, loss[loss=0.01513, audio_tagging_loss=0.01513, over 25000.00 frames. ], tot_loss[loss=0.0178, audio_tagging_loss=0.0178, over 3562339.36 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:44:25,226 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.594e+01 2.804e+01 3.008e+01 3.563e+01, threshold=5.609e+01, percent-clipped=0.0 2023-12-21 20:44:58,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=224280.0, ans=0.125 2023-12-21 20:45:11,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=224413.33333333334, ans=0.125 2023-12-21 20:45:11,813 INFO [train.py:886] (0/4) Epoch 8, batch 300, loss[loss=0.01455, audio_tagging_loss=0.01455, over 24750.00 frames. ], tot_loss[loss=0.01737, audio_tagging_loss=0.01737, over 3866531.24 frames. ], batch size: 99, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:45:14,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=224413.33333333334, ans=0.125 2023-12-21 20:45:20,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=224413.33333333334, ans=0.025 2023-12-21 20:45:33,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.79 vs. limit=15.0 2023-12-21 20:45:34,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=224546.66666666666, ans=0.2 2023-12-21 20:45:40,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.63 vs. limit=6.0 2023-12-21 20:45:41,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=224546.66666666666, ans=0.2 2023-12-21 20:45:47,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=224613.33333333334, ans=0.125 2023-12-21 20:45:51,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=224613.33333333334, ans=0.0 2023-12-21 20:45:52,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=224680.0, ans=0.125 2023-12-21 20:45:55,843 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2023-12-21 20:46:04,447 INFO [train.py:886] (0/4) Epoch 8, batch 350, loss[loss=0.01503, audio_tagging_loss=0.01503, over 25000.00 frames. ], tot_loss[loss=0.01704, audio_tagging_loss=0.01704, over 4107079.43 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:46:08,267 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.559e+01 2.704e+01 2.867e+01 3.346e+01, threshold=5.408e+01, percent-clipped=0.0 2023-12-21 20:46:24,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2023-12-21 20:46:33,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=224880.0, ans=0.125 2023-12-21 20:46:56,022 INFO [train.py:886] (0/4) Epoch 8, batch 400, loss[loss=0.01872, audio_tagging_loss=0.01872, over 22099.00 frames. ], tot_loss[loss=0.0166, audio_tagging_loss=0.0166, over 4294602.52 frames. ], batch size: 107, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:47:01,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=225080.0, ans=0.0 2023-12-21 20:47:02,582 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.565e-03 2023-12-21 20:47:08,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=225146.66666666666, ans=0.0 2023-12-21 20:47:21,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.35 vs. limit=22.5 2023-12-21 20:47:28,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=225280.0, ans=0.0 2023-12-21 20:47:31,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2023-12-21 20:47:35,634 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2023-12-21 20:47:40,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=225346.66666666666, ans=0.125 2023-12-21 20:47:48,116 INFO [train.py:886] (0/4) Epoch 8, batch 450, loss[loss=0.02041, audio_tagging_loss=0.02041, over 25000.00 frames. ], tot_loss[loss=0.01636, audio_tagging_loss=0.01636, over 4437472.66 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:47:51,844 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.539e+01 2.733e+01 2.950e+01 3.447e+01, threshold=5.466e+01, percent-clipped=0.0 2023-12-21 20:48:00,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=225480.0, ans=0.0 2023-12-21 20:48:09,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=225546.66666666666, ans=0.0 2023-12-21 20:48:22,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=225613.33333333334, ans=0.0 2023-12-21 20:48:33,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=225680.0, ans=0.125 2023-12-21 20:48:38,435 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2023-12-21 20:48:40,727 INFO [train.py:886] (0/4) Epoch 8, batch 500, loss[loss=0.01323, audio_tagging_loss=0.01323, over 25000.00 frames. ], tot_loss[loss=0.01623, audio_tagging_loss=0.01623, over 4554247.81 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:48:41,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=225746.66666666666, ans=0.125 2023-12-21 20:48:46,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=225746.66666666666, ans=0.125 2023-12-21 20:48:48,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=225746.66666666666, ans=0.0 2023-12-21 20:48:54,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225813.33333333334, ans=0.1 2023-12-21 20:49:07,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=225880.0, ans=0.2 2023-12-21 20:49:26,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=226013.33333333334, ans=0.125 2023-12-21 20:49:28,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.43 vs. limit=22.5 2023-12-21 20:49:31,729 INFO [train.py:886] (0/4) Epoch 8, batch 550, loss[loss=0.01643, audio_tagging_loss=0.01643, over 25000.00 frames. ], tot_loss[loss=0.01617, audio_tagging_loss=0.01617, over 4647283.10 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:49:32,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=226080.0, ans=0.1 2023-12-21 20:49:35,532 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.537e+01 2.667e+01 2.842e+01 3.334e+01, threshold=5.333e+01, percent-clipped=0.0 2023-12-21 20:49:52,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=226213.33333333334, ans=10.0 2023-12-21 20:49:55,469 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.58 vs. limit=15.0 2023-12-21 20:50:00,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=226213.33333333334, ans=0.125 2023-12-21 20:50:02,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.39 vs. limit=15.0 2023-12-21 20:50:07,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.05 vs. limit=15.0 2023-12-21 20:50:24,146 INFO [train.py:886] (0/4) Epoch 8, batch 600, loss[loss=0.01759, audio_tagging_loss=0.01759, over 24750.00 frames. ], tot_loss[loss=0.01614, audio_tagging_loss=0.01614, over 4716913.38 frames. ], batch size: 99, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:50:32,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.68 vs. limit=15.0 2023-12-21 20:50:36,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2023-12-21 20:51:14,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.18 vs. limit=15.0 2023-12-21 20:51:15,688 INFO [train.py:886] (0/4) Epoch 8, batch 650, loss[loss=0.01605, audio_tagging_loss=0.01605, over 24750.00 frames. ], tot_loss[loss=0.01619, audio_tagging_loss=0.01619, over 4763827.70 frames. ], batch size: 99, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:51:16,159 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2023-12-21 20:51:20,086 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.706e+01 2.885e+01 3.077e+01 3.988e+01, threshold=5.770e+01, percent-clipped=0.0 2023-12-21 20:51:20,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2023-12-21 20:51:35,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2023-12-21 20:51:51,024 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.182e-01 2023-12-21 20:51:51,256 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=12.0 2023-12-21 20:51:54,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=226946.66666666666, ans=0.0 2023-12-21 20:52:06,619 INFO [train.py:886] (0/4) Epoch 8, batch 700, loss[loss=0.01405, audio_tagging_loss=0.01405, over 24750.00 frames. ], tot_loss[loss=0.01617, audio_tagging_loss=0.01617, over 4801943.74 frames. ], batch size: 99, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:52:22,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=227146.66666666666, ans=0.2 2023-12-21 20:52:41,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=227280.0, ans=15.0 2023-12-21 20:52:59,420 INFO [train.py:886] (0/4) Epoch 8, batch 750, loss[loss=0.01383, audio_tagging_loss=0.01383, over 25000.00 frames. ], tot_loss[loss=0.01598, audio_tagging_loss=0.01598, over 4840034.69 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:53:03,144 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 2.577e+01 2.736e+01 2.943e+01 3.509e+01, threshold=5.472e+01, percent-clipped=0.0 2023-12-21 20:53:20,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=227546.66666666666, ans=0.0 2023-12-21 20:53:33,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=227613.33333333334, ans=0.0 2023-12-21 20:53:36,675 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 20:53:37,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=227613.33333333334, ans=0.0 2023-12-21 20:53:44,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=227680.0, ans=0.1 2023-12-21 20:53:50,469 INFO [train.py:886] (0/4) Epoch 8, batch 800, loss[loss=0.0168, audio_tagging_loss=0.0168, over 24750.00 frames. ], tot_loss[loss=0.01587, audio_tagging_loss=0.01587, over 4862194.71 frames. ], batch size: 99, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:53:50,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.25 vs. limit=15.0 2023-12-21 20:53:57,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=227746.66666666666, ans=0.125 2023-12-21 20:54:10,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=227880.0, ans=0.0 2023-12-21 20:54:29,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=227946.66666666666, ans=0.0 2023-12-21 20:54:35,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=228013.33333333334, ans=0.125 2023-12-21 20:54:43,045 INFO [train.py:886] (0/4) Epoch 8, batch 850, loss[loss=0.01915, audio_tagging_loss=0.01915, over 25000.00 frames. ], tot_loss[loss=0.01581, audio_tagging_loss=0.01581, over 4883855.64 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:54:46,741 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 2.587e+01 2.731e+01 2.941e+01 3.321e+01, threshold=5.462e+01, percent-clipped=0.0 2023-12-21 20:55:01,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=228146.66666666666, ans=0.125 2023-12-21 20:55:03,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=228213.33333333334, ans=0.0 2023-12-21 20:55:20,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=228280.0, ans=0.2 2023-12-21 20:55:29,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.72 vs. limit=15.0 2023-12-21 20:55:31,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=228346.66666666666, ans=0.125 2023-12-21 20:55:34,594 INFO [train.py:886] (0/4) Epoch 8, batch 900, loss[loss=0.01357, audio_tagging_loss=0.01357, over 24750.00 frames. ], tot_loss[loss=0.01572, audio_tagging_loss=0.01572, over 4900482.17 frames. ], batch size: 99, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:55:36,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=228413.33333333334, ans=0.02 2023-12-21 20:55:38,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=228413.33333333334, ans=0.0 2023-12-21 20:55:48,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=228480.0, ans=0.07 2023-12-21 20:55:51,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=228480.0, ans=0.2 2023-12-21 20:56:01,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=228546.66666666666, ans=0.0 2023-12-21 20:56:11,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.39 vs. limit=15.0 2023-12-21 20:56:15,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=228680.0, ans=0.1 2023-12-21 20:56:26,471 INFO [train.py:886] (0/4) Epoch 8, batch 950, loss[loss=0.0161, audio_tagging_loss=0.0161, over 24750.00 frames. ], tot_loss[loss=0.01587, audio_tagging_loss=0.01587, over 4901721.32 frames. ], batch size: 99, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:56:30,921 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+01 2.642e+01 2.776e+01 2.984e+01 3.977e+01, threshold=5.552e+01, percent-clipped=0.0 2023-12-21 20:56:44,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=228813.33333333334, ans=0.0 2023-12-21 20:56:58,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=228946.66666666666, ans=0.2 2023-12-21 20:57:13,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=229013.33333333334, ans=0.125 2023-12-21 20:57:15,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=229013.33333333334, ans=0.125 2023-12-21 20:57:18,749 INFO [train.py:886] (0/4) Epoch 8, batch 1000, loss[loss=0.01349, audio_tagging_loss=0.01349, over 22572.00 frames. ], tot_loss[loss=0.01589, audio_tagging_loss=0.01589, over 4904440.80 frames. ], batch size: 107, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:57:25,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=229080.0, ans=0.0 2023-12-21 20:57:58,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229280.0, ans=0.1 2023-12-21 20:58:02,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2023-12-21 20:58:05,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=229346.66666666666, ans=0.125 2023-12-21 20:58:09,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=229413.33333333334, ans=0.125 2023-12-21 20:58:10,992 INFO [train.py:886] (0/4) Epoch 8, batch 1050, loss[loss=0.01747, audio_tagging_loss=0.01747, over 24750.00 frames. ], tot_loss[loss=0.0157, audio_tagging_loss=0.0157, over 4918316.87 frames. ], batch size: 99, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:58:14,762 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.565e+01 2.750e+01 2.956e+01 3.659e+01, threshold=5.501e+01, percent-clipped=0.0 2023-12-21 20:58:14,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=229413.33333333334, ans=0.1 2023-12-21 20:58:29,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=229480.0, ans=0.125 2023-12-21 20:58:39,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=229546.66666666666, ans=0.125 2023-12-21 20:58:45,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=229613.33333333334, ans=0.1 2023-12-21 20:58:51,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=229680.0, ans=0.2 2023-12-21 20:59:02,600 INFO [train.py:886] (0/4) Epoch 8, batch 1100, loss[loss=0.01343, audio_tagging_loss=0.01343, over 24750.00 frames. ], tot_loss[loss=0.01566, audio_tagging_loss=0.01566, over 4918632.97 frames. ], batch size: 99, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:59:04,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=229746.66666666666, ans=0.125 2023-12-21 20:59:15,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.31 vs. limit=15.0 2023-12-21 20:59:18,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=229813.33333333334, ans=0.125 2023-12-21 20:59:19,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=229813.33333333334, ans=0.1 2023-12-21 20:59:28,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=229880.0, ans=0.0 2023-12-21 20:59:44,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=230013.33333333334, ans=0.1 2023-12-21 20:59:44,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=230013.33333333334, ans=0.125 2023-12-21 20:59:54,269 INFO [train.py:886] (0/4) Epoch 8, batch 1150, loss[loss=0.01639, audio_tagging_loss=0.01639, over 25000.00 frames. ], tot_loss[loss=0.01567, audio_tagging_loss=0.01567, over 4925894.77 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:59:58,777 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.593e+01 2.758e+01 2.932e+01 3.936e+01, threshold=5.517e+01, percent-clipped=0.0 2023-12-21 21:00:04,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=230146.66666666666, ans=0.0 2023-12-21 21:00:05,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=230146.66666666666, ans=0.1 2023-12-21 21:00:24,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.15 vs. limit=15.0 2023-12-21 21:00:32,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=230280.0, ans=0.0 2023-12-21 21:00:46,103 INFO [train.py:886] (0/4) Epoch 8, batch 1200, loss[loss=0.01753, audio_tagging_loss=0.01753, over 25000.00 frames. ], tot_loss[loss=0.01572, audio_tagging_loss=0.01572, over 4936607.03 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 21:00:58,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=230480.0, ans=0.04949747468305833 2023-12-21 21:01:03,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=230480.0, ans=0.125 2023-12-21 21:01:21,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=230613.33333333334, ans=0.05 2023-12-21 21:01:32,372 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.578e+00 2023-12-21 21:01:35,033 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:01:35,239 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2023-12-21 21:01:38,501 INFO [train.py:886] (0/4) Epoch 8, batch 1250, loss[loss=0.01478, audio_tagging_loss=0.01478, over 25000.00 frames. ], tot_loss[loss=0.01578, audio_tagging_loss=0.01578, over 4935610.74 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:01:42,236 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.655e+01 2.780e+01 2.996e+01 3.618e+01, threshold=5.560e+01, percent-clipped=0.0 2023-12-21 21:01:58,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=230880.0, ans=0.125 2023-12-21 21:02:00,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=230880.0, ans=0.1 2023-12-21 21:02:21,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=231013.33333333334, ans=0.125 2023-12-21 21:02:29,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=231080.0, ans=0.125 2023-12-21 21:02:30,641 INFO [train.py:886] (0/4) Epoch 8, batch 1300, loss[loss=0.01457, audio_tagging_loss=0.01457, over 25000.00 frames. ], tot_loss[loss=0.0159, audio_tagging_loss=0.0159, over 4934652.55 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:02:44,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=231146.66666666666, ans=0.0 2023-12-21 21:03:18,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=231346.66666666666, ans=0.125 2023-12-21 21:03:20,609 INFO [train.py:886] (0/4) Epoch 8, batch 1350, loss[loss=0.01584, audio_tagging_loss=0.01584, over 24750.00 frames. ], tot_loss[loss=0.01586, audio_tagging_loss=0.01586, over 4935044.55 frames. ], batch size: 99, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:03:25,733 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.556e+01 2.752e+01 2.967e+01 4.307e+01, threshold=5.505e+01, percent-clipped=0.0 2023-12-21 21:03:41,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2023-12-21 21:03:42,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231546.66666666666, ans=0.1 2023-12-21 21:03:45,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=231546.66666666666, ans=0.125 2023-12-21 21:03:54,185 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.06 vs. limit=10.0 2023-12-21 21:04:01,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=231613.33333333334, ans=0.1 2023-12-21 21:04:13,935 INFO [train.py:886] (0/4) Epoch 8, batch 1400, loss[loss=0.01162, audio_tagging_loss=0.01162, over 24750.00 frames. ], tot_loss[loss=0.01577, audio_tagging_loss=0.01577, over 4942109.06 frames. ], batch size: 99, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:04:15,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=231746.66666666666, ans=0.0 2023-12-21 21:04:16,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=231746.66666666666, ans=0.125 2023-12-21 21:04:19,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2023-12-21 21:04:20,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=231746.66666666666, ans=0.0 2023-12-21 21:04:50,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=231946.66666666666, ans=0.04949747468305833 2023-12-21 21:05:05,548 INFO [train.py:886] (0/4) Epoch 8, batch 1450, loss[loss=0.01383, audio_tagging_loss=0.01383, over 24750.00 frames. ], tot_loss[loss=0.01567, audio_tagging_loss=0.01567, over 4944752.13 frames. ], batch size: 99, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:05:09,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=232080.0, ans=0.2 2023-12-21 21:05:09,934 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.540e+01 2.719e+01 2.872e+01 3.689e+01, threshold=5.439e+01, percent-clipped=0.0 2023-12-21 21:05:12,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=232080.0, ans=0.125 2023-12-21 21:05:14,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=232146.66666666666, ans=0.0 2023-12-21 21:05:16,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=232146.66666666666, ans=0.0 2023-12-21 21:05:25,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=232213.33333333334, ans=0.0 2023-12-21 21:05:31,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=232213.33333333334, ans=0.125 2023-12-21 21:05:38,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=232280.0, ans=15.0 2023-12-21 21:05:40,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=232280.0, ans=0.1 2023-12-21 21:05:54,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2023-12-21 21:05:57,191 INFO [train.py:886] (0/4) Epoch 8, batch 1500, loss[loss=0.01629, audio_tagging_loss=0.01629, over 25000.00 frames. ], tot_loss[loss=0.01569, audio_tagging_loss=0.01569, over 4946704.89 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:05:59,292 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.734e-01 2023-12-21 21:06:12,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=232480.0, ans=0.07 2023-12-21 21:06:13,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=232480.0, ans=0.125 2023-12-21 21:06:16,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=12.0 2023-12-21 21:06:19,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2023-12-21 21:06:32,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=232613.33333333334, ans=0.125 2023-12-21 21:06:41,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2023-12-21 21:06:48,538 INFO [train.py:886] (0/4) Epoch 8, batch 1550, loss[loss=0.01627, audio_tagging_loss=0.01627, over 24750.00 frames. ], tot_loss[loss=0.01584, audio_tagging_loss=0.01584, over 4949386.45 frames. ], batch size: 99, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:06:48,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=232746.66666666666, ans=0.1 2023-12-21 21:06:52,988 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.317e+01 2.641e+01 2.774e+01 2.948e+01 3.544e+01, threshold=5.547e+01, percent-clipped=0.0 2023-12-21 21:07:19,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.52 vs. limit=22.5 2023-12-21 21:07:21,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=232946.66666666666, ans=0.125 2023-12-21 21:07:30,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=233013.33333333334, ans=0.0 2023-12-21 21:07:39,860 INFO [train.py:886] (0/4) Epoch 8, batch 1600, loss[loss=0.01734, audio_tagging_loss=0.01734, over 24750.00 frames. ], tot_loss[loss=0.01584, audio_tagging_loss=0.01584, over 4945885.53 frames. ], batch size: 99, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:07:47,565 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=15.0 2023-12-21 21:07:51,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=233146.66666666666, ans=0.125 2023-12-21 21:08:01,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=233213.33333333334, ans=0.0 2023-12-21 21:08:21,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=233346.66666666666, ans=0.125 2023-12-21 21:08:32,013 INFO [train.py:886] (0/4) Epoch 8, batch 1650, loss[loss=0.01591, audio_tagging_loss=0.01591, over 25000.00 frames. ], tot_loss[loss=0.01583, audio_tagging_loss=0.01583, over 4947483.99 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:08:33,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=233413.33333333334, ans=0.125 2023-12-21 21:08:35,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=233413.33333333334, ans=0.2 2023-12-21 21:08:35,750 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 2.707e+01 2.843e+01 2.988e+01 3.739e+01, threshold=5.687e+01, percent-clipped=0.0 2023-12-21 21:08:46,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=233480.0, ans=0.125 2023-12-21 21:08:48,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=233480.0, ans=0.125 2023-12-21 21:08:49,972 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2023-12-21 21:09:02,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=233613.33333333334, ans=0.0 2023-12-21 21:09:23,663 INFO [train.py:886] (0/4) Epoch 8, batch 1700, loss[loss=0.01688, audio_tagging_loss=0.01688, over 25000.00 frames. ], tot_loss[loss=0.01563, audio_tagging_loss=0.01563, over 4943971.39 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:09:24,803 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.60 vs. limit=15.0 2023-12-21 21:09:48,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=233880.0, ans=0.05 2023-12-21 21:09:52,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=233880.0, ans=0.125 2023-12-21 21:10:06,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=234013.33333333334, ans=0.0 2023-12-21 21:10:07,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=234013.33333333334, ans=0.2 2023-12-21 21:10:10,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=234013.33333333334, ans=0.1 2023-12-21 21:10:15,209 INFO [train.py:886] (0/4) Epoch 8, batch 1750, loss[loss=0.01785, audio_tagging_loss=0.01785, over 25000.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4949617.07 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:10:19,034 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+01 2.526e+01 2.688e+01 2.921e+01 3.740e+01, threshold=5.376e+01, percent-clipped=0.0 2023-12-21 21:11:08,777 INFO [train.py:886] (0/4) Epoch 8, batch 1800, loss[loss=0.01592, audio_tagging_loss=0.01592, over 25000.00 frames. ], tot_loss[loss=0.01558, audio_tagging_loss=0.01558, over 4951474.37 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 32.0 2023-12-21 21:11:14,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=234413.33333333334, ans=0.0 2023-12-21 21:11:19,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=234480.0, ans=0.0 2023-12-21 21:11:35,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=234546.66666666666, ans=0.125 2023-12-21 21:11:51,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=234680.0, ans=0.5 2023-12-21 21:11:59,299 INFO [train.py:886] (0/4) Epoch 8, batch 1850, loss[loss=0.01561, audio_tagging_loss=0.01561, over 24750.00 frames. ], tot_loss[loss=0.01584, audio_tagging_loss=0.01584, over 4956216.89 frames. ], batch size: 99, lr: 1.37e-02, grad_scale: 32.0 2023-12-21 21:12:04,460 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.298e+01 2.642e+01 2.796e+01 3.049e+01 4.216e+01, threshold=5.592e+01, percent-clipped=0.0 2023-12-21 21:12:09,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=234813.33333333334, ans=0.95 2023-12-21 21:12:13,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=234813.33333333334, ans=0.0 2023-12-21 21:12:16,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=234813.33333333334, ans=0.125 2023-12-21 21:12:29,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=234946.66666666666, ans=0.125 2023-12-21 21:12:33,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.98 vs. limit=6.0 2023-12-21 21:12:50,825 INFO [train.py:886] (0/4) Epoch 8, batch 1900, loss[loss=0.01604, audio_tagging_loss=0.01604, over 24750.00 frames. ], tot_loss[loss=0.01596, audio_tagging_loss=0.01596, over 4947269.43 frames. ], batch size: 99, lr: 1.37e-02, grad_scale: 32.0 2023-12-21 21:12:53,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=235080.0, ans=0.0 2023-12-21 21:12:55,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=235080.0, ans=0.125 2023-12-21 21:12:56,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=235080.0, ans=0.0 2023-12-21 21:12:58,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.87 vs. limit=22.5 2023-12-21 21:13:15,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=235213.33333333334, ans=0.1 2023-12-21 21:13:23,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=235280.0, ans=0.2 2023-12-21 21:13:40,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=235346.66666666666, ans=0.125 2023-12-21 21:13:42,960 INFO [train.py:886] (0/4) Epoch 8, batch 1950, loss[loss=0.01392, audio_tagging_loss=0.01392, over 21977.00 frames. ], tot_loss[loss=0.01582, audio_tagging_loss=0.01582, over 4941061.48 frames. ], batch size: 107, lr: 1.37e-02, grad_scale: 32.0 2023-12-21 21:13:45,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=235413.33333333334, ans=0.0 2023-12-21 21:13:47,358 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+01 2.599e+01 2.763e+01 2.892e+01 3.548e+01, threshold=5.526e+01, percent-clipped=0.0 2023-12-21 21:13:56,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=235480.0, ans=0.04949747468305833 2023-12-21 21:14:00,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=235480.0, ans=0.0 2023-12-21 21:14:11,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=235546.66666666666, ans=0.0 2023-12-21 21:14:11,494 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2023-12-21 21:14:22,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=235613.33333333334, ans=0.0 2023-12-21 21:14:27,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=235680.0, ans=0.0 2023-12-21 21:14:34,265 INFO [train.py:886] (0/4) Epoch 8, batch 2000, loss[loss=0.0138, audio_tagging_loss=0.0138, over 25000.00 frames. ], tot_loss[loss=0.01576, audio_tagging_loss=0.01576, over 4945598.89 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 32.0 2023-12-21 21:14:40,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.79 vs. limit=22.5 2023-12-21 21:14:43,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=235746.66666666666, ans=0.0 2023-12-21 21:14:44,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.03 vs. limit=15.0 2023-12-21 21:14:45,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=235813.33333333334, ans=0.2 2023-12-21 21:14:49,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=235813.33333333334, ans=0.1 2023-12-21 21:14:55,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.37 vs. limit=15.0 2023-12-21 21:14:59,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=235880.0, ans=0.125 2023-12-21 21:14:59,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2023-12-21 21:15:01,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=235880.0, ans=0.125 2023-12-21 21:15:02,035 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.427e-02 2023-12-21 21:15:02,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.04 vs. limit=6.0 2023-12-21 21:15:24,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=236013.33333333334, ans=0.1 2023-12-21 21:15:25,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=236013.33333333334, ans=10.0 2023-12-21 21:15:26,877 INFO [train.py:886] (0/4) Epoch 8, batch 2050, loss[loss=0.01689, audio_tagging_loss=0.01689, over 24750.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4950066.67 frames. ], batch size: 99, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:15:31,344 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.542e+01 2.684e+01 2.827e+01 3.551e+01, threshold=5.367e+01, percent-clipped=0.0 2023-12-21 21:15:34,874 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.59 vs. limit=15.0 2023-12-21 21:15:35,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.38 vs. limit=22.5 2023-12-21 21:15:41,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=236146.66666666666, ans=0.125 2023-12-21 21:15:48,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=236213.33333333334, ans=0.125 2023-12-21 21:15:55,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=236213.33333333334, ans=0.025 2023-12-21 21:15:58,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=236280.0, ans=0.2 2023-12-21 21:16:04,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=236280.0, ans=0.125 2023-12-21 21:16:18,465 INFO [train.py:886] (0/4) Epoch 8, batch 2100, loss[loss=0.01655, audio_tagging_loss=0.01655, over 25000.00 frames. ], tot_loss[loss=0.0155, audio_tagging_loss=0.0155, over 4949128.53 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:16:28,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=236480.0, ans=0.1 2023-12-21 21:16:41,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=236546.66666666666, ans=0.125 2023-12-21 21:16:43,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=236546.66666666666, ans=0.125 2023-12-21 21:16:58,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=236680.0, ans=0.0 2023-12-21 21:17:06,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=236680.0, ans=0.0 2023-12-21 21:17:09,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=236746.66666666666, ans=0.125 2023-12-21 21:17:10,431 INFO [train.py:886] (0/4) Epoch 8, batch 2150, loss[loss=0.01667, audio_tagging_loss=0.01667, over 24750.00 frames. ], tot_loss[loss=0.01557, audio_tagging_loss=0.01557, over 4949985.31 frames. ], batch size: 99, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:17:14,090 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.260e+01 2.652e+01 2.733e+01 2.907e+01 3.375e+01, threshold=5.466e+01, percent-clipped=0.0 2023-12-21 21:17:28,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=236813.33333333334, ans=0.125 2023-12-21 21:18:01,943 INFO [train.py:886] (0/4) Epoch 8, batch 2200, loss[loss=0.01468, audio_tagging_loss=0.01468, over 24750.00 frames. ], tot_loss[loss=0.01566, audio_tagging_loss=0.01566, over 4950571.01 frames. ], batch size: 99, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:18:06,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=237080.0, ans=0.125 2023-12-21 21:18:20,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=237146.66666666666, ans=0.125 2023-12-21 21:18:27,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=237213.33333333334, ans=0.125 2023-12-21 21:18:54,005 INFO [train.py:886] (0/4) Epoch 8, batch 2250, loss[loss=0.01556, audio_tagging_loss=0.01556, over 24750.00 frames. ], tot_loss[loss=0.0157, audio_tagging_loss=0.0157, over 4947819.68 frames. ], batch size: 99, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:18:59,106 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 2.643e+01 2.757e+01 2.915e+01 3.398e+01, threshold=5.513e+01, percent-clipped=0.0 2023-12-21 21:19:25,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=237613.33333333334, ans=0.0 2023-12-21 21:19:27,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.98 vs. limit=15.0 2023-12-21 21:19:41,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237680.0, ans=0.1 2023-12-21 21:19:44,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2023-12-21 21:19:46,178 INFO [train.py:886] (0/4) Epoch 8, batch 2300, loss[loss=0.01625, audio_tagging_loss=0.01625, over 25000.00 frames. ], tot_loss[loss=0.01569, audio_tagging_loss=0.01569, over 4951813.16 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:19:47,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=237746.66666666666, ans=0.07 2023-12-21 21:20:01,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=237813.33333333334, ans=0.1 2023-12-21 21:20:07,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=237880.0, ans=0.1 2023-12-21 21:20:08,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2023-12-21 21:20:13,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=237880.0, ans=0.09899494936611666 2023-12-21 21:20:21,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=237946.66666666666, ans=0.125 2023-12-21 21:20:29,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=238013.33333333334, ans=0.09899494936611666 2023-12-21 21:20:37,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=15.0 2023-12-21 21:20:38,774 INFO [train.py:886] (0/4) Epoch 8, batch 2350, loss[loss=0.0155, audio_tagging_loss=0.0155, over 24750.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4951984.00 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:20:40,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.75 vs. limit=10.0 2023-12-21 21:20:42,601 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.503e+01 2.677e+01 2.839e+01 3.914e+01, threshold=5.353e+01, percent-clipped=0.0 2023-12-21 21:20:50,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.21 vs. limit=22.5 2023-12-21 21:21:15,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=238280.0, ans=0.0 2023-12-21 21:21:29,915 INFO [train.py:886] (0/4) Epoch 8, batch 2400, loss[loss=0.01648, audio_tagging_loss=0.01648, over 25000.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4952028.67 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:21:36,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=238413.33333333334, ans=0.2 2023-12-21 21:22:00,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=12.0 2023-12-21 21:22:09,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=238613.33333333334, ans=0.0 2023-12-21 21:22:15,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=238680.0, ans=0.0 2023-12-21 21:22:18,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=238680.0, ans=0.125 2023-12-21 21:22:22,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2023-12-21 21:22:22,518 INFO [train.py:886] (0/4) Epoch 8, batch 2450, loss[loss=0.01407, audio_tagging_loss=0.01407, over 24750.00 frames. ], tot_loss[loss=0.01565, audio_tagging_loss=0.01565, over 4960331.11 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:22:26,969 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.604e+01 2.785e+01 2.949e+01 3.842e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-21 21:22:32,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=238813.33333333334, ans=0.125 2023-12-21 21:22:38,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=238813.33333333334, ans=0.2 2023-12-21 21:23:14,634 INFO [train.py:886] (0/4) Epoch 8, batch 2500, loss[loss=0.01742, audio_tagging_loss=0.01742, over 24750.00 frames. ], tot_loss[loss=0.01583, audio_tagging_loss=0.01583, over 4955951.72 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:23:14,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=239080.0, ans=0.2 2023-12-21 21:23:36,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=239213.33333333334, ans=0.0 2023-12-21 21:23:56,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=239346.66666666666, ans=0.2 2023-12-21 21:24:03,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=239346.66666666666, ans=15.0 2023-12-21 21:24:06,251 INFO [train.py:886] (0/4) Epoch 8, batch 2550, loss[loss=0.01624, audio_tagging_loss=0.01624, over 25000.00 frames. ], tot_loss[loss=0.01595, audio_tagging_loss=0.01595, over 4953051.91 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:24:09,935 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.703e+01 2.852e+01 3.044e+01 3.567e+01, threshold=5.704e+01, percent-clipped=0.0 2023-12-21 21:24:11,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=239413.33333333334, ans=0.1 2023-12-21 21:24:32,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=239546.66666666666, ans=0.0 2023-12-21 21:24:46,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=239680.0, ans=0.125 2023-12-21 21:24:51,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=239680.0, ans=0.0 2023-12-21 21:24:54,413 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.69 vs. limit=6.0 2023-12-21 21:24:58,483 INFO [train.py:886] (0/4) Epoch 8, batch 2600, loss[loss=0.01525, audio_tagging_loss=0.01525, over 25000.00 frames. ], tot_loss[loss=0.01582, audio_tagging_loss=0.01582, over 4948749.39 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:25:24,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=239880.0, ans=0.125 2023-12-21 21:25:35,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.56 vs. limit=22.5 2023-12-21 21:25:36,715 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-36000.pt 2023-12-21 21:25:41,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.29 vs. limit=15.0 2023-12-21 21:25:43,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=240013.33333333334, ans=0.125 2023-12-21 21:25:47,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=240013.33333333334, ans=0.0 2023-12-21 21:25:50,961 INFO [train.py:886] (0/4) Epoch 8, batch 2650, loss[loss=0.01602, audio_tagging_loss=0.01602, over 25000.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 4944989.88 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:25:55,392 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.630e+01 2.789e+01 2.971e+01 3.583e+01, threshold=5.578e+01, percent-clipped=0.0 2023-12-21 21:26:25,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=240280.0, ans=0.0 2023-12-21 21:26:30,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=240280.0, ans=0.0 2023-12-21 21:26:33,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=240346.66666666666, ans=0.0 2023-12-21 21:26:40,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.69 vs. limit=6.0 2023-12-21 21:26:42,489 INFO [train.py:886] (0/4) Epoch 8, batch 2700, loss[loss=0.0136, audio_tagging_loss=0.0136, over 25000.00 frames. ], tot_loss[loss=0.01561, audio_tagging_loss=0.01561, over 4944135.41 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:26:44,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=240413.33333333334, ans=0.0 2023-12-21 21:26:52,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=240480.0, ans=0.0 2023-12-21 21:27:00,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=240480.0, ans=0.125 2023-12-21 21:27:19,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=240613.33333333334, ans=0.1 2023-12-21 21:27:33,949 INFO [train.py:886] (0/4) Epoch 8, batch 2750, loss[loss=0.01402, audio_tagging_loss=0.01402, over 25000.00 frames. ], tot_loss[loss=0.01554, audio_tagging_loss=0.01554, over 4945018.00 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:27:37,717 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.574e+01 2.741e+01 2.911e+01 3.788e+01, threshold=5.483e+01, percent-clipped=0.0 2023-12-21 21:27:40,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.71 vs. limit=15.0 2023-12-21 21:27:41,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=240746.66666666666, ans=0.1 2023-12-21 21:27:42,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=240813.33333333334, ans=0.125 2023-12-21 21:27:49,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=240813.33333333334, ans=0.0 2023-12-21 21:27:49,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=240813.33333333334, ans=0.04949747468305833 2023-12-21 21:28:01,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=240880.0, ans=0.125 2023-12-21 21:28:01,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=240880.0, ans=0.0 2023-12-21 21:28:02,644 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.57 vs. limit=15.0 2023-12-21 21:28:03,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=240946.66666666666, ans=0.0 2023-12-21 21:28:21,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2023-12-21 21:28:25,016 INFO [train.py:886] (0/4) Epoch 8, batch 2800, loss[loss=0.0173, audio_tagging_loss=0.0173, over 24750.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4945482.20 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:28:27,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=241080.0, ans=0.125 2023-12-21 21:28:36,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=241146.66666666666, ans=0.125 2023-12-21 21:28:36,614 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=15.0 2023-12-21 21:28:54,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.96 vs. limit=6.0 2023-12-21 21:28:56,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=241280.0, ans=0.0 2023-12-21 21:29:15,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=241346.66666666666, ans=0.0 2023-12-21 21:29:16,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=241413.33333333334, ans=0.1 2023-12-21 21:29:17,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=241413.33333333334, ans=0.2 2023-12-21 21:29:17,689 INFO [train.py:886] (0/4) Epoch 8, batch 2850, loss[loss=0.01566, audio_tagging_loss=0.01566, over 24750.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 4943772.10 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:29:22,197 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.421e+01 2.608e+01 2.780e+01 2.965e+01 3.474e+01, threshold=5.560e+01, percent-clipped=0.0 2023-12-21 21:29:45,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.43 vs. limit=22.5 2023-12-21 21:29:46,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=241546.66666666666, ans=0.125 2023-12-21 21:29:52,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=241613.33333333334, ans=0.125 2023-12-21 21:29:54,622 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.23 vs. limit=15.0 2023-12-21 21:30:02,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=241680.0, ans=0.1 2023-12-21 21:30:08,468 INFO [train.py:886] (0/4) Epoch 8, batch 2900, loss[loss=0.01639, audio_tagging_loss=0.01639, over 25000.00 frames. ], tot_loss[loss=0.01562, audio_tagging_loss=0.01562, over 4940640.44 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:30:15,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=241746.66666666666, ans=0.125 2023-12-21 21:30:22,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=241813.33333333334, ans=0.0 2023-12-21 21:30:23,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=241813.33333333334, ans=10.0 2023-12-21 21:30:59,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=242013.33333333334, ans=0.125 2023-12-21 21:31:01,401 INFO [train.py:886] (0/4) Epoch 8, batch 2950, loss[loss=0.01583, audio_tagging_loss=0.01583, over 24750.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4941608.01 frames. ], batch size: 99, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:31:05,141 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.582e+01 2.728e+01 2.928e+01 3.566e+01, threshold=5.455e+01, percent-clipped=0.0 2023-12-21 21:31:23,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=242213.33333333334, ans=0.125 2023-12-21 21:31:38,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2023-12-21 21:31:53,109 INFO [train.py:886] (0/4) Epoch 8, batch 3000, loss[loss=0.01479, audio_tagging_loss=0.01479, over 25000.00 frames. ], tot_loss[loss=0.01542, audio_tagging_loss=0.01542, over 4949423.16 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:31:53,110 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 21:32:14,411 INFO [train.py:917] (0/4) Epoch 8, validation: loss=0.03648, audio_tagging_loss=0.03648, over 3737520.00 frames. 2023-12-21 21:32:14,412 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 21:32:18,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=242413.33333333334, ans=0.0 2023-12-21 21:32:42,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=242546.66666666666, ans=0.1 2023-12-21 21:33:06,505 INFO [train.py:886] (0/4) Epoch 8, batch 3050, loss[loss=0.0184, audio_tagging_loss=0.0184, over 25000.00 frames. ], tot_loss[loss=0.01544, audio_tagging_loss=0.01544, over 4946325.83 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:33:10,398 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+01 2.603e+01 2.768e+01 2.944e+01 3.581e+01, threshold=5.536e+01, percent-clipped=0.0 2023-12-21 21:33:12,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=242746.66666666666, ans=0.0 2023-12-21 21:33:16,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=242813.33333333334, ans=0.125 2023-12-21 21:33:30,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.66 vs. limit=15.0 2023-12-21 21:33:41,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=242946.66666666666, ans=0.0 2023-12-21 21:33:42,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=242946.66666666666, ans=0.1 2023-12-21 21:33:57,599 INFO [train.py:886] (0/4) Epoch 8, batch 3100, loss[loss=0.01497, audio_tagging_loss=0.01497, over 25000.00 frames. ], tot_loss[loss=0.01544, audio_tagging_loss=0.01544, over 4951658.70 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:33:58,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=243080.0, ans=0.125 2023-12-21 21:34:17,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=243213.33333333334, ans=0.09899494936611666 2023-12-21 21:34:19,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=243213.33333333334, ans=0.125 2023-12-21 21:34:27,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=243213.33333333334, ans=0.0 2023-12-21 21:34:31,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=243280.0, ans=0.125 2023-12-21 21:34:45,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2023-12-21 21:34:49,032 INFO [train.py:886] (0/4) Epoch 8, batch 3150, loss[loss=0.01554, audio_tagging_loss=0.01554, over 24750.00 frames. ], tot_loss[loss=0.01558, audio_tagging_loss=0.01558, over 4949551.36 frames. ], batch size: 99, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:34:49,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.37 vs. limit=22.5 2023-12-21 21:34:52,855 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 2.676e+01 2.832e+01 3.035e+01 3.552e+01, threshold=5.664e+01, percent-clipped=0.0 2023-12-21 21:35:02,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=243480.0, ans=0.125 2023-12-21 21:35:03,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=243480.0, ans=0.0 2023-12-21 21:35:03,246 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.55 vs. limit=22.5 2023-12-21 21:35:03,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=243480.0, ans=0.1 2023-12-21 21:35:42,043 INFO [train.py:886] (0/4) Epoch 8, batch 3200, loss[loss=0.01749, audio_tagging_loss=0.01749, over 25000.00 frames. ], tot_loss[loss=0.01555, audio_tagging_loss=0.01555, over 4946394.77 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:35:54,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=243813.33333333334, ans=0.125 2023-12-21 21:35:54,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=243813.33333333334, ans=0.125 2023-12-21 21:35:55,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=243813.33333333334, ans=0.0 2023-12-21 21:36:00,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=243813.33333333334, ans=0.5 2023-12-21 21:36:03,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2023-12-21 21:36:33,491 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.36 vs. limit=15.0 2023-12-21 21:36:33,916 INFO [train.py:886] (0/4) Epoch 8, batch 3250, loss[loss=0.01661, audio_tagging_loss=0.01661, over 25000.00 frames. ], tot_loss[loss=0.01547, audio_tagging_loss=0.01547, over 4947781.53 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:36:37,710 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.581e+01 2.751e+01 2.965e+01 3.733e+01, threshold=5.502e+01, percent-clipped=0.0 2023-12-21 21:36:45,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=244146.66666666666, ans=0.05 2023-12-21 21:37:18,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=244346.66666666666, ans=0.125 2023-12-21 21:37:23,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=244346.66666666666, ans=0.125 2023-12-21 21:37:25,270 INFO [train.py:886] (0/4) Epoch 8, batch 3300, loss[loss=0.01681, audio_tagging_loss=0.01681, over 25000.00 frames. ], tot_loss[loss=0.01544, audio_tagging_loss=0.01544, over 4947051.00 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:37:31,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2023-12-21 21:37:33,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=244413.33333333334, ans=0.1 2023-12-21 21:37:38,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2023-12-21 21:37:45,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=244546.66666666666, ans=0.125 2023-12-21 21:37:47,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-12-21 21:37:53,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=244546.66666666666, ans=0.125 2023-12-21 21:38:17,048 INFO [train.py:886] (0/4) Epoch 8, batch 3350, loss[loss=0.01651, audio_tagging_loss=0.01651, over 25000.00 frames. ], tot_loss[loss=0.01548, audio_tagging_loss=0.01548, over 4947103.82 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:38:17,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=244746.66666666666, ans=0.125 2023-12-21 21:38:18,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=244746.66666666666, ans=0.125 2023-12-21 21:38:21,612 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.276e+01 2.538e+01 2.708e+01 2.913e+01 3.391e+01, threshold=5.415e+01, percent-clipped=0.0 2023-12-21 21:38:32,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=244813.33333333334, ans=0.1 2023-12-21 21:38:40,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.48 vs. limit=15.0 2023-12-21 21:38:47,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=244946.66666666666, ans=0.0 2023-12-21 21:38:47,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=244946.66666666666, ans=0.2 2023-12-21 21:38:54,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=244946.66666666666, ans=0.0 2023-12-21 21:38:54,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=244946.66666666666, ans=0.0 2023-12-21 21:39:08,642 INFO [train.py:886] (0/4) Epoch 8, batch 3400, loss[loss=0.01666, audio_tagging_loss=0.01666, over 25000.00 frames. ], tot_loss[loss=0.01542, audio_tagging_loss=0.01542, over 4951204.06 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:39:12,507 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.93 vs. limit=15.0 2023-12-21 21:39:13,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=245080.0, ans=0.125 2023-12-21 21:39:26,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=245146.66666666666, ans=0.125 2023-12-21 21:39:30,278 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.67 vs. limit=6.0 2023-12-21 21:39:59,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=245346.66666666666, ans=0.125 2023-12-21 21:40:01,070 INFO [train.py:886] (0/4) Epoch 8, batch 3450, loss[loss=0.01481, audio_tagging_loss=0.01481, over 21925.00 frames. ], tot_loss[loss=0.01554, audio_tagging_loss=0.01554, over 4937390.07 frames. ], batch size: 107, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:40:04,818 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.354e+01 2.638e+01 2.791e+01 3.014e+01 3.775e+01, threshold=5.582e+01, percent-clipped=0.0 2023-12-21 21:40:24,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=245546.66666666666, ans=0.0 2023-12-21 21:40:27,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245546.66666666666, ans=0.1 2023-12-21 21:40:37,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=245613.33333333334, ans=0.125 2023-12-21 21:40:38,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=245613.33333333334, ans=0.1 2023-12-21 21:40:45,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=245680.0, ans=0.0 2023-12-21 21:40:50,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=245680.0, ans=0.015 2023-12-21 21:40:53,256 INFO [train.py:886] (0/4) Epoch 8, batch 3500, loss[loss=0.01467, audio_tagging_loss=0.01467, over 24750.00 frames. ], tot_loss[loss=0.01571, audio_tagging_loss=0.01571, over 4936162.18 frames. ], batch size: 99, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:41:06,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=245813.33333333334, ans=0.125 2023-12-21 21:41:11,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=245813.33333333334, ans=0.125 2023-12-21 21:41:14,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=245880.0, ans=0.0 2023-12-21 21:41:19,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245880.0, ans=0.1 2023-12-21 21:41:38,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=246013.33333333334, ans=0.125 2023-12-21 21:41:44,049 INFO [train.py:886] (0/4) Epoch 8, batch 3550, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.0157, audio_tagging_loss=0.0157, over 4935752.92 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:41:48,502 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 2.549e+01 2.709e+01 2.899e+01 3.607e+01, threshold=5.417e+01, percent-clipped=0.0 2023-12-21 21:42:12,081 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-12-21 21:42:30,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=246346.66666666666, ans=0.0 2023-12-21 21:42:37,278 INFO [train.py:886] (0/4) Epoch 8, batch 3600, loss[loss=0.0168, audio_tagging_loss=0.0168, over 25000.00 frames. ], tot_loss[loss=0.01565, audio_tagging_loss=0.01565, over 4943250.62 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:42:39,775 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.92 vs. limit=15.0 2023-12-21 21:42:48,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=246480.0, ans=0.0 2023-12-21 21:42:52,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=246480.0, ans=0.125 2023-12-21 21:43:02,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=246546.66666666666, ans=0.07 2023-12-21 21:43:06,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.88 vs. limit=22.5 2023-12-21 21:43:07,552 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.98 vs. limit=6.0 2023-12-21 21:43:09,963 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2023-12-21 21:43:17,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=246680.0, ans=0.07 2023-12-21 21:43:18,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=246680.0, ans=0.025 2023-12-21 21:43:29,308 INFO [train.py:886] (0/4) Epoch 8, batch 3650, loss[loss=0.01473, audio_tagging_loss=0.01473, over 25000.00 frames. ], tot_loss[loss=0.01549, audio_tagging_loss=0.01549, over 4949717.68 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:43:29,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.04 vs. limit=22.5 2023-12-21 21:43:30,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.05 vs. limit=15.0 2023-12-21 21:43:33,745 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.621e+01 2.795e+01 3.047e+01 3.909e+01, threshold=5.590e+01, percent-clipped=0.0 2023-12-21 21:43:44,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=246813.33333333334, ans=0.1 2023-12-21 21:43:49,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=246880.0, ans=0.125 2023-12-21 21:44:06,113 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.411e-01 2023-12-21 21:44:09,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=247013.33333333334, ans=0.125 2023-12-21 21:44:17,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=247013.33333333334, ans=0.0 2023-12-21 21:44:20,649 INFO [train.py:886] (0/4) Epoch 8, batch 3700, loss[loss=0.01415, audio_tagging_loss=0.01415, over 25000.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4953014.96 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:44:22,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=247080.0, ans=0.125 2023-12-21 21:44:34,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.55 vs. limit=15.0 2023-12-21 21:44:58,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=247280.0, ans=0.2 2023-12-21 21:45:00,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=247346.66666666666, ans=0.0 2023-12-21 21:45:03,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=247346.66666666666, ans=0.0 2023-12-21 21:45:05,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=15.0 2023-12-21 21:45:11,559 INFO [train.py:886] (0/4) Epoch 8, batch 3750, loss[loss=0.01496, audio_tagging_loss=0.01496, over 24750.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4952318.85 frames. ], batch size: 99, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:45:15,994 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.638e+01 2.813e+01 3.001e+01 3.548e+01, threshold=5.627e+01, percent-clipped=0.0 2023-12-21 21:45:28,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=12.0 2023-12-21 21:45:29,666 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.73 vs. limit=22.5 2023-12-21 21:45:30,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=247546.66666666666, ans=0.0 2023-12-21 21:46:02,581 INFO [train.py:886] (0/4) Epoch 8, batch 3800, loss[loss=0.01206, audio_tagging_loss=0.01206, over 24750.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 4945138.03 frames. ], batch size: 99, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:46:19,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=247813.33333333334, ans=0.05 2023-12-21 21:46:32,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=247880.0, ans=0.125 2023-12-21 21:46:38,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=247946.66666666666, ans=0.1 2023-12-21 21:46:42,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=247946.66666666666, ans=0.125 2023-12-21 21:46:55,691 INFO [train.py:886] (0/4) Epoch 8, batch 3850, loss[loss=0.01557, audio_tagging_loss=0.01557, over 25000.00 frames. ], tot_loss[loss=0.01571, audio_tagging_loss=0.01571, over 4939400.59 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:46:59,388 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.593e+01 2.766e+01 2.914e+01 3.476e+01, threshold=5.531e+01, percent-clipped=0.0 2023-12-21 21:47:00,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=248080.0, ans=0.1 2023-12-21 21:47:11,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=248146.66666666666, ans=0.0 2023-12-21 21:47:16,185 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.54 vs. limit=10.0 2023-12-21 21:47:27,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=248280.0, ans=0.125 2023-12-21 21:47:29,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=248280.0, ans=0.07 2023-12-21 21:47:36,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=248346.66666666666, ans=0.0 2023-12-21 21:47:46,994 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.27 vs. limit=12.0 2023-12-21 21:47:47,208 INFO [train.py:886] (0/4) Epoch 8, batch 3900, loss[loss=0.0143, audio_tagging_loss=0.0143, over 25000.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4941861.49 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:47:56,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=248480.0, ans=0.0 2023-12-21 21:47:56,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=248480.0, ans=0.0 2023-12-21 21:47:56,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-12-21 21:48:00,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=248480.0, ans=0.2 2023-12-21 21:48:15,495 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:48:17,679 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.02 vs. limit=10.0 2023-12-21 21:48:22,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=248613.33333333334, ans=0.0 2023-12-21 21:48:22,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=248613.33333333334, ans=0.0 2023-12-21 21:48:38,315 INFO [train.py:886] (0/4) Epoch 8, batch 3950, loss[loss=0.01457, audio_tagging_loss=0.01457, over 25000.00 frames. ], tot_loss[loss=0.01552, audio_tagging_loss=0.01552, over 4946555.28 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:48:42,028 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.574e+01 2.671e+01 2.865e+01 3.681e+01, threshold=5.342e+01, percent-clipped=0.0 2023-12-21 21:49:03,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=248880.0, ans=0.125 2023-12-21 21:49:05,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2023-12-21 21:49:30,365 INFO [train.py:886] (0/4) Epoch 8, batch 4000, loss[loss=0.01596, audio_tagging_loss=0.01596, over 25000.00 frames. ], tot_loss[loss=0.0155, audio_tagging_loss=0.0155, over 4948606.52 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:49:38,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=249080.0, ans=0.125 2023-12-21 21:49:47,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=249146.66666666666, ans=0.125 2023-12-21 21:50:04,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=249280.0, ans=0.125 2023-12-21 21:50:21,744 INFO [train.py:886] (0/4) Epoch 8, batch 4050, loss[loss=0.0158, audio_tagging_loss=0.0158, over 24750.00 frames. ], tot_loss[loss=0.01557, audio_tagging_loss=0.01557, over 4941981.17 frames. ], batch size: 99, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:50:27,058 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.602e+01 2.760e+01 2.948e+01 3.649e+01, threshold=5.519e+01, percent-clipped=0.0 2023-12-21 21:50:31,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=249480.0, ans=0.125 2023-12-21 21:50:39,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=249480.0, ans=0.0 2023-12-21 21:50:46,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=249546.66666666666, ans=0.125 2023-12-21 21:50:48,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=249546.66666666666, ans=0.125 2023-12-21 21:50:52,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=249613.33333333334, ans=0.125 2023-12-21 21:51:09,278 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.709e+00 2023-12-21 21:51:13,766 INFO [train.py:886] (0/4) Epoch 8, batch 4100, loss[loss=0.01858, audio_tagging_loss=0.01858, over 24750.00 frames. ], tot_loss[loss=0.01577, audio_tagging_loss=0.01577, over 4938764.41 frames. ], batch size: 99, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:51:41,332 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2023-12-21 21:51:50,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.92 vs. limit=10.0 2023-12-21 21:51:59,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=250013.33333333334, ans=0.125 2023-12-21 21:52:05,000 INFO [train.py:886] (0/4) Epoch 8, batch 4150, loss[loss=0.01437, audio_tagging_loss=0.01437, over 24045.00 frames. ], tot_loss[loss=0.01571, audio_tagging_loss=0.01571, over 4933303.08 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:52:05,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.58 vs. limit=12.0 2023-12-21 21:52:08,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=250080.0, ans=0.5 2023-12-21 21:52:10,483 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.358e+01 2.710e+01 2.835e+01 2.954e+01 3.813e+01, threshold=5.670e+01, percent-clipped=0.0 2023-12-21 21:52:11,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=250080.0, ans=0.125 2023-12-21 21:52:13,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=250080.0, ans=15.0 2023-12-21 21:52:24,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=250213.33333333334, ans=0.2 2023-12-21 21:52:45,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.17 vs. limit=22.5 2023-12-21 21:52:56,694 INFO [train.py:886] (0/4) Epoch 8, batch 4200, loss[loss=0.01575, audio_tagging_loss=0.01575, over 25000.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4935931.07 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:52:56,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=250413.33333333334, ans=0.2 2023-12-21 21:53:22,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=250546.66666666666, ans=0.035 2023-12-21 21:53:27,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=250613.33333333334, ans=0.125 2023-12-21 21:53:37,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=250680.0, ans=0.125 2023-12-21 21:53:44,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=250680.0, ans=0.2 2023-12-21 21:53:44,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=250680.0, ans=0.0 2023-12-21 21:53:45,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2023-12-21 21:53:49,345 INFO [train.py:886] (0/4) Epoch 8, batch 4250, loss[loss=0.01503, audio_tagging_loss=0.01503, over 25000.00 frames. ], tot_loss[loss=0.01555, audio_tagging_loss=0.01555, over 4939399.71 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:53:53,318 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.48 vs. limit=15.0 2023-12-21 21:53:54,769 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.560e+01 2.714e+01 2.924e+01 4.261e+01, threshold=5.428e+01, percent-clipped=0.0 2023-12-21 21:54:07,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=250813.33333333334, ans=0.1 2023-12-21 21:54:14,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=250880.0, ans=0.125 2023-12-21 21:54:29,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=250946.66666666666, ans=0.1 2023-12-21 21:54:29,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=250946.66666666666, ans=0.125 2023-12-21 21:54:41,044 INFO [train.py:886] (0/4) Epoch 8, batch 4300, loss[loss=0.01632, audio_tagging_loss=0.01632, over 24750.00 frames. ], tot_loss[loss=0.01548, audio_tagging_loss=0.01548, over 4948544.23 frames. ], batch size: 99, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:54:43,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=12.0 2023-12-21 21:54:44,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=251080.0, ans=0.0 2023-12-21 21:54:54,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=251146.66666666666, ans=0.125 2023-12-21 21:55:10,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=251280.0, ans=0.125 2023-12-21 21:55:18,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=251280.0, ans=0.2 2023-12-21 21:55:27,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=251346.66666666666, ans=0.125 2023-12-21 21:55:31,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.93 vs. limit=22.5 2023-12-21 21:55:33,253 INFO [train.py:886] (0/4) Epoch 8, batch 4350, loss[loss=0.01297, audio_tagging_loss=0.01297, over 24750.00 frames. ], tot_loss[loss=0.01562, audio_tagging_loss=0.01562, over 4957911.65 frames. ], batch size: 99, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:55:37,905 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 2.680e+01 2.895e+01 3.105e+01 4.359e+01, threshold=5.791e+01, percent-clipped=0.0 2023-12-21 21:55:37,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=251413.33333333334, ans=0.015 2023-12-21 21:55:38,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=251413.33333333334, ans=0.2 2023-12-21 21:55:52,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=251546.66666666666, ans=0.0 2023-12-21 21:55:54,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.92 vs. limit=15.0 2023-12-21 21:55:55,767 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.58 vs. limit=6.0 2023-12-21 21:56:03,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.94 vs. limit=22.5 2023-12-21 21:56:09,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=251613.33333333334, ans=0.2 2023-12-21 21:56:24,817 INFO [train.py:886] (0/4) Epoch 8, batch 4400, loss[loss=0.01685, audio_tagging_loss=0.01685, over 24750.00 frames. ], tot_loss[loss=0.01574, audio_tagging_loss=0.01574, over 4956877.87 frames. ], batch size: 99, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:56:29,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=251746.66666666666, ans=0.05 2023-12-21 21:57:10,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=252013.33333333334, ans=10.0 2023-12-21 21:57:11,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.79 vs. limit=15.0 2023-12-21 21:57:16,303 INFO [train.py:886] (0/4) Epoch 8, batch 4450, loss[loss=0.01594, audio_tagging_loss=0.01594, over 24750.00 frames. ], tot_loss[loss=0.01574, audio_tagging_loss=0.01574, over 4950030.02 frames. ], batch size: 99, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:57:16,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.00 vs. limit=15.0 2023-12-21 21:57:21,713 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+01 2.660e+01 2.808e+01 3.032e+01 4.377e+01, threshold=5.615e+01, percent-clipped=0.0 2023-12-21 21:57:37,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2023-12-21 21:57:44,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=252213.33333333334, ans=0.125 2023-12-21 21:57:51,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=252280.0, ans=0.0 2023-12-21 21:57:58,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.96 vs. limit=5.0 2023-12-21 21:58:08,660 INFO [train.py:886] (0/4) Epoch 8, batch 4500, loss[loss=0.01392, audio_tagging_loss=0.01392, over 25000.00 frames. ], tot_loss[loss=0.01565, audio_tagging_loss=0.01565, over 4955347.03 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:58:13,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=252413.33333333334, ans=0.1 2023-12-21 21:58:18,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=15.0 2023-12-21 21:58:26,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.31 vs. limit=10.0 2023-12-21 21:58:56,266 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.723e-01 2023-12-21 21:59:00,798 INFO [train.py:886] (0/4) Epoch 8, batch 4550, loss[loss=0.01376, audio_tagging_loss=0.01376, over 25000.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4947028.18 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:59:06,127 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.575e+01 2.765e+01 2.949e+01 3.693e+01, threshold=5.529e+01, percent-clipped=0.0 2023-12-21 21:59:08,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=252746.66666666666, ans=0.1 2023-12-21 21:59:20,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=252880.0, ans=0.125 2023-12-21 21:59:21,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=252880.0, ans=0.0 2023-12-21 21:59:52,614 INFO [train.py:886] (0/4) Epoch 8, batch 4600, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01557, audio_tagging_loss=0.01557, over 4948334.31 frames. ], batch size: 99, lr: 1.32e-02, grad_scale: 64.0 2023-12-21 22:00:07,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.16 vs. limit=15.0 2023-12-21 22:00:08,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=253146.66666666666, ans=0.05 2023-12-21 22:00:09,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=253146.66666666666, ans=0.125 2023-12-21 22:00:14,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=253213.33333333334, ans=0.125 2023-12-21 22:00:19,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.27 vs. limit=10.0 2023-12-21 22:00:42,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=19.96 vs. limit=15.0 2023-12-21 22:00:45,210 INFO [train.py:886] (0/4) Epoch 8, batch 4650, loss[loss=0.01558, audio_tagging_loss=0.01558, over 25000.00 frames. ], tot_loss[loss=0.01564, audio_tagging_loss=0.01564, over 4952218.67 frames. ], batch size: 100, lr: 1.32e-02, grad_scale: 64.0 2023-12-21 22:00:50,676 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.620e+01 2.747e+01 2.927e+01 3.887e+01, threshold=5.494e+01, percent-clipped=0.0 2023-12-21 22:01:03,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=253480.0, ans=0.125 2023-12-21 22:01:20,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-12-21 22:01:35,714 INFO [train.py:886] (0/4) Epoch 8, batch 4700, loss[loss=0.0159, audio_tagging_loss=0.0159, over 24750.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 4954491.53 frames. ], batch size: 99, lr: 1.32e-02, grad_scale: 64.0 2023-12-21 22:01:39,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=253746.66666666666, ans=0.015 2023-12-21 22:01:41,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.89 vs. limit=15.0 2023-12-21 22:01:47,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=253813.33333333334, ans=0.0 2023-12-21 22:01:57,479 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.13 vs. limit=6.0 2023-12-21 22:02:01,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=253880.0, ans=0.125 2023-12-21 22:02:03,701 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=30.91 vs. limit=22.5 2023-12-21 22:02:14,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=254013.33333333334, ans=0.2 2023-12-21 22:02:18,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=12.0 2023-12-21 22:02:23,287 INFO [train.py:886] (0/4) Epoch 8, batch 4750, loss[loss=0.01525, audio_tagging_loss=0.01525, over 24750.00 frames. ], tot_loss[loss=0.01571, audio_tagging_loss=0.01571, over 4944741.61 frames. ], batch size: 99, lr: 1.32e-02, grad_scale: 64.0 2023-12-21 22:02:23,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=254080.0, ans=0.0 2023-12-21 22:02:27,780 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.333e+01 2.666e+01 2.796e+01 2.993e+01 3.759e+01, threshold=5.593e+01, percent-clipped=0.0 2023-12-21 22:02:29,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.58 vs. limit=22.5 2023-12-21 22:02:31,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=254146.66666666666, ans=0.1 2023-12-21 22:02:36,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=254146.66666666666, ans=0.07 2023-12-21 22:02:38,401 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-8.pt 2023-12-21 22:03:00,065 INFO [train.py:886] (0/4) Epoch 9, batch 0, loss[loss=0.04684, audio_tagging_loss=0.04684, over 21222.00 frames. ], tot_loss[loss=0.04684, audio_tagging_loss=0.04684, over 21222.00 frames. ], batch size: 107, lr: 1.25e-02, grad_scale: 64.0 2023-12-21 22:03:00,067 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 22:03:21,366 INFO [train.py:917] (0/4) Epoch 9, validation: loss=0.03498, audio_tagging_loss=0.03498, over 3737520.00 frames. 2023-12-21 22:03:21,367 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 22:03:23,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=254186.66666666666, ans=0.0 2023-12-21 22:03:35,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=254253.33333333334, ans=0.2 2023-12-21 22:03:39,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=254253.33333333334, ans=0.0 2023-12-21 22:04:01,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=254453.33333333334, ans=0.125 2023-12-21 22:04:05,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.84 vs. limit=6.0 2023-12-21 22:04:06,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.45 vs. limit=10.0 2023-12-21 22:04:12,821 INFO [train.py:886] (0/4) Epoch 9, batch 50, loss[loss=0.02012, audio_tagging_loss=0.02012, over 25000.00 frames. ], tot_loss[loss=0.02474, audio_tagging_loss=0.02474, over 1121311.21 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:04:17,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=254520.0, ans=0.2 2023-12-21 22:04:20,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=254520.0, ans=0.025 2023-12-21 22:04:23,108 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 22:04:35,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=254653.33333333334, ans=0.125 2023-12-21 22:04:37,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=254653.33333333334, ans=0.125 2023-12-21 22:04:41,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=254653.33333333334, ans=0.125 2023-12-21 22:04:54,282 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.647e+01 3.011e+01 3.284e+01 3.905e+01 1.113e+02, threshold=6.568e+01, percent-clipped=8.0 2023-12-21 22:04:54,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=254786.66666666666, ans=0.125 2023-12-21 22:04:59,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=254786.66666666666, ans=0.2 2023-12-21 22:05:00,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=254786.66666666666, ans=0.1 2023-12-21 22:05:04,575 INFO [train.py:886] (0/4) Epoch 9, batch 100, loss[loss=0.01722, audio_tagging_loss=0.01722, over 25000.00 frames. ], tot_loss[loss=0.02153, audio_tagging_loss=0.02153, over 1973445.80 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:05:24,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=254986.66666666666, ans=0.0 2023-12-21 22:05:24,648 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.95 vs. limit=15.0 2023-12-21 22:05:31,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=254986.66666666666, ans=0.125 2023-12-21 22:05:32,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=254986.66666666666, ans=12.0 2023-12-21 22:05:35,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=255053.33333333334, ans=0.125 2023-12-21 22:05:46,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=255120.0, ans=0.125 2023-12-21 22:05:50,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=255120.0, ans=0.125 2023-12-21 22:05:55,432 INFO [train.py:886] (0/4) Epoch 9, batch 150, loss[loss=0.01851, audio_tagging_loss=0.01851, over 25000.00 frames. ], tot_loss[loss=0.0194, audio_tagging_loss=0.0194, over 2630632.03 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:06:37,347 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.420e+01 2.677e+01 2.811e+01 2.976e+01 3.484e+01, threshold=5.622e+01, percent-clipped=0.0 2023-12-21 22:06:47,042 INFO [train.py:886] (0/4) Epoch 9, batch 200, loss[loss=0.01572, audio_tagging_loss=0.01572, over 25000.00 frames. ], tot_loss[loss=0.01823, audio_tagging_loss=0.01823, over 3149500.02 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:06:58,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=255586.66666666666, ans=0.0 2023-12-21 22:07:10,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=255653.33333333334, ans=0.125 2023-12-21 22:07:21,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=255720.0, ans=0.04949747468305833 2023-12-21 22:07:24,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=255720.0, ans=0.125 2023-12-21 22:07:25,941 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 22:07:39,167 INFO [train.py:886] (0/4) Epoch 9, batch 250, loss[loss=0.01369, audio_tagging_loss=0.01369, over 23994.00 frames. ], tot_loss[loss=0.01759, audio_tagging_loss=0.01759, over 3555371.82 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:07:51,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=255920.0, ans=0.1 2023-12-21 22:07:57,567 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.38 vs. limit=8.0 2023-12-21 22:08:04,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=255986.66666666666, ans=0.2 2023-12-21 22:08:05,565 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.738e-02 2023-12-21 22:08:20,987 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 2.572e+01 2.690e+01 2.867e+01 4.305e+01, threshold=5.380e+01, percent-clipped=0.0 2023-12-21 22:08:21,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=256120.0, ans=0.125 2023-12-21 22:08:30,616 INFO [train.py:886] (0/4) Epoch 9, batch 300, loss[loss=0.01533, audio_tagging_loss=0.01533, over 24750.00 frames. ], tot_loss[loss=0.01695, audio_tagging_loss=0.01695, over 3863825.38 frames. ], batch size: 99, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:08:30,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=256186.66666666666, ans=0.125 2023-12-21 22:09:02,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=256386.66666666666, ans=0.0 2023-12-21 22:09:16,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=256453.33333333334, ans=0.0 2023-12-21 22:09:16,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=256453.33333333334, ans=0.125 2023-12-21 22:09:17,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=256453.33333333334, ans=0.125 2023-12-21 22:09:18,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=256453.33333333334, ans=0.0 2023-12-21 22:09:18,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=256453.33333333334, ans=0.125 2023-12-21 22:09:18,358 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.273e-01 2023-12-21 22:09:21,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=256453.33333333334, ans=0.125 2023-12-21 22:09:23,826 INFO [train.py:886] (0/4) Epoch 9, batch 350, loss[loss=0.0124, audio_tagging_loss=0.0124, over 24750.00 frames. ], tot_loss[loss=0.01668, audio_tagging_loss=0.01668, over 4094418.28 frames. ], batch size: 99, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:09:27,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=256520.0, ans=0.125 2023-12-21 22:09:34,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.17 vs. limit=22.5 2023-12-21 22:09:51,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=256653.33333333334, ans=0.125 2023-12-21 22:10:04,507 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+01 2.569e+01 2.774e+01 2.957e+01 3.605e+01, threshold=5.548e+01, percent-clipped=0.0 2023-12-21 22:10:15,404 INFO [train.py:886] (0/4) Epoch 9, batch 400, loss[loss=0.01709, audio_tagging_loss=0.01709, over 25000.00 frames. ], tot_loss[loss=0.0163, audio_tagging_loss=0.0163, over 4280982.55 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:10:19,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=256853.33333333334, ans=0.125 2023-12-21 22:10:21,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=256853.33333333334, ans=0.1 2023-12-21 22:10:33,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=256920.0, ans=0.2 2023-12-21 22:10:40,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=256986.66666666666, ans=0.125 2023-12-21 22:10:55,507 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=15.0 2023-12-21 22:11:07,307 INFO [train.py:886] (0/4) Epoch 9, batch 450, loss[loss=0.01767, audio_tagging_loss=0.01767, over 25000.00 frames. ], tot_loss[loss=0.01606, audio_tagging_loss=0.01606, over 4433745.12 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:11:11,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=257186.66666666666, ans=0.05 2023-12-21 22:11:22,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=257253.33333333334, ans=0.125 2023-12-21 22:11:22,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=257253.33333333334, ans=10.0 2023-12-21 22:11:26,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=257253.33333333334, ans=0.0 2023-12-21 22:11:41,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=257386.66666666666, ans=0.0 2023-12-21 22:11:42,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=257386.66666666666, ans=0.1 2023-12-21 22:11:47,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=257386.66666666666, ans=0.125 2023-12-21 22:11:48,939 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.608e+01 2.793e+01 2.951e+01 3.727e+01, threshold=5.585e+01, percent-clipped=0.0 2023-12-21 22:12:00,516 INFO [train.py:886] (0/4) Epoch 9, batch 500, loss[loss=0.01462, audio_tagging_loss=0.01462, over 25000.00 frames. ], tot_loss[loss=0.01582, audio_tagging_loss=0.01582, over 4552469.26 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:12:25,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=257653.33333333334, ans=0.0 2023-12-21 22:12:31,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.91 vs. limit=22.5 2023-12-21 22:12:51,066 INFO [train.py:886] (0/4) Epoch 9, batch 550, loss[loss=0.01661, audio_tagging_loss=0.01661, over 25000.00 frames. ], tot_loss[loss=0.0157, audio_tagging_loss=0.0157, over 4648272.58 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:12:51,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=257853.33333333334, ans=0.125 2023-12-21 22:13:02,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=257920.0, ans=0.125 2023-12-21 22:13:18,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=257986.66666666666, ans=0.0 2023-12-21 22:13:22,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2023-12-21 22:13:32,266 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 2.530e+01 2.671e+01 2.882e+01 3.618e+01, threshold=5.343e+01, percent-clipped=0.0 2023-12-21 22:13:38,870 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.02 vs. limit=15.0 2023-12-21 22:13:43,174 INFO [train.py:886] (0/4) Epoch 9, batch 600, loss[loss=0.01553, audio_tagging_loss=0.01553, over 24750.00 frames. ], tot_loss[loss=0.01571, audio_tagging_loss=0.01571, over 4718526.43 frames. ], batch size: 99, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:13:48,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=258186.66666666666, ans=0.2 2023-12-21 22:14:22,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=258386.66666666666, ans=0.125 2023-12-21 22:14:27,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=258453.33333333334, ans=0.125 2023-12-21 22:14:28,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=258453.33333333334, ans=0.0 2023-12-21 22:14:29,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=258453.33333333334, ans=0.0 2023-12-21 22:14:30,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=258453.33333333334, ans=0.0 2023-12-21 22:14:34,961 INFO [train.py:886] (0/4) Epoch 9, batch 650, loss[loss=0.01645, audio_tagging_loss=0.01645, over 24750.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 4761886.68 frames. ], batch size: 99, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:14:39,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=258520.0, ans=0.125 2023-12-21 22:14:51,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=258586.66666666666, ans=0.025 2023-12-21 22:15:17,029 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+01 2.595e+01 2.763e+01 2.915e+01 3.436e+01, threshold=5.525e+01, percent-clipped=0.0 2023-12-21 22:15:25,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=258853.33333333334, ans=0.125 2023-12-21 22:15:26,592 INFO [train.py:886] (0/4) Epoch 9, batch 700, loss[loss=0.01569, audio_tagging_loss=0.01569, over 22524.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 4796341.15 frames. ], batch size: 107, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:15:31,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.15 vs. limit=15.0 2023-12-21 22:15:33,585 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.31 vs. limit=15.0 2023-12-21 22:15:48,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2023-12-21 22:15:52,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2023-12-21 22:16:04,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=259053.33333333334, ans=0.125 2023-12-21 22:16:06,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=259053.33333333334, ans=0.07 2023-12-21 22:16:19,300 INFO [train.py:886] (0/4) Epoch 9, batch 750, loss[loss=0.01655, audio_tagging_loss=0.01655, over 24750.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4831929.04 frames. ], batch size: 99, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:16:35,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=259253.33333333334, ans=0.125 2023-12-21 22:16:41,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2023-12-21 22:16:44,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=259320.0, ans=0.125 2023-12-21 22:16:52,862 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.279e+00 2023-12-21 22:16:58,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=259386.66666666666, ans=0.125 2023-12-21 22:17:01,047 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.547e+01 2.724e+01 2.898e+01 3.402e+01, threshold=5.448e+01, percent-clipped=0.0 2023-12-21 22:17:11,291 INFO [train.py:886] (0/4) Epoch 9, batch 800, loss[loss=0.01547, audio_tagging_loss=0.01547, over 25000.00 frames. ], tot_loss[loss=0.01549, audio_tagging_loss=0.01549, over 4860584.81 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:17:12,414 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 22:17:44,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=259720.0, ans=0.1 2023-12-21 22:18:03,813 INFO [train.py:886] (0/4) Epoch 9, batch 850, loss[loss=0.0134, audio_tagging_loss=0.0134, over 25000.00 frames. ], tot_loss[loss=0.0155, audio_tagging_loss=0.0155, over 4884647.35 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:18:06,910 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.228e+00 2023-12-21 22:18:16,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.33 vs. limit=22.5 2023-12-21 22:18:34,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=260053.33333333334, ans=0.125 2023-12-21 22:18:35,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=260053.33333333334, ans=0.125 2023-12-21 22:18:43,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=260053.33333333334, ans=0.125 2023-12-21 22:18:45,098 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.246e+01 2.660e+01 2.834e+01 3.014e+01 3.541e+01, threshold=5.667e+01, percent-clipped=0.0 2023-12-21 22:18:56,045 INFO [train.py:886] (0/4) Epoch 9, batch 900, loss[loss=0.01462, audio_tagging_loss=0.01462, over 24750.00 frames. ], tot_loss[loss=0.01554, audio_tagging_loss=0.01554, over 4900708.32 frames. ], batch size: 99, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:18:57,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=260186.66666666666, ans=0.125 2023-12-21 22:19:09,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=260253.33333333334, ans=0.0 2023-12-21 22:19:36,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=260453.33333333334, ans=0.2 2023-12-21 22:19:48,418 INFO [train.py:886] (0/4) Epoch 9, batch 950, loss[loss=0.01629, audio_tagging_loss=0.01629, over 24037.00 frames. ], tot_loss[loss=0.01557, audio_tagging_loss=0.01557, over 4907101.58 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:20:18,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=260720.0, ans=0.125 2023-12-21 22:20:20,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=260720.0, ans=0.0 2023-12-21 22:20:29,896 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 2.673e+01 2.825e+01 3.002e+01 3.457e+01, threshold=5.650e+01, percent-clipped=0.0 2023-12-21 22:20:30,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.90 vs. limit=15.0 2023-12-21 22:20:34,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=260786.66666666666, ans=0.125 2023-12-21 22:20:35,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=260786.66666666666, ans=0.125 2023-12-21 22:20:39,988 INFO [train.py:886] (0/4) Epoch 9, batch 1000, loss[loss=0.01526, audio_tagging_loss=0.01526, over 24750.00 frames. ], tot_loss[loss=0.01549, audio_tagging_loss=0.01549, over 4912648.19 frames. ], batch size: 99, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:20:54,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=260920.0, ans=0.125 2023-12-21 22:21:09,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=260986.66666666666, ans=15.0 2023-12-21 22:21:30,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=261120.0, ans=0.2 2023-12-21 22:21:32,234 INFO [train.py:886] (0/4) Epoch 9, batch 1050, loss[loss=0.01029, audio_tagging_loss=0.01029, over 24062.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 4916815.23 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:21:49,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=261253.33333333334, ans=0.2 2023-12-21 22:22:06,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=261386.66666666666, ans=0.0 2023-12-21 22:22:07,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.78 vs. limit=10.0 2023-12-21 22:22:11,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=261386.66666666666, ans=0.1 2023-12-21 22:22:13,860 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.578e+01 2.720e+01 2.898e+01 3.660e+01, threshold=5.440e+01, percent-clipped=0.0 2023-12-21 22:22:17,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=261453.33333333334, ans=0.07 2023-12-21 22:22:18,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=261453.33333333334, ans=0.125 2023-12-21 22:22:23,388 INFO [train.py:886] (0/4) Epoch 9, batch 1100, loss[loss=0.01504, audio_tagging_loss=0.01504, over 24104.00 frames. ], tot_loss[loss=0.01524, audio_tagging_loss=0.01524, over 4930242.01 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:22:35,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2023-12-21 22:22:41,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=261586.66666666666, ans=0.125 2023-12-21 22:22:42,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.10 vs. limit=22.5 2023-12-21 22:22:51,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=261653.33333333334, ans=0.125 2023-12-21 22:22:51,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=261653.33333333334, ans=0.125 2023-12-21 22:23:10,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=261786.66666666666, ans=0.125 2023-12-21 22:23:14,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=261786.66666666666, ans=0.125 2023-12-21 22:23:16,837 INFO [train.py:886] (0/4) Epoch 9, batch 1150, loss[loss=0.0181, audio_tagging_loss=0.0181, over 25000.00 frames. ], tot_loss[loss=0.01529, audio_tagging_loss=0.01529, over 4939786.62 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:23:21,909 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.94 vs. limit=22.5 2023-12-21 22:23:22,011 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2023-12-21 22:23:23,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=261853.33333333334, ans=0.125 2023-12-21 22:23:34,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.32 vs. limit=15.0 2023-12-21 22:23:54,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=262053.33333333334, ans=0.0 2023-12-21 22:23:55,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=262053.33333333334, ans=0.0 2023-12-21 22:23:57,573 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.233e+01 2.612e+01 2.792e+01 2.985e+01 3.661e+01, threshold=5.583e+01, percent-clipped=0.0 2023-12-21 22:24:07,923 INFO [train.py:886] (0/4) Epoch 9, batch 1200, loss[loss=0.01619, audio_tagging_loss=0.01619, over 25000.00 frames. ], tot_loss[loss=0.01541, audio_tagging_loss=0.01541, over 4944002.67 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:24:10,353 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2023-12-21 22:24:24,911 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2023-12-21 22:24:40,420 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.23 vs. limit=22.5 2023-12-21 22:24:43,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=262386.6666666667, ans=0.1 2023-12-21 22:24:49,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=262453.3333333333, ans=0.125 2023-12-21 22:24:59,129 INFO [train.py:886] (0/4) Epoch 9, batch 1250, loss[loss=0.01692, audio_tagging_loss=0.01692, over 25000.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4948089.84 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:24:59,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.09 vs. limit=6.0 2023-12-21 22:25:07,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=262586.6666666667, ans=0.125 2023-12-21 22:25:09,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=262586.6666666667, ans=0.125 2023-12-21 22:25:11,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=262586.6666666667, ans=0.0 2023-12-21 22:25:21,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=262653.3333333333, ans=0.125 2023-12-21 22:25:35,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=262720.0, ans=0.2 2023-12-21 22:25:39,890 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.421e+01 2.627e+01 2.793e+01 3.069e+01 3.716e+01, threshold=5.586e+01, percent-clipped=0.0 2023-12-21 22:25:52,095 INFO [train.py:886] (0/4) Epoch 9, batch 1300, loss[loss=0.01767, audio_tagging_loss=0.01767, over 24750.00 frames. ], tot_loss[loss=0.01555, audio_tagging_loss=0.01555, over 4941922.01 frames. ], batch size: 99, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:25:58,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.06 vs. limit=15.0 2023-12-21 22:26:00,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=262920.0, ans=0.125 2023-12-21 22:26:03,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=262920.0, ans=0.125 2023-12-21 22:26:04,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=262920.0, ans=10.0 2023-12-21 22:26:10,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=262986.6666666667, ans=0.125 2023-12-21 22:26:24,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.97 vs. limit=15.0 2023-12-21 22:26:26,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=263053.3333333333, ans=0.125 2023-12-21 22:26:29,565 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.55 vs. limit=10.0 2023-12-21 22:26:30,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=263053.3333333333, ans=0.1 2023-12-21 22:26:37,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=263120.0, ans=0.125 2023-12-21 22:26:41,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.50 vs. limit=22.5 2023-12-21 22:26:42,324 INFO [train.py:886] (0/4) Epoch 9, batch 1350, loss[loss=0.0161, audio_tagging_loss=0.0161, over 25000.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4946530.26 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:26:54,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-12-21 22:27:20,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.23 vs. limit=8.0 2023-12-21 22:27:25,414 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+01 2.597e+01 2.714e+01 2.927e+01 3.624e+01, threshold=5.427e+01, percent-clipped=0.0 2023-12-21 22:27:28,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=263453.3333333333, ans=0.025 2023-12-21 22:27:35,541 INFO [train.py:886] (0/4) Epoch 9, batch 1400, loss[loss=0.0125, audio_tagging_loss=0.0125, over 25000.00 frames. ], tot_loss[loss=0.01535, audio_tagging_loss=0.01535, over 4956140.48 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:27:35,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=263520.0, ans=0.125 2023-12-21 22:27:48,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=263586.6666666667, ans=0.2 2023-12-21 22:27:48,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=263586.6666666667, ans=0.0 2023-12-21 22:27:52,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.0 2023-12-21 22:27:53,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=263586.6666666667, ans=0.0 2023-12-21 22:28:07,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=263720.0, ans=0.125 2023-12-21 22:28:12,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=263720.0, ans=0.125 2023-12-21 22:28:22,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=263786.6666666667, ans=0.0 2023-12-21 22:28:22,585 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-12-21 22:28:26,537 INFO [train.py:886] (0/4) Epoch 9, batch 1450, loss[loss=0.01396, audio_tagging_loss=0.01396, over 25000.00 frames. ], tot_loss[loss=0.01532, audio_tagging_loss=0.01532, over 4955169.07 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:28:42,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=263920.0, ans=0.125 2023-12-21 22:28:55,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=263986.6666666667, ans=22.5 2023-12-21 22:29:00,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=264053.3333333333, ans=0.125 2023-12-21 22:29:08,305 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.293e+01 2.534e+01 2.742e+01 2.913e+01 3.478e+01, threshold=5.484e+01, percent-clipped=0.0 2023-12-21 22:29:13,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=264120.0, ans=0.125 2023-12-21 22:29:13,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=264120.0, ans=0.0 2023-12-21 22:29:17,813 INFO [train.py:886] (0/4) Epoch 9, batch 1500, loss[loss=0.01564, audio_tagging_loss=0.01564, over 25000.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 4960239.24 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:29:41,567 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2023-12-21 22:29:42,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=264320.0, ans=0.0 2023-12-21 22:29:54,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=264386.6666666667, ans=0.125 2023-12-21 22:29:57,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=264453.3333333333, ans=0.125 2023-12-21 22:30:02,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=264453.3333333333, ans=0.2 2023-12-21 22:30:10,259 INFO [train.py:886] (0/4) Epoch 9, batch 1550, loss[loss=0.01665, audio_tagging_loss=0.01665, over 24956.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4959666.23 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:30:36,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=264653.3333333333, ans=0.0 2023-12-21 22:30:38,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2023-12-21 22:30:50,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=264786.6666666667, ans=0.2 2023-12-21 22:30:51,441 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.326e+01 2.715e+01 2.913e+01 3.058e+01 3.707e+01, threshold=5.826e+01, percent-clipped=0.0 2023-12-21 22:30:53,694 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=15.0 2023-12-21 22:31:00,773 INFO [train.py:886] (0/4) Epoch 9, batch 1600, loss[loss=0.0163, audio_tagging_loss=0.0163, over 24938.00 frames. ], tot_loss[loss=0.01557, audio_tagging_loss=0.01557, over 4948097.17 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:31:06,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=264853.3333333333, ans=0.125 2023-12-21 22:31:14,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=15.0 2023-12-21 22:31:21,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2023-12-21 22:31:30,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=265053.3333333333, ans=0.125 2023-12-21 22:31:43,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=265120.0, ans=0.125 2023-12-21 22:31:54,073 INFO [train.py:886] (0/4) Epoch 9, batch 1650, loss[loss=0.0145, audio_tagging_loss=0.0145, over 25000.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4947957.73 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:31:58,063 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 22:32:07,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=265253.3333333333, ans=0.125 2023-12-21 22:32:16,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=265320.0, ans=0.1 2023-12-21 22:32:29,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.05 vs. limit=22.5 2023-12-21 22:32:30,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=265386.6666666667, ans=0.2 2023-12-21 22:32:30,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.53 vs. limit=22.5 2023-12-21 22:32:33,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.48 vs. limit=22.5 2023-12-21 22:32:35,015 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+01 2.651e+01 2.793e+01 3.063e+01 3.586e+01, threshold=5.586e+01, percent-clipped=0.0 2023-12-21 22:32:35,272 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.383e-01 2023-12-21 22:32:42,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=15.0 2023-12-21 22:32:46,500 INFO [train.py:886] (0/4) Epoch 9, batch 1700, loss[loss=0.01688, audio_tagging_loss=0.01688, over 25000.00 frames. ], tot_loss[loss=0.01548, audio_tagging_loss=0.01548, over 4946257.31 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:33:09,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=265653.3333333333, ans=0.1 2023-12-21 22:33:14,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=265653.3333333333, ans=0.125 2023-12-21 22:33:26,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=265720.0, ans=0.125 2023-12-21 22:33:34,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=265786.6666666667, ans=0.0 2023-12-21 22:33:37,680 INFO [train.py:886] (0/4) Epoch 9, batch 1750, loss[loss=0.01455, audio_tagging_loss=0.01455, over 25000.00 frames. ], tot_loss[loss=0.01527, audio_tagging_loss=0.01527, over 4952752.60 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:33:39,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=265853.3333333333, ans=0.0 2023-12-21 22:33:42,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.42 vs. limit=22.5 2023-12-21 22:33:42,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=265853.3333333333, ans=0.0 2023-12-21 22:33:44,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=265853.3333333333, ans=10.0 2023-12-21 22:33:50,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=265920.0, ans=0.0 2023-12-21 22:33:52,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.38 vs. limit=10.0 2023-12-21 22:33:56,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=265920.0, ans=0.0 2023-12-21 22:34:18,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=266120.0, ans=0.07 2023-12-21 22:34:18,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=266120.0, ans=0.125 2023-12-21 22:34:19,238 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.611e+01 2.763e+01 2.973e+01 3.566e+01, threshold=5.527e+01, percent-clipped=0.0 2023-12-21 22:34:30,208 INFO [train.py:886] (0/4) Epoch 9, batch 1800, loss[loss=0.01393, audio_tagging_loss=0.01393, over 25000.00 frames. ], tot_loss[loss=0.01532, audio_tagging_loss=0.01532, over 4950634.37 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:34:36,205 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=27.02 vs. limit=22.5 2023-12-21 22:34:37,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.16 vs. limit=10.0 2023-12-21 22:34:43,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=266253.3333333333, ans=0.125 2023-12-21 22:34:45,572 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2023-12-21 22:34:46,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=266253.3333333333, ans=0.0 2023-12-21 22:34:46,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=266253.3333333333, ans=0.125 2023-12-21 22:34:50,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=266320.0, ans=0.0 2023-12-21 22:34:52,025 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-12-21 22:35:03,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=266386.6666666667, ans=0.0 2023-12-21 22:35:04,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=266386.6666666667, ans=0.2 2023-12-21 22:35:21,918 INFO [train.py:886] (0/4) Epoch 9, batch 1850, loss[loss=0.01401, audio_tagging_loss=0.01401, over 25000.00 frames. ], tot_loss[loss=0.01546, audio_tagging_loss=0.01546, over 4953417.53 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:35:22,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.46 vs. limit=15.0 2023-12-21 22:35:44,098 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-40000.pt 2023-12-21 22:35:57,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=266720.0, ans=0.0 2023-12-21 22:35:59,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=266720.0, ans=0.125 2023-12-21 22:36:05,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=266786.6666666667, ans=0.0 2023-12-21 22:36:05,786 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.271e+01 2.665e+01 2.807e+01 3.010e+01 3.699e+01, threshold=5.613e+01, percent-clipped=0.0 2023-12-21 22:36:13,508 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.491e-02 2023-12-21 22:36:16,020 INFO [train.py:886] (0/4) Epoch 9, batch 1900, loss[loss=0.01685, audio_tagging_loss=0.01685, over 24750.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4945983.87 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:36:21,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=266853.3333333333, ans=0.2 2023-12-21 22:36:27,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=266920.0, ans=0.125 2023-12-21 22:36:35,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=266920.0, ans=0.125 2023-12-21 22:36:43,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=266986.6666666667, ans=0.125 2023-12-21 22:36:57,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.30 vs. limit=22.5 2023-12-21 22:37:02,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2023-12-21 22:37:08,003 INFO [train.py:886] (0/4) Epoch 9, batch 1950, loss[loss=0.01326, audio_tagging_loss=0.01326, over 25000.00 frames. ], tot_loss[loss=0.01555, audio_tagging_loss=0.01555, over 4944761.59 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:37:08,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2023-12-21 22:37:13,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=267186.6666666667, ans=0.2 2023-12-21 22:37:21,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=267253.3333333333, ans=0.1 2023-12-21 22:37:47,984 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 2.613e+01 2.746e+01 2.938e+01 3.371e+01, threshold=5.492e+01, percent-clipped=0.0 2023-12-21 22:37:58,870 INFO [train.py:886] (0/4) Epoch 9, batch 2000, loss[loss=0.0152, audio_tagging_loss=0.0152, over 22546.00 frames. ], tot_loss[loss=0.01536, audio_tagging_loss=0.01536, over 4946834.94 frames. ], batch size: 107, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:38:00,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.88 vs. limit=15.0 2023-12-21 22:38:14,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=267586.6666666667, ans=0.1 2023-12-21 22:38:25,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=267653.3333333333, ans=0.2 2023-12-21 22:38:34,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.48 vs. limit=15.0 2023-12-21 22:38:37,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=267720.0, ans=0.1 2023-12-21 22:38:41,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2023-12-21 22:38:50,008 INFO [train.py:886] (0/4) Epoch 9, batch 2050, loss[loss=0.01486, audio_tagging_loss=0.01486, over 25000.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4946935.57 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:38:51,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.27 vs. limit=15.0 2023-12-21 22:38:54,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=267853.3333333333, ans=0.125 2023-12-21 22:38:56,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=15.0 2023-12-21 22:39:06,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=267920.0, ans=0.125 2023-12-21 22:39:08,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=267920.0, ans=0.1 2023-12-21 22:39:30,913 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+01 2.585e+01 2.736e+01 2.898e+01 3.379e+01, threshold=5.472e+01, percent-clipped=0.0 2023-12-21 22:39:39,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=268120.0, ans=0.2 2023-12-21 22:39:41,180 INFO [train.py:886] (0/4) Epoch 9, batch 2100, loss[loss=0.01709, audio_tagging_loss=0.01709, over 24750.00 frames. ], tot_loss[loss=0.01521, audio_tagging_loss=0.01521, over 4949071.36 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:39:51,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=268253.3333333333, ans=0.1 2023-12-21 22:40:32,296 INFO [train.py:886] (0/4) Epoch 9, batch 2150, loss[loss=0.01476, audio_tagging_loss=0.01476, over 25000.00 frames. ], tot_loss[loss=0.01527, audio_tagging_loss=0.01527, over 4951613.43 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:40:32,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=16.01 vs. limit=15.0 2023-12-21 22:40:41,618 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.081e-03 2023-12-21 22:40:53,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=268653.3333333333, ans=0.09899494936611666 2023-12-21 22:41:05,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=268720.0, ans=0.0 2023-12-21 22:41:14,767 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 2.638e+01 2.787e+01 2.989e+01 3.388e+01, threshold=5.573e+01, percent-clipped=0.0 2023-12-21 22:41:21,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=268786.6666666667, ans=0.125 2023-12-21 22:41:25,792 INFO [train.py:886] (0/4) Epoch 9, batch 2200, loss[loss=0.01342, audio_tagging_loss=0.01342, over 24750.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 4943290.63 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:41:27,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=268853.3333333333, ans=0.125 2023-12-21 22:41:35,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=268920.0, ans=0.125 2023-12-21 22:41:35,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=268920.0, ans=0.0 2023-12-21 22:41:45,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=268986.6666666667, ans=0.125 2023-12-21 22:42:06,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=269120.0, ans=0.0 2023-12-21 22:42:10,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=269120.0, ans=0.125 2023-12-21 22:42:17,189 INFO [train.py:886] (0/4) Epoch 9, batch 2250, loss[loss=0.01546, audio_tagging_loss=0.01546, over 24750.00 frames. ], tot_loss[loss=0.01533, audio_tagging_loss=0.01533, over 4941944.02 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:42:17,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=269186.6666666667, ans=0.05 2023-12-21 22:42:26,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=269186.6666666667, ans=0.2 2023-12-21 22:42:53,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=269386.6666666667, ans=0.0 2023-12-21 22:42:58,351 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.625e+01 2.827e+01 2.998e+01 3.600e+01, threshold=5.653e+01, percent-clipped=0.0 2023-12-21 22:43:07,842 INFO [train.py:886] (0/4) Epoch 9, batch 2300, loss[loss=0.01368, audio_tagging_loss=0.01368, over 25000.00 frames. ], tot_loss[loss=0.01528, audio_tagging_loss=0.01528, over 4948836.39 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:43:08,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=269520.0, ans=0.1 2023-12-21 22:43:16,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.39 vs. limit=22.5 2023-12-21 22:43:23,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=269586.6666666667, ans=0.125 2023-12-21 22:43:29,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=269653.3333333333, ans=0.125 2023-12-21 22:43:30,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=269653.3333333333, ans=0.125 2023-12-21 22:43:46,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2023-12-21 22:43:55,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.43 vs. limit=6.0 2023-12-21 22:44:01,086 INFO [train.py:886] (0/4) Epoch 9, batch 2350, loss[loss=0.01303, audio_tagging_loss=0.01303, over 25000.00 frames. ], tot_loss[loss=0.01524, audio_tagging_loss=0.01524, over 4951813.25 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:44:34,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=270053.3333333333, ans=0.125 2023-12-21 22:44:41,224 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 2.563e+01 2.767e+01 2.942e+01 3.408e+01, threshold=5.535e+01, percent-clipped=0.0 2023-12-21 22:44:41,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.39 vs. limit=22.5 2023-12-21 22:44:47,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270120.0, ans=0.1 2023-12-21 22:44:50,826 INFO [train.py:886] (0/4) Epoch 9, batch 2400, loss[loss=0.01567, audio_tagging_loss=0.01567, over 24750.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4955139.37 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:44:51,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=270186.6666666667, ans=0.125 2023-12-21 22:45:01,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=270253.3333333333, ans=0.125 2023-12-21 22:45:02,047 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2023-12-21 22:45:05,040 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2023-12-21 22:45:14,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=270320.0, ans=0.2 2023-12-21 22:45:21,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=270386.6666666667, ans=0.125 2023-12-21 22:45:22,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=270386.6666666667, ans=0.1 2023-12-21 22:45:28,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270386.6666666667, ans=0.1 2023-12-21 22:45:31,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.59 vs. limit=15.0 2023-12-21 22:45:42,466 INFO [train.py:886] (0/4) Epoch 9, batch 2450, loss[loss=0.018, audio_tagging_loss=0.018, over 25000.00 frames. ], tot_loss[loss=0.01529, audio_tagging_loss=0.01529, over 4957516.99 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:45:53,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=270586.6666666667, ans=0.125 2023-12-21 22:46:03,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=270653.3333333333, ans=0.125 2023-12-21 22:46:04,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270653.3333333333, ans=0.1 2023-12-21 22:46:13,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=17.88 vs. limit=15.0 2023-12-21 22:46:22,743 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 2.728e+01 2.875e+01 2.985e+01 3.809e+01, threshold=5.751e+01, percent-clipped=0.0 2023-12-21 22:46:33,013 INFO [train.py:886] (0/4) Epoch 9, batch 2500, loss[loss=0.01434, audio_tagging_loss=0.01434, over 24750.00 frames. ], tot_loss[loss=0.01543, audio_tagging_loss=0.01543, over 4956161.38 frames. ], batch size: 99, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:46:44,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270920.0, ans=0.1 2023-12-21 22:46:54,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=270986.6666666667, ans=0.1 2023-12-21 22:47:04,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=271053.3333333333, ans=0.07 2023-12-21 22:47:07,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=271053.3333333333, ans=0.125 2023-12-21 22:47:09,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=271053.3333333333, ans=0.125 2023-12-21 22:47:25,475 INFO [train.py:886] (0/4) Epoch 9, batch 2550, loss[loss=0.01365, audio_tagging_loss=0.01365, over 24750.00 frames. ], tot_loss[loss=0.0155, audio_tagging_loss=0.0155, over 4943099.62 frames. ], batch size: 99, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:47:49,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.63 vs. limit=22.5 2023-12-21 22:47:55,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=271386.6666666667, ans=0.125 2023-12-21 22:48:02,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=271386.6666666667, ans=0.0 2023-12-21 22:48:02,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=271386.6666666667, ans=0.125 2023-12-21 22:48:07,011 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.663e+01 2.770e+01 2.998e+01 3.753e+01, threshold=5.540e+01, percent-clipped=0.0 2023-12-21 22:48:11,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=271453.3333333333, ans=0.2 2023-12-21 22:48:17,922 INFO [train.py:886] (0/4) Epoch 9, batch 2600, loss[loss=0.01337, audio_tagging_loss=0.01337, over 25000.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 4946557.68 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:48:25,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=271520.0, ans=0.125 2023-12-21 22:48:31,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=271586.6666666667, ans=0.125 2023-12-21 22:48:34,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=12.0 2023-12-21 22:48:50,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=271720.0, ans=0.0 2023-12-21 22:48:54,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=271720.0, ans=0.0 2023-12-21 22:49:09,002 INFO [train.py:886] (0/4) Epoch 9, batch 2650, loss[loss=0.01304, audio_tagging_loss=0.01304, over 24750.00 frames. ], tot_loss[loss=0.01539, audio_tagging_loss=0.01539, over 4946968.69 frames. ], batch size: 99, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:49:10,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=271853.3333333333, ans=0.125 2023-12-21 22:49:12,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2023-12-21 22:49:19,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.00 vs. limit=10.0 2023-12-21 22:49:23,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=271920.0, ans=0.125 2023-12-21 22:49:27,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2023-12-21 22:49:34,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=271986.6666666667, ans=0.1 2023-12-21 22:49:44,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=272053.3333333333, ans=0.125 2023-12-21 22:49:45,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2023-12-21 22:49:51,073 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.584e+01 2.691e+01 2.849e+01 3.428e+01, threshold=5.381e+01, percent-clipped=0.0 2023-12-21 22:50:00,276 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2023-12-21 22:50:00,589 INFO [train.py:886] (0/4) Epoch 9, batch 2700, loss[loss=0.0155, audio_tagging_loss=0.0155, over 25000.00 frames. ], tot_loss[loss=0.01542, audio_tagging_loss=0.01542, over 4941425.37 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:50:03,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=272186.6666666667, ans=0.125 2023-12-21 22:50:05,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2023-12-21 22:50:06,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.38 vs. limit=8.0 2023-12-21 22:50:10,210 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.816e-02 2023-12-21 22:50:27,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=272320.0, ans=0.125 2023-12-21 22:50:30,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=272386.6666666667, ans=0.2 2023-12-21 22:50:34,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=272386.6666666667, ans=0.05 2023-12-21 22:50:39,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=272453.3333333333, ans=0.125 2023-12-21 22:50:50,703 INFO [train.py:886] (0/4) Epoch 9, batch 2750, loss[loss=0.01707, audio_tagging_loss=0.01707, over 25000.00 frames. ], tot_loss[loss=0.01538, audio_tagging_loss=0.01538, over 4948512.27 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:50:51,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=272520.0, ans=0.125 2023-12-21 22:50:53,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=272520.0, ans=0.0 2023-12-21 22:51:12,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=272653.3333333333, ans=0.2 2023-12-21 22:51:33,393 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+01 2.590e+01 2.730e+01 2.864e+01 3.266e+01, threshold=5.459e+01, percent-clipped=0.0 2023-12-21 22:51:43,038 INFO [train.py:886] (0/4) Epoch 9, batch 2800, loss[loss=0.01471, audio_tagging_loss=0.01471, over 24750.00 frames. ], tot_loss[loss=0.01543, audio_tagging_loss=0.01543, over 4949592.99 frames. ], batch size: 99, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:51:51,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=272853.3333333333, ans=12.0 2023-12-21 22:51:53,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=272920.0, ans=0.0 2023-12-21 22:51:54,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.35 vs. limit=10.0 2023-12-21 22:51:58,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=272920.0, ans=0.2 2023-12-21 22:52:11,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=272986.6666666667, ans=0.1 2023-12-21 22:52:22,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2023-12-21 22:52:36,148 INFO [train.py:886] (0/4) Epoch 9, batch 2850, loss[loss=0.01413, audio_tagging_loss=0.01413, over 24750.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4950219.29 frames. ], batch size: 99, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:52:46,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=273253.3333333333, ans=0.125 2023-12-21 22:52:53,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=273253.3333333333, ans=0.0 2023-12-21 22:52:58,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=273320.0, ans=0.125 2023-12-21 22:53:17,488 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 2.634e+01 2.788e+01 2.936e+01 3.853e+01, threshold=5.577e+01, percent-clipped=0.0 2023-12-21 22:53:23,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=273453.3333333333, ans=0.125 2023-12-21 22:53:27,609 INFO [train.py:886] (0/4) Epoch 9, batch 2900, loss[loss=0.01475, audio_tagging_loss=0.01475, over 24750.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 4948439.63 frames. ], batch size: 99, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:53:34,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=273520.0, ans=0.125 2023-12-21 22:53:49,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=273653.3333333333, ans=0.0 2023-12-21 22:54:02,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=273720.0, ans=0.125 2023-12-21 22:54:07,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=273720.0, ans=0.125 2023-12-21 22:54:20,007 INFO [train.py:886] (0/4) Epoch 9, batch 2950, loss[loss=0.01271, audio_tagging_loss=0.01271, over 24750.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4950550.09 frames. ], batch size: 99, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:54:44,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.50 vs. limit=15.0 2023-12-21 22:55:00,640 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+01 2.696e+01 2.843e+01 2.959e+01 3.331e+01, threshold=5.685e+01, percent-clipped=0.0 2023-12-21 22:55:01,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=274120.0, ans=0.125 2023-12-21 22:55:03,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=274120.0, ans=0.125 2023-12-21 22:55:05,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=274120.0, ans=0.125 2023-12-21 22:55:11,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=274186.6666666667, ans=0.125 2023-12-21 22:55:12,160 INFO [train.py:886] (0/4) Epoch 9, batch 3000, loss[loss=0.01221, audio_tagging_loss=0.01221, over 25000.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4948585.20 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:55:12,162 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 22:55:33,475 INFO [train.py:917] (0/4) Epoch 9, validation: loss=0.03523, audio_tagging_loss=0.03523, over 3737520.00 frames. 2023-12-21 22:55:33,476 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 22:55:45,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=274253.3333333333, ans=0.035 2023-12-21 22:55:58,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.70 vs. limit=15.0 2023-12-21 22:56:02,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=274386.6666666667, ans=0.0 2023-12-21 22:56:21,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-12-21 22:56:25,471 INFO [train.py:886] (0/4) Epoch 9, batch 3050, loss[loss=0.01701, audio_tagging_loss=0.01701, over 25000.00 frames. ], tot_loss[loss=0.01508, audio_tagging_loss=0.01508, over 4955393.64 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:56:26,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2023-12-21 22:56:32,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=274520.0, ans=0.125 2023-12-21 22:56:37,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=274586.6666666667, ans=0.0 2023-12-21 22:56:57,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.62 vs. limit=10.0 2023-12-21 22:57:04,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=274720.0, ans=0.2 2023-12-21 22:57:06,218 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 2.681e+01 2.830e+01 3.002e+01 4.084e+01, threshold=5.659e+01, percent-clipped=0.0 2023-12-21 22:57:08,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=274786.6666666667, ans=0.025 2023-12-21 22:57:17,843 INFO [train.py:886] (0/4) Epoch 9, batch 3100, loss[loss=0.0154, audio_tagging_loss=0.0154, over 25000.00 frames. ], tot_loss[loss=0.01519, audio_tagging_loss=0.01519, over 4956523.46 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:57:30,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=274920.0, ans=0.1 2023-12-21 22:57:36,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=274986.6666666667, ans=0.125 2023-12-21 22:57:40,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=274986.6666666667, ans=0.0 2023-12-21 22:57:42,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=274986.6666666667, ans=0.2 2023-12-21 22:57:44,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.18 vs. limit=12.0 2023-12-21 22:57:57,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=275120.0, ans=0.125 2023-12-21 22:57:59,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=275120.0, ans=0.0 2023-12-21 22:58:08,822 INFO [train.py:886] (0/4) Epoch 9, batch 3150, loss[loss=0.01282, audio_tagging_loss=0.01282, over 24750.00 frames. ], tot_loss[loss=0.01533, audio_tagging_loss=0.01533, over 4952088.62 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 22:58:24,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=275253.3333333333, ans=0.2 2023-12-21 22:58:29,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=275320.0, ans=0.2 2023-12-21 22:58:37,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.52 vs. limit=22.5 2023-12-21 22:58:50,924 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 2.623e+01 2.777e+01 2.975e+01 3.499e+01, threshold=5.555e+01, percent-clipped=0.0 2023-12-21 22:58:56,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2023-12-21 22:59:00,380 INFO [train.py:886] (0/4) Epoch 9, batch 3200, loss[loss=0.01411, audio_tagging_loss=0.01411, over 24750.00 frames. ], tot_loss[loss=0.01549, audio_tagging_loss=0.01549, over 4949922.40 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 22:59:02,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2023-12-21 22:59:13,868 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-21 22:59:29,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.78 vs. limit=15.0 2023-12-21 22:59:30,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=275720.0, ans=0.1 2023-12-21 22:59:35,986 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2023-12-21 22:59:48,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=15.0 2023-12-21 22:59:53,483 INFO [train.py:886] (0/4) Epoch 9, batch 3250, loss[loss=0.01549, audio_tagging_loss=0.01549, over 25000.00 frames. ], tot_loss[loss=0.01542, audio_tagging_loss=0.01542, over 4949581.77 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:00:14,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=275986.6666666667, ans=0.0 2023-12-21 23:00:14,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=275986.6666666667, ans=0.0 2023-12-21 23:00:18,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=275986.6666666667, ans=0.125 2023-12-21 23:00:25,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=276053.3333333333, ans=0.04949747468305833 2023-12-21 23:00:33,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=276120.0, ans=0.125 2023-12-21 23:00:34,339 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+01 2.607e+01 2.752e+01 2.914e+01 3.525e+01, threshold=5.504e+01, percent-clipped=0.0 2023-12-21 23:00:39,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=276120.0, ans=0.125 2023-12-21 23:00:43,805 INFO [train.py:886] (0/4) Epoch 9, batch 3300, loss[loss=0.01581, audio_tagging_loss=0.01581, over 25000.00 frames. ], tot_loss[loss=0.01533, audio_tagging_loss=0.01533, over 4953210.04 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:00:57,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=276253.3333333333, ans=0.0 2023-12-21 23:01:03,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=276253.3333333333, ans=0.2 2023-12-21 23:01:10,222 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=15.0 2023-12-21 23:01:31,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=276453.3333333333, ans=0.0 2023-12-21 23:01:36,160 INFO [train.py:886] (0/4) Epoch 9, batch 3350, loss[loss=0.01461, audio_tagging_loss=0.01461, over 24085.00 frames. ], tot_loss[loss=0.01526, audio_tagging_loss=0.01526, over 4956283.89 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:01:56,363 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-12-21 23:01:59,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=276653.3333333333, ans=0.0 2023-12-21 23:02:09,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=276720.0, ans=0.125 2023-12-21 23:02:16,744 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.658e+01 2.793e+01 2.957e+01 4.433e+01, threshold=5.585e+01, percent-clipped=0.0 2023-12-21 23:02:21,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2023-12-21 23:02:26,907 INFO [train.py:886] (0/4) Epoch 9, batch 3400, loss[loss=0.02065, audio_tagging_loss=0.02065, over 24953.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4955979.69 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:02:30,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=276853.3333333333, ans=0.125 2023-12-21 23:02:40,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0 2023-12-21 23:02:43,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=276920.0, ans=0.0 2023-12-21 23:02:52,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=276986.6666666667, ans=0.125 2023-12-21 23:03:00,647 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.91 vs. limit=15.0 2023-12-21 23:03:08,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=277053.3333333333, ans=0.125 2023-12-21 23:03:13,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=277120.0, ans=0.0 2023-12-21 23:03:14,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=277120.0, ans=0.2 2023-12-21 23:03:19,105 INFO [train.py:886] (0/4) Epoch 9, batch 3450, loss[loss=0.01495, audio_tagging_loss=0.01495, over 24750.00 frames. ], tot_loss[loss=0.01524, audio_tagging_loss=0.01524, over 4953501.45 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:03:25,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=277186.6666666667, ans=0.125 2023-12-21 23:03:49,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2023-12-21 23:03:52,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=277386.6666666667, ans=0.125 2023-12-21 23:03:59,346 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 2.682e+01 2.834e+01 2.983e+01 3.671e+01, threshold=5.669e+01, percent-clipped=0.0 2023-12-21 23:04:07,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=277453.3333333333, ans=0.125 2023-12-21 23:04:10,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=277520.0, ans=0.125 2023-12-21 23:04:10,942 INFO [train.py:886] (0/4) Epoch 9, batch 3500, loss[loss=0.01905, audio_tagging_loss=0.01905, over 25000.00 frames. ], tot_loss[loss=0.01535, audio_tagging_loss=0.01535, over 4950102.89 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:04:31,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=277653.3333333333, ans=0.1 2023-12-21 23:04:35,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=277653.3333333333, ans=15.0 2023-12-21 23:04:40,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=277720.0, ans=0.125 2023-12-21 23:04:53,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=277786.6666666667, ans=0.1 2023-12-21 23:04:57,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=277786.6666666667, ans=0.125 2023-12-21 23:05:00,643 INFO [train.py:886] (0/4) Epoch 9, batch 3550, loss[loss=0.0134, audio_tagging_loss=0.0134, over 25000.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4954565.00 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:05:18,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.80 vs. limit=15.0 2023-12-21 23:05:33,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=278053.3333333333, ans=0.2 2023-12-21 23:05:33,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=278053.3333333333, ans=0.125 2023-12-21 23:05:34,386 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2023-12-21 23:05:42,779 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.633e+01 2.767e+01 2.971e+01 3.918e+01, threshold=5.534e+01, percent-clipped=0.0 2023-12-21 23:05:45,279 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.11 vs. limit=10.0 2023-12-21 23:05:50,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=278120.0, ans=0.0 2023-12-21 23:05:52,247 INFO [train.py:886] (0/4) Epoch 9, batch 3600, loss[loss=0.01958, audio_tagging_loss=0.01958, over 25000.00 frames. ], tot_loss[loss=0.01526, audio_tagging_loss=0.01526, over 4957662.54 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:05:56,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=278186.6666666667, ans=0.125 2023-12-21 23:05:56,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=278186.6666666667, ans=0.1 2023-12-21 23:05:57,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=278186.6666666667, ans=0.09899494936611666 2023-12-21 23:06:18,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=278320.0, ans=0.125 2023-12-21 23:06:39,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=278453.3333333333, ans=0.0 2023-12-21 23:06:40,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2023-12-21 23:06:41,927 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.33 vs. limit=22.5 2023-12-21 23:06:42,379 INFO [train.py:886] (0/4) Epoch 9, batch 3650, loss[loss=0.01164, audio_tagging_loss=0.01164, over 21927.00 frames. ], tot_loss[loss=0.01521, audio_tagging_loss=0.01521, over 4956584.79 frames. ], batch size: 107, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:06:47,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=278520.0, ans=0.125 2023-12-21 23:07:07,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=278653.3333333333, ans=0.0 2023-12-21 23:07:11,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=278653.3333333333, ans=0.125 2023-12-21 23:07:25,669 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 2.660e+01 2.835e+01 3.011e+01 3.673e+01, threshold=5.670e+01, percent-clipped=0.0 2023-12-21 23:07:27,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=278786.6666666667, ans=0.0 2023-12-21 23:07:28,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=278786.6666666667, ans=0.125 2023-12-21 23:07:35,193 INFO [train.py:886] (0/4) Epoch 9, batch 3700, loss[loss=0.021, audio_tagging_loss=0.021, over 25000.00 frames. ], tot_loss[loss=0.01528, audio_tagging_loss=0.01528, over 4961396.24 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:07:38,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=278853.3333333333, ans=0.0 2023-12-21 23:07:47,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.25 vs. limit=12.0 2023-12-21 23:08:16,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=279120.0, ans=0.125 2023-12-21 23:08:28,124 INFO [train.py:886] (0/4) Epoch 9, batch 3750, loss[loss=0.01505, audio_tagging_loss=0.01505, over 21880.00 frames. ], tot_loss[loss=0.01543, audio_tagging_loss=0.01543, over 4959061.58 frames. ], batch size: 107, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:08:35,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=279186.6666666667, ans=0.125 2023-12-21 23:08:52,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279320.0, ans=0.1 2023-12-21 23:08:56,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=279320.0, ans=0.125 2023-12-21 23:09:00,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=279386.6666666667, ans=0.2 2023-12-21 23:09:09,039 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.434e+01 2.692e+01 2.838e+01 2.996e+01 3.691e+01, threshold=5.675e+01, percent-clipped=0.0 2023-12-21 23:09:18,849 INFO [train.py:886] (0/4) Epoch 9, batch 3800, loss[loss=0.01427, audio_tagging_loss=0.01427, over 24750.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4951226.77 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:09:26,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=279520.0, ans=0.1 2023-12-21 23:09:42,547 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.65 vs. limit=15.0 2023-12-21 23:10:01,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=279786.6666666667, ans=0.125 2023-12-21 23:10:12,071 INFO [train.py:886] (0/4) Epoch 9, batch 3850, loss[loss=0.01669, audio_tagging_loss=0.01669, over 25000.00 frames. ], tot_loss[loss=0.01536, audio_tagging_loss=0.01536, over 4948721.93 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:10:12,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=279853.3333333333, ans=0.015 2023-12-21 23:10:13,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=279853.3333333333, ans=0.125 2023-12-21 23:10:36,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=279986.6666666667, ans=0.0 2023-12-21 23:10:49,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.11 vs. limit=22.5 2023-12-21 23:10:52,251 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 2.650e+01 2.787e+01 2.948e+01 3.554e+01, threshold=5.574e+01, percent-clipped=0.0 2023-12-21 23:10:52,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=280120.0, ans=0.1 2023-12-21 23:11:03,183 INFO [train.py:886] (0/4) Epoch 9, batch 3900, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01524, audio_tagging_loss=0.01524, over 4945841.89 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:11:03,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=280186.6666666667, ans=0.2 2023-12-21 23:11:20,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=280253.3333333333, ans=0.125 2023-12-21 23:11:45,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=280453.3333333333, ans=0.125 2023-12-21 23:11:53,518 INFO [train.py:886] (0/4) Epoch 9, batch 3950, loss[loss=0.01528, audio_tagging_loss=0.01528, over 25000.00 frames. ], tot_loss[loss=0.01519, audio_tagging_loss=0.01519, over 4953131.13 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:11:54,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=280520.0, ans=0.0 2023-12-21 23:11:55,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.95 vs. limit=10.0 2023-12-21 23:12:13,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=280653.3333333333, ans=0.125 2023-12-21 23:12:14,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=280653.3333333333, ans=0.1 2023-12-21 23:12:18,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=280653.3333333333, ans=0.1 2023-12-21 23:12:19,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=280653.3333333333, ans=0.0 2023-12-21 23:12:23,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=280653.3333333333, ans=0.125 2023-12-21 23:12:25,530 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.70 vs. limit=15.0 2023-12-21 23:12:32,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=280720.0, ans=0.125 2023-12-21 23:12:35,027 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 2.621e+01 2.776e+01 2.902e+01 4.085e+01, threshold=5.551e+01, percent-clipped=0.0 2023-12-21 23:12:36,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=280786.6666666667, ans=0.125 2023-12-21 23:12:40,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=15.0 2023-12-21 23:12:42,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=280786.6666666667, ans=0.5 2023-12-21 23:12:45,134 INFO [train.py:886] (0/4) Epoch 9, batch 4000, loss[loss=0.01412, audio_tagging_loss=0.01412, over 25000.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 4952378.96 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:12:51,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.82 vs. limit=22.5 2023-12-21 23:12:54,865 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=15.0 2023-12-21 23:12:56,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.97 vs. limit=10.0 2023-12-21 23:13:08,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=280986.6666666667, ans=0.125 2023-12-21 23:13:19,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=281053.3333333333, ans=0.125 2023-12-21 23:13:31,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=281120.0, ans=0.125 2023-12-21 23:13:35,304 INFO [train.py:886] (0/4) Epoch 9, batch 4050, loss[loss=0.01431, audio_tagging_loss=0.01431, over 21864.00 frames. ], tot_loss[loss=0.01526, audio_tagging_loss=0.01526, over 4951697.45 frames. ], batch size: 107, lr: 1.19e-02, grad_scale: 128.0 2023-12-21 23:13:38,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=281186.6666666667, ans=0.125 2023-12-21 23:13:43,586 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.790e-02 2023-12-21 23:13:44,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=281186.6666666667, ans=0.125 2023-12-21 23:13:56,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2023-12-21 23:14:16,847 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.243e+01 2.645e+01 2.783e+01 2.979e+01 3.494e+01, threshold=5.566e+01, percent-clipped=0.0 2023-12-21 23:14:19,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=281453.3333333333, ans=0.07 2023-12-21 23:14:26,220 INFO [train.py:886] (0/4) Epoch 9, batch 4100, loss[loss=0.01513, audio_tagging_loss=0.01513, over 24750.00 frames. ], tot_loss[loss=0.01544, audio_tagging_loss=0.01544, over 4944812.67 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 128.0 2023-12-21 23:14:31,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=281520.0, ans=0.1 2023-12-21 23:14:43,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2023-12-21 23:14:49,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=281653.3333333333, ans=0.0 2023-12-21 23:15:06,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=15.0 2023-12-21 23:15:09,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=281786.6666666667, ans=0.0 2023-12-21 23:15:19,284 INFO [train.py:886] (0/4) Epoch 9, batch 4150, loss[loss=0.01407, audio_tagging_loss=0.01407, over 24750.00 frames. ], tot_loss[loss=0.01544, audio_tagging_loss=0.01544, over 4945847.76 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:15:27,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=281853.3333333333, ans=0.125 2023-12-21 23:15:28,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=281920.0, ans=0.07 2023-12-21 23:15:54,490 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=15.0 2023-12-21 23:16:00,690 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.634e+01 2.756e+01 2.939e+01 3.432e+01, threshold=5.512e+01, percent-clipped=0.0 2023-12-21 23:16:09,854 INFO [train.py:886] (0/4) Epoch 9, batch 4200, loss[loss=0.01563, audio_tagging_loss=0.01563, over 25000.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 4948780.76 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:16:29,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=282253.3333333333, ans=0.125 2023-12-21 23:16:34,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=282320.0, ans=0.1 2023-12-21 23:16:37,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.04 vs. limit=6.0 2023-12-21 23:16:59,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=282453.3333333333, ans=0.0 2023-12-21 23:17:02,308 INFO [train.py:886] (0/4) Epoch 9, batch 4250, loss[loss=0.01538, audio_tagging_loss=0.01538, over 24750.00 frames. ], tot_loss[loss=0.01529, audio_tagging_loss=0.01529, over 4952410.27 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:17:20,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=282586.6666666667, ans=0.0 2023-12-21 23:17:23,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=282653.3333333333, ans=0.2 2023-12-21 23:17:26,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=282653.3333333333, ans=0.125 2023-12-21 23:17:33,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=282720.0, ans=0.125 2023-12-21 23:17:37,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=282720.0, ans=0.125 2023-12-21 23:17:43,723 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 2.596e+01 2.839e+01 2.954e+01 3.918e+01, threshold=5.679e+01, percent-clipped=0.0 2023-12-21 23:17:54,379 INFO [train.py:886] (0/4) Epoch 9, batch 4300, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4960721.61 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:18:12,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.65 vs. limit=15.0 2023-12-21 23:18:17,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=282986.6666666667, ans=0.125 2023-12-21 23:18:38,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=283120.0, ans=0.04949747468305833 2023-12-21 23:18:45,888 INFO [train.py:886] (0/4) Epoch 9, batch 4350, loss[loss=0.01678, audio_tagging_loss=0.01678, over 24750.00 frames. ], tot_loss[loss=0.01527, audio_tagging_loss=0.01527, over 4957107.63 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:19:12,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=283320.0, ans=0.04949747468305833 2023-12-21 23:19:16,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=283386.6666666667, ans=0.2 2023-12-21 23:19:16,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=283386.6666666667, ans=0.2 2023-12-21 23:19:18,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=283386.6666666667, ans=0.125 2023-12-21 23:19:19,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=283386.6666666667, ans=0.04949747468305833 2023-12-21 23:19:29,164 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.360e+01 2.656e+01 2.784e+01 2.913e+01 3.411e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-21 23:19:39,184 INFO [train.py:886] (0/4) Epoch 9, batch 4400, loss[loss=0.01749, audio_tagging_loss=0.01749, over 24750.00 frames. ], tot_loss[loss=0.01541, audio_tagging_loss=0.01541, over 4956696.62 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:19:48,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=283586.6666666667, ans=0.125 2023-12-21 23:19:50,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.41 vs. limit=15.0 2023-12-21 23:19:51,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=15.0 2023-12-21 23:19:55,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=283586.6666666667, ans=0.125 2023-12-21 23:20:21,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=283786.6666666667, ans=0.125 2023-12-21 23:20:29,338 INFO [train.py:886] (0/4) Epoch 9, batch 4450, loss[loss=0.01995, audio_tagging_loss=0.01995, over 24750.00 frames. ], tot_loss[loss=0.01554, audio_tagging_loss=0.01554, over 4956871.39 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:20:31,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=283853.3333333333, ans=0.125 2023-12-21 23:21:12,820 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 2.640e+01 2.772e+01 2.975e+01 3.500e+01, threshold=5.544e+01, percent-clipped=0.0 2023-12-21 23:21:19,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=284120.0, ans=0.125 2023-12-21 23:21:21,362 INFO [train.py:886] (0/4) Epoch 9, batch 4500, loss[loss=0.01393, audio_tagging_loss=0.01393, over 25000.00 frames. ], tot_loss[loss=0.01544, audio_tagging_loss=0.01544, over 4952489.85 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:21:24,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=284186.6666666667, ans=0.1 2023-12-21 23:21:26,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.95 vs. limit=10.0 2023-12-21 23:21:26,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=15.0 2023-12-21 23:21:53,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=284386.6666666667, ans=0.125 2023-12-21 23:21:54,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=284386.6666666667, ans=0.1 2023-12-21 23:21:55,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=284386.6666666667, ans=0.125 2023-12-21 23:22:12,492 INFO [train.py:886] (0/4) Epoch 9, batch 4550, loss[loss=0.01592, audio_tagging_loss=0.01592, over 25000.00 frames. ], tot_loss[loss=0.01532, audio_tagging_loss=0.01532, over 4958368.63 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:22:29,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=284586.6666666667, ans=0.0 2023-12-21 23:22:50,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=284720.0, ans=0.125 2023-12-21 23:22:53,254 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.304e+01 2.599e+01 2.790e+01 2.989e+01 3.611e+01, threshold=5.581e+01, percent-clipped=0.0 2023-12-21 23:22:58,654 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2023-12-21 23:23:01,875 INFO [train.py:886] (0/4) Epoch 9, batch 4600, loss[loss=0.01469, audio_tagging_loss=0.01469, over 25000.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 4955741.75 frames. ], batch size: 100, lr: 1.18e-02, grad_scale: 64.0 2023-12-21 23:23:24,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=284986.6666666667, ans=0.125 2023-12-21 23:23:36,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=12.0 2023-12-21 23:23:37,109 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 23:23:41,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=285053.3333333333, ans=0.0 2023-12-21 23:23:54,820 INFO [train.py:886] (0/4) Epoch 9, batch 4650, loss[loss=0.01489, audio_tagging_loss=0.01489, over 25000.00 frames. ], tot_loss[loss=0.01532, audio_tagging_loss=0.01532, over 4956937.44 frames. ], batch size: 100, lr: 1.18e-02, grad_scale: 64.0 2023-12-21 23:23:54,954 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 23:23:55,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2023-12-21 23:24:00,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=285186.6666666667, ans=0.125 2023-12-21 23:24:05,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=285253.3333333333, ans=0.1 2023-12-21 23:24:09,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=285253.3333333333, ans=0.2 2023-12-21 23:24:35,613 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 2.629e+01 2.821e+01 3.013e+01 3.684e+01, threshold=5.641e+01, percent-clipped=0.0 2023-12-21 23:24:37,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=285453.3333333333, ans=0.05 2023-12-21 23:24:43,870 INFO [train.py:886] (0/4) Epoch 9, batch 4700, loss[loss=0.01526, audio_tagging_loss=0.01526, over 24750.00 frames. ], tot_loss[loss=0.01552, audio_tagging_loss=0.01552, over 4955440.08 frames. ], batch size: 99, lr: 1.18e-02, grad_scale: 64.0 2023-12-21 23:24:46,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=285520.0, ans=0.125 2023-12-21 23:24:55,429 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.53 vs. limit=15.0 2023-12-21 23:24:56,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=285586.6666666667, ans=0.2 2023-12-21 23:25:00,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=285586.6666666667, ans=0.0 2023-12-21 23:25:03,165 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.425e-01 2023-12-21 23:25:11,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=285720.0, ans=0.2 2023-12-21 23:25:13,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=285720.0, ans=0.1 2023-12-21 23:25:13,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=285720.0, ans=0.2 2023-12-21 23:25:14,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=285720.0, ans=0.0 2023-12-21 23:25:19,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=285720.0, ans=0.125 2023-12-21 23:25:28,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=285786.6666666667, ans=0.2 2023-12-21 23:25:31,405 INFO [train.py:886] (0/4) Epoch 9, batch 4750, loss[loss=0.01649, audio_tagging_loss=0.01649, over 24750.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4952316.76 frames. ], batch size: 99, lr: 1.18e-02, grad_scale: 64.0 2023-12-21 23:25:36,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=285853.3333333333, ans=0.125 2023-12-21 23:25:41,843 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.77 vs. limit=15.0 2023-12-21 23:25:46,506 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-9.pt 2023-12-21 23:26:08,773 INFO [train.py:886] (0/4) Epoch 10, batch 0, loss[loss=0.03402, audio_tagging_loss=0.03402, over 23975.00 frames. ], tot_loss[loss=0.03402, audio_tagging_loss=0.03402, over 23975.00 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:26:08,774 INFO [train.py:909] (0/4) Computing validation loss 2023-12-21 23:26:30,373 INFO [train.py:917] (0/4) Epoch 10, validation: loss=0.03426, audio_tagging_loss=0.03426, over 3737520.00 frames. 2023-12-21 23:26:30,373 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-21 23:26:34,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=285960.0, ans=0.0 2023-12-21 23:26:40,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=286026.6666666667, ans=0.09899494936611666 2023-12-21 23:26:45,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.33 vs. limit=22.5 2023-12-21 23:26:55,362 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+01 2.685e+01 2.858e+01 3.839e+01 9.905e+01, threshold=5.715e+01, percent-clipped=6.0 2023-12-21 23:26:56,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=286093.3333333333, ans=0.2 2023-12-21 23:27:21,831 INFO [train.py:886] (0/4) Epoch 10, batch 50, loss[loss=0.02044, audio_tagging_loss=0.02044, over 25000.00 frames. ], tot_loss[loss=0.02436, audio_tagging_loss=0.02436, over 1117242.89 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:27:41,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=286426.6666666667, ans=0.125 2023-12-21 23:27:41,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=286426.6666666667, ans=0.0 2023-12-21 23:27:56,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=286493.3333333333, ans=0.125 2023-12-21 23:27:57,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=286493.3333333333, ans=0.125 2023-12-21 23:27:58,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.34 vs. limit=6.0 2023-12-21 23:27:59,853 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2023-12-21 23:28:11,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=286626.6666666667, ans=0.125 2023-12-21 23:28:12,477 INFO [train.py:886] (0/4) Epoch 10, batch 100, loss[loss=0.0203, audio_tagging_loss=0.0203, over 25000.00 frames. ], tot_loss[loss=0.02119, audio_tagging_loss=0.02119, over 1968236.10 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:28:19,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=286626.6666666667, ans=0.0 2023-12-21 23:28:37,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=286760.0, ans=0.125 2023-12-21 23:28:38,471 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.924e+01 3.109e+01 3.428e+01 4.349e+01, threshold=6.218e+01, percent-clipped=0.0 2023-12-21 23:28:53,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=286893.3333333333, ans=0.125 2023-12-21 23:28:56,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=286893.3333333333, ans=0.0 2023-12-21 23:29:04,981 INFO [train.py:886] (0/4) Epoch 10, batch 150, loss[loss=0.01631, audio_tagging_loss=0.01631, over 25000.00 frames. ], tot_loss[loss=0.0193, audio_tagging_loss=0.0193, over 2635212.66 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:29:20,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=287026.6666666667, ans=0.0 2023-12-21 23:29:55,925 INFO [train.py:886] (0/4) Epoch 10, batch 200, loss[loss=0.01575, audio_tagging_loss=0.01575, over 25000.00 frames. ], tot_loss[loss=0.01798, audio_tagging_loss=0.01798, over 3149012.62 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:30:06,929 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=9.951e-01 2023-12-21 23:30:12,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=287360.0, ans=0.025 2023-12-21 23:30:17,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=287426.6666666667, ans=0.1 2023-12-21 23:30:17,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=287426.6666666667, ans=0.0 2023-12-21 23:30:20,754 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.628e+01 2.764e+01 2.956e+01 4.225e+01, threshold=5.527e+01, percent-clipped=0.0 2023-12-21 23:30:29,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=287493.3333333333, ans=0.2 2023-12-21 23:30:30,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=287493.3333333333, ans=0.0 2023-12-21 23:30:47,387 INFO [train.py:886] (0/4) Epoch 10, batch 250, loss[loss=0.01694, audio_tagging_loss=0.01694, over 24750.00 frames. ], tot_loss[loss=0.01731, audio_tagging_loss=0.01731, over 3553556.66 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:30:47,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=287626.6666666667, ans=0.2 2023-12-21 23:30:58,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=287693.3333333333, ans=0.0 2023-12-21 23:31:01,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=287693.3333333333, ans=0.0 2023-12-21 23:31:03,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=287693.3333333333, ans=0.0 2023-12-21 23:31:05,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=287693.3333333333, ans=0.125 2023-12-21 23:31:13,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=287760.0, ans=0.0 2023-12-21 23:31:13,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=287760.0, ans=0.125 2023-12-21 23:31:16,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=287760.0, ans=0.125 2023-12-21 23:31:38,985 INFO [train.py:886] (0/4) Epoch 10, batch 300, loss[loss=0.01583, audio_tagging_loss=0.01583, over 24750.00 frames. ], tot_loss[loss=0.01686, audio_tagging_loss=0.01686, over 3864621.86 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:31:39,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=287960.0, ans=0.125 2023-12-21 23:31:50,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2023-12-21 23:32:03,470 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.409e+01 2.631e+01 2.795e+01 2.956e+01 3.518e+01, threshold=5.589e+01, percent-clipped=0.0 2023-12-21 23:32:04,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=288093.3333333333, ans=0.125 2023-12-21 23:32:04,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288093.3333333333, ans=0.1 2023-12-21 23:32:11,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=288160.0, ans=0.125 2023-12-21 23:32:20,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=288226.6666666667, ans=0.125 2023-12-21 23:32:26,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=288226.6666666667, ans=0.125 2023-12-21 23:32:28,477 INFO [train.py:886] (0/4) Epoch 10, batch 350, loss[loss=0.01285, audio_tagging_loss=0.01285, over 24750.00 frames. ], tot_loss[loss=0.01646, audio_tagging_loss=0.01646, over 4101836.95 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:32:34,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=288293.3333333333, ans=0.125 2023-12-21 23:32:48,027 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2023-12-21 23:32:55,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=288426.6666666667, ans=0.0 2023-12-21 23:33:02,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=288493.3333333333, ans=0.125 2023-12-21 23:33:10,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=288560.0, ans=0.0 2023-12-21 23:33:20,889 INFO [train.py:886] (0/4) Epoch 10, batch 400, loss[loss=0.01628, audio_tagging_loss=0.01628, over 25000.00 frames. ], tot_loss[loss=0.01606, audio_tagging_loss=0.01606, over 4282190.80 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:33:24,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2023-12-21 23:33:38,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.18 vs. limit=22.5 2023-12-21 23:33:47,213 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.616e+01 2.753e+01 2.907e+01 3.389e+01, threshold=5.507e+01, percent-clipped=0.0 2023-12-21 23:33:48,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=288760.0, ans=0.07 2023-12-21 23:33:51,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2023-12-21 23:33:52,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=288826.6666666667, ans=0.0 2023-12-21 23:33:53,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=288826.6666666667, ans=0.125 2023-12-21 23:34:04,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.88 vs. limit=6.0 2023-12-21 23:34:11,440 INFO [train.py:886] (0/4) Epoch 10, batch 450, loss[loss=0.01831, audio_tagging_loss=0.01831, over 24750.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 4430268.40 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:34:12,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=288960.0, ans=0.1 2023-12-21 23:34:16,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=288960.0, ans=0.125 2023-12-21 23:34:28,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=289026.6666666667, ans=0.2 2023-12-21 23:34:37,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.61 vs. limit=22.5 2023-12-21 23:35:01,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=289226.6666666667, ans=0.125 2023-12-21 23:35:02,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=289226.6666666667, ans=0.1 2023-12-21 23:35:03,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=289293.3333333333, ans=0.125 2023-12-21 23:35:03,880 INFO [train.py:886] (0/4) Epoch 10, batch 500, loss[loss=0.0167, audio_tagging_loss=0.0167, over 25000.00 frames. ], tot_loss[loss=0.01549, audio_tagging_loss=0.01549, over 4553333.30 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:35:11,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=289293.3333333333, ans=0.0 2023-12-21 23:35:30,831 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.565e+01 2.709e+01 2.854e+01 3.600e+01, threshold=5.419e+01, percent-clipped=0.0 2023-12-21 23:35:34,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=289493.3333333333, ans=0.125 2023-12-21 23:35:35,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=289493.3333333333, ans=0.1 2023-12-21 23:35:37,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=289493.3333333333, ans=0.2 2023-12-21 23:35:55,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.66 vs. limit=15.0 2023-12-21 23:35:56,484 INFO [train.py:886] (0/4) Epoch 10, batch 550, loss[loss=0.01548, audio_tagging_loss=0.01548, over 25000.00 frames. ], tot_loss[loss=0.01547, audio_tagging_loss=0.01547, over 4644164.05 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:36:05,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=289693.3333333333, ans=0.015 2023-12-21 23:36:06,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=289693.3333333333, ans=0.125 2023-12-21 23:36:12,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=289693.3333333333, ans=0.125 2023-12-21 23:36:19,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=289760.0, ans=0.0 2023-12-21 23:36:31,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=289826.6666666667, ans=0.125 2023-12-21 23:36:36,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=289893.3333333333, ans=0.125 2023-12-21 23:36:42,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=289893.3333333333, ans=0.125 2023-12-21 23:36:45,282 INFO [train.py:886] (0/4) Epoch 10, batch 600, loss[loss=0.01322, audio_tagging_loss=0.01322, over 24750.00 frames. ], tot_loss[loss=0.01547, audio_tagging_loss=0.01547, over 4719251.28 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:37:10,374 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.605e+01 2.757e+01 2.995e+01 3.479e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-21 23:37:12,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=290093.3333333333, ans=0.2 2023-12-21 23:37:36,072 INFO [train.py:886] (0/4) Epoch 10, batch 650, loss[loss=0.01364, audio_tagging_loss=0.01364, over 24750.00 frames. ], tot_loss[loss=0.01572, audio_tagging_loss=0.01572, over 4769971.59 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:37:39,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=290293.3333333333, ans=0.0 2023-12-21 23:37:39,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=290293.3333333333, ans=0.2 2023-12-21 23:38:03,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=290426.6666666667, ans=0.0 2023-12-21 23:38:15,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=290493.3333333333, ans=0.125 2023-12-21 23:38:23,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=290560.0, ans=0.1 2023-12-21 23:38:26,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=290626.6666666667, ans=0.125 2023-12-21 23:38:27,730 INFO [train.py:886] (0/4) Epoch 10, batch 700, loss[loss=0.0129, audio_tagging_loss=0.0129, over 24750.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4806527.95 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:38:51,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=12.0 2023-12-21 23:38:52,701 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.403e+01 2.671e+01 2.861e+01 3.072e+01 3.885e+01, threshold=5.722e+01, percent-clipped=0.0 2023-12-21 23:38:56,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=290826.6666666667, ans=0.09899494936611666 2023-12-21 23:39:18,731 INFO [train.py:886] (0/4) Epoch 10, batch 750, loss[loss=0.01519, audio_tagging_loss=0.01519, over 25000.00 frames. ], tot_loss[loss=0.01552, audio_tagging_loss=0.01552, over 4834060.33 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:39:21,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=290960.0, ans=0.0 2023-12-21 23:39:26,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=290960.0, ans=0.125 2023-12-21 23:39:38,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=291093.3333333333, ans=0.035 2023-12-21 23:39:39,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=291093.3333333333, ans=0.125 2023-12-21 23:39:40,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=15.0 2023-12-21 23:39:40,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=291093.3333333333, ans=0.125 2023-12-21 23:39:52,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.55 vs. limit=10.0 2023-12-21 23:40:05,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=291226.6666666667, ans=0.0 2023-12-21 23:40:10,524 INFO [train.py:886] (0/4) Epoch 10, batch 800, loss[loss=0.01614, audio_tagging_loss=0.01614, over 25000.00 frames. ], tot_loss[loss=0.01546, audio_tagging_loss=0.01546, over 4857387.32 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:40:13,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=291293.3333333333, ans=0.125 2023-12-21 23:40:23,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=291360.0, ans=0.2 2023-12-21 23:40:34,602 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+01 2.609e+01 2.792e+01 2.929e+01 3.584e+01, threshold=5.584e+01, percent-clipped=0.0 2023-12-21 23:40:50,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.78 vs. limit=10.0 2023-12-21 23:40:51,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=291560.0, ans=0.125 2023-12-21 23:40:59,711 INFO [train.py:886] (0/4) Epoch 10, batch 850, loss[loss=0.01727, audio_tagging_loss=0.01727, over 25000.00 frames. ], tot_loss[loss=0.01532, audio_tagging_loss=0.01532, over 4881351.28 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:41:10,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=291693.3333333333, ans=0.05 2023-12-21 23:41:15,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=291693.3333333333, ans=0.2 2023-12-21 23:41:23,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=291760.0, ans=0.125 2023-12-21 23:41:51,243 INFO [train.py:886] (0/4) Epoch 10, batch 900, loss[loss=0.0154, audio_tagging_loss=0.0154, over 24750.00 frames. ], tot_loss[loss=0.01538, audio_tagging_loss=0.01538, over 4899874.04 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:41:57,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=291960.0, ans=0.2 2023-12-21 23:42:01,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=292026.6666666667, ans=0.125 2023-12-21 23:42:08,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=292026.6666666667, ans=0.1 2023-12-21 23:42:13,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=292093.3333333333, ans=0.1 2023-12-21 23:42:15,276 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=15.40 vs. limit=15.0 2023-12-21 23:42:15,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=292093.3333333333, ans=0.125 2023-12-21 23:42:16,469 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.658e+01 2.802e+01 2.964e+01 3.575e+01, threshold=5.603e+01, percent-clipped=0.0 2023-12-21 23:42:23,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=292160.0, ans=0.0 2023-12-21 23:42:30,082 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=15.0 2023-12-21 23:42:38,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=292226.6666666667, ans=0.1 2023-12-21 23:42:42,908 INFO [train.py:886] (0/4) Epoch 10, batch 950, loss[loss=0.01571, audio_tagging_loss=0.01571, over 24750.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 4909470.73 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:42:45,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=292293.3333333333, ans=0.025 2023-12-21 23:42:48,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=292293.3333333333, ans=0.125 2023-12-21 23:42:48,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=292293.3333333333, ans=0.125 2023-12-21 23:43:07,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=292426.6666666667, ans=0.0 2023-12-21 23:43:32,950 INFO [train.py:886] (0/4) Epoch 10, batch 1000, loss[loss=0.01642, audio_tagging_loss=0.01642, over 24750.00 frames. ], tot_loss[loss=0.01525, audio_tagging_loss=0.01525, over 4920515.63 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:43:59,056 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.595e+01 2.757e+01 2.971e+01 3.553e+01, threshold=5.515e+01, percent-clipped=0.0 2023-12-21 23:44:09,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=292826.6666666667, ans=0.125 2023-12-21 23:44:24,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=292960.0, ans=0.0 2023-12-21 23:44:24,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=292960.0, ans=0.125 2023-12-21 23:44:24,847 INFO [train.py:886] (0/4) Epoch 10, batch 1050, loss[loss=0.01512, audio_tagging_loss=0.01512, over 25000.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4929732.21 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:44:27,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=292960.0, ans=0.125 2023-12-21 23:44:29,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2023-12-21 23:44:34,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2023-12-21 23:44:43,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=293026.6666666667, ans=0.125 2023-12-21 23:44:54,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=293093.3333333333, ans=0.2 2023-12-21 23:45:04,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=293160.0, ans=0.125 2023-12-21 23:45:09,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=293226.6666666667, ans=0.2 2023-12-21 23:45:11,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=293226.6666666667, ans=0.125 2023-12-21 23:45:16,148 INFO [train.py:886] (0/4) Epoch 10, batch 1100, loss[loss=0.01407, audio_tagging_loss=0.01407, over 25000.00 frames. ], tot_loss[loss=0.01514, audio_tagging_loss=0.01514, over 4934143.39 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:45:16,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=293293.3333333333, ans=0.0 2023-12-21 23:45:20,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=293293.3333333333, ans=0.125 2023-12-21 23:45:21,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=293293.3333333333, ans=0.2 2023-12-21 23:45:23,183 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-44000.pt 2023-12-21 23:45:32,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=293360.0, ans=0.0 2023-12-21 23:45:44,969 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.604e+01 2.781e+01 2.959e+01 3.663e+01, threshold=5.562e+01, percent-clipped=0.0 2023-12-21 23:46:04,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=12.0 2023-12-21 23:46:10,262 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 23:46:11,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=293626.6666666667, ans=0.0 2023-12-21 23:46:12,037 INFO [train.py:886] (0/4) Epoch 10, batch 1150, loss[loss=0.01493, audio_tagging_loss=0.01493, over 25000.00 frames. ], tot_loss[loss=0.01513, audio_tagging_loss=0.01513, over 4936942.62 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:46:12,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=293626.6666666667, ans=0.125 2023-12-21 23:46:29,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=293693.3333333333, ans=0.07 2023-12-21 23:46:53,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.48 vs. limit=22.5 2023-12-21 23:47:01,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.16 vs. limit=15.0 2023-12-21 23:47:03,890 INFO [train.py:886] (0/4) Epoch 10, batch 1200, loss[loss=0.01374, audio_tagging_loss=0.01374, over 25000.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4938264.31 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:47:22,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=294093.3333333333, ans=0.1 2023-12-21 23:47:28,070 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.576e+01 2.725e+01 2.862e+01 3.373e+01, threshold=5.450e+01, percent-clipped=0.0 2023-12-21 23:47:54,681 INFO [train.py:886] (0/4) Epoch 10, batch 1250, loss[loss=0.01555, audio_tagging_loss=0.01555, over 24750.00 frames. ], tot_loss[loss=0.01528, audio_tagging_loss=0.01528, over 4935238.68 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:48:29,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=294493.3333333333, ans=0.125 2023-12-21 23:48:42,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=12.0 2023-12-21 23:48:43,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=294560.0, ans=0.125 2023-12-21 23:48:43,565 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.38 vs. limit=22.5 2023-12-21 23:48:46,963 INFO [train.py:886] (0/4) Epoch 10, batch 1300, loss[loss=0.01631, audio_tagging_loss=0.01631, over 25000.00 frames. ], tot_loss[loss=0.01533, audio_tagging_loss=0.01533, over 4934048.76 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:49:02,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=294693.3333333333, ans=0.125 2023-12-21 23:49:13,634 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.699e+01 2.817e+01 2.948e+01 3.406e+01, threshold=5.634e+01, percent-clipped=0.0 2023-12-21 23:49:20,405 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=5.203e-03 2023-12-21 23:49:39,366 INFO [train.py:886] (0/4) Epoch 10, batch 1350, loss[loss=0.01508, audio_tagging_loss=0.01508, over 25000.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 4938450.35 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 128.0 2023-12-21 23:49:45,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=294960.0, ans=0.0 2023-12-21 23:49:57,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=295026.6666666667, ans=0.2 2023-12-21 23:50:00,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=295093.3333333333, ans=0.0 2023-12-21 23:50:18,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=295160.0, ans=0.125 2023-12-21 23:50:22,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=295226.6666666667, ans=0.125 2023-12-21 23:50:23,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=295226.6666666667, ans=0.035 2023-12-21 23:50:30,335 INFO [train.py:886] (0/4) Epoch 10, batch 1400, loss[loss=0.01482, audio_tagging_loss=0.01482, over 25000.00 frames. ], tot_loss[loss=0.01519, audio_tagging_loss=0.01519, over 4942744.74 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:50:57,860 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 2.578e+01 2.757e+01 2.921e+01 3.435e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-21 23:51:11,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=295560.0, ans=0.07 2023-12-21 23:51:14,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=295560.0, ans=0.125 2023-12-21 23:51:23,342 INFO [train.py:886] (0/4) Epoch 10, batch 1450, loss[loss=0.01439, audio_tagging_loss=0.01439, over 22535.00 frames. ], tot_loss[loss=0.01516, audio_tagging_loss=0.01516, over 4939068.33 frames. ], batch size: 107, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:51:24,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=295626.6666666667, ans=0.0 2023-12-21 23:52:04,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.47 vs. limit=10.0 2023-12-21 23:52:05,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=295893.3333333333, ans=0.125 2023-12-21 23:52:08,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=295893.3333333333, ans=0.1 2023-12-21 23:52:14,520 INFO [train.py:886] (0/4) Epoch 10, batch 1500, loss[loss=0.01616, audio_tagging_loss=0.01616, over 25000.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4943895.06 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:52:31,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=296026.6666666667, ans=0.125 2023-12-21 23:52:41,513 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.355e+01 2.575e+01 2.789e+01 2.982e+01 3.364e+01, threshold=5.578e+01, percent-clipped=0.0 2023-12-21 23:52:42,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=296093.3333333333, ans=0.0 2023-12-21 23:52:47,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.86 vs. limit=15.0 2023-12-21 23:52:48,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=296160.0, ans=0.125 2023-12-21 23:53:06,793 INFO [train.py:886] (0/4) Epoch 10, batch 1550, loss[loss=0.01519, audio_tagging_loss=0.01519, over 24750.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4945215.02 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:53:17,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=296360.0, ans=0.0 2023-12-21 23:53:44,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=296493.3333333333, ans=0.0 2023-12-21 23:53:53,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=12.0 2023-12-21 23:53:59,454 INFO [train.py:886] (0/4) Epoch 10, batch 1600, loss[loss=0.01657, audio_tagging_loss=0.01657, over 24005.00 frames. ], tot_loss[loss=0.01529, audio_tagging_loss=0.01529, over 4939957.74 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:54:00,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=296626.6666666667, ans=0.2 2023-12-21 23:54:09,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=296693.3333333333, ans=0.125 2023-12-21 23:54:18,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=296760.0, ans=0.125 2023-12-21 23:54:19,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=296760.0, ans=0.1 2023-12-21 23:54:23,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=296760.0, ans=0.04949747468305833 2023-12-21 23:54:25,587 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.673e+01 2.852e+01 3.027e+01 3.338e+01, threshold=5.705e+01, percent-clipped=0.0 2023-12-21 23:54:25,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=296760.0, ans=0.95 2023-12-21 23:54:38,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.84 vs. limit=22.5 2023-12-21 23:54:41,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=296893.3333333333, ans=0.125 2023-12-21 23:54:49,797 INFO [train.py:886] (0/4) Epoch 10, batch 1650, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 4938931.93 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:54:50,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=296960.0, ans=0.125 2023-12-21 23:55:07,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=297026.6666666667, ans=0.035 2023-12-21 23:55:07,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=297026.6666666667, ans=0.2 2023-12-21 23:55:13,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=297093.3333333333, ans=0.125 2023-12-21 23:55:22,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2023-12-21 23:55:28,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.58 vs. limit=15.0 2023-12-21 23:55:42,885 INFO [train.py:886] (0/4) Epoch 10, batch 1700, loss[loss=0.01549, audio_tagging_loss=0.01549, over 24750.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 4941917.62 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:55:44,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=297293.3333333333, ans=0.125 2023-12-21 23:55:44,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=297293.3333333333, ans=0.125 2023-12-21 23:55:53,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.23 vs. limit=15.0 2023-12-21 23:55:55,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.61 vs. limit=22.5 2023-12-21 23:55:55,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=297360.0, ans=0.0 2023-12-21 23:56:10,397 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.376e+01 2.587e+01 2.708e+01 2.875e+01 3.627e+01, threshold=5.416e+01, percent-clipped=0.0 2023-12-21 23:56:15,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=297493.3333333333, ans=0.125 2023-12-21 23:56:16,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=297493.3333333333, ans=22.5 2023-12-21 23:56:26,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297560.0, ans=0.1 2023-12-21 23:56:33,969 INFO [train.py:886] (0/4) Epoch 10, batch 1750, loss[loss=0.01861, audio_tagging_loss=0.01861, over 25000.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4947540.15 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:56:47,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297693.3333333333, ans=0.1 2023-12-21 23:56:47,349 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.29 vs. limit=10.0 2023-12-21 23:56:53,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=297760.0, ans=0.125 2023-12-21 23:57:01,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=297760.0, ans=0.0 2023-12-21 23:57:06,573 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.677e-03 2023-12-21 23:57:24,857 INFO [train.py:886] (0/4) Epoch 10, batch 1800, loss[loss=0.0194, audio_tagging_loss=0.0194, over 25000.00 frames. ], tot_loss[loss=0.01519, audio_tagging_loss=0.01519, over 4943807.97 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:57:26,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=297960.0, ans=0.125 2023-12-21 23:57:38,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=298026.6666666667, ans=0.125 2023-12-21 23:57:44,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.41 vs. limit=22.5 2023-12-21 23:57:45,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.84 vs. limit=15.0 2023-12-21 23:57:48,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.77 vs. limit=15.0 2023-12-21 23:57:49,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=298093.3333333333, ans=0.0 2023-12-21 23:57:52,158 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.629e+01 2.799e+01 2.963e+01 3.676e+01, threshold=5.597e+01, percent-clipped=0.0 2023-12-21 23:58:06,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=298226.6666666667, ans=0.125 2023-12-21 23:58:09,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=298226.6666666667, ans=0.0 2023-12-21 23:58:10,123 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.07 vs. limit=12.0 2023-12-21 23:58:17,339 INFO [train.py:886] (0/4) Epoch 10, batch 1850, loss[loss=0.01528, audio_tagging_loss=0.01528, over 24750.00 frames. ], tot_loss[loss=0.01521, audio_tagging_loss=0.01521, over 4945248.11 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:58:52,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=298493.3333333333, ans=0.125 2023-12-21 23:58:54,667 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.87 vs. limit=10.0 2023-12-21 23:59:07,477 INFO [train.py:886] (0/4) Epoch 10, batch 1900, loss[loss=0.01504, audio_tagging_loss=0.01504, over 24750.00 frames. ], tot_loss[loss=0.01526, audio_tagging_loss=0.01526, over 4937642.37 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:59:34,344 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.713e+01 2.863e+01 3.091e+01 4.533e+01, threshold=5.726e+01, percent-clipped=0.0 2023-12-21 23:59:38,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=298826.6666666667, ans=0.0 2023-12-21 23:59:46,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=298826.6666666667, ans=0.125 2023-12-21 23:59:56,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=298893.3333333333, ans=0.0 2023-12-21 23:59:59,046 INFO [train.py:886] (0/4) Epoch 10, batch 1950, loss[loss=0.01446, audio_tagging_loss=0.01446, over 22427.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4935985.18 frames. ], batch size: 107, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:00:04,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=298960.0, ans=0.125 2023-12-22 00:00:10,072 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=15.0 2023-12-22 00:00:10,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=299026.6666666667, ans=0.125 2023-12-22 00:00:26,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=299093.3333333333, ans=10.0 2023-12-22 00:00:27,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=299093.3333333333, ans=0.125 2023-12-22 00:00:48,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=299226.6666666667, ans=0.0 2023-12-22 00:00:50,509 INFO [train.py:886] (0/4) Epoch 10, batch 2000, loss[loss=0.01533, audio_tagging_loss=0.01533, over 24750.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 4940789.91 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:00:50,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=299293.3333333333, ans=0.1 2023-12-22 00:00:51,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=299293.3333333333, ans=0.0 2023-12-22 00:01:08,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=299360.0, ans=0.0 2023-12-22 00:01:16,429 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.618e+01 2.723e+01 2.914e+01 3.556e+01, threshold=5.446e+01, percent-clipped=0.0 2023-12-22 00:01:22,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=299493.3333333333, ans=0.125 2023-12-22 00:01:30,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=299493.3333333333, ans=15.0 2023-12-22 00:01:35,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=299560.0, ans=0.125 2023-12-22 00:01:42,016 INFO [train.py:886] (0/4) Epoch 10, batch 2050, loss[loss=0.0165, audio_tagging_loss=0.0165, over 24750.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 4948079.02 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:02:03,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=299760.0, ans=0.2 2023-12-22 00:02:10,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=299760.0, ans=0.0 2023-12-22 00:02:13,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=299826.6666666667, ans=0.125 2023-12-22 00:02:18,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=12.0 2023-12-22 00:02:18,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=15.0 2023-12-22 00:02:18,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.34 vs. limit=10.0 2023-12-22 00:02:29,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2023-12-22 00:02:33,761 INFO [train.py:886] (0/4) Epoch 10, batch 2100, loss[loss=0.01492, audio_tagging_loss=0.01492, over 22097.00 frames. ], tot_loss[loss=0.0151, audio_tagging_loss=0.0151, over 4949979.01 frames. ], batch size: 107, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:02:46,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=300026.6666666667, ans=0.2 2023-12-22 00:02:47,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-12-22 00:02:59,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=300093.3333333333, ans=0.125 2023-12-22 00:03:00,778 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.290e+01 2.587e+01 2.718e+01 2.864e+01 3.394e+01, threshold=5.437e+01, percent-clipped=0.0 2023-12-22 00:03:24,898 INFO [train.py:886] (0/4) Epoch 10, batch 2150, loss[loss=0.01523, audio_tagging_loss=0.01523, over 25000.00 frames. ], tot_loss[loss=0.01514, audio_tagging_loss=0.01514, over 4953704.26 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:03:26,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=300293.3333333333, ans=0.125 2023-12-22 00:03:43,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=300360.0, ans=0.125 2023-12-22 00:03:45,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=300426.6666666667, ans=0.125 2023-12-22 00:03:53,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=300426.6666666667, ans=0.0 2023-12-22 00:04:17,933 INFO [train.py:886] (0/4) Epoch 10, batch 2200, loss[loss=0.01242, audio_tagging_loss=0.01242, over 24750.00 frames. ], tot_loss[loss=0.01525, audio_tagging_loss=0.01525, over 4948606.07 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:04:42,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=300760.0, ans=0.2 2023-12-22 00:04:42,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=300760.0, ans=0.0 2023-12-22 00:04:43,835 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.287e+01 2.615e+01 2.771e+01 2.942e+01 3.456e+01, threshold=5.541e+01, percent-clipped=0.0 2023-12-22 00:05:06,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=300893.3333333333, ans=0.1 2023-12-22 00:05:09,186 INFO [train.py:886] (0/4) Epoch 10, batch 2250, loss[loss=0.01349, audio_tagging_loss=0.01349, over 24750.00 frames. ], tot_loss[loss=0.01529, audio_tagging_loss=0.01529, over 4942655.92 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:05:29,353 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.06 vs. limit=22.5 2023-12-22 00:05:30,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=301093.3333333333, ans=0.2 2023-12-22 00:05:31,135 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.550e-01 2023-12-22 00:05:38,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=301160.0, ans=0.1 2023-12-22 00:05:40,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=301160.0, ans=0.0 2023-12-22 00:05:44,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=301160.0, ans=0.125 2023-12-22 00:05:51,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=15.0 2023-12-22 00:05:53,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=301226.6666666667, ans=0.0 2023-12-22 00:05:54,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=301226.6666666667, ans=0.0 2023-12-22 00:05:59,831 INFO [train.py:886] (0/4) Epoch 10, batch 2300, loss[loss=0.01571, audio_tagging_loss=0.01571, over 25000.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4946139.53 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:06:00,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=301293.3333333333, ans=0.2 2023-12-22 00:06:03,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=301293.3333333333, ans=0.0 2023-12-22 00:06:08,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=301293.3333333333, ans=0.125 2023-12-22 00:06:10,570 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.91 vs. limit=22.5 2023-12-22 00:06:13,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.75 vs. limit=10.0 2023-12-22 00:06:14,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=301360.0, ans=0.0 2023-12-22 00:06:20,051 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2023-12-22 00:06:26,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=301426.6666666667, ans=0.125 2023-12-22 00:06:26,821 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.310e+01 2.627e+01 2.751e+01 2.919e+01 3.578e+01, threshold=5.503e+01, percent-clipped=0.0 2023-12-22 00:06:35,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=301493.3333333333, ans=0.1 2023-12-22 00:06:50,942 INFO [train.py:886] (0/4) Epoch 10, batch 2350, loss[loss=0.01547, audio_tagging_loss=0.01547, over 25000.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 4949425.88 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:06:53,091 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=8.101e-03 2023-12-22 00:06:59,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=301626.6666666667, ans=0.0 2023-12-22 00:07:00,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=301693.3333333333, ans=0.125 2023-12-22 00:07:05,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=301693.3333333333, ans=0.125 2023-12-22 00:07:11,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=301760.0, ans=0.0 2023-12-22 00:07:12,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=301760.0, ans=0.125 2023-12-22 00:07:19,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=301760.0, ans=0.0 2023-12-22 00:07:20,458 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.79 vs. limit=15.0 2023-12-22 00:07:24,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=301826.6666666667, ans=0.125 2023-12-22 00:07:42,772 INFO [train.py:886] (0/4) Epoch 10, batch 2400, loss[loss=0.01582, audio_tagging_loss=0.01582, over 25000.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4947412.63 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:08:07,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=302093.3333333333, ans=0.1 2023-12-22 00:08:07,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.60 vs. limit=12.0 2023-12-22 00:08:08,763 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+01 2.611e+01 2.780e+01 2.950e+01 3.631e+01, threshold=5.560e+01, percent-clipped=0.0 2023-12-22 00:08:10,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=302093.3333333333, ans=22.5 2023-12-22 00:08:23,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.81 vs. limit=10.0 2023-12-22 00:08:33,181 INFO [train.py:886] (0/4) Epoch 10, batch 2450, loss[loss=0.01246, audio_tagging_loss=0.01246, over 23986.00 frames. ], tot_loss[loss=0.01519, audio_tagging_loss=0.01519, over 4950199.73 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:08:49,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-12-22 00:09:04,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=302493.3333333333, ans=15.0 2023-12-22 00:09:08,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=302493.3333333333, ans=0.125 2023-12-22 00:09:13,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=302560.0, ans=0.1 2023-12-22 00:09:23,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=302560.0, ans=0.125 2023-12-22 00:09:25,544 INFO [train.py:886] (0/4) Epoch 10, batch 2500, loss[loss=0.01363, audio_tagging_loss=0.01363, over 24750.00 frames. ], tot_loss[loss=0.01527, audio_tagging_loss=0.01527, over 4950651.78 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:09:36,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=302693.3333333333, ans=0.0 2023-12-22 00:09:45,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2023-12-22 00:09:45,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=302760.0, ans=0.0 2023-12-22 00:09:52,765 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.429e+01 2.737e+01 2.858e+01 3.085e+01 3.601e+01, threshold=5.717e+01, percent-clipped=0.0 2023-12-22 00:09:57,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=302826.6666666667, ans=0.125 2023-12-22 00:09:57,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2023-12-22 00:10:16,828 INFO [train.py:886] (0/4) Epoch 10, batch 2550, loss[loss=0.01288, audio_tagging_loss=0.01288, over 24750.00 frames. ], tot_loss[loss=0.01528, audio_tagging_loss=0.01528, over 4945296.52 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:10:36,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=303026.6666666667, ans=0.02 2023-12-22 00:10:56,050 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.111e-01 2023-12-22 00:10:59,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=303226.6666666667, ans=0.2 2023-12-22 00:11:03,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2023-12-22 00:11:08,867 INFO [train.py:886] (0/4) Epoch 10, batch 2600, loss[loss=0.01184, audio_tagging_loss=0.01184, over 24043.00 frames. ], tot_loss[loss=0.01517, audio_tagging_loss=0.01517, over 4944890.70 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:11:11,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=303293.3333333333, ans=0.0 2023-12-22 00:11:14,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=303293.3333333333, ans=0.125 2023-12-22 00:11:29,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.36 vs. limit=12.0 2023-12-22 00:11:35,999 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.384e+01 2.658e+01 2.842e+01 3.002e+01 3.670e+01, threshold=5.683e+01, percent-clipped=0.0 2023-12-22 00:11:36,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=303426.6666666667, ans=0.125 2023-12-22 00:11:44,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.49 vs. limit=15.0 2023-12-22 00:11:49,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2023-12-22 00:12:00,774 INFO [train.py:886] (0/4) Epoch 10, batch 2650, loss[loss=0.01554, audio_tagging_loss=0.01554, over 25000.00 frames. ], tot_loss[loss=0.01526, audio_tagging_loss=0.01526, over 4946594.82 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:12:10,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=303693.3333333333, ans=0.125 2023-12-22 00:12:11,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.61 vs. limit=15.0 2023-12-22 00:12:42,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=303893.3333333333, ans=0.0 2023-12-22 00:12:51,223 INFO [train.py:886] (0/4) Epoch 10, batch 2700, loss[loss=0.013, audio_tagging_loss=0.013, over 25000.00 frames. ], tot_loss[loss=0.01519, audio_tagging_loss=0.01519, over 4949059.28 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:12:52,704 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.17 vs. limit=15.0 2023-12-22 00:12:59,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=303960.0, ans=0.1 2023-12-22 00:13:03,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=304026.6666666667, ans=0.125 2023-12-22 00:13:11,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.53 vs. limit=15.0 2023-12-22 00:13:18,950 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.625e+01 2.781e+01 2.947e+01 3.660e+01, threshold=5.563e+01, percent-clipped=0.0 2023-12-22 00:13:24,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.95 vs. limit=22.5 2023-12-22 00:13:24,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=304160.0, ans=0.125 2023-12-22 00:13:38,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=304226.6666666667, ans=0.0 2023-12-22 00:13:40,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=304226.6666666667, ans=0.2 2023-12-22 00:13:44,514 INFO [train.py:886] (0/4) Epoch 10, batch 2750, loss[loss=0.01536, audio_tagging_loss=0.01536, over 25000.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4953769.01 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:13:53,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=304360.0, ans=0.125 2023-12-22 00:13:59,283 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.53 vs. limit=10.0 2023-12-22 00:14:10,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=304426.6666666667, ans=0.0 2023-12-22 00:14:11,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=304426.6666666667, ans=0.125 2023-12-22 00:14:24,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=304560.0, ans=0.125 2023-12-22 00:14:35,051 INFO [train.py:886] (0/4) Epoch 10, batch 2800, loss[loss=0.01701, audio_tagging_loss=0.01701, over 24750.00 frames. ], tot_loss[loss=0.01533, audio_tagging_loss=0.01533, over 4953736.67 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:14:47,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=304693.3333333333, ans=0.0 2023-12-22 00:14:51,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304693.3333333333, ans=0.1 2023-12-22 00:14:53,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=304693.3333333333, ans=0.125 2023-12-22 00:15:01,759 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.388e+01 2.658e+01 2.805e+01 2.937e+01 3.602e+01, threshold=5.609e+01, percent-clipped=0.0 2023-12-22 00:15:27,326 INFO [train.py:886] (0/4) Epoch 10, batch 2850, loss[loss=0.01362, audio_tagging_loss=0.01362, over 24750.00 frames. ], tot_loss[loss=0.01533, audio_tagging_loss=0.01533, over 4949666.08 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:15:39,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=305026.6666666667, ans=0.1 2023-12-22 00:15:44,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=305026.6666666667, ans=0.125 2023-12-22 00:15:52,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=305093.3333333333, ans=0.125 2023-12-22 00:16:14,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=15.0 2023-12-22 00:16:15,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=305226.6666666667, ans=0.1 2023-12-22 00:16:19,451 INFO [train.py:886] (0/4) Epoch 10, batch 2900, loss[loss=0.01774, audio_tagging_loss=0.01774, over 24750.00 frames. ], tot_loss[loss=0.01532, audio_tagging_loss=0.01532, over 4946685.10 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:16:22,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=305293.3333333333, ans=0.125 2023-12-22 00:16:24,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.60 vs. limit=22.5 2023-12-22 00:16:35,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=305360.0, ans=0.09899494936611666 2023-12-22 00:16:45,547 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.645e+01 2.819e+01 2.957e+01 3.644e+01, threshold=5.638e+01, percent-clipped=0.0 2023-12-22 00:16:45,994 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.54 vs. limit=15.0 2023-12-22 00:16:48,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.23 vs. limit=15.0 2023-12-22 00:16:58,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=305493.3333333333, ans=0.0 2023-12-22 00:17:05,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.65 vs. limit=12.0 2023-12-22 00:17:09,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.21 vs. limit=15.0 2023-12-22 00:17:10,126 INFO [train.py:886] (0/4) Epoch 10, batch 2950, loss[loss=0.01221, audio_tagging_loss=0.01221, over 24750.00 frames. ], tot_loss[loss=0.01516, audio_tagging_loss=0.01516, over 4949940.66 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:17:12,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=305626.6666666667, ans=0.125 2023-12-22 00:17:19,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=305626.6666666667, ans=0.2 2023-12-22 00:17:20,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=305693.3333333333, ans=0.2 2023-12-22 00:17:26,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=305693.3333333333, ans=0.035 2023-12-22 00:17:28,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=305693.3333333333, ans=0.05 2023-12-22 00:17:32,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=305760.0, ans=10.0 2023-12-22 00:17:49,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=305826.6666666667, ans=0.0 2023-12-22 00:17:49,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=305826.6666666667, ans=0.0 2023-12-22 00:18:03,086 INFO [train.py:886] (0/4) Epoch 10, batch 3000, loss[loss=0.01509, audio_tagging_loss=0.01509, over 24750.00 frames. ], tot_loss[loss=0.0151, audio_tagging_loss=0.0151, over 4955220.34 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:18:03,087 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 00:18:14,837 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9830, 2.7748, 3.6633, 3.8101], device='cuda:0') 2023-12-22 00:18:24,625 INFO [train.py:917] (0/4) Epoch 10, validation: loss=0.03417, audio_tagging_loss=0.03417, over 3737520.00 frames. 2023-12-22 00:18:24,626 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 00:18:35,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=306026.6666666667, ans=0.0 2023-12-22 00:18:43,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=306093.3333333333, ans=0.0 2023-12-22 00:18:49,711 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+01 2.598e+01 2.703e+01 2.864e+01 3.269e+01, threshold=5.407e+01, percent-clipped=0.0 2023-12-22 00:18:56,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=306160.0, ans=0.125 2023-12-22 00:19:14,560 INFO [train.py:886] (0/4) Epoch 10, batch 3050, loss[loss=0.01503, audio_tagging_loss=0.01503, over 25000.00 frames. ], tot_loss[loss=0.01508, audio_tagging_loss=0.01508, over 4961156.64 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:19:16,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=306293.3333333333, ans=0.125 2023-12-22 00:19:17,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=15.0 2023-12-22 00:19:42,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=306426.6666666667, ans=0.125 2023-12-22 00:19:46,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=306493.3333333333, ans=0.1 2023-12-22 00:19:49,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=306493.3333333333, ans=15.0 2023-12-22 00:20:03,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=306560.0, ans=0.125 2023-12-22 00:20:07,341 INFO [train.py:886] (0/4) Epoch 10, batch 3100, loss[loss=0.01689, audio_tagging_loss=0.01689, over 24750.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4963255.06 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:20:16,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=306693.3333333333, ans=0.2 2023-12-22 00:20:18,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.86 vs. limit=22.5 2023-12-22 00:20:28,198 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.510e-03 2023-12-22 00:20:33,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=306760.0, ans=0.0 2023-12-22 00:20:34,051 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.376e+01 2.718e+01 2.843e+01 3.043e+01 4.179e+01, threshold=5.686e+01, percent-clipped=0.0 2023-12-22 00:20:55,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=306893.3333333333, ans=0.1 2023-12-22 00:20:56,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-12-22 00:20:58,237 INFO [train.py:886] (0/4) Epoch 10, batch 3150, loss[loss=0.01177, audio_tagging_loss=0.01177, over 24750.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4956622.39 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:21:09,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=307026.6666666667, ans=0.04949747468305833 2023-12-22 00:21:30,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=307160.0, ans=0.0 2023-12-22 00:21:31,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=15.0 2023-12-22 00:21:49,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=307293.3333333333, ans=0.125 2023-12-22 00:21:50,530 INFO [train.py:886] (0/4) Epoch 10, batch 3200, loss[loss=0.01863, audio_tagging_loss=0.01863, over 24750.00 frames. ], tot_loss[loss=0.01524, audio_tagging_loss=0.01524, over 4949852.17 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:22:11,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=307426.6666666667, ans=0.125 2023-12-22 00:22:18,053 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.605e+01 2.808e+01 3.019e+01 3.529e+01, threshold=5.615e+01, percent-clipped=0.0 2023-12-22 00:22:40,598 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=12.0 2023-12-22 00:22:42,804 INFO [train.py:886] (0/4) Epoch 10, batch 3250, loss[loss=0.01494, audio_tagging_loss=0.01494, over 25000.00 frames. ], tot_loss[loss=0.01517, audio_tagging_loss=0.01517, over 4950078.08 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:22:43,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.63 vs. limit=15.0 2023-12-22 00:23:06,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=307760.0, ans=0.125 2023-12-22 00:23:13,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=307826.6666666667, ans=0.125 2023-12-22 00:23:14,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=307826.6666666667, ans=0.1 2023-12-22 00:23:16,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=307826.6666666667, ans=0.2 2023-12-22 00:23:18,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2023-12-22 00:23:23,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=307893.3333333333, ans=0.1 2023-12-22 00:23:28,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=307893.3333333333, ans=0.0 2023-12-22 00:23:34,262 INFO [train.py:886] (0/4) Epoch 10, batch 3300, loss[loss=0.0147, audio_tagging_loss=0.0147, over 25000.00 frames. ], tot_loss[loss=0.0151, audio_tagging_loss=0.0151, over 4952159.95 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:23:46,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=308026.6666666667, ans=0.125 2023-12-22 00:24:01,367 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+01 2.643e+01 2.728e+01 2.904e+01 3.479e+01, threshold=5.455e+01, percent-clipped=0.0 2023-12-22 00:24:04,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=308160.0, ans=0.1 2023-12-22 00:24:10,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=308160.0, ans=0.125 2023-12-22 00:24:10,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=308160.0, ans=0.0 2023-12-22 00:24:26,091 INFO [train.py:886] (0/4) Epoch 10, batch 3350, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.01501, audio_tagging_loss=0.01501, over 4949350.08 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:24:44,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=12.0 2023-12-22 00:24:53,876 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2023-12-22 00:25:17,544 INFO [train.py:886] (0/4) Epoch 10, batch 3400, loss[loss=0.0171, audio_tagging_loss=0.0171, over 25000.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 4950616.57 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 128.0 2023-12-22 00:25:39,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=308760.0, ans=0.125 2023-12-22 00:25:44,946 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.624e+01 2.786e+01 2.955e+01 3.614e+01, threshold=5.572e+01, percent-clipped=0.0 2023-12-22 00:25:45,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=308760.0, ans=0.0 2023-12-22 00:25:52,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308826.6666666667, ans=0.1 2023-12-22 00:25:58,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.69 vs. limit=22.5 2023-12-22 00:26:02,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=308893.3333333333, ans=0.125 2023-12-22 00:26:09,915 INFO [train.py:886] (0/4) Epoch 10, batch 3450, loss[loss=0.01191, audio_tagging_loss=0.01191, over 24750.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 4951763.89 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:26:16,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2023-12-22 00:26:28,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=309026.6666666667, ans=0.125 2023-12-22 00:26:32,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2023-12-22 00:26:43,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=309160.0, ans=0.0 2023-12-22 00:26:58,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=309226.6666666667, ans=0.125 2023-12-22 00:26:59,047 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.44 vs. limit=10.0 2023-12-22 00:27:02,319 INFO [train.py:886] (0/4) Epoch 10, batch 3500, loss[loss=0.01596, audio_tagging_loss=0.01596, over 24750.00 frames. ], tot_loss[loss=0.01529, audio_tagging_loss=0.01529, over 4943664.69 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:27:15,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=309360.0, ans=15.0 2023-12-22 00:27:17,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=309360.0, ans=0.125 2023-12-22 00:27:30,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=309426.6666666667, ans=0.0 2023-12-22 00:27:30,740 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.274e+01 2.637e+01 2.779e+01 3.001e+01 4.121e+01, threshold=5.558e+01, percent-clipped=0.0 2023-12-22 00:27:32,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=309493.3333333333, ans=0.125 2023-12-22 00:27:36,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=309493.3333333333, ans=0.07 2023-12-22 00:27:54,245 INFO [train.py:886] (0/4) Epoch 10, batch 3550, loss[loss=0.017, audio_tagging_loss=0.017, over 24750.00 frames. ], tot_loss[loss=0.01525, audio_tagging_loss=0.01525, over 4943056.70 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:28:04,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=309693.3333333333, ans=0.0 2023-12-22 00:28:12,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.31 vs. limit=15.0 2023-12-22 00:28:15,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=309760.0, ans=0.2 2023-12-22 00:28:18,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=309760.0, ans=0.125 2023-12-22 00:28:34,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=309826.6666666667, ans=0.07 2023-12-22 00:28:35,582 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 00:28:45,834 INFO [train.py:886] (0/4) Epoch 10, batch 3600, loss[loss=0.01692, audio_tagging_loss=0.01692, over 24750.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4944962.58 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:29:08,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.61 vs. limit=15.0 2023-12-22 00:29:14,192 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.609e+01 2.736e+01 2.893e+01 3.548e+01, threshold=5.471e+01, percent-clipped=0.0 2023-12-22 00:29:20,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.20 vs. limit=15.0 2023-12-22 00:29:32,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=310226.6666666667, ans=0.5 2023-12-22 00:29:33,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=310226.6666666667, ans=0.125 2023-12-22 00:29:37,924 INFO [train.py:886] (0/4) Epoch 10, batch 3650, loss[loss=0.01476, audio_tagging_loss=0.01476, over 24750.00 frames. ], tot_loss[loss=0.01504, audio_tagging_loss=0.01504, over 4952598.22 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:30:05,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.49 vs. limit=22.5 2023-12-22 00:30:09,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=310493.3333333333, ans=0.2 2023-12-22 00:30:21,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=310560.0, ans=0.0 2023-12-22 00:30:21,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=310560.0, ans=0.0 2023-12-22 00:30:27,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=310560.0, ans=0.0 2023-12-22 00:30:28,845 INFO [train.py:886] (0/4) Epoch 10, batch 3700, loss[loss=0.01428, audio_tagging_loss=0.01428, over 25000.00 frames. ], tot_loss[loss=0.01505, audio_tagging_loss=0.01505, over 4957918.55 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:30:57,719 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.651e+01 2.800e+01 2.999e+01 3.516e+01, threshold=5.600e+01, percent-clipped=0.0 2023-12-22 00:31:00,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=310826.6666666667, ans=0.07 2023-12-22 00:31:10,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=310893.3333333333, ans=0.1 2023-12-22 00:31:16,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=310893.3333333333, ans=0.125 2023-12-22 00:31:22,354 INFO [train.py:886] (0/4) Epoch 10, batch 3750, loss[loss=0.01572, audio_tagging_loss=0.01572, over 24750.00 frames. ], tot_loss[loss=0.01513, audio_tagging_loss=0.01513, over 4959101.20 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:31:27,253 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.592e+00 2023-12-22 00:31:30,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.15 vs. limit=15.0 2023-12-22 00:31:32,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=311026.6666666667, ans=0.1 2023-12-22 00:31:38,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=311026.6666666667, ans=0.2 2023-12-22 00:31:59,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=311160.0, ans=0.125 2023-12-22 00:32:00,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=311160.0, ans=0.125 2023-12-22 00:32:13,549 INFO [train.py:886] (0/4) Epoch 10, batch 3800, loss[loss=0.01759, audio_tagging_loss=0.01759, over 24750.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4954768.66 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:32:16,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=311293.3333333333, ans=0.1 2023-12-22 00:32:17,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2023-12-22 00:32:30,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=311360.0, ans=0.125 2023-12-22 00:32:41,151 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.397e+01 2.644e+01 2.770e+01 2.975e+01 3.634e+01, threshold=5.540e+01, percent-clipped=0.0 2023-12-22 00:33:01,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.76 vs. limit=22.5 2023-12-22 00:33:05,103 INFO [train.py:886] (0/4) Epoch 10, batch 3850, loss[loss=0.01564, audio_tagging_loss=0.01564, over 24750.00 frames. ], tot_loss[loss=0.01515, audio_tagging_loss=0.01515, over 4948699.47 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:33:06,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=311626.6666666667, ans=0.0 2023-12-22 00:33:12,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=311626.6666666667, ans=0.125 2023-12-22 00:33:27,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=311760.0, ans=0.125 2023-12-22 00:33:29,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=311760.0, ans=0.2 2023-12-22 00:33:30,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=311760.0, ans=0.0 2023-12-22 00:33:35,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=311826.6666666667, ans=0.125 2023-12-22 00:33:36,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=311826.6666666667, ans=0.125 2023-12-22 00:33:49,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=311893.3333333333, ans=0.125 2023-12-22 00:33:58,127 INFO [train.py:886] (0/4) Epoch 10, batch 3900, loss[loss=0.01467, audio_tagging_loss=0.01467, over 25000.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4945475.60 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:34:06,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=311960.0, ans=0.0 2023-12-22 00:34:08,379 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=15.0 2023-12-22 00:34:21,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=312093.3333333333, ans=0.125 2023-12-22 00:34:25,853 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.307e+01 2.618e+01 2.786e+01 2.971e+01 3.570e+01, threshold=5.572e+01, percent-clipped=0.0 2023-12-22 00:34:32,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=312160.0, ans=0.1 2023-12-22 00:34:35,478 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.80 vs. limit=6.0 2023-12-22 00:34:46,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=312226.6666666667, ans=0.0 2023-12-22 00:34:47,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2023-12-22 00:34:49,000 INFO [train.py:886] (0/4) Epoch 10, batch 3950, loss[loss=0.01479, audio_tagging_loss=0.01479, over 25000.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 4947765.82 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:34:54,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=312293.3333333333, ans=0.1 2023-12-22 00:34:59,027 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=15.0 2023-12-22 00:34:59,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=312360.0, ans=15.0 2023-12-22 00:35:32,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=312560.0, ans=0.2 2023-12-22 00:35:40,661 INFO [train.py:886] (0/4) Epoch 10, batch 4000, loss[loss=0.01411, audio_tagging_loss=0.01411, over 25000.00 frames. ], tot_loss[loss=0.01519, audio_tagging_loss=0.01519, over 4951312.31 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:36:07,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=312760.0, ans=0.035 2023-12-22 00:36:09,003 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.644e+01 2.805e+01 2.924e+01 3.481e+01, threshold=5.611e+01, percent-clipped=0.0 2023-12-22 00:36:14,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=312826.6666666667, ans=0.125 2023-12-22 00:36:31,676 INFO [train.py:886] (0/4) Epoch 10, batch 4050, loss[loss=0.01389, audio_tagging_loss=0.01389, over 24750.00 frames. ], tot_loss[loss=0.01525, audio_tagging_loss=0.01525, over 4955755.29 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:36:49,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=313026.6666666667, ans=0.0 2023-12-22 00:36:53,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=313093.3333333333, ans=0.125 2023-12-22 00:36:57,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=313093.3333333333, ans=0.0 2023-12-22 00:37:11,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=313160.0, ans=0.0 2023-12-22 00:37:23,795 INFO [train.py:886] (0/4) Epoch 10, batch 4100, loss[loss=0.01412, audio_tagging_loss=0.01412, over 24750.00 frames. ], tot_loss[loss=0.01536, audio_tagging_loss=0.01536, over 4947478.77 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:37:26,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=313293.3333333333, ans=0.015 2023-12-22 00:37:26,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=313293.3333333333, ans=0.125 2023-12-22 00:37:34,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=313360.0, ans=0.0 2023-12-22 00:37:36,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=313360.0, ans=0.125 2023-12-22 00:37:41,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=313360.0, ans=0.0 2023-12-22 00:37:43,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=313360.0, ans=0.2 2023-12-22 00:37:48,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=313426.6666666667, ans=0.125 2023-12-22 00:37:51,848 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.404e+01 2.742e+01 2.860e+01 3.064e+01 3.458e+01, threshold=5.720e+01, percent-clipped=0.0 2023-12-22 00:37:54,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=313493.3333333333, ans=0.1 2023-12-22 00:37:56,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=313493.3333333333, ans=0.0 2023-12-22 00:38:04,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=313560.0, ans=0.0 2023-12-22 00:38:13,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=313560.0, ans=0.0 2023-12-22 00:38:15,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2023-12-22 00:38:16,638 INFO [train.py:886] (0/4) Epoch 10, batch 4150, loss[loss=0.0152, audio_tagging_loss=0.0152, over 25000.00 frames. ], tot_loss[loss=0.01524, audio_tagging_loss=0.01524, over 4946342.38 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:38:33,885 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.16 vs. limit=15.0 2023-12-22 00:38:44,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.92 vs. limit=22.5 2023-12-22 00:39:05,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=313893.3333333333, ans=0.2 2023-12-22 00:39:07,567 INFO [train.py:886] (0/4) Epoch 10, batch 4200, loss[loss=0.0141, audio_tagging_loss=0.0141, over 25000.00 frames. ], tot_loss[loss=0.01513, audio_tagging_loss=0.01513, over 4949752.64 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:39:16,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=313960.0, ans=0.0 2023-12-22 00:39:21,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=314026.6666666667, ans=0.1 2023-12-22 00:39:25,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=314026.6666666667, ans=0.1 2023-12-22 00:39:32,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=314093.3333333333, ans=0.125 2023-12-22 00:39:35,998 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.283e+01 2.632e+01 2.766e+01 2.962e+01 3.622e+01, threshold=5.532e+01, percent-clipped=0.0 2023-12-22 00:39:37,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2023-12-22 00:39:49,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=314226.6666666667, ans=0.125 2023-12-22 00:39:58,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=314226.6666666667, ans=0.2 2023-12-22 00:40:00,007 INFO [train.py:886] (0/4) Epoch 10, batch 4250, loss[loss=0.01549, audio_tagging_loss=0.01549, over 25000.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4953619.86 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:40:00,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=314293.3333333333, ans=0.1 2023-12-22 00:40:02,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=314293.3333333333, ans=0.125 2023-12-22 00:40:09,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.43 vs. limit=15.0 2023-12-22 00:40:10,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2023-12-22 00:40:19,747 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=12.0 2023-12-22 00:40:24,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=314426.6666666667, ans=0.2 2023-12-22 00:40:30,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=314493.3333333333, ans=0.125 2023-12-22 00:40:38,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=314493.3333333333, ans=0.125 2023-12-22 00:40:39,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=314493.3333333333, ans=0.125 2023-12-22 00:40:40,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=314560.0, ans=0.0 2023-12-22 00:40:41,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=314560.0, ans=0.125 2023-12-22 00:40:48,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=314560.0, ans=0.0 2023-12-22 00:40:48,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=314560.0, ans=0.125 2023-12-22 00:40:51,773 INFO [train.py:886] (0/4) Epoch 10, batch 4300, loss[loss=0.01022, audio_tagging_loss=0.01022, over 24061.00 frames. ], tot_loss[loss=0.01507, audio_tagging_loss=0.01507, over 4960193.25 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:41:10,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=314693.3333333333, ans=6.0 2023-12-22 00:41:19,554 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.660e+01 2.835e+01 2.994e+01 3.565e+01, threshold=5.669e+01, percent-clipped=0.0 2023-12-22 00:41:21,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=314826.6666666667, ans=0.0 2023-12-22 00:41:35,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-12-22 00:41:43,493 INFO [train.py:886] (0/4) Epoch 10, batch 4350, loss[loss=0.01306, audio_tagging_loss=0.01306, over 25000.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 4960017.23 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:41:44,660 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.701e-01 2023-12-22 00:41:55,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.77 vs. limit=22.5 2023-12-22 00:42:20,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.61 vs. limit=10.0 2023-12-22 00:42:22,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=315160.0, ans=0.125 2023-12-22 00:42:29,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=315226.6666666667, ans=0.125 2023-12-22 00:42:33,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=315226.6666666667, ans=0.125 2023-12-22 00:42:35,517 INFO [train.py:886] (0/4) Epoch 10, batch 4400, loss[loss=0.01631, audio_tagging_loss=0.01631, over 24750.00 frames. ], tot_loss[loss=0.01519, audio_tagging_loss=0.01519, over 4955140.09 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:42:39,644 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2023-12-22 00:42:52,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=315360.0, ans=0.0 2023-12-22 00:42:53,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=315360.0, ans=0.0 2023-12-22 00:42:57,363 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2023-12-22 00:43:04,207 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.662e+01 2.809e+01 2.979e+01 4.012e+01, threshold=5.619e+01, percent-clipped=0.0 2023-12-22 00:43:07,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=315493.3333333333, ans=0.0 2023-12-22 00:43:20,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=315560.0, ans=0.0 2023-12-22 00:43:20,963 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.73 vs. limit=15.0 2023-12-22 00:43:23,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=315560.0, ans=0.125 2023-12-22 00:43:27,635 INFO [train.py:886] (0/4) Epoch 10, batch 4450, loss[loss=0.01553, audio_tagging_loss=0.01553, over 24750.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 4952860.31 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:43:44,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=315693.3333333333, ans=0.0 2023-12-22 00:44:01,717 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.260e+00 2023-12-22 00:44:04,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=315826.6666666667, ans=0.1 2023-12-22 00:44:11,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.48 vs. limit=15.0 2023-12-22 00:44:13,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=315893.3333333333, ans=0.125 2023-12-22 00:44:18,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.93 vs. limit=22.5 2023-12-22 00:44:19,731 INFO [train.py:886] (0/4) Epoch 10, batch 4500, loss[loss=0.01236, audio_tagging_loss=0.01236, over 25000.00 frames. ], tot_loss[loss=0.01519, audio_tagging_loss=0.01519, over 4954174.92 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:44:31,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=316026.6666666667, ans=0.125 2023-12-22 00:44:40,105 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 00:44:42,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=316093.3333333333, ans=0.05 2023-12-22 00:44:45,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=316093.3333333333, ans=0.2 2023-12-22 00:44:47,476 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.380e+01 2.672e+01 2.824e+01 2.974e+01 3.593e+01, threshold=5.647e+01, percent-clipped=0.0 2023-12-22 00:44:51,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=316160.0, ans=0.2 2023-12-22 00:45:12,137 INFO [train.py:886] (0/4) Epoch 10, batch 4550, loss[loss=0.01948, audio_tagging_loss=0.01948, over 25000.00 frames. ], tot_loss[loss=0.0151, audio_tagging_loss=0.0151, over 4956305.03 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:45:13,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=316293.3333333333, ans=0.125 2023-12-22 00:45:18,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2023-12-22 00:46:04,000 INFO [train.py:886] (0/4) Epoch 10, batch 4600, loss[loss=0.01555, audio_tagging_loss=0.01555, over 25000.00 frames. ], tot_loss[loss=0.01508, audio_tagging_loss=0.01508, over 4962286.36 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:46:04,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=316626.6666666667, ans=0.125 2023-12-22 00:46:18,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=316693.3333333333, ans=0.125 2023-12-22 00:46:24,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.00 vs. limit=10.0 2023-12-22 00:46:30,858 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 2.537e+01 2.757e+01 2.967e+01 3.317e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-22 00:46:55,675 INFO [train.py:886] (0/4) Epoch 10, batch 4650, loss[loss=0.01425, audio_tagging_loss=0.01425, over 25000.00 frames. ], tot_loss[loss=0.01506, audio_tagging_loss=0.01506, over 4967869.99 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:47:27,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=317160.0, ans=0.0 2023-12-22 00:47:30,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=317160.0, ans=0.0 2023-12-22 00:47:30,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=317160.0, ans=0.1 2023-12-22 00:47:33,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=317160.0, ans=0.2 2023-12-22 00:47:41,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=317226.6666666667, ans=0.125 2023-12-22 00:47:45,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317293.3333333333, ans=0.1 2023-12-22 00:47:46,027 INFO [train.py:886] (0/4) Epoch 10, batch 4700, loss[loss=0.01627, audio_tagging_loss=0.01627, over 24750.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4964097.38 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:47:49,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=317293.3333333333, ans=0.0 2023-12-22 00:47:49,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.60 vs. limit=15.0 2023-12-22 00:48:12,609 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.720e+01 2.864e+01 3.016e+01 3.730e+01, threshold=5.728e+01, percent-clipped=0.0 2023-12-22 00:48:29,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=15.0 2023-12-22 00:48:33,400 INFO [train.py:886] (0/4) Epoch 10, batch 4750, loss[loss=0.01533, audio_tagging_loss=0.01533, over 24750.00 frames. ], tot_loss[loss=0.01521, audio_tagging_loss=0.01521, over 4953399.75 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:48:38,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=317626.6666666667, ans=0.125 2023-12-22 00:48:48,749 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-10.pt 2023-12-22 00:49:10,296 INFO [train.py:886] (0/4) Epoch 11, batch 0, loss[loss=0.03143, audio_tagging_loss=0.03143, over 25000.00 frames. ], tot_loss[loss=0.03143, audio_tagging_loss=0.03143, over 25000.00 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:49:10,300 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 00:49:23,946 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6511, 2.9102, 3.3995, 3.3559], device='cuda:0') 2023-12-22 00:49:30,817 INFO [train.py:917] (0/4) Epoch 11, validation: loss=0.03405, audio_tagging_loss=0.03405, over 3737520.00 frames. 2023-12-22 00:49:30,818 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 00:49:50,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=317866.6666666667, ans=0.0 2023-12-22 00:49:54,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=317866.6666666667, ans=0.2 2023-12-22 00:49:55,732 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.46 vs. limit=15.0 2023-12-22 00:50:04,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.87 vs. limit=15.0 2023-12-22 00:50:04,265 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=12.0 2023-12-22 00:50:05,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=317933.3333333333, ans=0.0 2023-12-22 00:50:13,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.41 vs. limit=22.5 2023-12-22 00:50:22,919 INFO [train.py:886] (0/4) Epoch 11, batch 50, loss[loss=0.01952, audio_tagging_loss=0.01952, over 25000.00 frames. ], tot_loss[loss=0.02415, audio_tagging_loss=0.02415, over 1116331.72 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:50:28,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=318066.6666666667, ans=0.1 2023-12-22 00:50:33,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=318133.3333333333, ans=0.1 2023-12-22 00:50:33,997 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.550e+01 2.913e+01 3.271e+01 4.041e+01 1.011e+02, threshold=6.542e+01, percent-clipped=6.0 2023-12-22 00:50:39,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=318133.3333333333, ans=0.09899494936611666 2023-12-22 00:50:43,495 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.93 vs. limit=22.5 2023-12-22 00:50:48,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.55 vs. limit=22.5 2023-12-22 00:50:52,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=318200.0, ans=0.125 2023-12-22 00:51:03,362 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 00:51:14,407 INFO [train.py:886] (0/4) Epoch 11, batch 100, loss[loss=0.01522, audio_tagging_loss=0.01522, over 25000.00 frames. ], tot_loss[loss=0.02073, audio_tagging_loss=0.02073, over 1973970.24 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:51:17,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=318400.0, ans=0.0 2023-12-22 00:51:31,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=318466.6666666667, ans=0.035 2023-12-22 00:51:52,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=318600.0, ans=0.2 2023-12-22 00:51:59,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=318666.6666666667, ans=0.0 2023-12-22 00:52:06,788 INFO [train.py:886] (0/4) Epoch 11, batch 150, loss[loss=0.01433, audio_tagging_loss=0.01433, over 24094.00 frames. ], tot_loss[loss=0.01877, audio_tagging_loss=0.01877, over 2636875.26 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:52:07,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=318733.3333333333, ans=0.2 2023-12-22 00:52:10,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.29 vs. limit=15.0 2023-12-22 00:52:13,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2023-12-22 00:52:17,108 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 2.830e+01 2.997e+01 3.217e+01 3.667e+01, threshold=5.994e+01, percent-clipped=0.0 2023-12-22 00:52:40,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=318933.3333333333, ans=0.1 2023-12-22 00:52:43,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=318933.3333333333, ans=0.1 2023-12-22 00:52:45,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=318933.3333333333, ans=0.0 2023-12-22 00:52:55,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=319000.0, ans=0.0 2023-12-22 00:52:56,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=319000.0, ans=0.125 2023-12-22 00:52:58,312 INFO [train.py:886] (0/4) Epoch 11, batch 200, loss[loss=0.01295, audio_tagging_loss=0.01295, over 25000.00 frames. ], tot_loss[loss=0.01744, audio_tagging_loss=0.01744, over 3147699.37 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:53:11,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.48 vs. limit=15.0 2023-12-22 00:53:29,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=319266.6666666667, ans=0.125 2023-12-22 00:53:37,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-12-22 00:53:43,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.86 vs. limit=5.0 2023-12-22 00:53:46,588 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.02 vs. limit=22.5 2023-12-22 00:53:49,999 INFO [train.py:886] (0/4) Epoch 11, batch 250, loss[loss=0.01435, audio_tagging_loss=0.01435, over 25000.00 frames. ], tot_loss[loss=0.01679, audio_tagging_loss=0.01679, over 3547962.61 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:53:51,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2023-12-22 00:53:52,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-12-22 00:53:57,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=319400.0, ans=0.07 2023-12-22 00:54:01,135 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.668e+01 2.780e+01 2.958e+01 3.295e+01, threshold=5.560e+01, percent-clipped=0.0 2023-12-22 00:54:02,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=319466.6666666667, ans=0.125 2023-12-22 00:54:07,106 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2023-12-22 00:54:12,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=319533.3333333333, ans=0.05 2023-12-22 00:54:22,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=319600.0, ans=0.125 2023-12-22 00:54:23,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=319600.0, ans=0.0 2023-12-22 00:54:38,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2023-12-22 00:54:42,154 INFO [train.py:886] (0/4) Epoch 11, batch 300, loss[loss=0.01859, audio_tagging_loss=0.01859, over 24940.00 frames. ], tot_loss[loss=0.0164, audio_tagging_loss=0.0164, over 3853632.81 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:54:58,506 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.15 vs. limit=10.0 2023-12-22 00:55:00,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=319800.0, ans=0.2 2023-12-22 00:55:04,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319866.6666666667, ans=0.1 2023-12-22 00:55:11,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.30 vs. limit=12.0 2023-12-22 00:55:12,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=319933.3333333333, ans=0.125 2023-12-22 00:55:22,832 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-48000.pt 2023-12-22 00:55:33,106 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=12.0 2023-12-22 00:55:36,165 INFO [train.py:886] (0/4) Epoch 11, batch 350, loss[loss=0.01869, audio_tagging_loss=0.01869, over 24750.00 frames. ], tot_loss[loss=0.01603, audio_tagging_loss=0.01603, over 4089310.42 frames. ], batch size: 99, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:55:48,024 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.298e+01 2.599e+01 2.795e+01 2.968e+01 3.574e+01, threshold=5.590e+01, percent-clipped=0.0 2023-12-22 00:55:59,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=320200.0, ans=0.1 2023-12-22 00:56:04,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=320200.0, ans=0.0 2023-12-22 00:56:21,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2023-12-22 00:56:23,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320333.3333333333, ans=0.1 2023-12-22 00:56:28,464 INFO [train.py:886] (0/4) Epoch 11, batch 400, loss[loss=0.01273, audio_tagging_loss=0.01273, over 25000.00 frames. ], tot_loss[loss=0.0157, audio_tagging_loss=0.0157, over 4279528.66 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:56:28,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.93 vs. limit=15.0 2023-12-22 00:56:33,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=320400.0, ans=0.0 2023-12-22 00:56:38,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=320466.6666666667, ans=0.0 2023-12-22 00:56:40,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=320466.6666666667, ans=0.125 2023-12-22 00:56:47,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=320466.6666666667, ans=0.02 2023-12-22 00:57:08,796 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=12.0 2023-12-22 00:57:12,651 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.94 vs. limit=6.0 2023-12-22 00:57:20,458 INFO [train.py:886] (0/4) Epoch 11, batch 450, loss[loss=0.01691, audio_tagging_loss=0.01691, over 25000.00 frames. ], tot_loss[loss=0.01553, audio_tagging_loss=0.01553, over 4421635.98 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:57:32,227 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 2.613e+01 2.762e+01 2.922e+01 3.563e+01, threshold=5.524e+01, percent-clipped=0.0 2023-12-22 00:57:42,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=320866.6666666667, ans=0.2 2023-12-22 00:57:54,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.83 vs. limit=12.0 2023-12-22 00:57:54,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=320933.3333333333, ans=0.125 2023-12-22 00:58:12,056 INFO [train.py:886] (0/4) Epoch 11, batch 500, loss[loss=0.01366, audio_tagging_loss=0.01366, over 24750.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 4537126.02 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 00:58:15,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=321066.6666666667, ans=0.2 2023-12-22 00:58:18,001 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2023-12-22 00:58:36,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=321200.0, ans=0.125 2023-12-22 00:59:04,097 INFO [train.py:886] (0/4) Epoch 11, batch 550, loss[loss=0.01638, audio_tagging_loss=0.01638, over 25000.00 frames. ], tot_loss[loss=0.01526, audio_tagging_loss=0.01526, over 4625353.80 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 00:59:12,706 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.462e-02 2023-12-22 00:59:15,293 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.605e+01 2.796e+01 2.937e+01 3.436e+01, threshold=5.591e+01, percent-clipped=0.0 2023-12-22 00:59:18,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=321466.6666666667, ans=0.2 2023-12-22 00:59:20,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=321466.6666666667, ans=0.125 2023-12-22 00:59:23,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=321533.3333333333, ans=0.125 2023-12-22 00:59:24,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=321533.3333333333, ans=0.0 2023-12-22 00:59:40,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=321600.0, ans=0.0 2023-12-22 00:59:44,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.47 vs. limit=22.5 2023-12-22 00:59:55,424 INFO [train.py:886] (0/4) Epoch 11, batch 600, loss[loss=0.01247, audio_tagging_loss=0.01247, over 22146.00 frames. ], tot_loss[loss=0.01531, audio_tagging_loss=0.01531, over 4694586.24 frames. ], batch size: 107, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 00:59:56,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=321733.3333333333, ans=0.125 2023-12-22 01:00:30,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=321933.3333333333, ans=0.1 2023-12-22 01:00:47,580 INFO [train.py:886] (0/4) Epoch 11, batch 650, loss[loss=0.01817, audio_tagging_loss=0.01817, over 24750.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4745288.68 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:00:47,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=322066.6666666667, ans=0.2 2023-12-22 01:00:58,779 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.635e+01 2.798e+01 2.937e+01 3.276e+01, threshold=5.597e+01, percent-clipped=0.0 2023-12-22 01:01:09,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=322200.0, ans=0.1 2023-12-22 01:01:33,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=322333.3333333333, ans=0.0 2023-12-22 01:01:39,230 INFO [train.py:886] (0/4) Epoch 11, batch 700, loss[loss=0.01429, audio_tagging_loss=0.01429, over 24750.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 4783540.29 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:02:02,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=322533.3333333333, ans=0.1 2023-12-22 01:02:05,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=322533.3333333333, ans=0.125 2023-12-22 01:02:11,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=322600.0, ans=0.0 2023-12-22 01:02:14,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=322600.0, ans=0.125 2023-12-22 01:02:31,624 INFO [train.py:886] (0/4) Epoch 11, batch 750, loss[loss=0.01703, audio_tagging_loss=0.01703, over 25000.00 frames. ], tot_loss[loss=0.01515, audio_tagging_loss=0.01515, over 4822567.53 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:02:39,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=322733.3333333333, ans=0.125 2023-12-22 01:02:39,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.35 vs. limit=22.5 2023-12-22 01:02:44,360 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.310e+01 2.637e+01 2.770e+01 2.926e+01 3.754e+01, threshold=5.541e+01, percent-clipped=0.0 2023-12-22 01:03:14,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=323000.0, ans=0.0 2023-12-22 01:03:24,093 INFO [train.py:886] (0/4) Epoch 11, batch 800, loss[loss=0.01562, audio_tagging_loss=0.01562, over 25000.00 frames. ], tot_loss[loss=0.01505, audio_tagging_loss=0.01505, over 4853982.09 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:03:28,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=15.0 2023-12-22 01:03:30,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=323066.6666666667, ans=0.2 2023-12-22 01:03:32,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=323066.6666666667, ans=0.2 2023-12-22 01:03:35,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=323133.3333333333, ans=0.2 2023-12-22 01:03:38,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.18 vs. limit=15.0 2023-12-22 01:03:42,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=323133.3333333333, ans=0.125 2023-12-22 01:03:55,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=323266.6666666667, ans=0.125 2023-12-22 01:04:11,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=15.0 2023-12-22 01:04:15,558 INFO [train.py:886] (0/4) Epoch 11, batch 850, loss[loss=0.01604, audio_tagging_loss=0.01604, over 25000.00 frames. ], tot_loss[loss=0.01505, audio_tagging_loss=0.01505, over 4877544.91 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:04:25,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=12.0 2023-12-22 01:04:28,273 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 2.659e+01 2.776e+01 2.937e+01 3.524e+01, threshold=5.552e+01, percent-clipped=0.0 2023-12-22 01:04:32,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=323466.6666666667, ans=0.0 2023-12-22 01:04:44,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=323533.3333333333, ans=0.0 2023-12-22 01:04:49,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=323600.0, ans=0.1 2023-12-22 01:04:53,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=323600.0, ans=0.125 2023-12-22 01:04:54,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=323600.0, ans=0.0 2023-12-22 01:05:01,047 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.21 vs. limit=15.0 2023-12-22 01:05:02,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=323666.6666666667, ans=0.125 2023-12-22 01:05:07,993 INFO [train.py:886] (0/4) Epoch 11, batch 900, loss[loss=0.01372, audio_tagging_loss=0.01372, over 24750.00 frames. ], tot_loss[loss=0.01507, audio_tagging_loss=0.01507, over 4899413.71 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:05:09,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=323733.3333333333, ans=10.0 2023-12-22 01:05:15,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=323733.3333333333, ans=0.125 2023-12-22 01:05:27,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=323866.6666666667, ans=0.125 2023-12-22 01:05:48,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.06 vs. limit=6.0 2023-12-22 01:05:55,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2023-12-22 01:06:00,100 INFO [train.py:886] (0/4) Epoch 11, batch 950, loss[loss=0.01607, audio_tagging_loss=0.01607, over 24750.00 frames. ], tot_loss[loss=0.01508, audio_tagging_loss=0.01508, over 4905714.34 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:06:07,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=324066.6666666667, ans=0.2 2023-12-22 01:06:09,450 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=15.0 2023-12-22 01:06:12,721 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.426e+01 2.727e+01 2.870e+01 3.011e+01 3.522e+01, threshold=5.740e+01, percent-clipped=0.0 2023-12-22 01:06:22,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.72 vs. limit=12.0 2023-12-22 01:06:33,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=324266.6666666667, ans=0.0 2023-12-22 01:06:51,531 INFO [train.py:886] (0/4) Epoch 11, batch 1000, loss[loss=0.01329, audio_tagging_loss=0.01329, over 23954.00 frames. ], tot_loss[loss=0.01506, audio_tagging_loss=0.01506, over 4910365.90 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:06:54,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=324400.0, ans=0.1 2023-12-22 01:07:27,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=324600.0, ans=0.125 2023-12-22 01:07:44,399 INFO [train.py:886] (0/4) Epoch 11, batch 1050, loss[loss=0.01509, audio_tagging_loss=0.01509, over 25000.00 frames. ], tot_loss[loss=0.015, audio_tagging_loss=0.015, over 4917041.03 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:07:50,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=324733.3333333333, ans=0.5 2023-12-22 01:07:56,580 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+01 2.626e+01 2.739e+01 2.887e+01 3.384e+01, threshold=5.477e+01, percent-clipped=0.0 2023-12-22 01:08:07,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=324866.6666666667, ans=10.0 2023-12-22 01:08:09,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=324866.6666666667, ans=0.2 2023-12-22 01:08:16,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=324933.3333333333, ans=0.0 2023-12-22 01:08:19,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.60 vs. limit=10.0 2023-12-22 01:08:27,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=325000.0, ans=0.125 2023-12-22 01:08:28,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=325000.0, ans=0.2 2023-12-22 01:08:36,756 INFO [train.py:886] (0/4) Epoch 11, batch 1100, loss[loss=0.01592, audio_tagging_loss=0.01592, over 24750.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 4930376.03 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:08:51,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2023-12-22 01:09:02,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=325200.0, ans=0.125 2023-12-22 01:09:03,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=325200.0, ans=0.125 2023-12-22 01:09:15,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=325266.6666666667, ans=0.0 2023-12-22 01:09:23,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.02 vs. limit=6.0 2023-12-22 01:09:27,741 INFO [train.py:886] (0/4) Epoch 11, batch 1150, loss[loss=0.01511, audio_tagging_loss=0.01511, over 25000.00 frames. ], tot_loss[loss=0.01504, audio_tagging_loss=0.01504, over 4939811.20 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:09:30,611 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.563e-03 2023-12-22 01:09:34,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=325400.0, ans=0.07 2023-12-22 01:09:34,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=325400.0, ans=0.0 2023-12-22 01:09:35,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=325400.0, ans=0.125 2023-12-22 01:09:41,019 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.635e+01 2.806e+01 2.956e+01 3.723e+01, threshold=5.612e+01, percent-clipped=0.0 2023-12-22 01:09:54,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=325533.3333333333, ans=0.1 2023-12-22 01:10:19,965 INFO [train.py:886] (0/4) Epoch 11, batch 1200, loss[loss=0.01311, audio_tagging_loss=0.01311, over 24750.00 frames. ], tot_loss[loss=0.01504, audio_tagging_loss=0.01504, over 4947221.32 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:10:34,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=325800.0, ans=0.125 2023-12-22 01:10:54,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=325933.3333333333, ans=0.125 2023-12-22 01:11:05,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=326000.0, ans=0.125 2023-12-22 01:11:12,393 INFO [train.py:886] (0/4) Epoch 11, batch 1250, loss[loss=0.01658, audio_tagging_loss=0.01658, over 24750.00 frames. ], tot_loss[loss=0.01527, audio_tagging_loss=0.01527, over 4945808.91 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:11:13,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=326066.6666666667, ans=0.1 2023-12-22 01:11:17,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=326066.6666666667, ans=22.5 2023-12-22 01:11:25,126 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.715e+01 2.889e+01 3.133e+01 4.404e+01, threshold=5.779e+01, percent-clipped=0.0 2023-12-22 01:11:26,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=326133.3333333333, ans=0.95 2023-12-22 01:11:52,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=326333.3333333333, ans=0.0 2023-12-22 01:11:58,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=326333.3333333333, ans=0.1 2023-12-22 01:12:03,784 INFO [train.py:886] (0/4) Epoch 11, batch 1300, loss[loss=0.01691, audio_tagging_loss=0.01691, over 24750.00 frames. ], tot_loss[loss=0.01532, audio_tagging_loss=0.01532, over 4946623.18 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:12:06,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=326400.0, ans=0.125 2023-12-22 01:12:09,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2023-12-22 01:12:17,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=326466.6666666667, ans=0.1 2023-12-22 01:12:22,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=326466.6666666667, ans=0.1 2023-12-22 01:12:25,196 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.39 vs. limit=22.5 2023-12-22 01:12:29,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=326533.3333333333, ans=0.0 2023-12-22 01:12:56,093 INFO [train.py:886] (0/4) Epoch 11, batch 1350, loss[loss=0.01429, audio_tagging_loss=0.01429, over 25000.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4951209.22 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:12:57,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=326733.3333333333, ans=0.0 2023-12-22 01:13:02,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=326733.3333333333, ans=0.125 2023-12-22 01:13:02,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=326733.3333333333, ans=0.0 2023-12-22 01:13:04,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2023-12-22 01:13:07,971 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 2.643e+01 2.800e+01 2.940e+01 3.448e+01, threshold=5.600e+01, percent-clipped=0.0 2023-12-22 01:13:27,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=326933.3333333333, ans=0.125 2023-12-22 01:13:32,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=326933.3333333333, ans=0.125 2023-12-22 01:13:37,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2023-12-22 01:13:46,077 INFO [train.py:886] (0/4) Epoch 11, batch 1400, loss[loss=0.01612, audio_tagging_loss=0.01612, over 25000.00 frames. ], tot_loss[loss=0.0151, audio_tagging_loss=0.0151, over 4955718.77 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:13:46,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=327066.6666666667, ans=0.07 2023-12-22 01:13:46,791 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.27 vs. limit=15.0 2023-12-22 01:13:47,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.40 vs. limit=15.0 2023-12-22 01:13:55,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=327066.6666666667, ans=0.2 2023-12-22 01:13:56,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=327133.3333333333, ans=0.125 2023-12-22 01:14:00,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.08 vs. limit=22.5 2023-12-22 01:14:07,056 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.558e-01 2023-12-22 01:14:10,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=327200.0, ans=0.0 2023-12-22 01:14:13,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=327200.0, ans=0.125 2023-12-22 01:14:25,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=327266.6666666667, ans=0.0 2023-12-22 01:14:38,945 INFO [train.py:886] (0/4) Epoch 11, batch 1450, loss[loss=0.01344, audio_tagging_loss=0.01344, over 25000.00 frames. ], tot_loss[loss=0.01504, audio_tagging_loss=0.01504, over 4958566.94 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:14:42,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327400.0, ans=0.1 2023-12-22 01:14:49,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=327466.6666666667, ans=0.0 2023-12-22 01:14:50,217 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.361e+01 2.629e+01 2.758e+01 2.923e+01 4.200e+01, threshold=5.515e+01, percent-clipped=0.0 2023-12-22 01:14:57,752 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.29 vs. limit=22.5 2023-12-22 01:15:16,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=327600.0, ans=0.07 2023-12-22 01:15:19,805 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 01:15:21,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=327666.6666666667, ans=0.2 2023-12-22 01:15:28,972 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=9.802e-02 2023-12-22 01:15:29,686 INFO [train.py:886] (0/4) Epoch 11, batch 1500, loss[loss=0.01379, audio_tagging_loss=0.01379, over 25000.00 frames. ], tot_loss[loss=0.01499, audio_tagging_loss=0.01499, over 4959443.45 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:15:40,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=327800.0, ans=0.2 2023-12-22 01:15:48,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.36 vs. limit=15.0 2023-12-22 01:15:51,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327866.6666666667, ans=0.1 2023-12-22 01:16:04,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=327933.3333333333, ans=0.125 2023-12-22 01:16:21,548 INFO [train.py:886] (0/4) Epoch 11, batch 1550, loss[loss=0.01316, audio_tagging_loss=0.01316, over 24750.00 frames. ], tot_loss[loss=0.01507, audio_tagging_loss=0.01507, over 4950149.94 frames. ], batch size: 99, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:16:27,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=328066.6666666667, ans=0.2 2023-12-22 01:16:33,808 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 2.672e+01 2.837e+01 3.019e+01 3.569e+01, threshold=5.673e+01, percent-clipped=0.0 2023-12-22 01:16:37,530 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.95 vs. limit=15.0 2023-12-22 01:16:45,068 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2023-12-22 01:16:46,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=328200.0, ans=0.125 2023-12-22 01:16:55,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=328266.6666666667, ans=0.125 2023-12-22 01:17:05,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2023-12-22 01:17:08,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.58 vs. limit=10.0 2023-12-22 01:17:14,541 INFO [train.py:886] (0/4) Epoch 11, batch 1600, loss[loss=0.01567, audio_tagging_loss=0.01567, over 24750.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4942695.01 frames. ], batch size: 99, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:17:15,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=328400.0, ans=15.0 2023-12-22 01:17:20,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=328400.0, ans=0.0 2023-12-22 01:17:43,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=328533.3333333333, ans=0.125 2023-12-22 01:17:46,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.19 vs. limit=12.0 2023-12-22 01:17:50,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=328600.0, ans=0.5 2023-12-22 01:18:05,211 INFO [train.py:886] (0/4) Epoch 11, batch 1650, loss[loss=0.01429, audio_tagging_loss=0.01429, over 25000.00 frames. ], tot_loss[loss=0.01498, audio_tagging_loss=0.01498, over 4945767.28 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:18:05,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=328733.3333333333, ans=10.0 2023-12-22 01:18:18,583 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+01 2.598e+01 2.768e+01 2.940e+01 3.602e+01, threshold=5.536e+01, percent-clipped=0.0 2023-12-22 01:18:23,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=15.0 2023-12-22 01:18:25,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=328866.6666666667, ans=0.125 2023-12-22 01:18:35,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=328933.3333333333, ans=0.125 2023-12-22 01:18:45,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=328933.3333333333, ans=0.125 2023-12-22 01:18:49,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.05 vs. limit=15.0 2023-12-22 01:18:57,177 INFO [train.py:886] (0/4) Epoch 11, batch 1700, loss[loss=0.01202, audio_tagging_loss=0.01202, over 25000.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4939532.73 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:18:59,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=329066.6666666667, ans=0.1 2023-12-22 01:19:04,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=329066.6666666667, ans=0.0 2023-12-22 01:19:12,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=329133.3333333333, ans=0.1 2023-12-22 01:19:25,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2023-12-22 01:19:35,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=329266.6666666667, ans=0.0 2023-12-22 01:19:48,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=329400.0, ans=0.0 2023-12-22 01:19:49,207 INFO [train.py:886] (0/4) Epoch 11, batch 1750, loss[loss=0.01505, audio_tagging_loss=0.01505, over 25000.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4949065.47 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:20:01,330 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.366e+01 2.668e+01 2.806e+01 3.006e+01 3.774e+01, threshold=5.613e+01, percent-clipped=0.0 2023-12-22 01:20:08,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=329533.3333333333, ans=0.1 2023-12-22 01:20:33,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=329666.6666666667, ans=0.0 2023-12-22 01:20:40,637 INFO [train.py:886] (0/4) Epoch 11, batch 1800, loss[loss=0.01198, audio_tagging_loss=0.01198, over 25000.00 frames. ], tot_loss[loss=0.01486, audio_tagging_loss=0.01486, over 4952666.99 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:21:01,725 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.56 vs. limit=6.0 2023-12-22 01:21:03,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.97 vs. limit=15.0 2023-12-22 01:21:07,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=329866.6666666667, ans=0.1 2023-12-22 01:21:15,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=329933.3333333333, ans=0.125 2023-12-22 01:21:26,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=330000.0, ans=0.125 2023-12-22 01:21:33,103 INFO [train.py:886] (0/4) Epoch 11, batch 1850, loss[loss=0.017, audio_tagging_loss=0.017, over 24750.00 frames. ], tot_loss[loss=0.01491, audio_tagging_loss=0.01491, over 4959368.57 frames. ], batch size: 99, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:21:40,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=330066.6666666667, ans=0.125 2023-12-22 01:21:45,286 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+01 2.664e+01 2.766e+01 2.942e+01 3.434e+01, threshold=5.531e+01, percent-clipped=0.0 2023-12-22 01:21:46,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=330133.3333333333, ans=0.05 2023-12-22 01:21:52,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=330200.0, ans=0.2 2023-12-22 01:21:54,872 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 01:21:55,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2023-12-22 01:21:57,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2023-12-22 01:22:02,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=330200.0, ans=0.0 2023-12-22 01:22:04,076 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2023-12-22 01:22:24,764 INFO [train.py:886] (0/4) Epoch 11, batch 1900, loss[loss=0.01291, audio_tagging_loss=0.01291, over 24750.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 4958341.12 frames. ], batch size: 99, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:22:35,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.85 vs. limit=15.0 2023-12-22 01:22:48,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=330533.3333333333, ans=0.125 2023-12-22 01:23:16,940 INFO [train.py:886] (0/4) Epoch 11, batch 1950, loss[loss=0.01378, audio_tagging_loss=0.01378, over 25000.00 frames. ], tot_loss[loss=0.01508, audio_tagging_loss=0.01508, over 4957664.43 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:23:18,376 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2023-12-22 01:23:29,049 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 2.676e+01 2.806e+01 2.987e+01 3.356e+01, threshold=5.613e+01, percent-clipped=0.0 2023-12-22 01:23:29,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2023-12-22 01:23:39,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=330866.6666666667, ans=0.0 2023-12-22 01:24:06,988 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 01:24:09,404 INFO [train.py:886] (0/4) Epoch 11, batch 2000, loss[loss=0.01432, audio_tagging_loss=0.01432, over 25000.00 frames. ], tot_loss[loss=0.01494, audio_tagging_loss=0.01494, over 4959031.73 frames. ], batch size: 100, lr: 9.99e-03, grad_scale: 64.0 2023-12-22 01:24:16,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=331066.6666666667, ans=0.125 2023-12-22 01:24:18,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.68 vs. limit=22.5 2023-12-22 01:24:40,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.74 vs. limit=22.5 2023-12-22 01:24:46,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=331266.6666666667, ans=0.125 2023-12-22 01:24:56,867 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.285e-02 2023-12-22 01:25:00,493 INFO [train.py:886] (0/4) Epoch 11, batch 2050, loss[loss=0.0175, audio_tagging_loss=0.0175, over 25000.00 frames. ], tot_loss[loss=0.01491, audio_tagging_loss=0.01491, over 4959215.85 frames. ], batch size: 100, lr: 9.99e-03, grad_scale: 64.0 2023-12-22 01:25:13,295 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 2.569e+01 2.759e+01 2.903e+01 3.847e+01, threshold=5.517e+01, percent-clipped=0.0 2023-12-22 01:25:16,204 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=9.481e-02 2023-12-22 01:25:17,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=331466.6666666667, ans=0.125 2023-12-22 01:25:25,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=331533.3333333333, ans=0.0 2023-12-22 01:25:29,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=331533.3333333333, ans=0.125 2023-12-22 01:25:31,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=331600.0, ans=0.125 2023-12-22 01:25:36,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=331600.0, ans=0.125 2023-12-22 01:25:53,372 INFO [train.py:886] (0/4) Epoch 11, batch 2100, loss[loss=0.01121, audio_tagging_loss=0.01121, over 24750.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4953786.41 frames. ], batch size: 99, lr: 9.98e-03, grad_scale: 64.0 2023-12-22 01:26:04,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331800.0, ans=0.1 2023-12-22 01:26:05,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=331800.0, ans=0.0 2023-12-22 01:26:18,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=331866.6666666667, ans=0.125 2023-12-22 01:26:22,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=331866.6666666667, ans=0.2 2023-12-22 01:26:35,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=332000.0, ans=0.125 2023-12-22 01:26:36,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332000.0, ans=0.1 2023-12-22 01:26:40,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=15.0 2023-12-22 01:26:45,349 INFO [train.py:886] (0/4) Epoch 11, batch 2150, loss[loss=0.0137, audio_tagging_loss=0.0137, over 24750.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4953717.97 frames. ], batch size: 99, lr: 9.98e-03, grad_scale: 64.0 2023-12-22 01:26:51,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=332066.6666666667, ans=0.0 2023-12-22 01:26:57,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=332133.3333333333, ans=0.125 2023-12-22 01:26:58,040 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.407e+01 2.658e+01 2.791e+01 2.942e+01 3.654e+01, threshold=5.583e+01, percent-clipped=0.0 2023-12-22 01:27:00,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=332133.3333333333, ans=0.125 2023-12-22 01:27:03,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=332133.3333333333, ans=0.125 2023-12-22 01:27:05,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=332200.0, ans=0.07 2023-12-22 01:27:09,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=332200.0, ans=0.125 2023-12-22 01:27:15,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=12.0 2023-12-22 01:27:17,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=332266.6666666667, ans=0.125 2023-12-22 01:27:23,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=332266.6666666667, ans=0.0 2023-12-22 01:27:24,002 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.947e-02 2023-12-22 01:27:34,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=332333.3333333333, ans=0.0 2023-12-22 01:27:37,462 INFO [train.py:886] (0/4) Epoch 11, batch 2200, loss[loss=0.01342, audio_tagging_loss=0.01342, over 24750.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4947668.20 frames. ], batch size: 99, lr: 9.98e-03, grad_scale: 64.0 2023-12-22 01:27:37,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332400.0, ans=0.1 2023-12-22 01:27:37,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=332400.0, ans=0.0 2023-12-22 01:28:03,343 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=12.0 2023-12-22 01:28:18,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=332666.6666666667, ans=0.125 2023-12-22 01:28:29,515 INFO [train.py:886] (0/4) Epoch 11, batch 2250, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24750.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4942352.05 frames. ], batch size: 99, lr: 9.97e-03, grad_scale: 64.0 2023-12-22 01:28:41,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2023-12-22 01:28:41,536 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 2.628e+01 2.793e+01 2.962e+01 3.343e+01, threshold=5.585e+01, percent-clipped=0.0 2023-12-22 01:28:43,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332800.0, ans=0.1 2023-12-22 01:29:08,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=332933.3333333333, ans=0.1 2023-12-22 01:29:10,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-12-22 01:29:21,142 INFO [train.py:886] (0/4) Epoch 11, batch 2300, loss[loss=0.01479, audio_tagging_loss=0.01479, over 24750.00 frames. ], tot_loss[loss=0.01495, audio_tagging_loss=0.01495, over 4948776.49 frames. ], batch size: 99, lr: 9.97e-03, grad_scale: 64.0 2023-12-22 01:29:21,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=333066.6666666667, ans=0.125 2023-12-22 01:29:26,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=333066.6666666667, ans=0.125 2023-12-22 01:29:33,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=333133.3333333333, ans=0.0 2023-12-22 01:29:33,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.97 vs. limit=22.5 2023-12-22 01:29:50,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=333200.0, ans=0.125 2023-12-22 01:30:05,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=333333.3333333333, ans=0.0 2023-12-22 01:30:11,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=333333.3333333333, ans=0.125 2023-12-22 01:30:12,973 INFO [train.py:886] (0/4) Epoch 11, batch 2350, loss[loss=0.01306, audio_tagging_loss=0.01306, over 25000.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4951600.67 frames. ], batch size: 100, lr: 9.96e-03, grad_scale: 64.0 2023-12-22 01:30:15,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.46 vs. limit=6.0 2023-12-22 01:30:22,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=333466.6666666667, ans=0.035 2023-12-22 01:30:25,752 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 2.666e+01 2.800e+01 2.976e+01 3.525e+01, threshold=5.600e+01, percent-clipped=0.0 2023-12-22 01:30:28,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=333466.6666666667, ans=0.125 2023-12-22 01:30:51,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=333600.0, ans=0.125 2023-12-22 01:30:58,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=333666.6666666667, ans=0.125 2023-12-22 01:31:05,346 INFO [train.py:886] (0/4) Epoch 11, batch 2400, loss[loss=0.01365, audio_tagging_loss=0.01365, over 25000.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4953439.19 frames. ], batch size: 100, lr: 9.96e-03, grad_scale: 64.0 2023-12-22 01:31:13,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=333733.3333333333, ans=0.2 2023-12-22 01:31:22,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=333800.0, ans=0.09899494936611666 2023-12-22 01:31:26,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=333866.6666666667, ans=0.0 2023-12-22 01:31:29,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=333866.6666666667, ans=0.0 2023-12-22 01:31:35,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.98 vs. limit=10.0 2023-12-22 01:31:46,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2023-12-22 01:31:54,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=334000.0, ans=0.0 2023-12-22 01:31:56,639 INFO [train.py:886] (0/4) Epoch 11, batch 2450, loss[loss=0.01429, audio_tagging_loss=0.01429, over 25000.00 frames. ], tot_loss[loss=0.01494, audio_tagging_loss=0.01494, over 4959737.85 frames. ], batch size: 100, lr: 9.95e-03, grad_scale: 64.0 2023-12-22 01:31:58,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2023-12-22 01:32:07,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=334133.3333333333, ans=0.125 2023-12-22 01:32:09,952 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.647e+01 2.778e+01 2.930e+01 3.813e+01, threshold=5.556e+01, percent-clipped=0.0 2023-12-22 01:32:15,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=334133.3333333333, ans=0.125 2023-12-22 01:32:31,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.33 vs. limit=10.0 2023-12-22 01:32:40,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2023-12-22 01:32:45,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=334333.3333333333, ans=0.0 2023-12-22 01:32:49,605 INFO [train.py:886] (0/4) Epoch 11, batch 2500, loss[loss=0.01675, audio_tagging_loss=0.01675, over 24750.00 frames. ], tot_loss[loss=0.01504, audio_tagging_loss=0.01504, over 4951188.91 frames. ], batch size: 99, lr: 9.95e-03, grad_scale: 64.0 2023-12-22 01:32:53,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2023-12-22 01:33:14,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.52 vs. limit=6.0 2023-12-22 01:33:22,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=334600.0, ans=0.1 2023-12-22 01:33:23,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=334600.0, ans=0.125 2023-12-22 01:33:27,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=334600.0, ans=0.2 2023-12-22 01:33:28,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.12 vs. limit=6.0 2023-12-22 01:33:41,389 INFO [train.py:886] (0/4) Epoch 11, batch 2550, loss[loss=0.01451, audio_tagging_loss=0.01451, over 22819.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 4946451.70 frames. ], batch size: 107, lr: 9.94e-03, grad_scale: 64.0 2023-12-22 01:33:50,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.40 vs. limit=22.5 2023-12-22 01:33:54,292 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.329e+01 2.667e+01 2.808e+01 2.946e+01 3.351e+01, threshold=5.616e+01, percent-clipped=0.0 2023-12-22 01:33:54,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=334800.0, ans=0.125 2023-12-22 01:34:33,207 INFO [train.py:886] (0/4) Epoch 11, batch 2600, loss[loss=0.01378, audio_tagging_loss=0.01378, over 24750.00 frames. ], tot_loss[loss=0.01514, audio_tagging_loss=0.01514, over 4940850.07 frames. ], batch size: 99, lr: 9.94e-03, grad_scale: 64.0 2023-12-22 01:35:01,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=335200.0, ans=0.125 2023-12-22 01:35:24,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=335400.0, ans=0.0 2023-12-22 01:35:25,027 INFO [train.py:886] (0/4) Epoch 11, batch 2650, loss[loss=0.01082, audio_tagging_loss=0.01082, over 25000.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 4943408.19 frames. ], batch size: 100, lr: 9.93e-03, grad_scale: 64.0 2023-12-22 01:35:28,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=335400.0, ans=0.2 2023-12-22 01:35:36,557 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.354e+01 2.628e+01 2.801e+01 2.924e+01 4.214e+01, threshold=5.602e+01, percent-clipped=0.0 2023-12-22 01:36:15,997 INFO [train.py:886] (0/4) Epoch 11, batch 2700, loss[loss=0.01514, audio_tagging_loss=0.01514, over 25000.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4942865.66 frames. ], batch size: 100, lr: 9.93e-03, grad_scale: 128.0 2023-12-22 01:36:26,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=335800.0, ans=0.2 2023-12-22 01:36:30,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=335800.0, ans=0.125 2023-12-22 01:36:41,463 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.94 vs. limit=6.0 2023-12-22 01:36:42,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=335866.6666666667, ans=0.125 2023-12-22 01:37:00,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.15 vs. limit=10.0 2023-12-22 01:37:02,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=336000.0, ans=0.1 2023-12-22 01:37:03,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=336000.0, ans=0.125 2023-12-22 01:37:05,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=336000.0, ans=0.025 2023-12-22 01:37:08,146 INFO [train.py:886] (0/4) Epoch 11, batch 2750, loss[loss=0.01752, audio_tagging_loss=0.01752, over 25000.00 frames. ], tot_loss[loss=0.01491, audio_tagging_loss=0.01491, over 4946612.62 frames. ], batch size: 100, lr: 9.92e-03, grad_scale: 64.0 2023-12-22 01:37:09,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=336066.6666666667, ans=0.125 2023-12-22 01:37:14,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=336066.6666666667, ans=0.125 2023-12-22 01:37:18,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.90 vs. limit=22.5 2023-12-22 01:37:21,137 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.681e+01 2.814e+01 2.995e+01 3.460e+01, threshold=5.627e+01, percent-clipped=0.0 2023-12-22 01:37:27,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=336200.0, ans=0.1 2023-12-22 01:37:37,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=336200.0, ans=0.0 2023-12-22 01:37:39,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=336266.6666666667, ans=0.125 2023-12-22 01:37:44,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=336266.6666666667, ans=0.125 2023-12-22 01:37:50,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=336333.3333333333, ans=0.125 2023-12-22 01:37:54,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=336333.3333333333, ans=0.2 2023-12-22 01:37:59,150 INFO [train.py:886] (0/4) Epoch 11, batch 2800, loss[loss=0.01646, audio_tagging_loss=0.01646, over 24750.00 frames. ], tot_loss[loss=0.01494, audio_tagging_loss=0.01494, over 4946282.52 frames. ], batch size: 99, lr: 9.92e-03, grad_scale: 64.0 2023-12-22 01:38:06,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=15.0 2023-12-22 01:38:13,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=336466.6666666667, ans=0.125 2023-12-22 01:38:16,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.64 vs. limit=22.5 2023-12-22 01:38:19,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2023-12-22 01:38:32,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=336600.0, ans=0.125 2023-12-22 01:38:42,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=336666.6666666667, ans=0.2 2023-12-22 01:38:52,011 INFO [train.py:886] (0/4) Epoch 11, batch 2850, loss[loss=0.01711, audio_tagging_loss=0.01711, over 24750.00 frames. ], tot_loss[loss=0.01507, audio_tagging_loss=0.01507, over 4943339.54 frames. ], batch size: 99, lr: 9.91e-03, grad_scale: 64.0 2023-12-22 01:39:04,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=336800.0, ans=0.2 2023-12-22 01:39:05,710 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.677e+01 2.844e+01 3.022e+01 3.570e+01, threshold=5.688e+01, percent-clipped=0.0 2023-12-22 01:39:31,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=336933.3333333333, ans=0.0 2023-12-22 01:39:40,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=337000.0, ans=0.07 2023-12-22 01:39:45,149 INFO [train.py:886] (0/4) Epoch 11, batch 2900, loss[loss=0.01406, audio_tagging_loss=0.01406, over 25000.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4946618.25 frames. ], batch size: 100, lr: 9.91e-03, grad_scale: 64.0 2023-12-22 01:39:51,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=337066.6666666667, ans=0.125 2023-12-22 01:39:53,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=337066.6666666667, ans=0.125 2023-12-22 01:40:02,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.85 vs. limit=22.5 2023-12-22 01:40:11,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=337200.0, ans=0.0 2023-12-22 01:40:17,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=337266.6666666667, ans=0.0 2023-12-22 01:40:36,304 INFO [train.py:886] (0/4) Epoch 11, batch 2950, loss[loss=0.01698, audio_tagging_loss=0.01698, over 25000.00 frames. ], tot_loss[loss=0.01483, audio_tagging_loss=0.01483, over 4947831.78 frames. ], batch size: 100, lr: 9.90e-03, grad_scale: 64.0 2023-12-22 01:40:49,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337466.6666666667, ans=0.1 2023-12-22 01:40:50,447 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 2.642e+01 2.776e+01 2.943e+01 5.115e+01, threshold=5.551e+01, percent-clipped=0.0 2023-12-22 01:41:05,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=337533.3333333333, ans=22.5 2023-12-22 01:41:28,791 INFO [train.py:886] (0/4) Epoch 11, batch 3000, loss[loss=0.01292, audio_tagging_loss=0.01292, over 24750.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 4949529.77 frames. ], batch size: 99, lr: 9.90e-03, grad_scale: 64.0 2023-12-22 01:41:28,792 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 01:41:49,997 INFO [train.py:917] (0/4) Epoch 11, validation: loss=0.03489, audio_tagging_loss=0.03489, over 3737520.00 frames. 2023-12-22 01:41:49,998 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 01:41:52,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=337733.3333333333, ans=0.0 2023-12-22 01:41:53,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=337733.3333333333, ans=0.125 2023-12-22 01:41:56,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=337733.3333333333, ans=22.5 2023-12-22 01:42:09,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=337866.6666666667, ans=0.05 2023-12-22 01:42:10,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=337866.6666666667, ans=0.0 2023-12-22 01:42:19,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337933.3333333333, ans=0.1 2023-12-22 01:42:19,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=337933.3333333333, ans=0.0 2023-12-22 01:42:30,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=12.0 2023-12-22 01:42:38,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=338000.0, ans=0.1 2023-12-22 01:42:38,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=338000.0, ans=0.0 2023-12-22 01:42:41,458 INFO [train.py:886] (0/4) Epoch 11, batch 3050, loss[loss=0.01416, audio_tagging_loss=0.01416, over 25000.00 frames. ], tot_loss[loss=0.01464, audio_tagging_loss=0.01464, over 4948230.02 frames. ], batch size: 100, lr: 9.89e-03, grad_scale: 64.0 2023-12-22 01:42:48,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=338066.6666666667, ans=0.0 2023-12-22 01:42:55,340 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.694e+01 2.782e+01 2.959e+01 3.737e+01, threshold=5.564e+01, percent-clipped=0.0 2023-12-22 01:42:59,543 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.17 vs. limit=6.0 2023-12-22 01:43:02,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=338200.0, ans=0.0 2023-12-22 01:43:08,550 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 01:43:33,634 INFO [train.py:886] (0/4) Epoch 11, batch 3100, loss[loss=0.01701, audio_tagging_loss=0.01701, over 24750.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4949839.47 frames. ], batch size: 99, lr: 9.89e-03, grad_scale: 64.0 2023-12-22 01:43:37,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=338400.0, ans=15.0 2023-12-22 01:43:46,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=338466.6666666667, ans=0.125 2023-12-22 01:43:50,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.60 vs. limit=22.5 2023-12-22 01:43:57,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=338533.3333333333, ans=0.125 2023-12-22 01:44:09,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=338600.0, ans=0.125 2023-12-22 01:44:11,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=338600.0, ans=0.0 2023-12-22 01:44:11,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=6.0 2023-12-22 01:44:24,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=338733.3333333333, ans=0.125 2023-12-22 01:44:24,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-12-22 01:44:24,958 INFO [train.py:886] (0/4) Epoch 11, batch 3150, loss[loss=0.01551, audio_tagging_loss=0.01551, over 24750.00 frames. ], tot_loss[loss=0.01501, audio_tagging_loss=0.01501, over 4948085.93 frames. ], batch size: 99, lr: 9.88e-03, grad_scale: 64.0 2023-12-22 01:44:25,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2023-12-22 01:44:38,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=338800.0, ans=0.0 2023-12-22 01:44:38,755 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.475e+01 2.710e+01 2.835e+01 3.003e+01 4.249e+01, threshold=5.670e+01, percent-clipped=0.0 2023-12-22 01:44:48,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=338866.6666666667, ans=0.025 2023-12-22 01:45:13,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=339000.0, ans=0.125 2023-12-22 01:45:16,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=339066.6666666667, ans=0.2 2023-12-22 01:45:16,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=339066.6666666667, ans=10.0 2023-12-22 01:45:17,428 INFO [train.py:886] (0/4) Epoch 11, batch 3200, loss[loss=0.01383, audio_tagging_loss=0.01383, over 25000.00 frames. ], tot_loss[loss=0.01504, audio_tagging_loss=0.01504, over 4943421.13 frames. ], batch size: 100, lr: 9.88e-03, grad_scale: 64.0 2023-12-22 01:45:18,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.01 vs. limit=15.0 2023-12-22 01:45:32,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=339133.3333333333, ans=0.09899494936611666 2023-12-22 01:46:06,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=339333.3333333333, ans=0.04949747468305833 2023-12-22 01:46:09,463 INFO [train.py:886] (0/4) Epoch 11, batch 3250, loss[loss=0.0162, audio_tagging_loss=0.0162, over 22346.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4942809.13 frames. ], batch size: 107, lr: 9.87e-03, grad_scale: 64.0 2023-12-22 01:46:13,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=339400.0, ans=0.1 2023-12-22 01:46:13,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=12.0 2023-12-22 01:46:23,060 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.400e+01 2.628e+01 2.776e+01 2.963e+01 3.661e+01, threshold=5.553e+01, percent-clipped=0.0 2023-12-22 01:46:29,185 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.78 vs. limit=22.5 2023-12-22 01:46:35,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=339533.3333333333, ans=0.1 2023-12-22 01:46:36,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=339533.3333333333, ans=0.125 2023-12-22 01:46:45,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=339600.0, ans=0.0 2023-12-22 01:46:45,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=339600.0, ans=0.025 2023-12-22 01:46:53,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.12 vs. limit=22.5 2023-12-22 01:47:01,036 INFO [train.py:886] (0/4) Epoch 11, batch 3300, loss[loss=0.01722, audio_tagging_loss=0.01722, over 25000.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4950341.00 frames. ], batch size: 100, lr: 9.87e-03, grad_scale: 64.0 2023-12-22 01:47:04,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.09 vs. limit=15.0 2023-12-22 01:47:07,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=339733.3333333333, ans=0.0 2023-12-22 01:47:16,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.01 vs. limit=10.0 2023-12-22 01:47:28,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.39 vs. limit=8.0 2023-12-22 01:47:33,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=339933.3333333333, ans=0.0 2023-12-22 01:47:41,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.80 vs. limit=15.0 2023-12-22 01:47:46,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=340000.0, ans=0.125 2023-12-22 01:47:52,829 INFO [train.py:886] (0/4) Epoch 11, batch 3350, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24040.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4955021.05 frames. ], batch size: 100, lr: 9.86e-03, grad_scale: 64.0 2023-12-22 01:48:03,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=340133.3333333333, ans=0.2 2023-12-22 01:48:04,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=12.0 2023-12-22 01:48:06,416 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 2.626e+01 2.784e+01 2.981e+01 3.630e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-22 01:48:12,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=340133.3333333333, ans=0.0 2023-12-22 01:48:21,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=340200.0, ans=0.125 2023-12-22 01:48:45,043 INFO [train.py:886] (0/4) Epoch 11, batch 3400, loss[loss=0.01517, audio_tagging_loss=0.01517, over 25000.00 frames. ], tot_loss[loss=0.01501, audio_tagging_loss=0.01501, over 4957001.78 frames. ], batch size: 100, lr: 9.86e-03, grad_scale: 64.0 2023-12-22 01:48:48,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=340400.0, ans=0.05 2023-12-22 01:49:03,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=340466.6666666667, ans=0.125 2023-12-22 01:49:11,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.68 vs. limit=22.5 2023-12-22 01:49:19,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=340600.0, ans=0.1 2023-12-22 01:49:36,292 INFO [train.py:886] (0/4) Epoch 11, batch 3450, loss[loss=0.01464, audio_tagging_loss=0.01464, over 24750.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 4953294.20 frames. ], batch size: 99, lr: 9.86e-03, grad_scale: 64.0 2023-12-22 01:49:37,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=12.0 2023-12-22 01:49:48,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-12-22 01:49:49,460 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.658e+01 2.786e+01 2.914e+01 3.460e+01, threshold=5.572e+01, percent-clipped=0.0 2023-12-22 01:49:50,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2023-12-22 01:50:05,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=340866.6666666667, ans=0.125 2023-12-22 01:50:10,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=340933.3333333333, ans=0.2 2023-12-22 01:50:11,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=340933.3333333333, ans=0.1 2023-12-22 01:50:17,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2023-12-22 01:50:25,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=341000.0, ans=0.0 2023-12-22 01:50:27,736 INFO [train.py:886] (0/4) Epoch 11, batch 3500, loss[loss=0.01362, audio_tagging_loss=0.01362, over 24750.00 frames. ], tot_loss[loss=0.01508, audio_tagging_loss=0.01508, over 4944270.06 frames. ], batch size: 99, lr: 9.85e-03, grad_scale: 64.0 2023-12-22 01:50:50,164 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.704e-02 2023-12-22 01:50:55,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=341200.0, ans=0.125 2023-12-22 01:50:58,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=341266.6666666667, ans=0.125 2023-12-22 01:51:05,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=341266.6666666667, ans=0.125 2023-12-22 01:51:07,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=341266.6666666667, ans=0.0 2023-12-22 01:51:10,510 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.100e-01 2023-12-22 01:51:14,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=341333.3333333333, ans=0.0 2023-12-22 01:51:14,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.63 vs. limit=15.0 2023-12-22 01:51:18,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=341333.3333333333, ans=0.0 2023-12-22 01:51:20,406 INFO [train.py:886] (0/4) Epoch 11, batch 3550, loss[loss=0.01533, audio_tagging_loss=0.01533, over 25000.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4944625.68 frames. ], batch size: 100, lr: 9.85e-03, grad_scale: 64.0 2023-12-22 01:51:21,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=341400.0, ans=0.1 2023-12-22 01:51:26,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.64 vs. limit=6.0 2023-12-22 01:51:31,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=341466.6666666667, ans=0.1 2023-12-22 01:51:33,441 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 2.655e+01 2.795e+01 3.045e+01 3.842e+01, threshold=5.591e+01, percent-clipped=0.0 2023-12-22 01:51:50,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=341600.0, ans=0.0 2023-12-22 01:52:00,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=341600.0, ans=0.0 2023-12-22 01:52:06,057 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2023-12-22 01:52:09,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=341666.6666666667, ans=0.125 2023-12-22 01:52:12,042 INFO [train.py:886] (0/4) Epoch 11, batch 3600, loss[loss=0.01564, audio_tagging_loss=0.01564, over 25000.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4949783.27 frames. ], batch size: 100, lr: 9.84e-03, grad_scale: 64.0 2023-12-22 01:52:12,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=341733.3333333333, ans=0.1 2023-12-22 01:52:19,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=341733.3333333333, ans=0.1 2023-12-22 01:52:20,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=341733.3333333333, ans=0.95 2023-12-22 01:52:24,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=341800.0, ans=0.125 2023-12-22 01:52:35,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=6.0 2023-12-22 01:52:40,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=341866.6666666667, ans=0.2 2023-12-22 01:53:02,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=342000.0, ans=0.125 2023-12-22 01:53:02,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=342000.0, ans=0.1 2023-12-22 01:53:03,697 INFO [train.py:886] (0/4) Epoch 11, batch 3650, loss[loss=0.01695, audio_tagging_loss=0.01695, over 25000.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4954368.37 frames. ], batch size: 100, lr: 9.84e-03, grad_scale: 64.0 2023-12-22 01:53:06,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=342066.6666666667, ans=0.0 2023-12-22 01:53:13,675 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2023-12-22 01:53:17,502 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 2.641e+01 2.779e+01 2.904e+01 3.423e+01, threshold=5.558e+01, percent-clipped=0.0 2023-12-22 01:53:27,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2023-12-22 01:53:33,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=342200.0, ans=0.125 2023-12-22 01:53:41,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.33 vs. limit=22.5 2023-12-22 01:53:55,474 INFO [train.py:886] (0/4) Epoch 11, batch 3700, loss[loss=0.01385, audio_tagging_loss=0.01385, over 25000.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4957238.87 frames. ], batch size: 100, lr: 9.83e-03, grad_scale: 64.0 2023-12-22 01:53:56,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=342400.0, ans=0.0 2023-12-22 01:54:13,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342466.6666666667, ans=0.1 2023-12-22 01:54:37,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=342666.6666666667, ans=0.125 2023-12-22 01:54:39,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=342666.6666666667, ans=0.0 2023-12-22 01:54:41,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=342666.6666666667, ans=0.125 2023-12-22 01:54:48,090 INFO [train.py:886] (0/4) Epoch 11, batch 3750, loss[loss=0.01386, audio_tagging_loss=0.01386, over 24750.00 frames. ], tot_loss[loss=0.01495, audio_tagging_loss=0.01495, over 4957697.34 frames. ], batch size: 99, lr: 9.83e-03, grad_scale: 64.0 2023-12-22 01:54:59,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=342800.0, ans=0.0 2023-12-22 01:55:01,195 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.410e+01 2.750e+01 2.894e+01 3.049e+01 3.560e+01, threshold=5.788e+01, percent-clipped=0.0 2023-12-22 01:55:12,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=342866.6666666667, ans=0.0 2023-12-22 01:55:13,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=24.13 vs. limit=22.5 2023-12-22 01:55:23,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.55 vs. limit=15.0 2023-12-22 01:55:26,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=342933.3333333333, ans=0.0 2023-12-22 01:55:30,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=343000.0, ans=0.125 2023-12-22 01:55:33,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=343000.0, ans=0.5 2023-12-22 01:55:35,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=343000.0, ans=0.07 2023-12-22 01:55:39,607 INFO [train.py:886] (0/4) Epoch 11, batch 3800, loss[loss=0.01408, audio_tagging_loss=0.01408, over 24750.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4952408.62 frames. ], batch size: 99, lr: 9.82e-03, grad_scale: 64.0 2023-12-22 01:55:43,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.09 vs. limit=15.0 2023-12-22 01:55:47,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=343066.6666666667, ans=0.5 2023-12-22 01:55:51,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=343133.3333333333, ans=0.125 2023-12-22 01:55:59,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=343200.0, ans=0.5 2023-12-22 01:56:14,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.37 vs. limit=22.5 2023-12-22 01:56:31,182 INFO [train.py:886] (0/4) Epoch 11, batch 3850, loss[loss=0.01644, audio_tagging_loss=0.01644, over 22306.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4950331.79 frames. ], batch size: 107, lr: 9.82e-03, grad_scale: 64.0 2023-12-22 01:56:44,873 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 2.640e+01 2.758e+01 2.966e+01 3.475e+01, threshold=5.515e+01, percent-clipped=0.0 2023-12-22 01:57:00,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=12.0 2023-12-22 01:57:12,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=343666.6666666667, ans=0.0 2023-12-22 01:57:23,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.82 vs. limit=22.5 2023-12-22 01:57:23,759 INFO [train.py:886] (0/4) Epoch 11, batch 3900, loss[loss=0.01649, audio_tagging_loss=0.01649, over 24750.00 frames. ], tot_loss[loss=0.01483, audio_tagging_loss=0.01483, over 4949334.62 frames. ], batch size: 99, lr: 9.81e-03, grad_scale: 64.0 2023-12-22 01:57:28,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2023-12-22 01:57:31,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.63 vs. limit=22.5 2023-12-22 01:57:31,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=343733.3333333333, ans=0.09899494936611666 2023-12-22 01:57:34,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2023-12-22 01:57:56,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=343933.3333333333, ans=0.0 2023-12-22 01:58:09,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=344000.0, ans=0.1 2023-12-22 01:58:09,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=344000.0, ans=0.125 2023-12-22 01:58:15,467 INFO [train.py:886] (0/4) Epoch 11, batch 3950, loss[loss=0.01549, audio_tagging_loss=0.01549, over 25000.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4953921.67 frames. ], batch size: 100, lr: 9.81e-03, grad_scale: 64.0 2023-12-22 01:58:22,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.45 vs. limit=10.0 2023-12-22 01:58:25,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=344133.3333333333, ans=0.0 2023-12-22 01:58:29,726 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.678e+01 2.766e+01 2.913e+01 3.341e+01, threshold=5.531e+01, percent-clipped=0.0 2023-12-22 01:58:48,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=344266.6666666667, ans=0.0 2023-12-22 01:58:54,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=344266.6666666667, ans=0.1 2023-12-22 01:58:58,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=344333.3333333333, ans=0.125 2023-12-22 01:59:08,117 INFO [train.py:886] (0/4) Epoch 11, batch 4000, loss[loss=0.0159, audio_tagging_loss=0.0159, over 24750.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4955875.96 frames. ], batch size: 99, lr: 9.80e-03, grad_scale: 64.0 2023-12-22 01:59:14,750 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.122e-02 2023-12-22 01:59:26,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=344466.6666666667, ans=0.0 2023-12-22 01:59:38,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=344600.0, ans=0.125 2023-12-22 02:00:00,085 INFO [train.py:886] (0/4) Epoch 11, batch 4050, loss[loss=0.01505, audio_tagging_loss=0.01505, over 24750.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4957072.68 frames. ], batch size: 99, lr: 9.80e-03, grad_scale: 64.0 2023-12-22 02:00:13,199 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.657e+01 2.838e+01 3.013e+01 3.411e+01, threshold=5.677e+01, percent-clipped=0.0 2023-12-22 02:00:17,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=344800.0, ans=0.125 2023-12-22 02:00:22,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=344866.6666666667, ans=0.125 2023-12-22 02:00:22,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=344866.6666666667, ans=0.1 2023-12-22 02:00:51,604 INFO [train.py:886] (0/4) Epoch 11, batch 4100, loss[loss=0.01587, audio_tagging_loss=0.01587, over 24750.00 frames. ], tot_loss[loss=0.01513, audio_tagging_loss=0.01513, over 4952315.04 frames. ], batch size: 99, lr: 9.79e-03, grad_scale: 64.0 2023-12-22 02:00:53,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=345066.6666666667, ans=12.0 2023-12-22 02:01:20,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=345200.0, ans=0.1 2023-12-22 02:01:28,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=345266.6666666667, ans=0.125 2023-12-22 02:01:33,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=345333.3333333333, ans=0.035 2023-12-22 02:01:43,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=345400.0, ans=0.1 2023-12-22 02:01:44,237 INFO [train.py:886] (0/4) Epoch 11, batch 4150, loss[loss=0.01306, audio_tagging_loss=0.01306, over 24750.00 frames. ], tot_loss[loss=0.01505, audio_tagging_loss=0.01505, over 4950864.12 frames. ], batch size: 99, lr: 9.79e-03, grad_scale: 64.0 2023-12-22 02:01:48,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=345400.0, ans=0.04949747468305833 2023-12-22 02:01:53,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=345466.6666666667, ans=0.025 2023-12-22 02:01:55,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=345466.6666666667, ans=0.125 2023-12-22 02:01:57,136 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.334e+01 2.652e+01 2.805e+01 3.053e+01 3.563e+01, threshold=5.610e+01, percent-clipped=0.0 2023-12-22 02:02:04,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=345533.3333333333, ans=0.0 2023-12-22 02:02:20,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=345600.0, ans=0.125 2023-12-22 02:02:35,109 INFO [train.py:886] (0/4) Epoch 11, batch 4200, loss[loss=0.01489, audio_tagging_loss=0.01489, over 25000.00 frames. ], tot_loss[loss=0.01499, audio_tagging_loss=0.01499, over 4949878.60 frames. ], batch size: 100, lr: 9.79e-03, grad_scale: 64.0 2023-12-22 02:02:39,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=345733.3333333333, ans=0.125 2023-12-22 02:03:11,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=345933.3333333333, ans=0.125 2023-12-22 02:03:27,976 INFO [train.py:886] (0/4) Epoch 11, batch 4250, loss[loss=0.01528, audio_tagging_loss=0.01528, over 25000.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4954934.55 frames. ], batch size: 100, lr: 9.78e-03, grad_scale: 64.0 2023-12-22 02:03:35,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=346066.6666666667, ans=0.1 2023-12-22 02:03:40,220 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.357e+01 2.635e+01 2.806e+01 3.008e+01 3.451e+01, threshold=5.611e+01, percent-clipped=0.0 2023-12-22 02:03:45,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=346133.3333333333, ans=0.1 2023-12-22 02:03:48,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=346200.0, ans=0.0 2023-12-22 02:04:10,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.59 vs. limit=22.5 2023-12-22 02:04:15,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=346333.3333333333, ans=0.125 2023-12-22 02:04:17,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=346400.0, ans=0.2 2023-12-22 02:04:18,632 INFO [train.py:886] (0/4) Epoch 11, batch 4300, loss[loss=0.01612, audio_tagging_loss=0.01612, over 25000.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4961207.03 frames. ], batch size: 100, lr: 9.78e-03, grad_scale: 64.0 2023-12-22 02:04:21,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2023-12-22 02:04:22,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=346400.0, ans=0.125 2023-12-22 02:04:45,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=346533.3333333333, ans=0.5 2023-12-22 02:04:47,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=346533.3333333333, ans=0.1 2023-12-22 02:04:59,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2023-12-22 02:04:59,706 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-52000.pt 2023-12-22 02:05:13,317 INFO [train.py:886] (0/4) Epoch 11, batch 4350, loss[loss=0.01511, audio_tagging_loss=0.01511, over 25000.00 frames. ], tot_loss[loss=0.01498, audio_tagging_loss=0.01498, over 4962863.11 frames. ], batch size: 100, lr: 9.77e-03, grad_scale: 64.0 2023-12-22 02:05:25,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=346800.0, ans=0.125 2023-12-22 02:05:26,215 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 2.594e+01 2.762e+01 2.910e+01 3.539e+01, threshold=5.524e+01, percent-clipped=0.0 2023-12-22 02:05:58,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=347000.0, ans=0.1 2023-12-22 02:06:00,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=12.0 2023-12-22 02:06:04,906 INFO [train.py:886] (0/4) Epoch 11, batch 4400, loss[loss=0.01519, audio_tagging_loss=0.01519, over 24036.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4955309.53 frames. ], batch size: 100, lr: 9.77e-03, grad_scale: 64.0 2023-12-22 02:06:10,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=347066.6666666667, ans=0.1 2023-12-22 02:06:14,972 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.561e-01 2023-12-22 02:06:25,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=347200.0, ans=0.125 2023-12-22 02:06:35,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=347266.6666666667, ans=0.125 2023-12-22 02:06:42,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=347266.6666666667, ans=0.125 2023-12-22 02:06:51,458 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.76 vs. limit=22.5 2023-12-22 02:06:57,386 INFO [train.py:886] (0/4) Epoch 11, batch 4450, loss[loss=0.01548, audio_tagging_loss=0.01548, over 25000.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4951458.58 frames. ], batch size: 100, lr: 9.76e-03, grad_scale: 64.0 2023-12-22 02:07:02,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.99 vs. limit=12.0 2023-12-22 02:07:10,412 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.427e+01 2.637e+01 2.813e+01 2.949e+01 3.771e+01, threshold=5.626e+01, percent-clipped=0.0 2023-12-22 02:07:13,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=347466.6666666667, ans=0.0 2023-12-22 02:07:15,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2023-12-22 02:07:19,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2023-12-22 02:07:26,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=347533.3333333333, ans=0.0 2023-12-22 02:07:30,132 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.17 vs. limit=12.0 2023-12-22 02:07:49,030 INFO [train.py:886] (0/4) Epoch 11, batch 4500, loss[loss=0.01377, audio_tagging_loss=0.01377, over 25000.00 frames. ], tot_loss[loss=0.01507, audio_tagging_loss=0.01507, over 4950498.28 frames. ], batch size: 100, lr: 9.76e-03, grad_scale: 64.0 2023-12-22 02:08:02,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=347800.0, ans=0.1 2023-12-22 02:08:15,865 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2023-12-22 02:08:41,327 INFO [train.py:886] (0/4) Epoch 11, batch 4550, loss[loss=0.01676, audio_tagging_loss=0.01676, over 25000.00 frames. ], tot_loss[loss=0.01504, audio_tagging_loss=0.01504, over 4952260.74 frames. ], batch size: 100, lr: 9.75e-03, grad_scale: 64.0 2023-12-22 02:08:49,844 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:08:55,124 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.595e+01 2.747e+01 2.922e+01 3.602e+01, threshold=5.493e+01, percent-clipped=0.0 2023-12-22 02:09:08,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=348200.0, ans=0.125 2023-12-22 02:09:10,662 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.64 vs. limit=22.5 2023-12-22 02:09:33,445 INFO [train.py:886] (0/4) Epoch 11, batch 4600, loss[loss=0.01468, audio_tagging_loss=0.01468, over 25000.00 frames. ], tot_loss[loss=0.01501, audio_tagging_loss=0.01501, over 4951082.27 frames. ], batch size: 100, lr: 9.75e-03, grad_scale: 64.0 2023-12-22 02:10:09,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=348600.0, ans=0.0 2023-12-22 02:10:14,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=348666.6666666667, ans=0.0 2023-12-22 02:10:21,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=348666.6666666667, ans=0.0 2023-12-22 02:10:25,540 INFO [train.py:886] (0/4) Epoch 11, batch 4650, loss[loss=0.01597, audio_tagging_loss=0.01597, over 25000.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4951140.04 frames. ], batch size: 100, lr: 9.74e-03, grad_scale: 64.0 2023-12-22 02:10:34,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=348733.3333333333, ans=0.125 2023-12-22 02:10:35,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=348800.0, ans=0.05 2023-12-22 02:10:36,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=348800.0, ans=0.125 2023-12-22 02:10:39,504 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.626e+01 2.815e+01 2.915e+01 3.579e+01, threshold=5.630e+01, percent-clipped=0.0 2023-12-22 02:10:48,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=348866.6666666667, ans=0.125 2023-12-22 02:11:07,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=349000.0, ans=0.125 2023-12-22 02:11:17,318 INFO [train.py:886] (0/4) Epoch 11, batch 4700, loss[loss=0.0157, audio_tagging_loss=0.0157, over 24750.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 4951152.15 frames. ], batch size: 99, lr: 9.74e-03, grad_scale: 64.0 2023-12-22 02:11:27,552 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2023-12-22 02:11:30,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2023-12-22 02:11:36,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=349200.0, ans=0.0 2023-12-22 02:11:45,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=349266.6666666667, ans=0.125 2023-12-22 02:11:45,500 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2023-12-22 02:11:46,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=349266.6666666667, ans=0.0 2023-12-22 02:11:51,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.99 vs. limit=15.0 2023-12-22 02:12:04,806 INFO [train.py:886] (0/4) Epoch 11, batch 4750, loss[loss=0.0164, audio_tagging_loss=0.0164, over 24750.00 frames. ], tot_loss[loss=0.01528, audio_tagging_loss=0.01528, over 4948334.43 frames. ], batch size: 99, lr: 9.73e-03, grad_scale: 64.0 2023-12-22 02:12:17,671 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.414e+01 2.685e+01 2.810e+01 2.973e+01 3.471e+01, threshold=5.619e+01, percent-clipped=0.0 2023-12-22 02:12:20,271 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-11.pt 2023-12-22 02:12:40,994 INFO [train.py:886] (0/4) Epoch 12, batch 0, loss[loss=0.03163, audio_tagging_loss=0.03163, over 24033.00 frames. ], tot_loss[loss=0.03163, audio_tagging_loss=0.03163, over 24033.00 frames. ], batch size: 100, lr: 9.32e-03, grad_scale: 64.0 2023-12-22 02:12:40,995 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 02:12:54,772 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([0.9826, 2.0839, 2.6829, 2.8098], device='cuda:0') 2023-12-22 02:13:02,308 INFO [train.py:917] (0/4) Epoch 12, validation: loss=0.03393, audio_tagging_loss=0.03393, over 3737520.00 frames. 2023-12-22 02:13:02,308 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 02:13:14,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=349573.3333333333, ans=0.0 2023-12-22 02:13:24,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=349640.0, ans=0.09899494936611666 2023-12-22 02:13:26,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=349640.0, ans=0.125 2023-12-22 02:13:33,353 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=18.38 vs. limit=15.0 2023-12-22 02:13:39,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=349706.6666666667, ans=0.2 2023-12-22 02:13:53,167 INFO [train.py:886] (0/4) Epoch 12, batch 50, loss[loss=0.02233, audio_tagging_loss=0.02233, over 25000.00 frames. ], tot_loss[loss=0.02354, audio_tagging_loss=0.02354, over 1119180.39 frames. ], batch size: 100, lr: 9.32e-03, grad_scale: 64.0 2023-12-22 02:13:57,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=349840.0, ans=0.1 2023-12-22 02:13:59,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=349840.0, ans=0.125 2023-12-22 02:14:05,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=349906.6666666667, ans=0.125 2023-12-22 02:14:08,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.85 vs. limit=10.0 2023-12-22 02:14:23,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=350040.0, ans=0.1 2023-12-22 02:14:24,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=350040.0, ans=0.0 2023-12-22 02:14:40,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=350106.6666666667, ans=0.125 2023-12-22 02:14:42,744 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.586e+01 3.132e+01 3.484e+01 4.021e+01 8.947e+01, threshold=6.968e+01, percent-clipped=8.0 2023-12-22 02:14:45,340 INFO [train.py:886] (0/4) Epoch 12, batch 100, loss[loss=0.0187, audio_tagging_loss=0.0187, over 25000.00 frames. ], tot_loss[loss=0.02047, audio_tagging_loss=0.02047, over 1972341.33 frames. ], batch size: 100, lr: 9.32e-03, grad_scale: 64.0 2023-12-22 02:14:47,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=350173.3333333333, ans=0.5 2023-12-22 02:15:28,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2023-12-22 02:15:28,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=350440.0, ans=0.1 2023-12-22 02:15:36,310 INFO [train.py:886] (0/4) Epoch 12, batch 150, loss[loss=0.01316, audio_tagging_loss=0.01316, over 24064.00 frames. ], tot_loss[loss=0.01855, audio_tagging_loss=0.01855, over 2638360.13 frames. ], batch size: 100, lr: 9.31e-03, grad_scale: 64.0 2023-12-22 02:16:01,995 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=6.0 2023-12-22 02:16:10,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=350706.6666666667, ans=0.125 2023-12-22 02:16:24,965 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=15.0 2023-12-22 02:16:27,248 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.398e+01 2.700e+01 2.881e+01 3.001e+01 3.518e+01, threshold=5.761e+01, percent-clipped=0.0 2023-12-22 02:16:29,888 INFO [train.py:886] (0/4) Epoch 12, batch 200, loss[loss=0.01436, audio_tagging_loss=0.01436, over 24005.00 frames. ], tot_loss[loss=0.01754, audio_tagging_loss=0.01754, over 3151444.14 frames. ], batch size: 100, lr: 9.31e-03, grad_scale: 64.0 2023-12-22 02:16:36,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=350840.0, ans=0.125 2023-12-22 02:16:51,605 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-12-22 02:17:20,967 INFO [train.py:886] (0/4) Epoch 12, batch 250, loss[loss=0.01636, audio_tagging_loss=0.01636, over 25000.00 frames. ], tot_loss[loss=0.01681, audio_tagging_loss=0.01681, over 3551790.84 frames. ], batch size: 100, lr: 9.30e-03, grad_scale: 64.0 2023-12-22 02:17:23,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=351173.3333333333, ans=0.0 2023-12-22 02:17:52,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=351373.3333333333, ans=0.0 2023-12-22 02:17:52,167 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.019e-01 2023-12-22 02:17:59,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=351373.3333333333, ans=0.125 2023-12-22 02:18:00,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=351373.3333333333, ans=0.125 2023-12-22 02:18:10,375 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.670e+01 2.789e+01 2.930e+01 3.431e+01, threshold=5.578e+01, percent-clipped=0.0 2023-12-22 02:18:10,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=351440.0, ans=0.0 2023-12-22 02:18:12,281 INFO [train.py:886] (0/4) Epoch 12, batch 300, loss[loss=0.01683, audio_tagging_loss=0.01683, over 25000.00 frames. ], tot_loss[loss=0.01633, audio_tagging_loss=0.01633, over 3860206.55 frames. ], batch size: 100, lr: 9.30e-03, grad_scale: 64.0 2023-12-22 02:18:14,370 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:18:22,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=351573.3333333333, ans=0.125 2023-12-22 02:18:48,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=351706.6666666667, ans=0.125 2023-12-22 02:18:55,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=351773.3333333333, ans=0.125 2023-12-22 02:19:03,709 INFO [train.py:886] (0/4) Epoch 12, batch 350, loss[loss=0.01362, audio_tagging_loss=0.01362, over 24750.00 frames. ], tot_loss[loss=0.01595, audio_tagging_loss=0.01595, over 4102826.30 frames. ], batch size: 99, lr: 9.29e-03, grad_scale: 64.0 2023-12-22 02:19:40,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.40 vs. limit=15.0 2023-12-22 02:19:43,754 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.37 vs. limit=10.0 2023-12-22 02:19:52,869 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.603e+01 2.805e+01 2.915e+01 3.693e+01, threshold=5.611e+01, percent-clipped=0.0 2023-12-22 02:19:53,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=352106.6666666667, ans=0.2 2023-12-22 02:19:55,522 INFO [train.py:886] (0/4) Epoch 12, batch 400, loss[loss=0.01459, audio_tagging_loss=0.01459, over 24750.00 frames. ], tot_loss[loss=0.01561, audio_tagging_loss=0.01561, over 4289898.47 frames. ], batch size: 99, lr: 9.29e-03, grad_scale: 64.0 2023-12-22 02:20:38,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=352440.0, ans=0.0 2023-12-22 02:20:39,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352440.0, ans=0.1 2023-12-22 02:20:47,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=352506.6666666667, ans=0.125 2023-12-22 02:20:48,156 INFO [train.py:886] (0/4) Epoch 12, batch 450, loss[loss=0.01643, audio_tagging_loss=0.01643, over 22269.00 frames. ], tot_loss[loss=0.01533, audio_tagging_loss=0.01533, over 4432595.21 frames. ], batch size: 107, lr: 9.29e-03, grad_scale: 64.0 2023-12-22 02:21:02,519 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:21:04,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.04 vs. limit=15.0 2023-12-22 02:21:14,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2023-12-22 02:21:27,662 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2023-12-22 02:21:37,260 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.343e+01 2.603e+01 2.721e+01 2.857e+01 3.643e+01, threshold=5.441e+01, percent-clipped=0.0 2023-12-22 02:21:39,860 INFO [train.py:886] (0/4) Epoch 12, batch 500, loss[loss=0.01244, audio_tagging_loss=0.01244, over 25000.00 frames. ], tot_loss[loss=0.01533, audio_tagging_loss=0.01533, over 4549128.54 frames. ], batch size: 100, lr: 9.28e-03, grad_scale: 64.0 2023-12-22 02:21:51,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=352906.6666666667, ans=0.0 2023-12-22 02:21:57,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352906.6666666667, ans=0.1 2023-12-22 02:22:01,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2023-12-22 02:22:06,848 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.068e-02 2023-12-22 02:22:31,414 INFO [train.py:886] (0/4) Epoch 12, batch 550, loss[loss=0.01427, audio_tagging_loss=0.01427, over 25000.00 frames. ], tot_loss[loss=0.01533, audio_tagging_loss=0.01533, over 4643565.58 frames. ], batch size: 100, lr: 9.28e-03, grad_scale: 64.0 2023-12-22 02:22:37,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=353173.3333333333, ans=0.125 2023-12-22 02:22:44,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=353240.0, ans=0.125 2023-12-22 02:22:52,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=353306.6666666667, ans=0.2 2023-12-22 02:22:58,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=353306.6666666667, ans=0.125 2023-12-22 02:22:58,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=353306.6666666667, ans=0.0 2023-12-22 02:23:10,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.70 vs. limit=22.5 2023-12-22 02:23:21,416 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.657e+01 2.754e+01 2.932e+01 3.860e+01, threshold=5.508e+01, percent-clipped=0.0 2023-12-22 02:23:23,350 INFO [train.py:886] (0/4) Epoch 12, batch 600, loss[loss=0.01435, audio_tagging_loss=0.01435, over 24750.00 frames. ], tot_loss[loss=0.01527, audio_tagging_loss=0.01527, over 4711190.84 frames. ], batch size: 99, lr: 9.27e-03, grad_scale: 64.0 2023-12-22 02:23:39,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=353573.3333333333, ans=0.1 2023-12-22 02:23:41,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=353573.3333333333, ans=0.125 2023-12-22 02:24:03,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=353706.6666666667, ans=0.0 2023-12-22 02:24:15,637 INFO [train.py:886] (0/4) Epoch 12, batch 650, loss[loss=0.01785, audio_tagging_loss=0.01785, over 24750.00 frames. ], tot_loss[loss=0.01531, audio_tagging_loss=0.01531, over 4759544.83 frames. ], batch size: 99, lr: 9.27e-03, grad_scale: 64.0 2023-12-22 02:24:25,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=353906.6666666667, ans=0.125 2023-12-22 02:24:29,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=353906.6666666667, ans=0.2 2023-12-22 02:24:36,352 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.123e-02 2023-12-22 02:24:38,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=353973.3333333333, ans=0.125 2023-12-22 02:25:05,088 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.660e+01 2.826e+01 2.982e+01 3.638e+01, threshold=5.651e+01, percent-clipped=0.0 2023-12-22 02:25:06,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=354173.3333333333, ans=0.0 2023-12-22 02:25:07,015 INFO [train.py:886] (0/4) Epoch 12, batch 700, loss[loss=0.01452, audio_tagging_loss=0.01452, over 23992.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4793838.93 frames. ], batch size: 100, lr: 9.26e-03, grad_scale: 64.0 2023-12-22 02:25:11,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=354173.3333333333, ans=0.125 2023-12-22 02:25:12,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=354173.3333333333, ans=0.125 2023-12-22 02:25:19,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=354240.0, ans=0.0 2023-12-22 02:25:23,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=354240.0, ans=0.0 2023-12-22 02:25:33,583 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:25:41,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2023-12-22 02:25:44,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354373.3333333333, ans=0.1 2023-12-22 02:25:59,238 INFO [train.py:886] (0/4) Epoch 12, batch 750, loss[loss=0.01299, audio_tagging_loss=0.01299, over 25000.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 4832754.95 frames. ], batch size: 100, lr: 9.26e-03, grad_scale: 64.0 2023-12-22 02:26:08,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=354573.3333333333, ans=0.125 2023-12-22 02:26:10,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=354573.3333333333, ans=0.125 2023-12-22 02:26:13,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=354573.3333333333, ans=15.0 2023-12-22 02:26:16,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=354573.3333333333, ans=0.125 2023-12-22 02:26:19,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2023-12-22 02:26:25,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=354640.0, ans=0.1 2023-12-22 02:26:29,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=354706.6666666667, ans=0.0 2023-12-22 02:26:43,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354773.3333333333, ans=0.1 2023-12-22 02:26:47,761 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+01 2.606e+01 2.795e+01 2.916e+01 3.346e+01, threshold=5.591e+01, percent-clipped=0.0 2023-12-22 02:26:48,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=354773.3333333333, ans=0.125 2023-12-22 02:26:50,422 INFO [train.py:886] (0/4) Epoch 12, batch 800, loss[loss=0.01634, audio_tagging_loss=0.01634, over 24750.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4861260.22 frames. ], batch size: 99, lr: 9.26e-03, grad_scale: 64.0 2023-12-22 02:26:52,722 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.20 vs. limit=6.0 2023-12-22 02:26:54,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=354840.0, ans=0.2 2023-12-22 02:27:01,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2023-12-22 02:27:18,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=354973.3333333333, ans=0.0 2023-12-22 02:27:18,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-12-22 02:27:32,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=355106.6666666667, ans=0.2 2023-12-22 02:27:42,024 INFO [train.py:886] (0/4) Epoch 12, batch 850, loss[loss=0.01643, audio_tagging_loss=0.01643, over 25000.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4884209.67 frames. ], batch size: 100, lr: 9.25e-03, grad_scale: 64.0 2023-12-22 02:28:08,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=355306.6666666667, ans=0.125 2023-12-22 02:28:08,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=355306.6666666667, ans=0.0 2023-12-22 02:28:32,634 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 2.693e+01 2.795e+01 2.929e+01 3.864e+01, threshold=5.590e+01, percent-clipped=0.0 2023-12-22 02:28:34,550 INFO [train.py:886] (0/4) Epoch 12, batch 900, loss[loss=0.01285, audio_tagging_loss=0.01285, over 20861.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 4900239.09 frames. ], batch size: 107, lr: 9.25e-03, grad_scale: 64.0 2023-12-22 02:28:35,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=355506.6666666667, ans=0.2 2023-12-22 02:28:45,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=355573.3333333333, ans=0.125 2023-12-22 02:28:48,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=355573.3333333333, ans=0.035 2023-12-22 02:28:56,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=355640.0, ans=0.0 2023-12-22 02:29:23,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=355773.3333333333, ans=0.0 2023-12-22 02:29:26,314 INFO [train.py:886] (0/4) Epoch 12, batch 950, loss[loss=0.01525, audio_tagging_loss=0.01525, over 24750.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 4906361.21 frames. ], batch size: 99, lr: 9.24e-03, grad_scale: 64.0 2023-12-22 02:29:35,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=17.97 vs. limit=15.0 2023-12-22 02:29:43,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.34 vs. limit=6.0 2023-12-22 02:29:53,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=355973.3333333333, ans=0.1 2023-12-22 02:29:55,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=355973.3333333333, ans=0.1 2023-12-22 02:29:57,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=356040.0, ans=0.09899494936611666 2023-12-22 02:30:16,916 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 2.679e+01 2.792e+01 2.958e+01 3.407e+01, threshold=5.583e+01, percent-clipped=0.0 2023-12-22 02:30:18,828 INFO [train.py:886] (0/4) Epoch 12, batch 1000, loss[loss=0.0157, audio_tagging_loss=0.0157, over 25000.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4911046.41 frames. ], batch size: 100, lr: 9.24e-03, grad_scale: 64.0 2023-12-22 02:30:19,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=356173.3333333333, ans=0.125 2023-12-22 02:30:26,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=356173.3333333333, ans=0.1 2023-12-22 02:30:37,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=356240.0, ans=0.0 2023-12-22 02:30:52,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=356373.3333333333, ans=0.125 2023-12-22 02:30:52,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=356373.3333333333, ans=0.125 2023-12-22 02:31:02,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=356440.0, ans=0.0 2023-12-22 02:31:02,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.91 vs. limit=6.0 2023-12-22 02:31:04,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=356440.0, ans=0.125 2023-12-22 02:31:10,658 INFO [train.py:886] (0/4) Epoch 12, batch 1050, loss[loss=0.01409, audio_tagging_loss=0.01409, over 25000.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4924675.81 frames. ], batch size: 100, lr: 9.23e-03, grad_scale: 64.0 2023-12-22 02:31:28,006 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.83 vs. limit=15.0 2023-12-22 02:31:35,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=356640.0, ans=0.0 2023-12-22 02:31:55,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=356773.3333333333, ans=0.125 2023-12-22 02:31:56,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=356773.3333333333, ans=0.125 2023-12-22 02:32:00,445 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+01 2.676e+01 2.799e+01 2.940e+01 3.260e+01, threshold=5.599e+01, percent-clipped=0.0 2023-12-22 02:32:02,364 INFO [train.py:886] (0/4) Epoch 12, batch 1100, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4936150.69 frames. ], batch size: 100, lr: 9.23e-03, grad_scale: 64.0 2023-12-22 02:32:02,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=356840.0, ans=0.125 2023-12-22 02:32:18,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=356906.6666666667, ans=0.2 2023-12-22 02:32:27,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=356973.3333333333, ans=0.125 2023-12-22 02:32:30,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.65 vs. limit=22.5 2023-12-22 02:32:34,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=357040.0, ans=0.125 2023-12-22 02:32:44,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=357106.6666666667, ans=0.125 2023-12-22 02:32:52,841 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2023-12-22 02:32:54,182 INFO [train.py:886] (0/4) Epoch 12, batch 1150, loss[loss=0.01282, audio_tagging_loss=0.01282, over 25000.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4948202.54 frames. ], batch size: 100, lr: 9.23e-03, grad_scale: 64.0 2023-12-22 02:32:56,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=357173.3333333333, ans=0.125 2023-12-22 02:32:59,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=357173.3333333333, ans=0.0 2023-12-22 02:33:02,037 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.19 vs. limit=22.5 2023-12-22 02:33:02,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=357173.3333333333, ans=0.2 2023-12-22 02:33:03,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=357240.0, ans=0.0 2023-12-22 02:33:11,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=357240.0, ans=22.5 2023-12-22 02:33:12,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=357240.0, ans=0.125 2023-12-22 02:33:18,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=357306.6666666667, ans=0.2 2023-12-22 02:33:27,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=357373.3333333333, ans=10.0 2023-12-22 02:33:30,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=357373.3333333333, ans=0.125 2023-12-22 02:33:40,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=357440.0, ans=0.0 2023-12-22 02:33:43,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357440.0, ans=0.1 2023-12-22 02:33:44,375 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.648e+01 2.751e+01 2.936e+01 3.446e+01, threshold=5.502e+01, percent-clipped=0.0 2023-12-22 02:33:45,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=357506.6666666667, ans=0.0 2023-12-22 02:33:46,302 INFO [train.py:886] (0/4) Epoch 12, batch 1200, loss[loss=0.01347, audio_tagging_loss=0.01347, over 25000.00 frames. ], tot_loss[loss=0.01474, audio_tagging_loss=0.01474, over 4951797.49 frames. ], batch size: 100, lr: 9.22e-03, grad_scale: 64.0 2023-12-22 02:33:47,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=357506.6666666667, ans=0.1 2023-12-22 02:33:47,613 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2023-12-22 02:33:56,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.31 vs. limit=15.0 2023-12-22 02:34:13,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=357640.0, ans=0.125 2023-12-22 02:34:24,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=357706.6666666667, ans=0.0 2023-12-22 02:34:38,777 INFO [train.py:886] (0/4) Epoch 12, batch 1250, loss[loss=0.01442, audio_tagging_loss=0.01442, over 24750.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4948230.73 frames. ], batch size: 99, lr: 9.22e-03, grad_scale: 64.0 2023-12-22 02:34:43,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=357840.0, ans=0.125 2023-12-22 02:34:52,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=357906.6666666667, ans=0.0 2023-12-22 02:35:28,217 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.325e+01 2.720e+01 2.848e+01 2.999e+01 3.607e+01, threshold=5.697e+01, percent-clipped=0.0 2023-12-22 02:35:29,699 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.52 vs. limit=10.0 2023-12-22 02:35:30,132 INFO [train.py:886] (0/4) Epoch 12, batch 1300, loss[loss=0.01278, audio_tagging_loss=0.01278, over 24750.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4944089.97 frames. ], batch size: 99, lr: 9.21e-03, grad_scale: 64.0 2023-12-22 02:35:30,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=358173.3333333333, ans=0.125 2023-12-22 02:36:04,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2023-12-22 02:36:14,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=358440.0, ans=0.125 2023-12-22 02:36:19,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=358440.0, ans=0.1 2023-12-22 02:36:22,384 INFO [train.py:886] (0/4) Epoch 12, batch 1350, loss[loss=0.01529, audio_tagging_loss=0.01529, over 24750.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 4944504.95 frames. ], batch size: 99, lr: 9.21e-03, grad_scale: 64.0 2023-12-22 02:36:50,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=358640.0, ans=0.125 2023-12-22 02:37:12,419 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.310e+01 2.686e+01 2.846e+01 3.039e+01 3.537e+01, threshold=5.691e+01, percent-clipped=0.0 2023-12-22 02:37:12,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=358773.3333333333, ans=0.05 2023-12-22 02:37:14,346 INFO [train.py:886] (0/4) Epoch 12, batch 1400, loss[loss=0.01635, audio_tagging_loss=0.01635, over 24750.00 frames. ], tot_loss[loss=0.01483, audio_tagging_loss=0.01483, over 4945192.95 frames. ], batch size: 99, lr: 9.20e-03, grad_scale: 64.0 2023-12-22 02:37:16,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=358840.0, ans=0.0 2023-12-22 02:37:18,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=358840.0, ans=0.0 2023-12-22 02:37:21,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=358840.0, ans=0.04949747468305833 2023-12-22 02:37:21,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=358840.0, ans=0.035 2023-12-22 02:37:26,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.91 vs. limit=15.0 2023-12-22 02:37:43,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.75 vs. limit=15.0 2023-12-22 02:37:48,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=359040.0, ans=0.125 2023-12-22 02:37:55,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=359106.6666666667, ans=0.015 2023-12-22 02:38:00,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=359106.6666666667, ans=0.125 2023-12-22 02:38:04,765 INFO [train.py:886] (0/4) Epoch 12, batch 1450, loss[loss=0.0147, audio_tagging_loss=0.0147, over 24750.00 frames. ], tot_loss[loss=0.01477, audio_tagging_loss=0.01477, over 4952435.02 frames. ], batch size: 99, lr: 9.20e-03, grad_scale: 64.0 2023-12-22 02:38:20,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.66 vs. limit=10.0 2023-12-22 02:38:27,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=359306.6666666667, ans=0.04949747468305833 2023-12-22 02:38:54,578 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.648e+01 2.789e+01 2.949e+01 3.520e+01, threshold=5.579e+01, percent-clipped=0.0 2023-12-22 02:38:56,515 INFO [train.py:886] (0/4) Epoch 12, batch 1500, loss[loss=0.01341, audio_tagging_loss=0.01341, over 25000.00 frames. ], tot_loss[loss=0.01476, audio_tagging_loss=0.01476, over 4953542.22 frames. ], batch size: 100, lr: 9.20e-03, grad_scale: 64.0 2023-12-22 02:39:00,386 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2023-12-22 02:39:19,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=359640.0, ans=0.125 2023-12-22 02:39:21,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=359640.0, ans=0.125 2023-12-22 02:39:26,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=359640.0, ans=0.125 2023-12-22 02:39:39,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=359773.3333333333, ans=0.125 2023-12-22 02:39:43,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=359773.3333333333, ans=0.0 2023-12-22 02:39:50,011 INFO [train.py:886] (0/4) Epoch 12, batch 1550, loss[loss=0.01624, audio_tagging_loss=0.01624, over 24750.00 frames. ], tot_loss[loss=0.01483, audio_tagging_loss=0.01483, over 4952988.21 frames. ], batch size: 99, lr: 9.19e-03, grad_scale: 64.0 2023-12-22 02:39:52,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=359840.0, ans=0.2 2023-12-22 02:39:53,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=359840.0, ans=0.125 2023-12-22 02:39:57,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=359840.0, ans=0.125 2023-12-22 02:40:05,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=359906.6666666667, ans=0.125 2023-12-22 02:40:39,644 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+01 2.713e+01 2.839e+01 3.034e+01 3.584e+01, threshold=5.677e+01, percent-clipped=0.0 2023-12-22 02:40:40,325 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.82 vs. limit=6.0 2023-12-22 02:40:41,600 INFO [train.py:886] (0/4) Epoch 12, batch 1600, loss[loss=0.01547, audio_tagging_loss=0.01547, over 24750.00 frames. ], tot_loss[loss=0.01491, audio_tagging_loss=0.01491, over 4949020.06 frames. ], batch size: 99, lr: 9.19e-03, grad_scale: 64.0 2023-12-22 02:40:48,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=360173.3333333333, ans=0.2 2023-12-22 02:40:49,987 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:40:53,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=360240.0, ans=0.125 2023-12-22 02:40:55,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=360240.0, ans=0.125 2023-12-22 02:40:59,493 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.606e-01 2023-12-22 02:41:10,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=360306.6666666667, ans=0.0 2023-12-22 02:41:11,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=360306.6666666667, ans=0.125 2023-12-22 02:41:32,822 INFO [train.py:886] (0/4) Epoch 12, batch 1650, loss[loss=0.01581, audio_tagging_loss=0.01581, over 25000.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4941886.66 frames. ], batch size: 100, lr: 9.18e-03, grad_scale: 64.0 2023-12-22 02:41:44,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=360573.3333333333, ans=0.125 2023-12-22 02:41:47,070 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.13 vs. limit=6.0 2023-12-22 02:41:55,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=360640.0, ans=0.07 2023-12-22 02:42:15,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=360773.3333333333, ans=0.0 2023-12-22 02:42:15,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=360773.3333333333, ans=0.125 2023-12-22 02:42:17,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.90 vs. limit=6.0 2023-12-22 02:42:20,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=360773.3333333333, ans=0.1 2023-12-22 02:42:22,670 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.365e+01 2.644e+01 2.817e+01 2.973e+01 3.664e+01, threshold=5.633e+01, percent-clipped=0.0 2023-12-22 02:42:25,262 INFO [train.py:886] (0/4) Epoch 12, batch 1700, loss[loss=0.01338, audio_tagging_loss=0.01338, over 24750.00 frames. ], tot_loss[loss=0.01478, audio_tagging_loss=0.01478, over 4948689.73 frames. ], batch size: 99, lr: 9.18e-03, grad_scale: 64.0 2023-12-22 02:42:31,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=360840.0, ans=0.025 2023-12-22 02:42:31,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.15 vs. limit=22.5 2023-12-22 02:42:36,307 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.37 vs. limit=12.0 2023-12-22 02:42:40,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=360906.6666666667, ans=0.04949747468305833 2023-12-22 02:42:45,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.60 vs. limit=22.5 2023-12-22 02:43:06,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=361106.6666666667, ans=0.125 2023-12-22 02:43:07,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=361106.6666666667, ans=0.125 2023-12-22 02:43:08,216 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:43:16,379 INFO [train.py:886] (0/4) Epoch 12, batch 1750, loss[loss=0.01755, audio_tagging_loss=0.01755, over 25000.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4948390.77 frames. ], batch size: 100, lr: 9.18e-03, grad_scale: 64.0 2023-12-22 02:43:23,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=361173.3333333333, ans=0.125 2023-12-22 02:43:25,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=361240.0, ans=0.125 2023-12-22 02:43:41,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2023-12-22 02:43:41,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=361306.6666666667, ans=0.2 2023-12-22 02:43:48,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=361373.3333333333, ans=0.125 2023-12-22 02:43:55,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=361373.3333333333, ans=0.125 2023-12-22 02:44:04,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=361440.0, ans=0.125 2023-12-22 02:44:07,342 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.350e+01 2.657e+01 2.802e+01 2.979e+01 3.545e+01, threshold=5.603e+01, percent-clipped=0.0 2023-12-22 02:44:09,256 INFO [train.py:886] (0/4) Epoch 12, batch 1800, loss[loss=0.01515, audio_tagging_loss=0.01515, over 25000.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4953166.35 frames. ], batch size: 100, lr: 9.17e-03, grad_scale: 64.0 2023-12-22 02:44:10,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=361506.6666666667, ans=0.1 2023-12-22 02:44:40,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=361706.6666666667, ans=0.1 2023-12-22 02:44:45,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.38 vs. limit=10.0 2023-12-22 02:44:49,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=361773.3333333333, ans=0.0 2023-12-22 02:45:00,421 INFO [train.py:886] (0/4) Epoch 12, batch 1850, loss[loss=0.01254, audio_tagging_loss=0.01254, over 25000.00 frames. ], tot_loss[loss=0.01483, audio_tagging_loss=0.01483, over 4946126.56 frames. ], batch size: 100, lr: 9.17e-03, grad_scale: 64.0 2023-12-22 02:45:20,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=361973.3333333333, ans=0.0 2023-12-22 02:45:29,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=361973.3333333333, ans=0.5 2023-12-22 02:45:42,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=362106.6666666667, ans=0.125 2023-12-22 02:45:42,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.31 vs. limit=10.0 2023-12-22 02:45:44,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=362106.6666666667, ans=0.125 2023-12-22 02:45:47,619 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=6.0 2023-12-22 02:45:50,659 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.408e+01 2.739e+01 2.901e+01 3.056e+01 4.118e+01, threshold=5.802e+01, percent-clipped=0.0 2023-12-22 02:45:53,268 INFO [train.py:886] (0/4) Epoch 12, batch 1900, loss[loss=0.01503, audio_tagging_loss=0.01503, over 24750.00 frames. ], tot_loss[loss=0.01505, audio_tagging_loss=0.01505, over 4942856.82 frames. ], batch size: 99, lr: 9.16e-03, grad_scale: 64.0 2023-12-22 02:45:55,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=362173.3333333333, ans=0.2 2023-12-22 02:46:06,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=362240.0, ans=0.125 2023-12-22 02:46:07,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=362240.0, ans=0.2 2023-12-22 02:46:18,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.21 vs. limit=15.0 2023-12-22 02:46:38,177 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.17 vs. limit=15.0 2023-12-22 02:46:45,596 INFO [train.py:886] (0/4) Epoch 12, batch 1950, loss[loss=0.01479, audio_tagging_loss=0.01479, over 24750.00 frames. ], tot_loss[loss=0.015, audio_tagging_loss=0.015, over 4943792.26 frames. ], batch size: 99, lr: 9.16e-03, grad_scale: 64.0 2023-12-22 02:46:53,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=362506.6666666667, ans=0.125 2023-12-22 02:47:26,379 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2023-12-22 02:47:33,919 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.661e+01 2.847e+01 3.018e+01 3.786e+01, threshold=5.694e+01, percent-clipped=0.0 2023-12-22 02:47:35,870 INFO [train.py:886] (0/4) Epoch 12, batch 2000, loss[loss=0.01497, audio_tagging_loss=0.01497, over 24750.00 frames. ], tot_loss[loss=0.01491, audio_tagging_loss=0.01491, over 4943730.11 frames. ], batch size: 99, lr: 9.16e-03, grad_scale: 128.0 2023-12-22 02:47:38,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=362840.0, ans=0.0 2023-12-22 02:47:56,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=362973.3333333333, ans=0.125 2023-12-22 02:47:58,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=362973.3333333333, ans=0.0 2023-12-22 02:48:02,089 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.11 vs. limit=15.0 2023-12-22 02:48:28,425 INFO [train.py:886] (0/4) Epoch 12, batch 2050, loss[loss=0.01504, audio_tagging_loss=0.01504, over 25000.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4952304.21 frames. ], batch size: 100, lr: 9.15e-03, grad_scale: 64.0 2023-12-22 02:49:17,502 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.419e+01 2.662e+01 2.833e+01 2.958e+01 3.467e+01, threshold=5.665e+01, percent-clipped=0.0 2023-12-22 02:49:18,481 INFO [train.py:886] (0/4) Epoch 12, batch 2100, loss[loss=0.01644, audio_tagging_loss=0.01644, over 25000.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4955733.98 frames. ], batch size: 100, lr: 9.15e-03, grad_scale: 64.0 2023-12-22 02:49:18,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=363506.6666666667, ans=0.125 2023-12-22 02:50:00,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=363773.3333333333, ans=0.125 2023-12-22 02:50:04,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=363773.3333333333, ans=0.1 2023-12-22 02:50:11,414 INFO [train.py:886] (0/4) Epoch 12, batch 2150, loss[loss=0.01668, audio_tagging_loss=0.01668, over 25000.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4955953.49 frames. ], batch size: 100, lr: 9.14e-03, grad_scale: 64.0 2023-12-22 02:50:15,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=363840.0, ans=0.125 2023-12-22 02:50:22,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=363906.6666666667, ans=0.125 2023-12-22 02:50:26,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=363906.6666666667, ans=0.2 2023-12-22 02:50:48,423 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.60 vs. limit=15.0 2023-12-22 02:51:02,679 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.357e+01 2.698e+01 2.881e+01 3.027e+01 3.417e+01, threshold=5.762e+01, percent-clipped=0.0 2023-12-22 02:51:03,648 INFO [train.py:886] (0/4) Epoch 12, batch 2200, loss[loss=0.01646, audio_tagging_loss=0.01646, over 24750.00 frames. ], tot_loss[loss=0.015, audio_tagging_loss=0.015, over 4953247.73 frames. ], batch size: 99, lr: 9.14e-03, grad_scale: 64.0 2023-12-22 02:51:09,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=364173.3333333333, ans=0.5 2023-12-22 02:51:21,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=364240.0, ans=0.0 2023-12-22 02:51:21,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=364240.0, ans=0.125 2023-12-22 02:51:55,234 INFO [train.py:886] (0/4) Epoch 12, batch 2250, loss[loss=0.01411, audio_tagging_loss=0.01411, over 24750.00 frames. ], tot_loss[loss=0.01499, audio_tagging_loss=0.01499, over 4950424.02 frames. ], batch size: 99, lr: 9.13e-03, grad_scale: 64.0 2023-12-22 02:52:04,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=15.0 2023-12-22 02:52:12,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=364573.3333333333, ans=0.07 2023-12-22 02:52:31,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=364706.6666666667, ans=0.0 2023-12-22 02:52:34,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=364706.6666666667, ans=0.125 2023-12-22 02:52:35,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=364706.6666666667, ans=0.125 2023-12-22 02:52:37,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=364773.3333333333, ans=0.125 2023-12-22 02:52:43,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=364773.3333333333, ans=0.125 2023-12-22 02:52:46,425 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+01 2.658e+01 2.783e+01 2.953e+01 5.165e+01, threshold=5.566e+01, percent-clipped=0.0 2023-12-22 02:52:47,410 INFO [train.py:886] (0/4) Epoch 12, batch 2300, loss[loss=0.01426, audio_tagging_loss=0.01426, over 25000.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4953834.56 frames. ], batch size: 100, lr: 9.13e-03, grad_scale: 64.0 2023-12-22 02:52:52,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=364840.0, ans=10.0 2023-12-22 02:52:54,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=364840.0, ans=0.0 2023-12-22 02:53:01,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=364906.6666666667, ans=0.125 2023-12-22 02:53:13,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364973.3333333333, ans=0.1 2023-12-22 02:53:17,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364973.3333333333, ans=0.1 2023-12-22 02:53:39,768 INFO [train.py:886] (0/4) Epoch 12, batch 2350, loss[loss=0.01504, audio_tagging_loss=0.01504, over 25000.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4957897.00 frames. ], batch size: 100, lr: 9.13e-03, grad_scale: 64.0 2023-12-22 02:53:43,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=365173.3333333333, ans=0.02 2023-12-22 02:53:51,979 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.476e-02 2023-12-22 02:53:57,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=365240.0, ans=0.125 2023-12-22 02:54:05,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.46 vs. limit=22.5 2023-12-22 02:54:17,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=365373.3333333333, ans=0.2 2023-12-22 02:54:17,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=365373.3333333333, ans=0.0 2023-12-22 02:54:30,883 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+01 2.667e+01 2.832e+01 3.022e+01 3.621e+01, threshold=5.664e+01, percent-clipped=0.0 2023-12-22 02:54:31,875 INFO [train.py:886] (0/4) Epoch 12, batch 2400, loss[loss=0.01341, audio_tagging_loss=0.01341, over 25000.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 4961550.22 frames. ], batch size: 100, lr: 9.12e-03, grad_scale: 64.0 2023-12-22 02:54:34,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=365506.6666666667, ans=0.125 2023-12-22 02:54:39,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=365506.6666666667, ans=0.0 2023-12-22 02:55:08,453 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-12-22 02:55:10,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=365706.6666666667, ans=0.125 2023-12-22 02:55:19,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=365773.3333333333, ans=0.125 2023-12-22 02:55:23,513 INFO [train.py:886] (0/4) Epoch 12, batch 2450, loss[loss=0.0144, audio_tagging_loss=0.0144, over 25000.00 frames. ], tot_loss[loss=0.01479, audio_tagging_loss=0.01479, over 4958352.00 frames. ], batch size: 100, lr: 9.12e-03, grad_scale: 64.0 2023-12-22 02:55:53,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.21 vs. limit=15.0 2023-12-22 02:56:11,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=366106.6666666667, ans=0.125 2023-12-22 02:56:14,272 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+01 2.716e+01 2.829e+01 2.975e+01 3.465e+01, threshold=5.658e+01, percent-clipped=0.0 2023-12-22 02:56:15,269 INFO [train.py:886] (0/4) Epoch 12, batch 2500, loss[loss=0.01723, audio_tagging_loss=0.01723, over 24750.00 frames. ], tot_loss[loss=0.01498, audio_tagging_loss=0.01498, over 4953668.09 frames. ], batch size: 99, lr: 9.11e-03, grad_scale: 64.0 2023-12-22 02:56:20,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.77 vs. limit=15.0 2023-12-22 02:56:23,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=366173.3333333333, ans=0.0 2023-12-22 02:56:26,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=366240.0, ans=0.125 2023-12-22 02:56:26,722 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2023-12-22 02:56:30,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=15.0 2023-12-22 02:56:37,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=366306.6666666667, ans=0.025 2023-12-22 02:56:46,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=366373.3333333333, ans=22.5 2023-12-22 02:56:47,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=15.0 2023-12-22 02:57:06,187 INFO [train.py:886] (0/4) Epoch 12, batch 2550, loss[loss=0.01723, audio_tagging_loss=0.01723, over 24046.00 frames. ], tot_loss[loss=0.01501, audio_tagging_loss=0.01501, over 4949566.72 frames. ], batch size: 100, lr: 9.11e-03, grad_scale: 64.0 2023-12-22 02:57:26,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=366640.0, ans=0.2 2023-12-22 02:57:34,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=366640.0, ans=0.125 2023-12-22 02:57:37,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=366706.6666666667, ans=0.125 2023-12-22 02:57:38,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=366706.6666666667, ans=0.1 2023-12-22 02:57:56,934 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+01 2.648e+01 2.817e+01 3.039e+01 3.435e+01, threshold=5.634e+01, percent-clipped=0.0 2023-12-22 02:57:57,903 INFO [train.py:886] (0/4) Epoch 12, batch 2600, loss[loss=0.01539, audio_tagging_loss=0.01539, over 24750.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4951594.42 frames. ], batch size: 99, lr: 9.11e-03, grad_scale: 64.0 2023-12-22 02:57:59,418 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.68 vs. limit=15.0 2023-12-22 02:58:47,988 INFO [train.py:886] (0/4) Epoch 12, batch 2650, loss[loss=0.01382, audio_tagging_loss=0.01382, over 25000.00 frames. ], tot_loss[loss=0.01474, audio_tagging_loss=0.01474, over 4953715.19 frames. ], batch size: 100, lr: 9.10e-03, grad_scale: 64.0 2023-12-22 02:58:49,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=367173.3333333333, ans=0.2 2023-12-22 02:58:58,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=367240.0, ans=0.125 2023-12-22 02:58:59,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=367240.0, ans=0.0 2023-12-22 02:59:04,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=367240.0, ans=0.125 2023-12-22 02:59:17,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=367306.6666666667, ans=0.125 2023-12-22 02:59:32,650 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.67 vs. limit=22.5 2023-12-22 02:59:38,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=367440.0, ans=0.125 2023-12-22 02:59:38,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=367440.0, ans=0.0 2023-12-22 02:59:38,847 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.659e+01 2.800e+01 2.953e+01 3.305e+01, threshold=5.601e+01, percent-clipped=0.0 2023-12-22 02:59:39,829 INFO [train.py:886] (0/4) Epoch 12, batch 2700, loss[loss=0.01625, audio_tagging_loss=0.01625, over 25000.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4959627.31 frames. ], batch size: 100, lr: 9.10e-03, grad_scale: 64.0 2023-12-22 03:00:06,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=367640.0, ans=0.0 2023-12-22 03:00:07,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=367640.0, ans=0.125 2023-12-22 03:00:16,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.41 vs. limit=15.0 2023-12-22 03:00:17,205 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.74 vs. limit=10.0 2023-12-22 03:00:31,448 INFO [train.py:886] (0/4) Epoch 12, batch 2750, loss[loss=0.01796, audio_tagging_loss=0.01796, over 25000.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4961161.15 frames. ], batch size: 100, lr: 9.09e-03, grad_scale: 64.0 2023-12-22 03:00:44,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=367906.6666666667, ans=0.125 2023-12-22 03:00:45,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2023-12-22 03:00:53,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=367973.3333333333, ans=0.1 2023-12-22 03:00:53,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=367973.3333333333, ans=0.04949747468305833 2023-12-22 03:01:17,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=368106.6666666667, ans=0.2 2023-12-22 03:01:19,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=368106.6666666667, ans=0.05 2023-12-22 03:01:22,366 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+01 2.734e+01 2.863e+01 2.984e+01 3.983e+01, threshold=5.726e+01, percent-clipped=0.0 2023-12-22 03:01:22,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=368173.3333333333, ans=0.125 2023-12-22 03:01:22,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=368173.3333333333, ans=0.0 2023-12-22 03:01:22,954 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2023-12-22 03:01:23,371 INFO [train.py:886] (0/4) Epoch 12, batch 2800, loss[loss=0.01695, audio_tagging_loss=0.01695, over 24750.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4965236.94 frames. ], batch size: 99, lr: 9.09e-03, grad_scale: 64.0 2023-12-22 03:01:42,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=368240.0, ans=0.0 2023-12-22 03:02:16,490 INFO [train.py:886] (0/4) Epoch 12, batch 2850, loss[loss=0.01632, audio_tagging_loss=0.01632, over 24750.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4956273.13 frames. ], batch size: 99, lr: 9.09e-03, grad_scale: 64.0 2023-12-22 03:02:17,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2023-12-22 03:02:19,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=368506.6666666667, ans=0.125 2023-12-22 03:02:23,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=368506.6666666667, ans=0.05 2023-12-22 03:02:26,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=368573.3333333333, ans=0.0 2023-12-22 03:02:51,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.50 vs. limit=22.5 2023-12-22 03:02:58,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=368773.3333333333, ans=0.125 2023-12-22 03:03:06,049 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 2.672e+01 2.791e+01 2.941e+01 3.403e+01, threshold=5.583e+01, percent-clipped=0.0 2023-12-22 03:03:07,710 INFO [train.py:886] (0/4) Epoch 12, batch 2900, loss[loss=0.01317, audio_tagging_loss=0.01317, over 24750.00 frames. ], tot_loss[loss=0.01486, audio_tagging_loss=0.01486, over 4953496.95 frames. ], batch size: 99, lr: 9.08e-03, grad_scale: 64.0 2023-12-22 03:03:40,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=369040.0, ans=0.2 2023-12-22 03:03:45,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=369040.0, ans=0.2 2023-12-22 03:03:46,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=369040.0, ans=0.125 2023-12-22 03:03:59,023 INFO [train.py:886] (0/4) Epoch 12, batch 2950, loss[loss=0.01245, audio_tagging_loss=0.01245, over 25000.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4958027.19 frames. ], batch size: 100, lr: 9.08e-03, grad_scale: 64.0 2023-12-22 03:04:24,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=369306.6666666667, ans=0.2 2023-12-22 03:04:42,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=369440.0, ans=0.125 2023-12-22 03:04:48,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2023-12-22 03:04:50,025 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+01 2.641e+01 2.799e+01 3.004e+01 3.517e+01, threshold=5.597e+01, percent-clipped=0.0 2023-12-22 03:04:51,017 INFO [train.py:886] (0/4) Epoch 12, batch 3000, loss[loss=0.01366, audio_tagging_loss=0.01366, over 25000.00 frames. ], tot_loss[loss=0.01466, audio_tagging_loss=0.01466, over 4955290.40 frames. ], batch size: 100, lr: 9.07e-03, grad_scale: 64.0 2023-12-22 03:04:51,018 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 03:05:12,278 INFO [train.py:917] (0/4) Epoch 12, validation: loss=0.03429, audio_tagging_loss=0.03429, over 3737520.00 frames. 2023-12-22 03:05:12,279 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 03:05:35,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=369640.0, ans=0.025 2023-12-22 03:06:01,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=369773.3333333333, ans=0.0 2023-12-22 03:06:03,271 INFO [train.py:886] (0/4) Epoch 12, batch 3050, loss[loss=0.01352, audio_tagging_loss=0.01352, over 25000.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4951474.16 frames. ], batch size: 100, lr: 9.07e-03, grad_scale: 64.0 2023-12-22 03:06:10,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.59 vs. limit=15.0 2023-12-22 03:06:12,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.53 vs. limit=10.0 2023-12-22 03:06:23,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=369906.6666666667, ans=0.1 2023-12-22 03:06:29,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.87 vs. limit=22.5 2023-12-22 03:06:33,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=370040.0, ans=0.125 2023-12-22 03:06:40,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=370040.0, ans=0.2 2023-12-22 03:06:40,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-12-22 03:06:53,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=370106.6666666667, ans=0.1 2023-12-22 03:06:54,447 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 2.661e+01 2.811e+01 2.974e+01 3.661e+01, threshold=5.621e+01, percent-clipped=0.0 2023-12-22 03:06:55,400 INFO [train.py:886] (0/4) Epoch 12, batch 3100, loss[loss=0.01569, audio_tagging_loss=0.01569, over 25000.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4954049.67 frames. ], batch size: 100, lr: 9.07e-03, grad_scale: 64.0 2023-12-22 03:07:02,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=370173.3333333333, ans=0.2 2023-12-22 03:07:31,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=370373.3333333333, ans=0.0 2023-12-22 03:07:31,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=370373.3333333333, ans=0.0 2023-12-22 03:07:35,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2023-12-22 03:07:35,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.57 vs. limit=10.0 2023-12-22 03:07:38,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=370440.0, ans=0.1 2023-12-22 03:07:42,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=370440.0, ans=0.125 2023-12-22 03:07:45,187 INFO [train.py:886] (0/4) Epoch 12, batch 3150, loss[loss=0.01212, audio_tagging_loss=0.01212, over 24188.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4945626.68 frames. ], batch size: 101, lr: 9.06e-03, grad_scale: 64.0 2023-12-22 03:08:00,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=370573.3333333333, ans=0.125 2023-12-22 03:08:12,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=370640.0, ans=0.1 2023-12-22 03:08:20,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=370706.6666666667, ans=0.2 2023-12-22 03:08:21,712 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=22.5 2023-12-22 03:08:33,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=370773.3333333333, ans=0.125 2023-12-22 03:08:37,464 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.651e+01 2.845e+01 3.043e+01 3.695e+01, threshold=5.689e+01, percent-clipped=0.0 2023-12-22 03:08:38,465 INFO [train.py:886] (0/4) Epoch 12, batch 3200, loss[loss=0.01483, audio_tagging_loss=0.01483, over 25000.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4940790.32 frames. ], batch size: 100, lr: 9.06e-03, grad_scale: 64.0 2023-12-22 03:08:40,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=370840.0, ans=0.0 2023-12-22 03:08:45,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.28 vs. limit=15.0 2023-12-22 03:08:50,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=370906.6666666667, ans=0.2 2023-12-22 03:09:01,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=370973.3333333333, ans=0.0 2023-12-22 03:09:19,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=371106.6666666667, ans=0.0 2023-12-22 03:09:22,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-12-22 03:09:23,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=22.5 2023-12-22 03:09:29,961 INFO [train.py:886] (0/4) Epoch 12, batch 3250, loss[loss=0.01487, audio_tagging_loss=0.01487, over 25000.00 frames. ], tot_loss[loss=0.01474, audio_tagging_loss=0.01474, over 4945750.50 frames. ], batch size: 100, lr: 9.05e-03, grad_scale: 64.0 2023-12-22 03:09:51,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=371306.6666666667, ans=0.0 2023-12-22 03:09:54,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=371306.6666666667, ans=0.125 2023-12-22 03:09:55,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=371306.6666666667, ans=0.0 2023-12-22 03:09:55,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=371306.6666666667, ans=0.125 2023-12-22 03:10:00,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=371373.3333333333, ans=0.1 2023-12-22 03:10:14,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=371440.0, ans=0.1 2023-12-22 03:10:20,351 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.685e+01 2.819e+01 2.963e+01 3.522e+01, threshold=5.637e+01, percent-clipped=0.0 2023-12-22 03:10:21,389 INFO [train.py:886] (0/4) Epoch 12, batch 3300, loss[loss=0.01357, audio_tagging_loss=0.01357, over 25000.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4949878.86 frames. ], batch size: 100, lr: 9.05e-03, grad_scale: 64.0 2023-12-22 03:10:36,283 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:10:41,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=371640.0, ans=0.2 2023-12-22 03:10:44,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=371640.0, ans=0.125 2023-12-22 03:10:59,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=371706.6666666667, ans=0.125 2023-12-22 03:11:01,418 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:11:03,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=371773.3333333333, ans=0.0 2023-12-22 03:11:03,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=371773.3333333333, ans=0.1 2023-12-22 03:11:04,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=371773.3333333333, ans=0.0 2023-12-22 03:11:13,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=371840.0, ans=0.125 2023-12-22 03:11:13,996 INFO [train.py:886] (0/4) Epoch 12, batch 3350, loss[loss=0.01683, audio_tagging_loss=0.01683, over 24750.00 frames. ], tot_loss[loss=0.01468, audio_tagging_loss=0.01468, over 4949383.53 frames. ], batch size: 99, lr: 9.05e-03, grad_scale: 64.0 2023-12-22 03:11:24,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.87 vs. limit=22.5 2023-12-22 03:11:25,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=371906.6666666667, ans=0.1 2023-12-22 03:11:31,611 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:11:33,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=371973.3333333333, ans=0.125 2023-12-22 03:11:40,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=28.13 vs. limit=15.0 2023-12-22 03:11:52,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=372040.0, ans=0.125 2023-12-22 03:11:53,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=372040.0, ans=0.125 2023-12-22 03:11:53,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2023-12-22 03:11:55,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=372106.6666666667, ans=0.125 2023-12-22 03:12:03,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=372106.6666666667, ans=0.015 2023-12-22 03:12:04,924 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+01 2.659e+01 2.803e+01 3.006e+01 4.806e+01, threshold=5.606e+01, percent-clipped=0.0 2023-12-22 03:12:05,919 INFO [train.py:886] (0/4) Epoch 12, batch 3400, loss[loss=0.01609, audio_tagging_loss=0.01609, over 25000.00 frames. ], tot_loss[loss=0.01477, audio_tagging_loss=0.01477, over 4951539.91 frames. ], batch size: 100, lr: 9.04e-03, grad_scale: 64.0 2023-12-22 03:12:21,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=372240.0, ans=0.1 2023-12-22 03:12:55,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=372440.0, ans=0.0 2023-12-22 03:12:58,609 INFO [train.py:886] (0/4) Epoch 12, batch 3450, loss[loss=0.01487, audio_tagging_loss=0.01487, over 24750.00 frames. ], tot_loss[loss=0.01483, audio_tagging_loss=0.01483, over 4951130.94 frames. ], batch size: 99, lr: 9.04e-03, grad_scale: 64.0 2023-12-22 03:12:58,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=372506.6666666667, ans=0.1 2023-12-22 03:13:05,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=372506.6666666667, ans=0.1 2023-12-22 03:13:18,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=372640.0, ans=0.0 2023-12-22 03:13:21,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=372640.0, ans=0.1 2023-12-22 03:13:35,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=372706.6666666667, ans=0.1 2023-12-22 03:13:42,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=372773.3333333333, ans=0.0 2023-12-22 03:13:49,404 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.514e+01 2.768e+01 2.902e+01 3.061e+01 3.695e+01, threshold=5.804e+01, percent-clipped=0.0 2023-12-22 03:13:51,112 INFO [train.py:886] (0/4) Epoch 12, batch 3500, loss[loss=0.01663, audio_tagging_loss=0.01663, over 25000.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 4949514.99 frames. ], batch size: 100, lr: 9.03e-03, grad_scale: 64.0 2023-12-22 03:13:51,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=372840.0, ans=0.125 2023-12-22 03:14:02,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=372906.6666666667, ans=0.125 2023-12-22 03:14:09,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=372906.6666666667, ans=15.0 2023-12-22 03:14:10,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=372973.3333333333, ans=0.2 2023-12-22 03:14:11,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=372973.3333333333, ans=0.125 2023-12-22 03:14:15,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.40 vs. limit=6.0 2023-12-22 03:14:28,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.99 vs. limit=12.0 2023-12-22 03:14:41,967 INFO [train.py:886] (0/4) Epoch 12, batch 3550, loss[loss=0.01564, audio_tagging_loss=0.01564, over 25000.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4953034.43 frames. ], batch size: 100, lr: 9.03e-03, grad_scale: 64.0 2023-12-22 03:14:45,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2023-12-22 03:14:51,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2023-12-22 03:15:05,980 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-56000.pt 2023-12-22 03:15:13,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=373306.6666666667, ans=0.125 2023-12-22 03:15:30,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=373440.0, ans=0.035 2023-12-22 03:15:35,218 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.600e+01 2.744e+01 2.864e+01 3.557e+01, threshold=5.489e+01, percent-clipped=0.0 2023-12-22 03:15:36,194 INFO [train.py:886] (0/4) Epoch 12, batch 3600, loss[loss=0.0144, audio_tagging_loss=0.0144, over 24750.00 frames. ], tot_loss[loss=0.01466, audio_tagging_loss=0.01466, over 4951016.70 frames. ], batch size: 99, lr: 9.03e-03, grad_scale: 64.0 2023-12-22 03:15:38,750 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2023-12-22 03:15:48,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=373573.3333333333, ans=0.125 2023-12-22 03:15:59,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=373640.0, ans=0.1 2023-12-22 03:16:21,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=373773.3333333333, ans=0.0 2023-12-22 03:16:27,987 INFO [train.py:886] (0/4) Epoch 12, batch 3650, loss[loss=0.0153, audio_tagging_loss=0.0153, over 25000.00 frames. ], tot_loss[loss=0.01464, audio_tagging_loss=0.01464, over 4950486.67 frames. ], batch size: 100, lr: 9.02e-03, grad_scale: 64.0 2023-12-22 03:16:37,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=373840.0, ans=0.125 2023-12-22 03:16:38,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=373906.6666666667, ans=0.0 2023-12-22 03:16:43,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.59 vs. limit=15.0 2023-12-22 03:16:59,627 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.37 vs. limit=15.0 2023-12-22 03:17:00,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=374040.0, ans=0.0 2023-12-22 03:17:12,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=374106.6666666667, ans=0.0 2023-12-22 03:17:16,175 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.23 vs. limit=6.0 2023-12-22 03:17:18,603 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.492e+01 2.722e+01 2.831e+01 2.983e+01 3.559e+01, threshold=5.662e+01, percent-clipped=0.0 2023-12-22 03:17:19,590 INFO [train.py:886] (0/4) Epoch 12, batch 3700, loss[loss=0.01578, audio_tagging_loss=0.01578, over 25000.00 frames. ], tot_loss[loss=0.01464, audio_tagging_loss=0.01464, over 4953465.95 frames. ], batch size: 100, lr: 9.02e-03, grad_scale: 64.0 2023-12-22 03:17:19,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.71 vs. limit=10.0 2023-12-22 03:17:23,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.81 vs. limit=22.5 2023-12-22 03:17:24,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=374173.3333333333, ans=0.0 2023-12-22 03:17:25,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=374173.3333333333, ans=0.125 2023-12-22 03:17:36,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2023-12-22 03:18:09,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=374440.0, ans=0.05 2023-12-22 03:18:12,524 INFO [train.py:886] (0/4) Epoch 12, batch 3750, loss[loss=0.01379, audio_tagging_loss=0.01379, over 25000.00 frames. ], tot_loss[loss=0.01471, audio_tagging_loss=0.01471, over 4956428.38 frames. ], batch size: 100, lr: 9.01e-03, grad_scale: 64.0 2023-12-22 03:18:28,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.56 vs. limit=10.0 2023-12-22 03:18:34,507 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.92 vs. limit=15.0 2023-12-22 03:18:48,414 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2023-12-22 03:18:56,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=374773.3333333333, ans=0.2 2023-12-22 03:19:02,674 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.712e+01 2.843e+01 3.020e+01 3.497e+01, threshold=5.685e+01, percent-clipped=0.0 2023-12-22 03:19:02,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=374840.0, ans=0.1 2023-12-22 03:19:03,675 INFO [train.py:886] (0/4) Epoch 12, batch 3800, loss[loss=0.01806, audio_tagging_loss=0.01806, over 24750.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4954676.64 frames. ], batch size: 99, lr: 9.01e-03, grad_scale: 64.0 2023-12-22 03:19:03,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=374840.0, ans=0.125 2023-12-22 03:19:22,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=374906.6666666667, ans=0.0 2023-12-22 03:19:39,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375040.0, ans=0.1 2023-12-22 03:19:39,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=375040.0, ans=0.125 2023-12-22 03:19:40,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=375040.0, ans=0.125 2023-12-22 03:19:41,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2023-12-22 03:19:55,527 INFO [train.py:886] (0/4) Epoch 12, batch 3850, loss[loss=0.01344, audio_tagging_loss=0.01344, over 24750.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4952705.05 frames. ], batch size: 99, lr: 9.01e-03, grad_scale: 64.0 2023-12-22 03:19:55,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375173.3333333333, ans=0.1 2023-12-22 03:20:05,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=375240.0, ans=0.0 2023-12-22 03:20:15,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=375306.6666666667, ans=0.0 2023-12-22 03:20:38,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=375440.0, ans=0.125 2023-12-22 03:20:46,915 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.427e+01 2.664e+01 2.840e+01 2.981e+01 3.799e+01, threshold=5.681e+01, percent-clipped=0.0 2023-12-22 03:20:47,897 INFO [train.py:886] (0/4) Epoch 12, batch 3900, loss[loss=0.01234, audio_tagging_loss=0.01234, over 25000.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4952225.98 frames. ], batch size: 100, lr: 9.00e-03, grad_scale: 64.0 2023-12-22 03:20:52,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=375506.6666666667, ans=0.125 2023-12-22 03:21:01,248 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:21:03,354 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2023-12-22 03:21:05,026 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:21:11,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=375640.0, ans=0.125 2023-12-22 03:21:26,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=375706.6666666667, ans=0.1 2023-12-22 03:21:29,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=375773.3333333333, ans=0.125 2023-12-22 03:21:39,170 INFO [train.py:886] (0/4) Epoch 12, batch 3950, loss[loss=0.01258, audio_tagging_loss=0.01258, over 25000.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 4954592.46 frames. ], batch size: 100, lr: 9.00e-03, grad_scale: 64.0 2023-12-22 03:21:49,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=375906.6666666667, ans=0.0 2023-12-22 03:21:51,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.77 vs. limit=6.0 2023-12-22 03:21:59,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=375906.6666666667, ans=0.2 2023-12-22 03:22:12,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=376040.0, ans=0.125 2023-12-22 03:22:13,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=376040.0, ans=0.125 2023-12-22 03:22:30,717 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.669e+01 2.811e+01 2.989e+01 3.723e+01, threshold=5.621e+01, percent-clipped=0.0 2023-12-22 03:22:31,694 INFO [train.py:886] (0/4) Epoch 12, batch 4000, loss[loss=0.01605, audio_tagging_loss=0.01605, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4956075.85 frames. ], batch size: 100, lr: 8.99e-03, grad_scale: 64.0 2023-12-22 03:22:36,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=376173.3333333333, ans=0.125 2023-12-22 03:22:39,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=376173.3333333333, ans=0.125 2023-12-22 03:22:41,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=376240.0, ans=0.09899494936611666 2023-12-22 03:22:55,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=376306.6666666667, ans=0.125 2023-12-22 03:22:57,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=15.0 2023-12-22 03:22:58,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=376306.6666666667, ans=0.125 2023-12-22 03:23:03,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=376373.3333333333, ans=0.0 2023-12-22 03:23:09,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=376373.3333333333, ans=0.125 2023-12-22 03:23:22,935 INFO [train.py:886] (0/4) Epoch 12, batch 4050, loss[loss=0.01447, audio_tagging_loss=0.01447, over 24750.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 4957519.32 frames. ], batch size: 99, lr: 8.99e-03, grad_scale: 128.0 2023-12-22 03:23:26,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=376506.6666666667, ans=0.125 2023-12-22 03:23:35,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=376573.3333333333, ans=0.2 2023-12-22 03:23:42,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=376640.0, ans=0.1 2023-12-22 03:23:47,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=376640.0, ans=0.1 2023-12-22 03:24:09,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=376773.3333333333, ans=0.125 2023-12-22 03:24:11,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=376773.3333333333, ans=0.125 2023-12-22 03:24:12,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=376773.3333333333, ans=0.125 2023-12-22 03:24:14,135 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.760e+01 2.894e+01 3.048e+01 3.546e+01, threshold=5.789e+01, percent-clipped=0.0 2023-12-22 03:24:14,159 INFO [train.py:886] (0/4) Epoch 12, batch 4100, loss[loss=0.01636, audio_tagging_loss=0.01636, over 24750.00 frames. ], tot_loss[loss=0.01491, audio_tagging_loss=0.01491, over 4954519.06 frames. ], batch size: 99, lr: 8.99e-03, grad_scale: 64.0 2023-12-22 03:24:23,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=376906.6666666667, ans=0.125 2023-12-22 03:24:25,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=376906.6666666667, ans=0.125 2023-12-22 03:24:49,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=377040.0, ans=0.125 2023-12-22 03:24:54,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=377106.6666666667, ans=0.125 2023-12-22 03:25:07,062 INFO [train.py:886] (0/4) Epoch 12, batch 4150, loss[loss=0.01665, audio_tagging_loss=0.01665, over 23995.00 frames. ], tot_loss[loss=0.01483, audio_tagging_loss=0.01483, over 4945857.98 frames. ], batch size: 100, lr: 8.98e-03, grad_scale: 64.0 2023-12-22 03:25:08,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=377173.3333333333, ans=0.0 2023-12-22 03:25:20,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=377240.0, ans=0.125 2023-12-22 03:25:21,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=377240.0, ans=0.1 2023-12-22 03:25:42,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.60 vs. limit=22.5 2023-12-22 03:25:58,381 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 2.599e+01 2.785e+01 2.978e+01 3.505e+01, threshold=5.570e+01, percent-clipped=0.0 2023-12-22 03:25:58,405 INFO [train.py:886] (0/4) Epoch 12, batch 4200, loss[loss=0.0133, audio_tagging_loss=0.0133, over 24750.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4949252.54 frames. ], batch size: 99, lr: 8.98e-03, grad_scale: 64.0 2023-12-22 03:26:01,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=377506.6666666667, ans=0.0 2023-12-22 03:26:01,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=377506.6666666667, ans=0.125 2023-12-22 03:26:02,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=377506.6666666667, ans=0.125 2023-12-22 03:26:10,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=377573.3333333333, ans=0.0 2023-12-22 03:26:19,512 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2023-12-22 03:26:37,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.86 vs. limit=10.0 2023-12-22 03:26:46,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=377773.3333333333, ans=0.0 2023-12-22 03:26:47,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=377773.3333333333, ans=0.0 2023-12-22 03:26:50,236 INFO [train.py:886] (0/4) Epoch 12, batch 4250, loss[loss=0.01227, audio_tagging_loss=0.01227, over 25000.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4954799.97 frames. ], batch size: 100, lr: 8.97e-03, grad_scale: 64.0 2023-12-22 03:26:57,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.69 vs. limit=12.0 2023-12-22 03:27:10,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=377973.3333333333, ans=0.125 2023-12-22 03:27:28,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=12.0 2023-12-22 03:27:41,791 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.354e+01 2.668e+01 2.819e+01 2.959e+01 3.495e+01, threshold=5.638e+01, percent-clipped=0.0 2023-12-22 03:27:41,821 INFO [train.py:886] (0/4) Epoch 12, batch 4300, loss[loss=0.01489, audio_tagging_loss=0.01489, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4959380.85 frames. ], batch size: 100, lr: 8.97e-03, grad_scale: 64.0 2023-12-22 03:27:44,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=378173.3333333333, ans=0.125 2023-12-22 03:28:08,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.01 vs. limit=12.0 2023-12-22 03:28:26,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=378440.0, ans=0.0 2023-12-22 03:28:32,935 INFO [train.py:886] (0/4) Epoch 12, batch 4350, loss[loss=0.01177, audio_tagging_loss=0.01177, over 24750.00 frames. ], tot_loss[loss=0.01466, audio_tagging_loss=0.01466, over 4959127.17 frames. ], batch size: 99, lr: 8.97e-03, grad_scale: 64.0 2023-12-22 03:28:37,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378506.6666666667, ans=0.1 2023-12-22 03:28:42,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=378573.3333333333, ans=0.05 2023-12-22 03:28:42,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=378573.3333333333, ans=0.125 2023-12-22 03:28:51,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.44 vs. limit=22.5 2023-12-22 03:29:03,753 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=9.549e-02 2023-12-22 03:29:04,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=378706.6666666667, ans=0.125 2023-12-22 03:29:06,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=378706.6666666667, ans=0.125 2023-12-22 03:29:12,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=378706.6666666667, ans=0.125 2023-12-22 03:29:25,511 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.769e+01 2.901e+01 3.056e+01 3.737e+01, threshold=5.802e+01, percent-clipped=0.0 2023-12-22 03:29:25,535 INFO [train.py:886] (0/4) Epoch 12, batch 4400, loss[loss=0.01176, audio_tagging_loss=0.01176, over 23972.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4954772.01 frames. ], batch size: 100, lr: 8.96e-03, grad_scale: 64.0 2023-12-22 03:29:26,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=378840.0, ans=0.125 2023-12-22 03:29:29,551 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.61 vs. limit=15.0 2023-12-22 03:29:35,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=378906.6666666667, ans=0.2 2023-12-22 03:30:02,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=379040.0, ans=0.125 2023-12-22 03:30:17,511 INFO [train.py:886] (0/4) Epoch 12, batch 4450, loss[loss=0.01414, audio_tagging_loss=0.01414, over 25000.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4947854.99 frames. ], batch size: 100, lr: 8.96e-03, grad_scale: 64.0 2023-12-22 03:30:28,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=379240.0, ans=15.0 2023-12-22 03:30:47,316 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2023-12-22 03:30:50,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.89 vs. limit=15.0 2023-12-22 03:30:56,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=379373.3333333333, ans=0.125 2023-12-22 03:30:58,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=379440.0, ans=0.125 2023-12-22 03:31:00,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=379440.0, ans=0.125 2023-12-22 03:31:09,035 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 2.635e+01 2.785e+01 2.931e+01 3.625e+01, threshold=5.571e+01, percent-clipped=0.0 2023-12-22 03:31:09,059 INFO [train.py:886] (0/4) Epoch 12, batch 4500, loss[loss=0.01265, audio_tagging_loss=0.01265, over 24750.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4949180.81 frames. ], batch size: 99, lr: 8.96e-03, grad_scale: 64.0 2023-12-22 03:31:42,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=379706.6666666667, ans=0.125 2023-12-22 03:32:00,482 INFO [train.py:886] (0/4) Epoch 12, batch 4550, loss[loss=0.01478, audio_tagging_loss=0.01478, over 24750.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 4947999.27 frames. ], batch size: 99, lr: 8.95e-03, grad_scale: 64.0 2023-12-22 03:32:00,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=379840.0, ans=0.0 2023-12-22 03:32:02,600 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:32:13,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.20 vs. limit=22.5 2023-12-22 03:32:20,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=379973.3333333333, ans=0.0 2023-12-22 03:32:23,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=379973.3333333333, ans=0.1 2023-12-22 03:32:36,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.61 vs. limit=15.0 2023-12-22 03:32:37,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2023-12-22 03:32:44,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=12.0 2023-12-22 03:32:51,249 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 2.640e+01 2.744e+01 2.893e+01 3.340e+01, threshold=5.488e+01, percent-clipped=0.0 2023-12-22 03:32:51,285 INFO [train.py:886] (0/4) Epoch 12, batch 4600, loss[loss=0.01349, audio_tagging_loss=0.01349, over 25000.00 frames. ], tot_loss[loss=0.01471, audio_tagging_loss=0.01471, over 4942088.62 frames. ], batch size: 100, lr: 8.95e-03, grad_scale: 64.0 2023-12-22 03:32:51,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=380173.3333333333, ans=0.125 2023-12-22 03:32:58,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=380173.3333333333, ans=0.0 2023-12-22 03:32:58,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=380173.3333333333, ans=0.07 2023-12-22 03:33:00,267 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2023-12-22 03:33:06,231 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.383e-01 2023-12-22 03:33:16,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=380306.6666666667, ans=0.0 2023-12-22 03:33:42,194 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2023-12-22 03:33:42,664 INFO [train.py:886] (0/4) Epoch 12, batch 4650, loss[loss=0.01506, audio_tagging_loss=0.01506, over 25000.00 frames. ], tot_loss[loss=0.01474, audio_tagging_loss=0.01474, over 4939999.06 frames. ], batch size: 100, lr: 8.94e-03, grad_scale: 64.0 2023-12-22 03:33:44,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=380506.6666666667, ans=0.125 2023-12-22 03:33:56,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=380573.3333333333, ans=0.125 2023-12-22 03:34:32,595 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.367e+01 2.686e+01 2.822e+01 3.003e+01 3.524e+01, threshold=5.643e+01, percent-clipped=0.0 2023-12-22 03:34:32,619 INFO [train.py:886] (0/4) Epoch 12, batch 4700, loss[loss=0.01808, audio_tagging_loss=0.01808, over 24750.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4937885.33 frames. ], batch size: 99, lr: 8.94e-03, grad_scale: 64.0 2023-12-22 03:34:34,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=380840.0, ans=0.125 2023-12-22 03:34:38,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=380840.0, ans=0.125 2023-12-22 03:34:40,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=380840.0, ans=0.125 2023-12-22 03:34:41,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2023-12-22 03:34:51,888 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2023-12-22 03:34:52,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=380973.3333333333, ans=0.0 2023-12-22 03:35:08,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.83 vs. limit=15.0 2023-12-22 03:35:18,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=381106.6666666667, ans=0.0 2023-12-22 03:35:20,076 INFO [train.py:886] (0/4) Epoch 12, batch 4750, loss[loss=0.01256, audio_tagging_loss=0.01256, over 24750.00 frames. ], tot_loss[loss=0.01483, audio_tagging_loss=0.01483, over 4934448.53 frames. ], batch size: 99, lr: 8.94e-03, grad_scale: 64.0 2023-12-22 03:35:26,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=381173.3333333333, ans=0.125 2023-12-22 03:35:35,520 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-12.pt 2023-12-22 03:35:57,808 INFO [train.py:886] (0/4) Epoch 13, batch 0, loss[loss=0.0318, audio_tagging_loss=0.0318, over 24007.00 frames. ], tot_loss[loss=0.0318, audio_tagging_loss=0.0318, over 24007.00 frames. ], batch size: 100, lr: 8.59e-03, grad_scale: 32.0 2023-12-22 03:35:57,809 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 03:36:18,413 INFO [train.py:917] (0/4) Epoch 13, validation: loss=0.03383, audio_tagging_loss=0.03383, over 3737520.00 frames. 2023-12-22 03:36:18,414 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 03:36:18,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=381280.0, ans=0.05 2023-12-22 03:36:22,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=381280.0, ans=0.125 2023-12-22 03:36:27,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2023-12-22 03:36:46,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=381413.3333333333, ans=0.125 2023-12-22 03:36:54,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=381480.0, ans=0.125 2023-12-22 03:36:54,912 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.369e+01 2.829e+01 3.058e+01 3.866e+01 8.495e+01, threshold=6.115e+01, percent-clipped=6.0 2023-12-22 03:37:10,147 INFO [train.py:886] (0/4) Epoch 13, batch 50, loss[loss=0.01818, audio_tagging_loss=0.01818, over 25000.00 frames. ], tot_loss[loss=0.02331, audio_tagging_loss=0.02331, over 1113502.92 frames. ], batch size: 100, lr: 8.58e-03, grad_scale: 32.0 2023-12-22 03:37:10,568 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.21 vs. limit=22.5 2023-12-22 03:37:14,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.24 vs. limit=22.5 2023-12-22 03:37:38,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=381746.6666666667, ans=0.125 2023-12-22 03:37:47,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=381813.3333333333, ans=0.125 2023-12-22 03:37:58,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=381880.0, ans=0.0 2023-12-22 03:38:02,621 INFO [train.py:886] (0/4) Epoch 13, batch 100, loss[loss=0.01451, audio_tagging_loss=0.01451, over 25000.00 frames. ], tot_loss[loss=0.02015, audio_tagging_loss=0.02015, over 1964339.03 frames. ], batch size: 100, lr: 8.58e-03, grad_scale: 32.0 2023-12-22 03:38:04,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=381946.6666666667, ans=0.2 2023-12-22 03:38:11,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=381946.6666666667, ans=0.95 2023-12-22 03:38:14,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=382013.3333333333, ans=0.2 2023-12-22 03:38:15,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=382013.3333333333, ans=0.125 2023-12-22 03:38:30,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=382080.0, ans=0.125 2023-12-22 03:38:39,108 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.608e+01 2.907e+01 3.118e+01 3.285e+01 3.851e+01, threshold=6.236e+01, percent-clipped=0.0 2023-12-22 03:38:39,701 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.84 vs. limit=6.0 2023-12-22 03:38:47,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=382213.3333333333, ans=0.125 2023-12-22 03:38:48,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=382213.3333333333, ans=0.125 2023-12-22 03:38:48,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=382213.3333333333, ans=0.1 2023-12-22 03:38:52,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=382213.3333333333, ans=0.0 2023-12-22 03:38:54,830 INFO [train.py:886] (0/4) Epoch 13, batch 150, loss[loss=0.01692, audio_tagging_loss=0.01692, over 25000.00 frames. ], tot_loss[loss=0.01833, audio_tagging_loss=0.01833, over 2630278.71 frames. ], batch size: 100, lr: 8.58e-03, grad_scale: 32.0 2023-12-22 03:39:01,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=382280.0, ans=0.1 2023-12-22 03:39:02,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=382280.0, ans=0.125 2023-12-22 03:39:08,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=382346.6666666667, ans=0.1 2023-12-22 03:39:14,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=382413.3333333333, ans=0.125 2023-12-22 03:39:33,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=382480.0, ans=0.0 2023-12-22 03:39:33,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=382480.0, ans=0.125 2023-12-22 03:39:38,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=15.0 2023-12-22 03:39:42,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=382546.6666666667, ans=0.0 2023-12-22 03:39:46,373 INFO [train.py:886] (0/4) Epoch 13, batch 200, loss[loss=0.01975, audio_tagging_loss=0.01975, over 24750.00 frames. ], tot_loss[loss=0.0172, audio_tagging_loss=0.0172, over 3143306.99 frames. ], batch size: 99, lr: 8.57e-03, grad_scale: 32.0 2023-12-22 03:40:02,556 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.095e-01 2023-12-22 03:40:10,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=382746.6666666667, ans=0.125 2023-12-22 03:40:14,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=382746.6666666667, ans=0.05 2023-12-22 03:40:21,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=382813.3333333333, ans=0.5 2023-12-22 03:40:22,761 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.728e+01 2.862e+01 2.980e+01 3.546e+01, threshold=5.723e+01, percent-clipped=0.0 2023-12-22 03:40:34,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.86 vs. limit=22.5 2023-12-22 03:40:38,315 INFO [train.py:886] (0/4) Epoch 13, batch 250, loss[loss=0.01254, audio_tagging_loss=0.01254, over 24083.00 frames. ], tot_loss[loss=0.01648, audio_tagging_loss=0.01648, over 3542358.99 frames. ], batch size: 100, lr: 8.57e-03, grad_scale: 32.0 2023-12-22 03:40:57,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=383080.0, ans=0.2 2023-12-22 03:41:04,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=383080.0, ans=0.5 2023-12-22 03:41:07,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=383080.0, ans=0.125 2023-12-22 03:41:10,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=383146.6666666667, ans=0.0 2023-12-22 03:41:15,870 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.11 vs. limit=15.0 2023-12-22 03:41:16,529 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.93 vs. limit=15.0 2023-12-22 03:41:17,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=383146.6666666667, ans=0.125 2023-12-22 03:41:21,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.43 vs. limit=22.5 2023-12-22 03:41:27,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=383213.3333333333, ans=0.125 2023-12-22 03:41:29,770 INFO [train.py:886] (0/4) Epoch 13, batch 300, loss[loss=0.01552, audio_tagging_loss=0.01552, over 24750.00 frames. ], tot_loss[loss=0.01616, audio_tagging_loss=0.01616, over 3852756.44 frames. ], batch size: 99, lr: 8.56e-03, grad_scale: 32.0 2023-12-22 03:41:30,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=383280.0, ans=0.125 2023-12-22 03:41:37,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=383280.0, ans=0.1 2023-12-22 03:41:37,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=383280.0, ans=0.125 2023-12-22 03:41:40,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2023-12-22 03:41:53,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=383413.3333333333, ans=0.125 2023-12-22 03:42:06,081 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.47 vs. limit=10.0 2023-12-22 03:42:06,351 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.359e+01 2.665e+01 2.854e+01 3.043e+01 3.614e+01, threshold=5.708e+01, percent-clipped=0.0 2023-12-22 03:42:11,418 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2023-12-22 03:42:16,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.20 vs. limit=22.5 2023-12-22 03:42:22,027 INFO [train.py:886] (0/4) Epoch 13, batch 350, loss[loss=0.01715, audio_tagging_loss=0.01715, over 24750.00 frames. ], tot_loss[loss=0.01583, audio_tagging_loss=0.01583, over 4088019.18 frames. ], batch size: 99, lr: 8.56e-03, grad_scale: 32.0 2023-12-22 03:42:34,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=383680.0, ans=0.125 2023-12-22 03:42:36,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=383680.0, ans=0.125 2023-12-22 03:42:40,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=383680.0, ans=0.2 2023-12-22 03:42:40,642 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.29 vs. limit=22.5 2023-12-22 03:42:45,482 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2023-12-22 03:42:52,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=383813.3333333333, ans=15.0 2023-12-22 03:43:14,604 INFO [train.py:886] (0/4) Epoch 13, batch 400, loss[loss=0.01708, audio_tagging_loss=0.01708, over 22364.00 frames. ], tot_loss[loss=0.01547, audio_tagging_loss=0.01547, over 4278539.35 frames. ], batch size: 107, lr: 8.56e-03, grad_scale: 32.0 2023-12-22 03:43:17,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=383946.6666666667, ans=0.125 2023-12-22 03:43:18,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.76 vs. limit=22.5 2023-12-22 03:43:24,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=384013.3333333333, ans=0.0 2023-12-22 03:43:33,239 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.53 vs. limit=10.0 2023-12-22 03:43:40,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.37 vs. limit=22.5 2023-12-22 03:43:44,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=384080.0, ans=0.0 2023-12-22 03:43:51,102 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 2.676e+01 2.770e+01 2.913e+01 3.430e+01, threshold=5.541e+01, percent-clipped=0.0 2023-12-22 03:44:05,952 INFO [train.py:886] (0/4) Epoch 13, batch 450, loss[loss=0.01686, audio_tagging_loss=0.01686, over 25000.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4422734.67 frames. ], batch size: 100, lr: 8.55e-03, grad_scale: 32.0 2023-12-22 03:44:15,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=384280.0, ans=0.0 2023-12-22 03:44:20,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=384346.6666666667, ans=0.0 2023-12-22 03:44:21,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=384346.6666666667, ans=0.125 2023-12-22 03:44:28,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=384413.3333333333, ans=0.0 2023-12-22 03:44:29,307 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.15 vs. limit=6.0 2023-12-22 03:44:50,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=384546.6666666667, ans=0.125 2023-12-22 03:44:52,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=384546.6666666667, ans=0.2 2023-12-22 03:44:58,220 INFO [train.py:886] (0/4) Epoch 13, batch 500, loss[loss=0.01399, audio_tagging_loss=0.01399, over 22299.00 frames. ], tot_loss[loss=0.015, audio_tagging_loss=0.015, over 4541450.20 frames. ], batch size: 107, lr: 8.55e-03, grad_scale: 32.0 2023-12-22 03:45:07,032 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2023-12-22 03:45:30,934 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.419e-03 2023-12-22 03:45:33,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=384813.3333333333, ans=0.125 2023-12-22 03:45:34,429 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.645e+01 2.803e+01 2.992e+01 3.772e+01, threshold=5.607e+01, percent-clipped=0.0 2023-12-22 03:45:34,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=384813.3333333333, ans=0.0 2023-12-22 03:45:41,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=384880.0, ans=0.0 2023-12-22 03:45:50,086 INFO [train.py:886] (0/4) Epoch 13, batch 550, loss[loss=0.01426, audio_tagging_loss=0.01426, over 25000.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4637215.07 frames. ], batch size: 100, lr: 8.55e-03, grad_scale: 32.0 2023-12-22 03:46:01,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.55 vs. limit=10.0 2023-12-22 03:46:06,110 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.27 vs. limit=15.0 2023-12-22 03:46:14,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=385080.0, ans=0.0 2023-12-22 03:46:14,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.62 vs. limit=12.0 2023-12-22 03:46:17,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=385080.0, ans=0.125 2023-12-22 03:46:17,261 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2023-12-22 03:46:41,737 INFO [train.py:886] (0/4) Epoch 13, batch 600, loss[loss=0.01674, audio_tagging_loss=0.01674, over 24750.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4702789.70 frames. ], batch size: 99, lr: 8.54e-03, grad_scale: 32.0 2023-12-22 03:46:43,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=385280.0, ans=0.0 2023-12-22 03:47:17,928 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 2.658e+01 2.809e+01 2.976e+01 3.464e+01, threshold=5.617e+01, percent-clipped=0.0 2023-12-22 03:47:19,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=385480.0, ans=0.125 2023-12-22 03:47:29,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=385546.6666666667, ans=0.0 2023-12-22 03:47:32,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=385613.3333333333, ans=0.125 2023-12-22 03:47:33,623 INFO [train.py:886] (0/4) Epoch 13, batch 650, loss[loss=0.01276, audio_tagging_loss=0.01276, over 24750.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4752048.76 frames. ], batch size: 99, lr: 8.54e-03, grad_scale: 32.0 2023-12-22 03:47:38,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=385613.3333333333, ans=0.05 2023-12-22 03:47:38,826 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.10 vs. limit=15.0 2023-12-22 03:47:46,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=385680.0, ans=0.125 2023-12-22 03:47:46,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=385680.0, ans=0.125 2023-12-22 03:47:48,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=385680.0, ans=0.5 2023-12-22 03:47:53,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=385746.6666666667, ans=0.125 2023-12-22 03:48:01,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=385746.6666666667, ans=0.2 2023-12-22 03:48:10,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=385813.3333333333, ans=0.02 2023-12-22 03:48:14,866 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.78 vs. limit=15.0 2023-12-22 03:48:16,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=385880.0, ans=0.125 2023-12-22 03:48:24,009 INFO [train.py:886] (0/4) Epoch 13, batch 700, loss[loss=0.01677, audio_tagging_loss=0.01677, over 21928.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4791765.88 frames. ], batch size: 107, lr: 8.53e-03, grad_scale: 32.0 2023-12-22 03:48:24,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2023-12-22 03:48:28,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2023-12-22 03:48:35,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.12 vs. limit=22.5 2023-12-22 03:48:40,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=386013.3333333333, ans=0.1 2023-12-22 03:48:40,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.75 vs. limit=10.0 2023-12-22 03:48:42,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=386013.3333333333, ans=0.125 2023-12-22 03:48:47,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=386080.0, ans=0.125 2023-12-22 03:48:50,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.29 vs. limit=15.0 2023-12-22 03:49:01,126 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.365e+01 2.667e+01 2.765e+01 2.953e+01 3.565e+01, threshold=5.530e+01, percent-clipped=0.0 2023-12-22 03:49:02,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=386146.6666666667, ans=0.0 2023-12-22 03:49:04,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=386146.6666666667, ans=0.1 2023-12-22 03:49:08,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=386213.3333333333, ans=0.0 2023-12-22 03:49:17,654 INFO [train.py:886] (0/4) Epoch 13, batch 750, loss[loss=0.01586, audio_tagging_loss=0.01586, over 25000.00 frames. ], tot_loss[loss=0.01476, audio_tagging_loss=0.01476, over 4829942.49 frames. ], batch size: 100, lr: 8.53e-03, grad_scale: 32.0 2023-12-22 03:49:35,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=386346.6666666667, ans=0.125 2023-12-22 03:49:45,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.80 vs. limit=22.5 2023-12-22 03:49:51,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=386480.0, ans=0.125 2023-12-22 03:49:57,679 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:50:02,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=386546.6666666667, ans=0.0 2023-12-22 03:50:08,444 INFO [train.py:886] (0/4) Epoch 13, batch 800, loss[loss=0.01348, audio_tagging_loss=0.01348, over 25000.00 frames. ], tot_loss[loss=0.01465, audio_tagging_loss=0.01465, over 4857653.50 frames. ], batch size: 100, lr: 8.53e-03, grad_scale: 32.0 2023-12-22 03:50:19,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2023-12-22 03:50:43,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=386813.3333333333, ans=0.1 2023-12-22 03:50:45,490 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.686e+01 2.811e+01 2.962e+01 3.601e+01, threshold=5.623e+01, percent-clipped=0.0 2023-12-22 03:50:49,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=386813.3333333333, ans=0.09899494936611666 2023-12-22 03:50:57,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=386880.0, ans=0.125 2023-12-22 03:51:01,121 INFO [train.py:886] (0/4) Epoch 13, batch 850, loss[loss=0.01443, audio_tagging_loss=0.01443, over 25000.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 4887239.97 frames. ], batch size: 100, lr: 8.52e-03, grad_scale: 32.0 2023-12-22 03:51:02,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=386946.6666666667, ans=0.1 2023-12-22 03:51:05,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=386946.6666666667, ans=0.125 2023-12-22 03:51:23,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=387080.0, ans=0.125 2023-12-22 03:51:26,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=387080.0, ans=0.2 2023-12-22 03:51:30,873 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-12-22 03:51:38,594 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2023-12-22 03:51:41,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=387213.3333333333, ans=0.05 2023-12-22 03:51:52,670 INFO [train.py:886] (0/4) Epoch 13, batch 900, loss[loss=0.01585, audio_tagging_loss=0.01585, over 24953.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4898767.26 frames. ], batch size: 100, lr: 8.52e-03, grad_scale: 32.0 2023-12-22 03:52:03,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=387346.6666666667, ans=0.0 2023-12-22 03:52:19,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=387413.3333333333, ans=0.125 2023-12-22 03:52:27,727 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:52:29,459 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.739e+01 2.831e+01 3.018e+01 3.521e+01, threshold=5.662e+01, percent-clipped=0.0 2023-12-22 03:52:40,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=387546.6666666667, ans=0.0 2023-12-22 03:52:41,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=387546.6666666667, ans=0.1 2023-12-22 03:52:44,297 INFO [train.py:886] (0/4) Epoch 13, batch 950, loss[loss=0.01431, audio_tagging_loss=0.01431, over 25000.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4905657.95 frames. ], batch size: 100, lr: 8.52e-03, grad_scale: 32.0 2023-12-22 03:52:54,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=387680.0, ans=0.125 2023-12-22 03:53:13,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.34 vs. limit=22.5 2023-12-22 03:53:30,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=387880.0, ans=0.0 2023-12-22 03:53:36,360 INFO [train.py:886] (0/4) Epoch 13, batch 1000, loss[loss=0.01301, audio_tagging_loss=0.01301, over 24750.00 frames. ], tot_loss[loss=0.01465, audio_tagging_loss=0.01465, over 4914011.22 frames. ], batch size: 99, lr: 8.51e-03, grad_scale: 32.0 2023-12-22 03:53:45,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=388013.3333333333, ans=0.125 2023-12-22 03:53:46,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=388013.3333333333, ans=0.125 2023-12-22 03:53:48,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388013.3333333333, ans=0.1 2023-12-22 03:53:57,443 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:54:07,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=388146.6666666667, ans=0.2 2023-12-22 03:54:12,660 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.361e+01 2.670e+01 2.865e+01 3.020e+01 3.813e+01, threshold=5.731e+01, percent-clipped=0.0 2023-12-22 03:54:26,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=15.0 2023-12-22 03:54:28,331 INFO [train.py:886] (0/4) Epoch 13, batch 1050, loss[loss=0.01588, audio_tagging_loss=0.01588, over 24750.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4919788.41 frames. ], batch size: 99, lr: 8.51e-03, grad_scale: 32.0 2023-12-22 03:54:32,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=388280.0, ans=0.125 2023-12-22 03:54:48,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=388413.3333333333, ans=0.125 2023-12-22 03:54:56,507 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.85 vs. limit=15.0 2023-12-22 03:55:01,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=388480.0, ans=0.125 2023-12-22 03:55:02,159 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2023-12-22 03:55:20,193 INFO [train.py:886] (0/4) Epoch 13, batch 1100, loss[loss=0.01486, audio_tagging_loss=0.01486, over 24750.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4928127.32 frames. ], batch size: 99, lr: 8.51e-03, grad_scale: 32.0 2023-12-22 03:55:46,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=388746.6666666667, ans=0.0 2023-12-22 03:55:53,761 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2023-12-22 03:55:56,136 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.370e+01 2.628e+01 2.785e+01 2.893e+01 3.439e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-22 03:56:11,806 INFO [train.py:886] (0/4) Epoch 13, batch 1150, loss[loss=0.01496, audio_tagging_loss=0.01496, over 25000.00 frames. ], tot_loss[loss=0.01465, audio_tagging_loss=0.01465, over 4934008.64 frames. ], batch size: 100, lr: 8.50e-03, grad_scale: 32.0 2023-12-22 03:56:12,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=388946.6666666667, ans=0.1 2023-12-22 03:56:16,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=388946.6666666667, ans=0.125 2023-12-22 03:57:01,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=389213.3333333333, ans=0.125 2023-12-22 03:57:04,150 INFO [train.py:886] (0/4) Epoch 13, batch 1200, loss[loss=0.01563, audio_tagging_loss=0.01563, over 25000.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4933420.37 frames. ], batch size: 100, lr: 8.50e-03, grad_scale: 32.0 2023-12-22 03:57:10,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=389280.0, ans=0.1 2023-12-22 03:57:11,833 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:57:30,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=389413.3333333333, ans=0.0 2023-12-22 03:57:37,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389480.0, ans=0.1 2023-12-22 03:57:41,007 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.433e+01 2.655e+01 2.812e+01 2.956e+01 4.274e+01, threshold=5.623e+01, percent-clipped=0.0 2023-12-22 03:57:41,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.96 vs. limit=10.0 2023-12-22 03:57:45,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=389546.6666666667, ans=0.09899494936611666 2023-12-22 03:57:46,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2023-12-22 03:57:47,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389546.6666666667, ans=0.1 2023-12-22 03:57:50,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.41 vs. limit=22.5 2023-12-22 03:57:55,912 INFO [train.py:886] (0/4) Epoch 13, batch 1250, loss[loss=0.01679, audio_tagging_loss=0.01679, over 22007.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4928891.71 frames. ], batch size: 107, lr: 8.50e-03, grad_scale: 32.0 2023-12-22 03:57:56,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=389613.3333333333, ans=0.125 2023-12-22 03:58:08,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=389680.0, ans=0.1 2023-12-22 03:58:44,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=389880.0, ans=0.125 2023-12-22 03:58:44,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.84 vs. limit=15.0 2023-12-22 03:58:47,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.96 vs. limit=6.0 2023-12-22 03:58:48,333 INFO [train.py:886] (0/4) Epoch 13, batch 1300, loss[loss=0.0155, audio_tagging_loss=0.0155, over 21747.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4926690.46 frames. ], batch size: 107, lr: 8.49e-03, grad_scale: 32.0 2023-12-22 03:58:56,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=389946.6666666667, ans=0.125 2023-12-22 03:58:56,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2023-12-22 03:58:59,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.71 vs. limit=22.5 2023-12-22 03:59:00,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=390013.3333333333, ans=0.125 2023-12-22 03:59:01,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=390013.3333333333, ans=0.5 2023-12-22 03:59:21,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=390146.6666666667, ans=0.2 2023-12-22 03:59:24,295 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.505e+01 2.761e+01 2.854e+01 2.985e+01 3.406e+01, threshold=5.709e+01, percent-clipped=0.0 2023-12-22 03:59:32,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=390213.3333333333, ans=0.05 2023-12-22 03:59:39,918 INFO [train.py:886] (0/4) Epoch 13, batch 1350, loss[loss=0.01479, audio_tagging_loss=0.01479, over 24750.00 frames. ], tot_loss[loss=0.01486, audio_tagging_loss=0.01486, over 4925585.84 frames. ], batch size: 99, lr: 8.49e-03, grad_scale: 32.0 2023-12-22 03:59:42,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=390280.0, ans=10.0 2023-12-22 04:00:13,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=390480.0, ans=0.125 2023-12-22 04:00:15,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=390480.0, ans=0.0 2023-12-22 04:00:29,493 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=15.0 2023-12-22 04:00:32,527 INFO [train.py:886] (0/4) Epoch 13, batch 1400, loss[loss=0.0149, audio_tagging_loss=0.0149, over 25000.00 frames. ], tot_loss[loss=0.01479, audio_tagging_loss=0.01479, over 4938276.20 frames. ], batch size: 100, lr: 8.48e-03, grad_scale: 32.0 2023-12-22 04:01:08,892 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+01 2.605e+01 2.750e+01 2.989e+01 3.413e+01, threshold=5.499e+01, percent-clipped=0.0 2023-12-22 04:01:10,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.38 vs. limit=15.0 2023-12-22 04:01:12,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=390880.0, ans=0.125 2023-12-22 04:01:20,099 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:01:24,372 INFO [train.py:886] (0/4) Epoch 13, batch 1450, loss[loss=0.01377, audio_tagging_loss=0.01377, over 25000.00 frames. ], tot_loss[loss=0.01464, audio_tagging_loss=0.01464, over 4940497.30 frames. ], batch size: 100, lr: 8.48e-03, grad_scale: 32.0 2023-12-22 04:01:59,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2023-12-22 04:02:07,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=391213.3333333333, ans=0.125 2023-12-22 04:02:15,789 INFO [train.py:886] (0/4) Epoch 13, batch 1500, loss[loss=0.01325, audio_tagging_loss=0.01325, over 25000.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 4935745.18 frames. ], batch size: 100, lr: 8.48e-03, grad_scale: 32.0 2023-12-22 04:02:37,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=391413.3333333333, ans=0.1 2023-12-22 04:02:51,066 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.423e+01 2.726e+01 2.853e+01 3.010e+01 3.854e+01, threshold=5.706e+01, percent-clipped=0.0 2023-12-22 04:02:57,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=22.5 2023-12-22 04:03:06,650 INFO [train.py:886] (0/4) Epoch 13, batch 1550, loss[loss=0.01492, audio_tagging_loss=0.01492, over 24750.00 frames. ], tot_loss[loss=0.01471, audio_tagging_loss=0.01471, over 4940191.24 frames. ], batch size: 99, lr: 8.47e-03, grad_scale: 32.0 2023-12-22 04:03:57,485 INFO [train.py:886] (0/4) Epoch 13, batch 1600, loss[loss=0.01332, audio_tagging_loss=0.01332, over 24750.00 frames. ], tot_loss[loss=0.01473, audio_tagging_loss=0.01473, over 4938147.50 frames. ], batch size: 99, lr: 8.47e-03, grad_scale: 32.0 2023-12-22 04:04:12,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=392013.3333333333, ans=0.0 2023-12-22 04:04:24,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=392080.0, ans=0.125 2023-12-22 04:04:34,942 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+01 2.766e+01 2.914e+01 3.092e+01 3.457e+01, threshold=5.827e+01, percent-clipped=0.0 2023-12-22 04:04:36,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=392146.6666666667, ans=0.125 2023-12-22 04:04:50,742 INFO [train.py:886] (0/4) Epoch 13, batch 1650, loss[loss=0.01454, audio_tagging_loss=0.01454, over 24750.00 frames. ], tot_loss[loss=0.01466, audio_tagging_loss=0.01466, over 4940134.71 frames. ], batch size: 99, lr: 8.47e-03, grad_scale: 32.0 2023-12-22 04:04:51,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2023-12-22 04:04:59,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=392346.6666666667, ans=0.125 2023-12-22 04:04:59,421 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.998e-01 2023-12-22 04:05:07,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=392346.6666666667, ans=0.0 2023-12-22 04:05:27,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.84 vs. limit=22.5 2023-12-22 04:05:37,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=392546.6666666667, ans=0.125 2023-12-22 04:05:40,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=22.5 2023-12-22 04:05:42,335 INFO [train.py:886] (0/4) Epoch 13, batch 1700, loss[loss=0.01414, audio_tagging_loss=0.01414, over 25000.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4941115.95 frames. ], batch size: 100, lr: 8.46e-03, grad_scale: 32.0 2023-12-22 04:05:48,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=392613.3333333333, ans=0.2 2023-12-22 04:06:18,232 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.384e+01 2.671e+01 2.847e+01 3.058e+01 3.677e+01, threshold=5.694e+01, percent-clipped=0.0 2023-12-22 04:06:33,334 INFO [train.py:886] (0/4) Epoch 13, batch 1750, loss[loss=0.01655, audio_tagging_loss=0.01655, over 25000.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4946667.69 frames. ], batch size: 100, lr: 8.46e-03, grad_scale: 32.0 2023-12-22 04:06:39,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=392946.6666666667, ans=0.125 2023-12-22 04:06:42,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.42 vs. limit=22.5 2023-12-22 04:06:58,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=393080.0, ans=0.1 2023-12-22 04:07:25,436 INFO [train.py:886] (0/4) Epoch 13, batch 1800, loss[loss=0.01602, audio_tagging_loss=0.01602, over 25000.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4952548.79 frames. ], batch size: 100, lr: 8.46e-03, grad_scale: 32.0 2023-12-22 04:07:40,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=393346.6666666667, ans=0.125 2023-12-22 04:07:40,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=393346.6666666667, ans=0.2 2023-12-22 04:07:47,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=393413.3333333333, ans=0.0 2023-12-22 04:07:48,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=393413.3333333333, ans=0.0 2023-12-22 04:07:57,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=393480.0, ans=0.09899494936611666 2023-12-22 04:07:58,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=393480.0, ans=0.125 2023-12-22 04:08:02,356 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.405e+01 2.697e+01 2.832e+01 3.014e+01 3.728e+01, threshold=5.665e+01, percent-clipped=0.0 2023-12-22 04:08:17,471 INFO [train.py:886] (0/4) Epoch 13, batch 1850, loss[loss=0.01563, audio_tagging_loss=0.01563, over 24750.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4956392.63 frames. ], batch size: 99, lr: 8.45e-03, grad_scale: 32.0 2023-12-22 04:08:34,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=393680.0, ans=0.0 2023-12-22 04:08:36,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=393680.0, ans=0.125 2023-12-22 04:08:40,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=393746.6666666667, ans=0.1 2023-12-22 04:08:41,698 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=15.0 2023-12-22 04:08:51,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=393813.3333333333, ans=0.1 2023-12-22 04:09:09,391 INFO [train.py:886] (0/4) Epoch 13, batch 1900, loss[loss=0.01424, audio_tagging_loss=0.01424, over 24750.00 frames. ], tot_loss[loss=0.01468, audio_tagging_loss=0.01468, over 4951978.53 frames. ], batch size: 99, lr: 8.45e-03, grad_scale: 32.0 2023-12-22 04:09:11,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=393946.6666666667, ans=0.0 2023-12-22 04:09:19,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=394013.3333333333, ans=0.1 2023-12-22 04:09:19,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.74 vs. limit=15.0 2023-12-22 04:09:23,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=394013.3333333333, ans=0.0 2023-12-22 04:09:37,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=394080.0, ans=0.0 2023-12-22 04:09:39,777 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.360e-02 2023-12-22 04:09:45,142 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.423e+01 2.721e+01 2.913e+01 3.026e+01 3.443e+01, threshold=5.825e+01, percent-clipped=0.0 2023-12-22 04:09:51,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=394213.3333333333, ans=0.0 2023-12-22 04:09:52,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=394213.3333333333, ans=0.1 2023-12-22 04:09:54,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=394213.3333333333, ans=15.0 2023-12-22 04:10:00,994 INFO [train.py:886] (0/4) Epoch 13, batch 1950, loss[loss=0.01398, audio_tagging_loss=0.01398, over 25000.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4951195.04 frames. ], batch size: 100, lr: 8.45e-03, grad_scale: 32.0 2023-12-22 04:10:10,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.48 vs. limit=22.5 2023-12-22 04:10:18,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=394346.6666666667, ans=0.0 2023-12-22 04:10:21,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=394413.3333333333, ans=0.0 2023-12-22 04:10:36,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=394480.0, ans=0.125 2023-12-22 04:10:44,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=394546.6666666667, ans=0.07 2023-12-22 04:10:51,751 INFO [train.py:886] (0/4) Epoch 13, batch 2000, loss[loss=0.01509, audio_tagging_loss=0.01509, over 25000.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4951312.97 frames. ], batch size: 100, lr: 8.44e-03, grad_scale: 64.0 2023-12-22 04:10:52,239 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.43 vs. limit=12.0 2023-12-22 04:10:56,624 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.26 vs. limit=15.0 2023-12-22 04:10:58,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=394613.3333333333, ans=0.125 2023-12-22 04:11:15,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.73 vs. limit=10.0 2023-12-22 04:11:17,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=394746.6666666667, ans=0.125 2023-12-22 04:11:23,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=394813.3333333333, ans=0.1 2023-12-22 04:11:28,347 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.365e+01 2.715e+01 2.846e+01 3.050e+01 3.536e+01, threshold=5.692e+01, percent-clipped=0.0 2023-12-22 04:11:39,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.80 vs. limit=15.0 2023-12-22 04:11:41,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=394880.0, ans=0.125 2023-12-22 04:11:44,776 INFO [train.py:886] (0/4) Epoch 13, batch 2050, loss[loss=0.01402, audio_tagging_loss=0.01402, over 25000.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 4951229.14 frames. ], batch size: 100, lr: 8.44e-03, grad_scale: 64.0 2023-12-22 04:12:00,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=395013.3333333333, ans=0.125 2023-12-22 04:12:00,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=395013.3333333333, ans=0.0 2023-12-22 04:12:13,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2023-12-22 04:12:15,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2023-12-22 04:12:31,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=395213.3333333333, ans=0.0 2023-12-22 04:12:33,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=395213.3333333333, ans=0.0 2023-12-22 04:12:34,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=395213.3333333333, ans=0.125 2023-12-22 04:12:35,685 INFO [train.py:886] (0/4) Epoch 13, batch 2100, loss[loss=0.01319, audio_tagging_loss=0.01319, over 25000.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 4956088.94 frames. ], batch size: 100, lr: 8.44e-03, grad_scale: 64.0 2023-12-22 04:12:44,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=395280.0, ans=0.1 2023-12-22 04:12:55,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=395346.6666666667, ans=0.125 2023-12-22 04:13:12,414 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+01 2.725e+01 2.922e+01 3.047e+01 3.389e+01, threshold=5.843e+01, percent-clipped=0.0 2023-12-22 04:13:21,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=395546.6666666667, ans=0.125 2023-12-22 04:13:27,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=395613.3333333333, ans=0.125 2023-12-22 04:13:27,948 INFO [train.py:886] (0/4) Epoch 13, batch 2150, loss[loss=0.01818, audio_tagging_loss=0.01818, over 25000.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 4953971.21 frames. ], batch size: 100, lr: 8.43e-03, grad_scale: 64.0 2023-12-22 04:13:29,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=395613.3333333333, ans=0.2 2023-12-22 04:13:42,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=395680.0, ans=0.1 2023-12-22 04:13:56,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=395746.6666666667, ans=0.125 2023-12-22 04:13:57,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2023-12-22 04:14:04,132 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.18 vs. limit=10.0 2023-12-22 04:14:10,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=395880.0, ans=0.0 2023-12-22 04:14:10,657 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=12.0 2023-12-22 04:14:19,334 INFO [train.py:886] (0/4) Epoch 13, batch 2200, loss[loss=0.01596, audio_tagging_loss=0.01596, over 24750.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4945127.00 frames. ], batch size: 99, lr: 8.43e-03, grad_scale: 64.0 2023-12-22 04:14:40,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=396080.0, ans=0.125 2023-12-22 04:14:56,169 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+01 2.696e+01 2.863e+01 2.990e+01 3.495e+01, threshold=5.726e+01, percent-clipped=0.0 2023-12-22 04:14:58,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=396146.6666666667, ans=0.0 2023-12-22 04:15:03,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=396213.3333333333, ans=0.125 2023-12-22 04:15:11,033 INFO [train.py:886] (0/4) Epoch 13, batch 2250, loss[loss=0.01312, audio_tagging_loss=0.01312, over 25000.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4942332.93 frames. ], batch size: 100, lr: 8.42e-03, grad_scale: 64.0 2023-12-22 04:15:13,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=396280.0, ans=0.0 2023-12-22 04:15:38,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=396413.3333333333, ans=0.125 2023-12-22 04:15:40,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=396413.3333333333, ans=0.0 2023-12-22 04:15:41,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=396480.0, ans=0.2 2023-12-22 04:15:58,439 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2023-12-22 04:16:03,310 INFO [train.py:886] (0/4) Epoch 13, batch 2300, loss[loss=0.01326, audio_tagging_loss=0.01326, over 25000.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 4938630.81 frames. ], batch size: 100, lr: 8.42e-03, grad_scale: 64.0 2023-12-22 04:16:04,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=396613.3333333333, ans=0.0 2023-12-22 04:16:15,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2023-12-22 04:16:27,909 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2023-12-22 04:16:39,345 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.203e+01 2.666e+01 2.800e+01 2.964e+01 3.386e+01, threshold=5.601e+01, percent-clipped=0.0 2023-12-22 04:16:41,511 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2023-12-22 04:16:42,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=396813.3333333333, ans=0.125 2023-12-22 04:16:45,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=396880.0, ans=0.07 2023-12-22 04:16:53,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=396880.0, ans=0.125 2023-12-22 04:16:55,664 INFO [train.py:886] (0/4) Epoch 13, batch 2350, loss[loss=0.01424, audio_tagging_loss=0.01424, over 25000.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4938070.65 frames. ], batch size: 100, lr: 8.42e-03, grad_scale: 64.0 2023-12-22 04:17:32,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=397146.6666666667, ans=0.125 2023-12-22 04:17:41,276 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.57 vs. limit=10.0 2023-12-22 04:17:46,556 INFO [train.py:886] (0/4) Epoch 13, batch 2400, loss[loss=0.01344, audio_tagging_loss=0.01344, over 25000.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4946222.97 frames. ], batch size: 100, lr: 8.41e-03, grad_scale: 64.0 2023-12-22 04:18:08,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=397413.3333333333, ans=0.0 2023-12-22 04:18:22,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.26 vs. limit=12.0 2023-12-22 04:18:23,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.42 vs. limit=15.0 2023-12-22 04:18:23,442 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.690e+01 2.809e+01 2.996e+01 3.440e+01, threshold=5.617e+01, percent-clipped=0.0 2023-12-22 04:18:34,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=397546.6666666667, ans=0.125 2023-12-22 04:18:37,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=397546.6666666667, ans=0.125 2023-12-22 04:18:39,163 INFO [train.py:886] (0/4) Epoch 13, batch 2450, loss[loss=0.0131, audio_tagging_loss=0.0131, over 25000.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 4946682.34 frames. ], batch size: 100, lr: 8.41e-03, grad_scale: 64.0 2023-12-22 04:18:53,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=397680.0, ans=0.125 2023-12-22 04:18:59,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=397746.6666666667, ans=0.2 2023-12-22 04:19:14,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=397813.3333333333, ans=0.2 2023-12-22 04:19:25,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=397880.0, ans=0.125 2023-12-22 04:19:30,696 INFO [train.py:886] (0/4) Epoch 13, batch 2500, loss[loss=0.01511, audio_tagging_loss=0.01511, over 24750.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4945217.76 frames. ], batch size: 99, lr: 8.41e-03, grad_scale: 64.0 2023-12-22 04:19:30,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=397946.6666666667, ans=0.125 2023-12-22 04:19:33,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2023-12-22 04:19:51,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=398080.0, ans=0.04949747468305833 2023-12-22 04:19:54,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-12-22 04:19:54,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=398080.0, ans=0.125 2023-12-22 04:20:04,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.89 vs. limit=6.0 2023-12-22 04:20:05,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=398146.6666666667, ans=0.125 2023-12-22 04:20:07,963 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.462e+01 2.733e+01 2.874e+01 2.990e+01 3.625e+01, threshold=5.747e+01, percent-clipped=0.0 2023-12-22 04:20:09,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=398146.6666666667, ans=0.125 2023-12-22 04:20:11,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=398146.6666666667, ans=0.035 2023-12-22 04:20:15,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=398213.3333333333, ans=0.125 2023-12-22 04:20:22,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=398280.0, ans=0.125 2023-12-22 04:20:23,080 INFO [train.py:886] (0/4) Epoch 13, batch 2550, loss[loss=0.01508, audio_tagging_loss=0.01508, over 24750.00 frames. ], tot_loss[loss=0.01474, audio_tagging_loss=0.01474, over 4941314.73 frames. ], batch size: 99, lr: 8.40e-03, grad_scale: 64.0 2023-12-22 04:20:33,928 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2023-12-22 04:20:44,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=398413.3333333333, ans=0.0 2023-12-22 04:20:49,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=398413.3333333333, ans=0.125 2023-12-22 04:20:52,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.07 vs. limit=6.0 2023-12-22 04:20:55,119 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.05 vs. limit=10.0 2023-12-22 04:20:58,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=398480.0, ans=0.125 2023-12-22 04:21:09,637 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.554e-03 2023-12-22 04:21:10,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=398546.6666666667, ans=0.2 2023-12-22 04:21:10,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=398546.6666666667, ans=0.0 2023-12-22 04:21:15,685 INFO [train.py:886] (0/4) Epoch 13, batch 2600, loss[loss=0.01358, audio_tagging_loss=0.01358, over 25000.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 4942585.74 frames. ], batch size: 100, lr: 8.40e-03, grad_scale: 64.0 2023-12-22 04:21:18,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=398613.3333333333, ans=0.125 2023-12-22 04:21:42,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=398746.6666666667, ans=0.0 2023-12-22 04:21:51,208 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.481e+01 2.732e+01 2.842e+01 3.020e+01 3.869e+01, threshold=5.684e+01, percent-clipped=0.0 2023-12-22 04:21:55,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=398880.0, ans=0.0 2023-12-22 04:22:06,039 INFO [train.py:886] (0/4) Epoch 13, batch 2650, loss[loss=0.01732, audio_tagging_loss=0.01732, over 25000.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 4946080.46 frames. ], batch size: 100, lr: 8.40e-03, grad_scale: 64.0 2023-12-22 04:22:09,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=398946.6666666667, ans=0.0 2023-12-22 04:22:15,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=398946.6666666667, ans=0.1 2023-12-22 04:22:34,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=399080.0, ans=0.125 2023-12-22 04:22:48,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=399213.3333333333, ans=0.125 2023-12-22 04:22:51,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=399213.3333333333, ans=0.125 2023-12-22 04:22:54,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=399213.3333333333, ans=0.2 2023-12-22 04:22:54,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=399213.3333333333, ans=0.125 2023-12-22 04:22:58,322 INFO [train.py:886] (0/4) Epoch 13, batch 2700, loss[loss=0.01465, audio_tagging_loss=0.01465, over 25000.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4950695.24 frames. ], batch size: 100, lr: 8.39e-03, grad_scale: 64.0 2023-12-22 04:23:10,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2023-12-22 04:23:11,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=399346.6666666667, ans=0.0 2023-12-22 04:23:14,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=399346.6666666667, ans=0.0 2023-12-22 04:23:32,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.24 vs. limit=15.0 2023-12-22 04:23:33,773 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+01 2.672e+01 2.778e+01 2.950e+01 3.329e+01, threshold=5.555e+01, percent-clipped=0.0 2023-12-22 04:23:38,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=399546.6666666667, ans=0.125 2023-12-22 04:23:45,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=399546.6666666667, ans=0.015 2023-12-22 04:23:48,670 INFO [train.py:886] (0/4) Epoch 13, batch 2750, loss[loss=0.01278, audio_tagging_loss=0.01278, over 25000.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4950749.73 frames. ], batch size: 100, lr: 8.39e-03, grad_scale: 64.0 2023-12-22 04:23:58,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.42 vs. limit=22.5 2023-12-22 04:24:40,132 INFO [train.py:886] (0/4) Epoch 13, batch 2800, loss[loss=0.01469, audio_tagging_loss=0.01469, over 25000.00 frames. ], tot_loss[loss=0.01466, audio_tagging_loss=0.01466, over 4953059.22 frames. ], batch size: 100, lr: 8.39e-03, grad_scale: 64.0 2023-12-22 04:24:43,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=399946.6666666667, ans=0.125 2023-12-22 04:24:44,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=399946.6666666667, ans=0.1 2023-12-22 04:24:45,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=399946.6666666667, ans=0.0 2023-12-22 04:24:47,122 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-60000.pt 2023-12-22 04:25:10,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.04 vs. limit=12.0 2023-12-22 04:25:13,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=400146.6666666667, ans=0.125 2023-12-22 04:25:14,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=400146.6666666667, ans=0.09899494936611666 2023-12-22 04:25:18,177 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.363e+01 2.709e+01 2.865e+01 2.998e+01 3.603e+01, threshold=5.731e+01, percent-clipped=0.0 2023-12-22 04:25:19,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=400146.6666666667, ans=0.125 2023-12-22 04:25:21,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=400146.6666666667, ans=0.125 2023-12-22 04:25:34,326 INFO [train.py:886] (0/4) Epoch 13, batch 2850, loss[loss=0.01373, audio_tagging_loss=0.01373, over 24750.00 frames. ], tot_loss[loss=0.01474, audio_tagging_loss=0.01474, over 4948758.80 frames. ], batch size: 99, lr: 8.38e-03, grad_scale: 64.0 2023-12-22 04:25:39,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=400280.0, ans=0.2 2023-12-22 04:25:40,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=400280.0, ans=0.0 2023-12-22 04:25:41,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=400280.0, ans=0.09899494936611666 2023-12-22 04:25:59,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=400413.3333333333, ans=0.125 2023-12-22 04:26:08,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.03 vs. limit=22.5 2023-12-22 04:26:16,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.38 vs. limit=15.0 2023-12-22 04:26:16,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=400546.6666666667, ans=0.0 2023-12-22 04:26:19,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=400546.6666666667, ans=0.07 2023-12-22 04:26:25,172 INFO [train.py:886] (0/4) Epoch 13, batch 2900, loss[loss=0.01431, audio_tagging_loss=0.01431, over 24750.00 frames. ], tot_loss[loss=0.01471, audio_tagging_loss=0.01471, over 4944397.19 frames. ], batch size: 99, lr: 8.38e-03, grad_scale: 64.0 2023-12-22 04:26:42,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=400680.0, ans=0.0 2023-12-22 04:26:43,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400680.0, ans=0.1 2023-12-22 04:26:58,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=400813.3333333333, ans=0.05 2023-12-22 04:26:58,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=400813.3333333333, ans=0.2 2023-12-22 04:27:01,896 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.432e+01 2.689e+01 2.815e+01 2.998e+01 3.858e+01, threshold=5.630e+01, percent-clipped=0.0 2023-12-22 04:27:16,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400946.6666666667, ans=0.1 2023-12-22 04:27:17,537 INFO [train.py:886] (0/4) Epoch 13, batch 2950, loss[loss=0.01304, audio_tagging_loss=0.01304, over 25000.00 frames. ], tot_loss[loss=0.01468, audio_tagging_loss=0.01468, over 4944372.43 frames. ], batch size: 100, lr: 8.38e-03, grad_scale: 64.0 2023-12-22 04:27:19,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=400946.6666666667, ans=0.125 2023-12-22 04:27:22,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2023-12-22 04:27:31,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=401013.3333333333, ans=0.0 2023-12-22 04:27:41,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=12.0 2023-12-22 04:27:43,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2023-12-22 04:27:53,097 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:28:00,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=401213.3333333333, ans=0.125 2023-12-22 04:28:05,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=401213.3333333333, ans=0.125 2023-12-22 04:28:06,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=401213.3333333333, ans=0.125 2023-12-22 04:28:07,810 INFO [train.py:886] (0/4) Epoch 13, batch 3000, loss[loss=0.01226, audio_tagging_loss=0.01226, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4950466.96 frames. ], batch size: 100, lr: 8.37e-03, grad_scale: 64.0 2023-12-22 04:28:07,811 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 04:28:26,076 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.2753, 3.8222, 3.7970, 3.3342], device='cuda:0') 2023-12-22 04:28:28,504 INFO [train.py:917] (0/4) Epoch 13, validation: loss=0.03396, audio_tagging_loss=0.03396, over 3737520.00 frames. 2023-12-22 04:28:28,505 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 04:28:28,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=401280.0, ans=0.0 2023-12-22 04:28:34,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=401280.0, ans=0.0 2023-12-22 04:28:35,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.56 vs. limit=22.5 2023-12-22 04:29:04,378 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.650e+01 2.784e+01 2.965e+01 3.758e+01, threshold=5.568e+01, percent-clipped=0.0 2023-12-22 04:29:18,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=401546.6666666667, ans=0.025 2023-12-22 04:29:20,130 INFO [train.py:886] (0/4) Epoch 13, batch 3050, loss[loss=0.01486, audio_tagging_loss=0.01486, over 24750.00 frames. ], tot_loss[loss=0.01471, audio_tagging_loss=0.01471, over 4954090.07 frames. ], batch size: 99, lr: 8.37e-03, grad_scale: 64.0 2023-12-22 04:29:22,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.92 vs. limit=22.5 2023-12-22 04:29:28,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=401680.0, ans=0.125 2023-12-22 04:29:32,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=401680.0, ans=0.125 2023-12-22 04:29:42,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=401746.6666666667, ans=0.0 2023-12-22 04:29:51,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=401813.3333333333, ans=0.125 2023-12-22 04:29:59,735 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=22.5 2023-12-22 04:30:10,219 INFO [train.py:886] (0/4) Epoch 13, batch 3100, loss[loss=0.01486, audio_tagging_loss=0.01486, over 24750.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 4952734.47 frames. ], batch size: 99, lr: 8.37e-03, grad_scale: 64.0 2023-12-22 04:30:12,642 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.48 vs. limit=10.0 2023-12-22 04:30:24,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.39 vs. limit=10.0 2023-12-22 04:30:32,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=402080.0, ans=0.0 2023-12-22 04:30:46,089 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+01 2.729e+01 2.844e+01 2.954e+01 3.475e+01, threshold=5.688e+01, percent-clipped=0.0 2023-12-22 04:30:52,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=402213.3333333333, ans=0.04949747468305833 2023-12-22 04:31:01,812 INFO [train.py:886] (0/4) Epoch 13, batch 3150, loss[loss=0.01545, audio_tagging_loss=0.01545, over 25000.00 frames. ], tot_loss[loss=0.01468, audio_tagging_loss=0.01468, over 4949430.38 frames. ], batch size: 100, lr: 8.36e-03, grad_scale: 64.0 2023-12-22 04:31:22,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=402413.3333333333, ans=0.125 2023-12-22 04:31:33,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=402480.0, ans=0.025 2023-12-22 04:31:36,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=402480.0, ans=0.125 2023-12-22 04:31:37,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=402480.0, ans=0.0 2023-12-22 04:31:41,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=402546.6666666667, ans=0.125 2023-12-22 04:31:50,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=402546.6666666667, ans=0.5 2023-12-22 04:31:52,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=402613.3333333333, ans=0.125 2023-12-22 04:31:52,767 INFO [train.py:886] (0/4) Epoch 13, batch 3200, loss[loss=0.01428, audio_tagging_loss=0.01428, over 24750.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4944916.69 frames. ], batch size: 99, lr: 8.36e-03, grad_scale: 64.0 2023-12-22 04:31:59,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=402613.3333333333, ans=0.07 2023-12-22 04:32:29,647 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 2.680e+01 2.806e+01 3.000e+01 3.606e+01, threshold=5.611e+01, percent-clipped=0.0 2023-12-22 04:32:35,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=402880.0, ans=0.1 2023-12-22 04:32:37,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.84 vs. limit=10.0 2023-12-22 04:32:43,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=402880.0, ans=0.0 2023-12-22 04:32:45,292 INFO [train.py:886] (0/4) Epoch 13, batch 3250, loss[loss=0.01469, audio_tagging_loss=0.01469, over 25000.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4942372.07 frames. ], batch size: 100, lr: 8.36e-03, grad_scale: 64.0 2023-12-22 04:32:45,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=402946.6666666667, ans=0.0 2023-12-22 04:32:49,256 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:32:50,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.39 vs. limit=22.5 2023-12-22 04:33:02,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=403013.3333333333, ans=0.2 2023-12-22 04:33:06,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=403080.0, ans=0.125 2023-12-22 04:33:14,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.67 vs. limit=6.0 2023-12-22 04:33:37,338 INFO [train.py:886] (0/4) Epoch 13, batch 3300, loss[loss=0.01459, audio_tagging_loss=0.01459, over 25000.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4950195.16 frames. ], batch size: 100, lr: 8.35e-03, grad_scale: 64.0 2023-12-22 04:33:44,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.91 vs. limit=22.5 2023-12-22 04:33:46,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=403346.6666666667, ans=0.2 2023-12-22 04:33:52,270 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.674e-03 2023-12-22 04:34:13,614 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+01 2.672e+01 2.806e+01 3.015e+01 3.427e+01, threshold=5.612e+01, percent-clipped=0.0 2023-12-22 04:34:18,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=403546.6666666667, ans=0.125 2023-12-22 04:34:28,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=403613.3333333333, ans=0.0 2023-12-22 04:34:29,308 INFO [train.py:886] (0/4) Epoch 13, batch 3350, loss[loss=0.01259, audio_tagging_loss=0.01259, over 25000.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4954432.05 frames. ], batch size: 100, lr: 8.35e-03, grad_scale: 64.0 2023-12-22 04:34:32,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=403613.3333333333, ans=0.125 2023-12-22 04:34:32,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=403613.3333333333, ans=0.0 2023-12-22 04:34:56,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.09 vs. limit=22.5 2023-12-22 04:34:59,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.81 vs. limit=22.5 2023-12-22 04:35:03,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=403813.3333333333, ans=0.2 2023-12-22 04:35:17,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=403880.0, ans=0.1 2023-12-22 04:35:20,742 INFO [train.py:886] (0/4) Epoch 13, batch 3400, loss[loss=0.01559, audio_tagging_loss=0.01559, over 25000.00 frames. ], tot_loss[loss=0.01458, audio_tagging_loss=0.01458, over 4957287.41 frames. ], batch size: 100, lr: 8.35e-03, grad_scale: 64.0 2023-12-22 04:35:22,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403946.6666666667, ans=0.1 2023-12-22 04:35:22,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=403946.6666666667, ans=0.0 2023-12-22 04:35:33,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=404013.3333333333, ans=0.125 2023-12-22 04:35:56,796 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.420e+01 2.740e+01 2.906e+01 3.028e+01 3.470e+01, threshold=5.811e+01, percent-clipped=0.0 2023-12-22 04:35:58,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=404146.6666666667, ans=0.95 2023-12-22 04:36:01,010 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.71 vs. limit=15.0 2023-12-22 04:36:02,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2023-12-22 04:36:04,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=404213.3333333333, ans=0.125 2023-12-22 04:36:07,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=404213.3333333333, ans=0.0 2023-12-22 04:36:08,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404213.3333333333, ans=0.1 2023-12-22 04:36:11,664 INFO [train.py:886] (0/4) Epoch 13, batch 3450, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01466, audio_tagging_loss=0.01466, over 4953350.94 frames. ], batch size: 99, lr: 8.34e-03, grad_scale: 64.0 2023-12-22 04:36:34,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=404413.3333333333, ans=0.125 2023-12-22 04:36:37,905 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2023-12-22 04:37:03,787 INFO [train.py:886] (0/4) Epoch 13, batch 3500, loss[loss=0.01424, audio_tagging_loss=0.01424, over 25000.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4938454.82 frames. ], batch size: 100, lr: 8.34e-03, grad_scale: 64.0 2023-12-22 04:37:05,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=404613.3333333333, ans=0.07 2023-12-22 04:37:19,327 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2023-12-22 04:37:31,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=404746.6666666667, ans=0.125 2023-12-22 04:37:37,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2023-12-22 04:37:39,806 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.687e+01 2.886e+01 3.053e+01 3.392e+01, threshold=5.771e+01, percent-clipped=0.0 2023-12-22 04:37:52,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2023-12-22 04:37:55,456 INFO [train.py:886] (0/4) Epoch 13, batch 3550, loss[loss=0.01517, audio_tagging_loss=0.01517, over 25000.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 4937579.85 frames. ], batch size: 100, lr: 8.34e-03, grad_scale: 64.0 2023-12-22 04:38:07,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=405013.3333333333, ans=0.0 2023-12-22 04:38:10,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2023-12-22 04:38:12,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2023-12-22 04:38:15,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=405080.0, ans=0.125 2023-12-22 04:38:17,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=405080.0, ans=0.0 2023-12-22 04:38:21,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=405080.0, ans=0.125 2023-12-22 04:38:31,735 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.51 vs. limit=22.5 2023-12-22 04:38:47,564 INFO [train.py:886] (0/4) Epoch 13, batch 3600, loss[loss=0.01248, audio_tagging_loss=0.01248, over 25000.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4936538.76 frames. ], batch size: 100, lr: 8.33e-03, grad_scale: 64.0 2023-12-22 04:38:59,555 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=15.0 2023-12-22 04:39:22,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=405480.0, ans=0.0 2023-12-22 04:39:23,614 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.413e+01 2.719e+01 2.868e+01 3.005e+01 3.537e+01, threshold=5.736e+01, percent-clipped=0.0 2023-12-22 04:39:32,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=405546.6666666667, ans=10.0 2023-12-22 04:39:38,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=405613.3333333333, ans=0.2 2023-12-22 04:39:39,981 INFO [train.py:886] (0/4) Epoch 13, batch 3650, loss[loss=0.01507, audio_tagging_loss=0.01507, over 24750.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4941747.01 frames. ], batch size: 99, lr: 8.33e-03, grad_scale: 64.0 2023-12-22 04:39:46,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=405613.3333333333, ans=0.0 2023-12-22 04:39:50,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2023-12-22 04:39:53,527 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.62 vs. limit=10.0 2023-12-22 04:39:57,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=405680.0, ans=0.125 2023-12-22 04:40:06,888 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2023-12-22 04:40:08,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=405746.6666666667, ans=0.0 2023-12-22 04:40:24,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=405880.0, ans=0.125 2023-12-22 04:40:30,630 INFO [train.py:886] (0/4) Epoch 13, batch 3700, loss[loss=0.01534, audio_tagging_loss=0.01534, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4950029.26 frames. ], batch size: 100, lr: 8.33e-03, grad_scale: 64.0 2023-12-22 04:40:48,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=406013.3333333333, ans=0.0 2023-12-22 04:40:50,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=406013.3333333333, ans=0.2 2023-12-22 04:41:06,773 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.499e+01 2.738e+01 2.839e+01 2.971e+01 4.016e+01, threshold=5.677e+01, percent-clipped=0.0 2023-12-22 04:41:22,586 INFO [train.py:886] (0/4) Epoch 13, batch 3750, loss[loss=0.01326, audio_tagging_loss=0.01326, over 24750.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4950905.61 frames. ], batch size: 99, lr: 8.32e-03, grad_scale: 64.0 2023-12-22 04:41:23,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.75 vs. limit=22.5 2023-12-22 04:42:04,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=406546.6666666667, ans=0.125 2023-12-22 04:42:12,666 INFO [train.py:886] (0/4) Epoch 13, batch 3800, loss[loss=0.01671, audio_tagging_loss=0.01671, over 24750.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4944468.84 frames. ], batch size: 99, lr: 8.32e-03, grad_scale: 64.0 2023-12-22 04:42:30,543 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2023-12-22 04:42:49,592 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.729e+01 2.855e+01 2.947e+01 3.530e+01, threshold=5.711e+01, percent-clipped=0.0 2023-12-22 04:42:52,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=406813.3333333333, ans=0.125 2023-12-22 04:42:56,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=406880.0, ans=0.2 2023-12-22 04:42:59,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.89 vs. limit=10.0 2023-12-22 04:43:05,320 INFO [train.py:886] (0/4) Epoch 13, batch 3850, loss[loss=0.01149, audio_tagging_loss=0.01149, over 25000.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4939493.21 frames. ], batch size: 100, lr: 8.32e-03, grad_scale: 64.0 2023-12-22 04:43:15,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=407013.3333333333, ans=0.125 2023-12-22 04:43:16,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=407013.3333333333, ans=0.0 2023-12-22 04:43:37,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.63 vs. limit=6.0 2023-12-22 04:43:57,671 INFO [train.py:886] (0/4) Epoch 13, batch 3900, loss[loss=0.01346, audio_tagging_loss=0.01346, over 25000.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4941839.60 frames. ], batch size: 100, lr: 8.31e-03, grad_scale: 64.0 2023-12-22 04:43:57,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=407280.0, ans=0.05 2023-12-22 04:43:59,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=407280.0, ans=0.2 2023-12-22 04:44:01,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=407280.0, ans=0.125 2023-12-22 04:44:07,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=407346.6666666667, ans=0.125 2023-12-22 04:44:13,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=407346.6666666667, ans=0.125 2023-12-22 04:44:29,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=407480.0, ans=0.0 2023-12-22 04:44:31,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=407480.0, ans=0.025 2023-12-22 04:44:33,659 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.421e+01 2.742e+01 2.835e+01 2.926e+01 3.497e+01, threshold=5.671e+01, percent-clipped=0.0 2023-12-22 04:44:48,705 INFO [train.py:886] (0/4) Epoch 13, batch 3950, loss[loss=0.01448, audio_tagging_loss=0.01448, over 25000.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4942979.79 frames. ], batch size: 100, lr: 8.31e-03, grad_scale: 64.0 2023-12-22 04:45:03,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.97 vs. limit=15.0 2023-12-22 04:45:04,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=407680.0, ans=0.1 2023-12-22 04:45:06,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2023-12-22 04:45:07,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2023-12-22 04:45:15,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=407746.6666666667, ans=0.125 2023-12-22 04:45:18,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=407746.6666666667, ans=0.05 2023-12-22 04:45:19,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=407813.3333333333, ans=0.125 2023-12-22 04:45:41,175 INFO [train.py:886] (0/4) Epoch 13, batch 4000, loss[loss=0.01589, audio_tagging_loss=0.01589, over 25000.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4948530.59 frames. ], batch size: 100, lr: 8.31e-03, grad_scale: 128.0 2023-12-22 04:45:41,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=407946.6666666667, ans=0.125 2023-12-22 04:45:41,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.12 vs. limit=10.0 2023-12-22 04:45:50,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.05 vs. limit=22.5 2023-12-22 04:45:51,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.11 vs. limit=15.0 2023-12-22 04:45:56,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=408013.3333333333, ans=0.07 2023-12-22 04:45:58,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=408013.3333333333, ans=0.125 2023-12-22 04:45:58,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=408013.3333333333, ans=0.125 2023-12-22 04:46:00,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.50 vs. limit=15.0 2023-12-22 04:46:07,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=408080.0, ans=0.5 2023-12-22 04:46:08,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=408080.0, ans=0.0 2023-12-22 04:46:10,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=408080.0, ans=0.1 2023-12-22 04:46:18,848 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.329e+01 2.770e+01 2.894e+01 3.015e+01 3.399e+01, threshold=5.788e+01, percent-clipped=0.0 2023-12-22 04:46:26,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-12-22 04:46:32,168 INFO [train.py:886] (0/4) Epoch 13, batch 4050, loss[loss=0.01083, audio_tagging_loss=0.01083, over 24750.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 4950922.65 frames. ], batch size: 99, lr: 8.30e-03, grad_scale: 64.0 2023-12-22 04:46:59,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=408413.3333333333, ans=0.125 2023-12-22 04:47:14,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=408546.6666666667, ans=0.125 2023-12-22 04:47:15,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=408546.6666666667, ans=0.0 2023-12-22 04:47:16,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=408546.6666666667, ans=0.0 2023-12-22 04:47:24,280 INFO [train.py:886] (0/4) Epoch 13, batch 4100, loss[loss=0.01333, audio_tagging_loss=0.01333, over 24750.00 frames. ], tot_loss[loss=0.01471, audio_tagging_loss=0.01471, over 4942525.04 frames. ], batch size: 99, lr: 8.30e-03, grad_scale: 64.0 2023-12-22 04:47:25,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=408613.3333333333, ans=0.1 2023-12-22 04:47:26,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2023-12-22 04:47:28,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=408613.3333333333, ans=0.125 2023-12-22 04:47:38,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=408680.0, ans=0.0 2023-12-22 04:47:53,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=408813.3333333333, ans=0.125 2023-12-22 04:47:58,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=408813.3333333333, ans=0.125 2023-12-22 04:48:00,114 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.473e+01 2.711e+01 2.874e+01 3.032e+01 3.484e+01, threshold=5.749e+01, percent-clipped=0.0 2023-12-22 04:48:10,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=408880.0, ans=0.125 2023-12-22 04:48:10,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=408880.0, ans=0.125 2023-12-22 04:48:13,987 INFO [train.py:886] (0/4) Epoch 13, batch 4150, loss[loss=0.0142, audio_tagging_loss=0.0142, over 24750.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4941196.84 frames. ], batch size: 99, lr: 8.30e-03, grad_scale: 64.0 2023-12-22 04:48:20,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=408946.6666666667, ans=0.125 2023-12-22 04:48:45,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=409146.6666666667, ans=0.0 2023-12-22 04:48:49,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.39 vs. limit=10.0 2023-12-22 04:48:50,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=409146.6666666667, ans=0.125 2023-12-22 04:49:03,494 INFO [train.py:886] (0/4) Epoch 13, batch 4200, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4942879.99 frames. ], batch size: 100, lr: 8.29e-03, grad_scale: 64.0 2023-12-22 04:49:08,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=409280.0, ans=0.0 2023-12-22 04:49:11,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.62 vs. limit=22.5 2023-12-22 04:49:26,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=409413.3333333333, ans=0.2 2023-12-22 04:49:39,681 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 2.633e+01 2.827e+01 2.966e+01 3.504e+01, threshold=5.654e+01, percent-clipped=0.0 2023-12-22 04:49:55,216 INFO [train.py:886] (0/4) Epoch 13, batch 4250, loss[loss=0.01236, audio_tagging_loss=0.01236, over 25000.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4938928.56 frames. ], batch size: 100, lr: 8.29e-03, grad_scale: 64.0 2023-12-22 04:50:35,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.43 vs. limit=22.5 2023-12-22 04:50:45,476 INFO [train.py:886] (0/4) Epoch 13, batch 4300, loss[loss=0.01425, audio_tagging_loss=0.01425, over 24750.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4941894.87 frames. ], batch size: 99, lr: 8.29e-03, grad_scale: 64.0 2023-12-22 04:50:56,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=410013.3333333333, ans=0.1 2023-12-22 04:51:01,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=410013.3333333333, ans=0.125 2023-12-22 04:51:09,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=410080.0, ans=0.2 2023-12-22 04:51:10,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2023-12-22 04:51:21,866 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+01 2.664e+01 2.847e+01 3.002e+01 3.748e+01, threshold=5.694e+01, percent-clipped=0.0 2023-12-22 04:51:36,648 INFO [train.py:886] (0/4) Epoch 13, batch 4350, loss[loss=0.01497, audio_tagging_loss=0.01497, over 25000.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 4942558.28 frames. ], batch size: 100, lr: 8.28e-03, grad_scale: 64.0 2023-12-22 04:51:36,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=410280.0, ans=0.0 2023-12-22 04:51:50,540 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:51:52,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=15.0 2023-12-22 04:52:15,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=410480.0, ans=0.0 2023-12-22 04:52:15,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=410480.0, ans=22.5 2023-12-22 04:52:22,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=410546.6666666667, ans=0.125 2023-12-22 04:52:27,248 INFO [train.py:886] (0/4) Epoch 13, batch 4400, loss[loss=0.01822, audio_tagging_loss=0.01822, over 24750.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 4943400.74 frames. ], batch size: 99, lr: 8.28e-03, grad_scale: 64.0 2023-12-22 04:52:50,291 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:53:05,343 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 2.774e+01 2.948e+01 3.077e+01 3.816e+01, threshold=5.896e+01, percent-clipped=0.0 2023-12-22 04:53:15,735 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.026e-01 2023-12-22 04:53:19,295 INFO [train.py:886] (0/4) Epoch 13, batch 4450, loss[loss=0.01459, audio_tagging_loss=0.01459, over 24043.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4941382.00 frames. ], batch size: 100, lr: 8.28e-03, grad_scale: 64.0 2023-12-22 04:53:25,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=410946.6666666667, ans=0.125 2023-12-22 04:53:34,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=411013.3333333333, ans=0.125 2023-12-22 04:53:41,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411080.0, ans=0.1 2023-12-22 04:54:10,078 INFO [train.py:886] (0/4) Epoch 13, batch 4500, loss[loss=0.01334, audio_tagging_loss=0.01334, over 24750.00 frames. ], tot_loss[loss=0.01476, audio_tagging_loss=0.01476, over 4936295.95 frames. ], batch size: 99, lr: 8.27e-03, grad_scale: 64.0 2023-12-22 04:54:23,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=411346.6666666667, ans=0.1 2023-12-22 04:54:37,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411413.3333333333, ans=0.1 2023-12-22 04:54:40,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-12-22 04:54:47,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.00 vs. limit=15.0 2023-12-22 04:54:47,744 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.626e+01 2.809e+01 2.913e+01 3.485e+01, threshold=5.618e+01, percent-clipped=0.0 2023-12-22 04:54:48,945 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.427e-01 2023-12-22 04:55:02,377 INFO [train.py:886] (0/4) Epoch 13, batch 4550, loss[loss=0.01136, audio_tagging_loss=0.01136, over 25000.00 frames. ], tot_loss[loss=0.01473, audio_tagging_loss=0.01473, over 4937866.32 frames. ], batch size: 100, lr: 8.27e-03, grad_scale: 64.0 2023-12-22 04:55:06,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=411613.3333333333, ans=0.0 2023-12-22 04:55:09,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=15.0 2023-12-22 04:55:32,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=411813.3333333333, ans=0.0 2023-12-22 04:55:38,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411813.3333333333, ans=0.1 2023-12-22 04:55:43,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=411880.0, ans=0.2 2023-12-22 04:55:44,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=411880.0, ans=0.2 2023-12-22 04:55:53,108 INFO [train.py:886] (0/4) Epoch 13, batch 4600, loss[loss=0.01346, audio_tagging_loss=0.01346, over 25000.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4944140.99 frames. ], batch size: 100, lr: 8.27e-03, grad_scale: 64.0 2023-12-22 04:56:21,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2023-12-22 04:56:27,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=412146.6666666667, ans=0.0 2023-12-22 04:56:30,631 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 2.702e+01 2.812e+01 2.973e+01 3.804e+01, threshold=5.623e+01, percent-clipped=0.0 2023-12-22 04:56:45,141 INFO [train.py:886] (0/4) Epoch 13, batch 4650, loss[loss=0.01473, audio_tagging_loss=0.01473, over 25000.00 frames. ], tot_loss[loss=0.01458, audio_tagging_loss=0.01458, over 4944189.75 frames. ], batch size: 100, lr: 8.26e-03, grad_scale: 64.0 2023-12-22 04:56:54,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=412346.6666666667, ans=0.125 2023-12-22 04:56:54,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=412346.6666666667, ans=0.125 2023-12-22 04:56:57,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=412346.6666666667, ans=0.125 2023-12-22 04:56:58,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=412346.6666666667, ans=0.125 2023-12-22 04:57:23,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=412480.0, ans=0.125 2023-12-22 04:57:28,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=412546.6666666667, ans=0.125 2023-12-22 04:57:29,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=412546.6666666667, ans=0.1 2023-12-22 04:57:31,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=412546.6666666667, ans=0.0 2023-12-22 04:57:35,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=412613.3333333333, ans=0.0 2023-12-22 04:57:35,743 INFO [train.py:886] (0/4) Epoch 13, batch 4700, loss[loss=0.01473, audio_tagging_loss=0.01473, over 24750.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4946115.33 frames. ], batch size: 99, lr: 8.26e-03, grad_scale: 64.0 2023-12-22 04:58:04,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2023-12-22 04:58:10,162 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.333e+01 2.759e+01 2.926e+01 3.098e+01 3.768e+01, threshold=5.852e+01, percent-clipped=0.0 2023-12-22 04:58:12,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=412813.3333333333, ans=0.125 2023-12-22 04:58:20,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=12.0 2023-12-22 04:58:23,361 INFO [train.py:886] (0/4) Epoch 13, batch 4750, loss[loss=0.01648, audio_tagging_loss=0.01648, over 24750.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4942683.26 frames. ], batch size: 99, lr: 8.26e-03, grad_scale: 64.0 2023-12-22 04:58:28,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=412946.6666666667, ans=0.125 2023-12-22 04:58:38,532 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-13.pt 2023-12-22 04:59:00,168 INFO [train.py:886] (0/4) Epoch 14, batch 0, loss[loss=0.03328, audio_tagging_loss=0.03328, over 23982.00 frames. ], tot_loss[loss=0.03328, audio_tagging_loss=0.03328, over 23982.00 frames. ], batch size: 100, lr: 7.95e-03, grad_scale: 64.0 2023-12-22 04:59:00,169 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 04:59:20,851 INFO [train.py:917] (0/4) Epoch 14, validation: loss=0.0333, audio_tagging_loss=0.0333, over 3737520.00 frames. 2023-12-22 04:59:20,852 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 04:59:24,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=15.0 2023-12-22 04:59:26,041 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.96 vs. limit=6.0 2023-12-22 04:59:28,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=413053.3333333333, ans=0.125 2023-12-22 05:00:12,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.98 vs. limit=10.0 2023-12-22 05:00:13,994 INFO [train.py:886] (0/4) Epoch 14, batch 50, loss[loss=0.02114, audio_tagging_loss=0.02114, over 25000.00 frames. ], tot_loss[loss=0.0231, audio_tagging_loss=0.0231, over 1111289.49 frames. ], batch size: 100, lr: 7.95e-03, grad_scale: 64.0 2023-12-22 05:00:21,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=15.0 2023-12-22 05:00:23,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=413453.3333333333, ans=0.0 2023-12-22 05:00:25,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=413453.3333333333, ans=0.125 2023-12-22 05:00:28,110 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:00:33,567 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.460e+01 2.957e+01 3.426e+01 4.066e+01 1.021e+02, threshold=6.852e+01, percent-clipped=7.0 2023-12-22 05:00:52,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.12 vs. limit=12.0 2023-12-22 05:01:04,576 INFO [train.py:886] (0/4) Epoch 14, batch 100, loss[loss=0.01873, audio_tagging_loss=0.01873, over 25000.00 frames. ], tot_loss[loss=0.01969, audio_tagging_loss=0.01969, over 1971586.58 frames. ], batch size: 100, lr: 7.95e-03, grad_scale: 64.0 2023-12-22 05:01:22,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=413786.6666666667, ans=0.125 2023-12-22 05:01:34,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=413920.0, ans=0.2 2023-12-22 05:01:46,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=413986.6666666667, ans=0.0 2023-12-22 05:01:53,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=413986.6666666667, ans=0.125 2023-12-22 05:01:56,701 INFO [train.py:886] (0/4) Epoch 14, batch 150, loss[loss=0.01171, audio_tagging_loss=0.01171, over 24750.00 frames. ], tot_loss[loss=0.01788, audio_tagging_loss=0.01788, over 2638678.69 frames. ], batch size: 99, lr: 7.94e-03, grad_scale: 64.0 2023-12-22 05:01:56,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=414053.3333333333, ans=0.0 2023-12-22 05:02:01,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=414053.3333333333, ans=0.125 2023-12-22 05:02:01,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=414053.3333333333, ans=0.0 2023-12-22 05:02:12,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=414120.0, ans=0.125 2023-12-22 05:02:16,265 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+01 2.861e+01 3.025e+01 3.235e+01 3.410e+01, threshold=6.050e+01, percent-clipped=0.0 2023-12-22 05:02:20,190 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2023-12-22 05:02:32,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=15.0 2023-12-22 05:02:37,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=414320.0, ans=0.025 2023-12-22 05:02:38,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=414320.0, ans=0.2 2023-12-22 05:02:41,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=414320.0, ans=0.125 2023-12-22 05:02:43,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=414320.0, ans=0.015 2023-12-22 05:02:45,925 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.71 vs. limit=15.0 2023-12-22 05:02:47,409 INFO [train.py:886] (0/4) Epoch 14, batch 200, loss[loss=0.01442, audio_tagging_loss=0.01442, over 25000.00 frames. ], tot_loss[loss=0.01683, audio_tagging_loss=0.01683, over 3154470.28 frames. ], batch size: 100, lr: 7.94e-03, grad_scale: 64.0 2023-12-22 05:02:59,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=414453.3333333333, ans=0.0 2023-12-22 05:03:02,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=414453.3333333333, ans=0.1 2023-12-22 05:03:02,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=414453.3333333333, ans=0.125 2023-12-22 05:03:10,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.62 vs. limit=12.0 2023-12-22 05:03:19,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2023-12-22 05:03:27,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=414586.6666666667, ans=0.125 2023-12-22 05:03:33,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=414653.3333333333, ans=0.0 2023-12-22 05:03:37,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=414653.3333333333, ans=0.0 2023-12-22 05:03:39,267 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2023-12-22 05:03:40,531 INFO [train.py:886] (0/4) Epoch 14, batch 250, loss[loss=0.01515, audio_tagging_loss=0.01515, over 25000.00 frames. ], tot_loss[loss=0.01613, audio_tagging_loss=0.01613, over 3551650.52 frames. ], batch size: 100, lr: 7.94e-03, grad_scale: 64.0 2023-12-22 05:04:00,261 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+01 2.722e+01 2.891e+01 3.020e+01 3.428e+01, threshold=5.782e+01, percent-clipped=0.0 2023-12-22 05:04:31,608 INFO [train.py:886] (0/4) Epoch 14, batch 300, loss[loss=0.01572, audio_tagging_loss=0.01572, over 22179.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 3854389.66 frames. ], batch size: 107, lr: 7.93e-03, grad_scale: 64.0 2023-12-22 05:04:32,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=415053.3333333333, ans=0.125 2023-12-22 05:04:39,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=415053.3333333333, ans=0.0 2023-12-22 05:04:40,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=415053.3333333333, ans=0.5 2023-12-22 05:05:13,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=415320.0, ans=0.04949747468305833 2023-12-22 05:05:18,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=415320.0, ans=0.125 2023-12-22 05:05:23,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=12.0 2023-12-22 05:05:23,728 INFO [train.py:886] (0/4) Epoch 14, batch 350, loss[loss=0.01296, audio_tagging_loss=0.01296, over 24750.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4085848.50 frames. ], batch size: 99, lr: 7.93e-03, grad_scale: 64.0 2023-12-22 05:05:44,921 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.760e+01 2.874e+01 3.045e+01 3.726e+01, threshold=5.747e+01, percent-clipped=0.0 2023-12-22 05:05:58,482 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=12.0 2023-12-22 05:05:59,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=415586.6666666667, ans=0.125 2023-12-22 05:06:01,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=415586.6666666667, ans=0.125 2023-12-22 05:06:05,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=415653.3333333333, ans=0.1 2023-12-22 05:06:15,501 INFO [train.py:886] (0/4) Epoch 14, batch 400, loss[loss=0.01439, audio_tagging_loss=0.01439, over 24750.00 frames. ], tot_loss[loss=0.01528, audio_tagging_loss=0.01528, over 4280911.80 frames. ], batch size: 99, lr: 7.93e-03, grad_scale: 64.0 2023-12-22 05:06:17,581 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.22 vs. limit=15.0 2023-12-22 05:06:30,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=415786.6666666667, ans=0.0 2023-12-22 05:06:41,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=415853.3333333333, ans=0.0 2023-12-22 05:06:49,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=415920.0, ans=0.125 2023-12-22 05:06:53,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=415920.0, ans=0.07 2023-12-22 05:07:07,890 INFO [train.py:886] (0/4) Epoch 14, batch 450, loss[loss=0.01603, audio_tagging_loss=0.01603, over 24929.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4423386.39 frames. ], batch size: 100, lr: 7.93e-03, grad_scale: 64.0 2023-12-22 05:07:14,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=416053.3333333333, ans=0.125 2023-12-22 05:07:19,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=416120.0, ans=0.125 2023-12-22 05:07:28,354 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 2.661e+01 2.801e+01 2.949e+01 3.337e+01, threshold=5.602e+01, percent-clipped=0.0 2023-12-22 05:07:47,007 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:07:55,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=12.0 2023-12-22 05:07:58,351 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=12.0 2023-12-22 05:07:59,877 INFO [train.py:886] (0/4) Epoch 14, batch 500, loss[loss=0.0157, audio_tagging_loss=0.0157, over 25000.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4543040.09 frames. ], batch size: 100, lr: 7.92e-03, grad_scale: 64.0 2023-12-22 05:08:03,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=416386.6666666667, ans=0.5 2023-12-22 05:08:05,265 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.52 vs. limit=22.5 2023-12-22 05:08:09,798 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:08:13,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=416453.3333333333, ans=0.125 2023-12-22 05:08:27,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.62 vs. limit=15.0 2023-12-22 05:08:51,489 INFO [train.py:886] (0/4) Epoch 14, batch 550, loss[loss=0.01603, audio_tagging_loss=0.01603, over 25000.00 frames. ], tot_loss[loss=0.01468, audio_tagging_loss=0.01468, over 4635835.75 frames. ], batch size: 100, lr: 7.92e-03, grad_scale: 64.0 2023-12-22 05:08:56,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=416720.0, ans=0.1 2023-12-22 05:09:11,913 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.240e+01 2.686e+01 2.777e+01 2.975e+01 3.433e+01, threshold=5.554e+01, percent-clipped=0.0 2023-12-22 05:09:18,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=416853.3333333333, ans=0.125 2023-12-22 05:09:18,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=416853.3333333333, ans=0.0 2023-12-22 05:09:18,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=416853.3333333333, ans=0.2 2023-12-22 05:09:43,193 INFO [train.py:886] (0/4) Epoch 14, batch 600, loss[loss=0.01428, audio_tagging_loss=0.01428, over 24750.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 4704895.26 frames. ], batch size: 99, lr: 7.92e-03, grad_scale: 64.0 2023-12-22 05:09:59,332 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=12.0 2023-12-22 05:10:05,387 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.640e-03 2023-12-22 05:10:14,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=417253.3333333333, ans=10.0 2023-12-22 05:10:22,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2023-12-22 05:10:23,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=417320.0, ans=0.125 2023-12-22 05:10:28,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=417320.0, ans=0.0 2023-12-22 05:10:34,850 INFO [train.py:886] (0/4) Epoch 14, batch 650, loss[loss=0.01339, audio_tagging_loss=0.01339, over 22202.00 frames. ], tot_loss[loss=0.01471, audio_tagging_loss=0.01471, over 4750947.10 frames. ], batch size: 107, lr: 7.91e-03, grad_scale: 64.0 2023-12-22 05:10:37,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2023-12-22 05:10:42,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=417386.6666666667, ans=0.0 2023-12-22 05:10:56,104 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.469e+01 2.731e+01 2.888e+01 3.018e+01 3.671e+01, threshold=5.777e+01, percent-clipped=0.0 2023-12-22 05:11:23,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=417653.3333333333, ans=0.125 2023-12-22 05:11:27,245 INFO [train.py:886] (0/4) Epoch 14, batch 700, loss[loss=0.01502, audio_tagging_loss=0.01502, over 24750.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4790273.01 frames. ], batch size: 99, lr: 7.91e-03, grad_scale: 64.0 2023-12-22 05:11:32,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=417720.0, ans=0.125 2023-12-22 05:11:35,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=417720.0, ans=0.2 2023-12-22 05:11:41,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=417786.6666666667, ans=0.1 2023-12-22 05:11:41,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=417786.6666666667, ans=0.2 2023-12-22 05:11:46,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=417786.6666666667, ans=0.125 2023-12-22 05:12:01,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=417920.0, ans=0.1 2023-12-22 05:12:17,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=417986.6666666667, ans=0.125 2023-12-22 05:12:18,770 INFO [train.py:886] (0/4) Epoch 14, batch 750, loss[loss=0.0138, audio_tagging_loss=0.0138, over 25000.00 frames. ], tot_loss[loss=0.01468, audio_tagging_loss=0.01468, over 4829059.21 frames. ], batch size: 100, lr: 7.91e-03, grad_scale: 64.0 2023-12-22 05:12:20,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.37 vs. limit=22.5 2023-12-22 05:12:24,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=418053.3333333333, ans=0.0 2023-12-22 05:12:29,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.29 vs. limit=10.0 2023-12-22 05:12:38,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=418120.0, ans=0.0 2023-12-22 05:12:39,814 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.423e+01 2.688e+01 2.814e+01 2.981e+01 3.514e+01, threshold=5.628e+01, percent-clipped=0.0 2023-12-22 05:12:46,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=418186.6666666667, ans=0.0 2023-12-22 05:12:47,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=418186.6666666667, ans=0.0 2023-12-22 05:12:53,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=418253.3333333333, ans=0.0 2023-12-22 05:13:11,008 INFO [train.py:886] (0/4) Epoch 14, batch 800, loss[loss=0.01363, audio_tagging_loss=0.01363, over 25000.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4857005.73 frames. ], batch size: 100, lr: 7.90e-03, grad_scale: 64.0 2023-12-22 05:14:02,730 INFO [train.py:886] (0/4) Epoch 14, batch 850, loss[loss=0.01335, audio_tagging_loss=0.01335, over 24750.00 frames. ], tot_loss[loss=0.01452, audio_tagging_loss=0.01452, over 4877001.77 frames. ], batch size: 99, lr: 7.90e-03, grad_scale: 64.0 2023-12-22 05:14:23,687 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.411e+01 2.718e+01 2.822e+01 2.961e+01 3.534e+01, threshold=5.644e+01, percent-clipped=0.0 2023-12-22 05:14:27,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=418853.3333333333, ans=0.1 2023-12-22 05:14:34,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=12.0 2023-12-22 05:14:43,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.29 vs. limit=15.0 2023-12-22 05:14:44,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=418986.6666666667, ans=0.2 2023-12-22 05:14:54,311 INFO [train.py:886] (0/4) Epoch 14, batch 900, loss[loss=0.01341, audio_tagging_loss=0.01341, over 25000.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4895744.79 frames. ], batch size: 100, lr: 7.90e-03, grad_scale: 64.0 2023-12-22 05:14:59,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2023-12-22 05:14:59,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=419053.3333333333, ans=15.0 2023-12-22 05:15:15,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=419186.6666666667, ans=0.125 2023-12-22 05:15:46,878 INFO [train.py:886] (0/4) Epoch 14, batch 950, loss[loss=0.01649, audio_tagging_loss=0.01649, over 24750.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 4905688.34 frames. ], batch size: 99, lr: 7.89e-03, grad_scale: 64.0 2023-12-22 05:15:55,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=419386.6666666667, ans=22.5 2023-12-22 05:16:02,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=419453.3333333333, ans=0.0 2023-12-22 05:16:03,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=419453.3333333333, ans=0.2 2023-12-22 05:16:04,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=419453.3333333333, ans=0.0 2023-12-22 05:16:06,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=419520.0, ans=0.125 2023-12-22 05:16:07,289 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.477e+01 2.744e+01 2.862e+01 3.032e+01 3.515e+01, threshold=5.724e+01, percent-clipped=0.0 2023-12-22 05:16:18,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=419586.6666666667, ans=0.125 2023-12-22 05:16:21,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=419586.6666666667, ans=0.0 2023-12-22 05:16:25,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=419586.6666666667, ans=0.1 2023-12-22 05:16:35,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.70 vs. limit=10.0 2023-12-22 05:16:36,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=419653.3333333333, ans=0.1 2023-12-22 05:16:36,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=419653.3333333333, ans=0.125 2023-12-22 05:16:38,372 INFO [train.py:886] (0/4) Epoch 14, batch 1000, loss[loss=0.01539, audio_tagging_loss=0.01539, over 24750.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4909010.11 frames. ], batch size: 99, lr: 7.89e-03, grad_scale: 64.0 2023-12-22 05:16:42,613 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.67 vs. limit=15.0 2023-12-22 05:16:44,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=419720.0, ans=0.125 2023-12-22 05:16:44,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=419720.0, ans=0.2 2023-12-22 05:17:11,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=419920.0, ans=0.2 2023-12-22 05:17:13,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=419920.0, ans=0.0 2023-12-22 05:17:21,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=419986.6666666667, ans=0.125 2023-12-22 05:17:30,176 INFO [train.py:886] (0/4) Epoch 14, batch 1050, loss[loss=0.01394, audio_tagging_loss=0.01394, over 25000.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4920622.65 frames. ], batch size: 100, lr: 7.89e-03, grad_scale: 64.0 2023-12-22 05:17:39,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-12-22 05:17:46,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=420120.0, ans=0.0 2023-12-22 05:17:48,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=420120.0, ans=0.0 2023-12-22 05:17:48,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=420120.0, ans=0.0 2023-12-22 05:17:51,091 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.700e+01 2.859e+01 3.033e+01 3.573e+01, threshold=5.717e+01, percent-clipped=0.0 2023-12-22 05:18:10,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=420320.0, ans=0.2 2023-12-22 05:18:19,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=420320.0, ans=0.2 2023-12-22 05:18:21,439 INFO [train.py:886] (0/4) Epoch 14, batch 1100, loss[loss=0.01592, audio_tagging_loss=0.01592, over 25000.00 frames. ], tot_loss[loss=0.01458, audio_tagging_loss=0.01458, over 4925093.73 frames. ], batch size: 100, lr: 7.89e-03, grad_scale: 64.0 2023-12-22 05:18:26,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=420386.6666666667, ans=0.125 2023-12-22 05:18:35,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.19 vs. limit=12.0 2023-12-22 05:18:38,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=420453.3333333333, ans=0.0 2023-12-22 05:18:41,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=420520.0, ans=0.1 2023-12-22 05:19:05,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420653.3333333333, ans=0.1 2023-12-22 05:19:06,080 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=22.5 2023-12-22 05:19:09,979 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:19:13,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.34 vs. limit=12.0 2023-12-22 05:19:13,592 INFO [train.py:886] (0/4) Epoch 14, batch 1150, loss[loss=0.01504, audio_tagging_loss=0.01504, over 25000.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4927961.04 frames. ], batch size: 100, lr: 7.88e-03, grad_scale: 64.0 2023-12-22 05:19:14,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=420720.0, ans=0.0 2023-12-22 05:19:19,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.36 vs. limit=15.0 2023-12-22 05:19:28,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=420786.6666666667, ans=0.125 2023-12-22 05:19:30,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2023-12-22 05:19:34,585 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.729e+01 2.852e+01 2.990e+01 3.450e+01, threshold=5.704e+01, percent-clipped=0.0 2023-12-22 05:19:35,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=420853.3333333333, ans=0.1 2023-12-22 05:19:39,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420853.3333333333, ans=0.1 2023-12-22 05:19:51,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=420920.0, ans=0.2 2023-12-22 05:19:56,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=420986.6666666667, ans=0.0 2023-12-22 05:20:03,460 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.26 vs. limit=22.5 2023-12-22 05:20:05,397 INFO [train.py:886] (0/4) Epoch 14, batch 1200, loss[loss=0.01408, audio_tagging_loss=0.01408, over 25000.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4928310.72 frames. ], batch size: 100, lr: 7.88e-03, grad_scale: 64.0 2023-12-22 05:20:08,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=421053.3333333333, ans=0.1 2023-12-22 05:20:08,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=421053.3333333333, ans=0.1 2023-12-22 05:20:12,499 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.09 vs. limit=15.0 2023-12-22 05:20:17,383 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:20:26,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=421186.6666666667, ans=0.0 2023-12-22 05:20:57,215 INFO [train.py:886] (0/4) Epoch 14, batch 1250, loss[loss=0.01296, audio_tagging_loss=0.01296, over 24750.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4929272.39 frames. ], batch size: 99, lr: 7.88e-03, grad_scale: 128.0 2023-12-22 05:21:11,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=421453.3333333333, ans=0.125 2023-12-22 05:21:20,039 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 2.730e+01 2.921e+01 3.084e+01 3.740e+01, threshold=5.843e+01, percent-clipped=0.0 2023-12-22 05:21:24,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2023-12-22 05:21:50,376 INFO [train.py:886] (0/4) Epoch 14, batch 1300, loss[loss=0.01509, audio_tagging_loss=0.01509, over 24750.00 frames. ], tot_loss[loss=0.01477, audio_tagging_loss=0.01477, over 4927094.26 frames. ], batch size: 99, lr: 7.87e-03, grad_scale: 64.0 2023-12-22 05:21:53,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=421720.0, ans=0.125 2023-12-22 05:22:13,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=421853.3333333333, ans=0.0 2023-12-22 05:22:18,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=421853.3333333333, ans=0.125 2023-12-22 05:22:21,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=421920.0, ans=0.125 2023-12-22 05:22:24,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=421920.0, ans=0.1 2023-12-22 05:22:29,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=421920.0, ans=0.125 2023-12-22 05:22:32,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=421986.6666666667, ans=0.5 2023-12-22 05:22:33,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=421986.6666666667, ans=0.125 2023-12-22 05:22:37,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=421986.6666666667, ans=0.125 2023-12-22 05:22:41,360 INFO [train.py:886] (0/4) Epoch 14, batch 1350, loss[loss=0.01408, audio_tagging_loss=0.01408, over 25000.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 4933073.52 frames. ], batch size: 100, lr: 7.87e-03, grad_scale: 64.0 2023-12-22 05:22:43,979 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.83 vs. limit=15.0 2023-12-22 05:22:49,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=422053.3333333333, ans=0.125 2023-12-22 05:22:50,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=422053.3333333333, ans=0.125 2023-12-22 05:22:59,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=422120.0, ans=0.125 2023-12-22 05:23:03,254 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.431e+01 2.742e+01 2.848e+01 3.000e+01 3.738e+01, threshold=5.695e+01, percent-clipped=0.0 2023-12-22 05:23:08,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=422186.6666666667, ans=0.1 2023-12-22 05:23:14,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=422253.3333333333, ans=0.125 2023-12-22 05:23:26,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=422320.0, ans=0.125 2023-12-22 05:23:26,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=422320.0, ans=15.0 2023-12-22 05:23:32,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=422386.6666666667, ans=0.1 2023-12-22 05:23:33,437 INFO [train.py:886] (0/4) Epoch 14, batch 1400, loss[loss=0.01442, audio_tagging_loss=0.01442, over 25000.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4931909.35 frames. ], batch size: 100, lr: 7.87e-03, grad_scale: 64.0 2023-12-22 05:23:34,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=422386.6666666667, ans=0.05 2023-12-22 05:23:51,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2023-12-22 05:23:57,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=422520.0, ans=0.1 2023-12-22 05:24:12,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=422586.6666666667, ans=0.0 2023-12-22 05:24:20,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=422653.3333333333, ans=0.0 2023-12-22 05:24:25,646 INFO [train.py:886] (0/4) Epoch 14, batch 1450, loss[loss=0.01539, audio_tagging_loss=0.01539, over 25000.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4938513.81 frames. ], batch size: 100, lr: 7.86e-03, grad_scale: 64.0 2023-12-22 05:24:44,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=422786.6666666667, ans=0.125 2023-12-22 05:24:47,335 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.450e+01 2.660e+01 2.819e+01 2.961e+01 3.390e+01, threshold=5.637e+01, percent-clipped=0.0 2023-12-22 05:24:50,756 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-12-22 05:24:59,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=422920.0, ans=0.2 2023-12-22 05:25:00,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=422920.0, ans=0.125 2023-12-22 05:25:04,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=422920.0, ans=0.125 2023-12-22 05:25:16,640 INFO [train.py:886] (0/4) Epoch 14, batch 1500, loss[loss=0.01288, audio_tagging_loss=0.01288, over 25000.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4947877.94 frames. ], batch size: 100, lr: 7.86e-03, grad_scale: 64.0 2023-12-22 05:25:47,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=423253.3333333333, ans=12.0 2023-12-22 05:26:01,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=423320.0, ans=0.0 2023-12-22 05:26:08,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=423320.0, ans=0.125 2023-12-22 05:26:09,762 INFO [train.py:886] (0/4) Epoch 14, batch 1550, loss[loss=0.01645, audio_tagging_loss=0.01645, over 24750.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4943293.04 frames. ], batch size: 99, lr: 7.86e-03, grad_scale: 64.0 2023-12-22 05:26:12,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=423386.6666666667, ans=0.125 2023-12-22 05:26:26,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=423453.3333333333, ans=0.0 2023-12-22 05:26:30,351 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+01 2.731e+01 2.887e+01 3.049e+01 4.641e+01, threshold=5.774e+01, percent-clipped=0.0 2023-12-22 05:26:36,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=423520.0, ans=0.0 2023-12-22 05:26:40,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=423586.6666666667, ans=0.125 2023-12-22 05:26:48,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=423586.6666666667, ans=0.125 2023-12-22 05:26:56,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=423653.3333333333, ans=0.0 2023-12-22 05:27:00,505 INFO [train.py:886] (0/4) Epoch 14, batch 1600, loss[loss=0.01122, audio_tagging_loss=0.01122, over 24750.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4939860.35 frames. ], batch size: 99, lr: 7.85e-03, grad_scale: 64.0 2023-12-22 05:27:04,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=423720.0, ans=0.125 2023-12-22 05:27:08,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=423720.0, ans=0.1 2023-12-22 05:27:25,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.31 vs. limit=15.0 2023-12-22 05:27:40,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=423920.0, ans=0.125 2023-12-22 05:27:47,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=423986.6666666667, ans=0.1 2023-12-22 05:27:52,876 INFO [train.py:886] (0/4) Epoch 14, batch 1650, loss[loss=0.01401, audio_tagging_loss=0.01401, over 24750.00 frames. ], tot_loss[loss=0.01452, audio_tagging_loss=0.01452, over 4940261.86 frames. ], batch size: 99, lr: 7.85e-03, grad_scale: 64.0 2023-12-22 05:28:14,934 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 2.709e+01 2.898e+01 3.040e+01 3.602e+01, threshold=5.796e+01, percent-clipped=0.0 2023-12-22 05:28:32,106 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2023-12-22 05:28:44,643 INFO [train.py:886] (0/4) Epoch 14, batch 1700, loss[loss=0.01391, audio_tagging_loss=0.01391, over 25000.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4944804.57 frames. ], batch size: 100, lr: 7.85e-03, grad_scale: 64.0 2023-12-22 05:28:55,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=424453.3333333333, ans=0.125 2023-12-22 05:29:00,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=424453.3333333333, ans=0.0 2023-12-22 05:29:00,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=424453.3333333333, ans=0.2 2023-12-22 05:29:11,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=424520.0, ans=0.1 2023-12-22 05:29:18,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=424586.6666666667, ans=0.125 2023-12-22 05:29:27,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2023-12-22 05:29:36,996 INFO [train.py:886] (0/4) Epoch 14, batch 1750, loss[loss=0.01615, audio_tagging_loss=0.01615, over 21665.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4940682.40 frames. ], batch size: 107, lr: 7.85e-03, grad_scale: 64.0 2023-12-22 05:29:39,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=424720.0, ans=0.0 2023-12-22 05:29:57,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=424853.3333333333, ans=0.125 2023-12-22 05:29:59,038 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.432e+01 2.657e+01 2.813e+01 2.952e+01 3.749e+01, threshold=5.627e+01, percent-clipped=0.0 2023-12-22 05:30:03,018 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:30:15,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=424920.0, ans=0.1 2023-12-22 05:30:27,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=424986.6666666667, ans=0.2 2023-12-22 05:30:28,727 INFO [train.py:886] (0/4) Epoch 14, batch 1800, loss[loss=0.01376, audio_tagging_loss=0.01376, over 25000.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4947331.44 frames. ], batch size: 100, lr: 7.84e-03, grad_scale: 64.0 2023-12-22 05:30:29,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=425053.3333333333, ans=0.1 2023-12-22 05:30:35,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=425053.3333333333, ans=0.125 2023-12-22 05:30:39,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=425120.0, ans=0.1 2023-12-22 05:30:56,729 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:31:09,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=425320.0, ans=0.2 2023-12-22 05:31:17,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.14 vs. limit=15.0 2023-12-22 05:31:18,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2023-12-22 05:31:20,750 INFO [train.py:886] (0/4) Epoch 14, batch 1850, loss[loss=0.01703, audio_tagging_loss=0.01703, over 24750.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 4952182.56 frames. ], batch size: 99, lr: 7.84e-03, grad_scale: 64.0 2023-12-22 05:31:37,752 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.98 vs. limit=22.5 2023-12-22 05:31:38,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=425453.3333333333, ans=0.125 2023-12-22 05:31:42,478 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.475e+01 2.703e+01 2.874e+01 3.050e+01 3.710e+01, threshold=5.749e+01, percent-clipped=0.0 2023-12-22 05:32:02,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=425653.3333333333, ans=0.125 2023-12-22 05:32:12,711 INFO [train.py:886] (0/4) Epoch 14, batch 1900, loss[loss=0.01985, audio_tagging_loss=0.01985, over 24944.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4947143.42 frames. ], batch size: 100, lr: 7.84e-03, grad_scale: 64.0 2023-12-22 05:32:15,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=425720.0, ans=0.0 2023-12-22 05:32:23,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=425786.6666666667, ans=0.5 2023-12-22 05:32:24,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=425786.6666666667, ans=0.1 2023-12-22 05:32:25,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=425786.6666666667, ans=0.0 2023-12-22 05:32:29,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=425786.6666666667, ans=0.2 2023-12-22 05:32:49,510 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.63 vs. limit=22.5 2023-12-22 05:32:56,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=425986.6666666667, ans=0.125 2023-12-22 05:33:04,802 INFO [train.py:886] (0/4) Epoch 14, batch 1950, loss[loss=0.01707, audio_tagging_loss=0.01707, over 24750.00 frames. ], tot_loss[loss=0.01465, audio_tagging_loss=0.01465, over 4947864.94 frames. ], batch size: 99, lr: 7.83e-03, grad_scale: 64.0 2023-12-22 05:33:26,066 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.762e+01 2.921e+01 3.064e+01 3.650e+01, threshold=5.841e+01, percent-clipped=0.0 2023-12-22 05:33:46,754 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=12.0 2023-12-22 05:33:48,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=426320.0, ans=0.125 2023-12-22 05:33:56,150 INFO [train.py:886] (0/4) Epoch 14, batch 2000, loss[loss=0.0149, audio_tagging_loss=0.0149, over 25000.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 4949885.72 frames. ], batch size: 100, lr: 7.83e-03, grad_scale: 64.0 2023-12-22 05:33:56,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=426386.6666666667, ans=0.125 2023-12-22 05:34:14,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=426453.3333333333, ans=0.0 2023-12-22 05:34:27,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=426586.6666666667, ans=15.0 2023-12-22 05:34:37,976 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-64000.pt 2023-12-22 05:34:45,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=426653.3333333333, ans=0.125 2023-12-22 05:34:49,232 INFO [train.py:886] (0/4) Epoch 14, batch 2050, loss[loss=0.01366, audio_tagging_loss=0.01366, over 25000.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4947947.70 frames. ], batch size: 100, lr: 7.83e-03, grad_scale: 64.0 2023-12-22 05:35:11,157 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 2.714e+01 2.866e+01 3.005e+01 3.462e+01, threshold=5.732e+01, percent-clipped=0.0 2023-12-22 05:35:12,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=426853.3333333333, ans=0.0 2023-12-22 05:35:40,483 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:35:41,252 INFO [train.py:886] (0/4) Epoch 14, batch 2100, loss[loss=0.01397, audio_tagging_loss=0.01397, over 25000.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4955265.30 frames. ], batch size: 100, lr: 7.82e-03, grad_scale: 64.0 2023-12-22 05:35:46,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2023-12-22 05:35:50,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=427120.0, ans=0.125 2023-12-22 05:36:08,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=427186.6666666667, ans=0.2 2023-12-22 05:36:14,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.30 vs. limit=15.0 2023-12-22 05:36:21,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=427320.0, ans=0.125 2023-12-22 05:36:32,216 INFO [train.py:886] (0/4) Epoch 14, batch 2150, loss[loss=0.01332, audio_tagging_loss=0.01332, over 24001.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 4958670.57 frames. ], batch size: 100, lr: 7.82e-03, grad_scale: 64.0 2023-12-22 05:36:47,762 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:36:54,877 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.418e+01 2.697e+01 2.884e+01 3.038e+01 3.535e+01, threshold=5.769e+01, percent-clipped=0.0 2023-12-22 05:37:01,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=427520.0, ans=0.125 2023-12-22 05:37:23,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=12.0 2023-12-22 05:37:25,200 INFO [train.py:886] (0/4) Epoch 14, batch 2200, loss[loss=0.01582, audio_tagging_loss=0.01582, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4955013.49 frames. ], batch size: 99, lr: 7.82e-03, grad_scale: 64.0 2023-12-22 05:37:40,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=427786.6666666667, ans=0.0 2023-12-22 05:37:54,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=427853.3333333333, ans=0.0 2023-12-22 05:38:01,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=427920.0, ans=0.125 2023-12-22 05:38:11,494 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=15.0 2023-12-22 05:38:17,234 INFO [train.py:886] (0/4) Epoch 14, batch 2250, loss[loss=0.01395, audio_tagging_loss=0.01395, over 24750.00 frames. ], tot_loss[loss=0.01452, audio_tagging_loss=0.01452, over 4953086.73 frames. ], batch size: 99, lr: 7.82e-03, grad_scale: 64.0 2023-12-22 05:38:20,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=428053.3333333333, ans=0.0 2023-12-22 05:38:25,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=428053.3333333333, ans=0.125 2023-12-22 05:38:36,414 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2023-12-22 05:38:37,734 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.498e+01 2.726e+01 2.835e+01 3.020e+01 3.346e+01, threshold=5.670e+01, percent-clipped=0.0 2023-12-22 05:38:43,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=428186.6666666667, ans=0.0 2023-12-22 05:38:58,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=428320.0, ans=0.125 2023-12-22 05:39:07,314 INFO [train.py:886] (0/4) Epoch 14, batch 2300, loss[loss=0.01426, audio_tagging_loss=0.01426, over 24750.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 4957368.51 frames. ], batch size: 99, lr: 7.81e-03, grad_scale: 64.0 2023-12-22 05:39:07,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=428386.6666666667, ans=0.125 2023-12-22 05:39:12,258 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2023-12-22 05:39:15,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-12-22 05:39:34,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=428520.0, ans=0.0 2023-12-22 05:39:37,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=428520.0, ans=0.125 2023-12-22 05:39:37,423 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=15.0 2023-12-22 05:39:43,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=428586.6666666667, ans=0.125 2023-12-22 05:39:57,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=428653.3333333333, ans=0.1 2023-12-22 05:40:00,327 INFO [train.py:886] (0/4) Epoch 14, batch 2350, loss[loss=0.01079, audio_tagging_loss=0.01079, over 25000.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4951777.20 frames. ], batch size: 100, lr: 7.81e-03, grad_scale: 64.0 2023-12-22 05:40:21,521 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+01 2.672e+01 2.834e+01 2.975e+01 3.666e+01, threshold=5.667e+01, percent-clipped=0.0 2023-12-22 05:40:30,041 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.26 vs. limit=15.0 2023-12-22 05:40:37,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=428920.0, ans=0.07 2023-12-22 05:40:37,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=428920.0, ans=0.125 2023-12-22 05:40:40,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.52 vs. limit=22.5 2023-12-22 05:40:51,857 INFO [train.py:886] (0/4) Epoch 14, batch 2400, loss[loss=0.01454, audio_tagging_loss=0.01454, over 25000.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4956537.92 frames. ], batch size: 100, lr: 7.81e-03, grad_scale: 64.0 2023-12-22 05:40:57,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=429053.3333333333, ans=0.1 2023-12-22 05:41:07,488 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:41:13,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=429186.6666666667, ans=0.1 2023-12-22 05:41:17,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=429186.6666666667, ans=0.125 2023-12-22 05:41:44,334 INFO [train.py:886] (0/4) Epoch 14, batch 2450, loss[loss=0.01292, audio_tagging_loss=0.01292, over 25000.00 frames. ], tot_loss[loss=0.01438, audio_tagging_loss=0.01438, over 4961494.33 frames. ], batch size: 100, lr: 7.80e-03, grad_scale: 64.0 2023-12-22 05:41:50,222 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:42:05,684 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.732e+01 2.868e+01 2.998e+01 3.609e+01, threshold=5.736e+01, percent-clipped=0.0 2023-12-22 05:42:23,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=429586.6666666667, ans=0.0 2023-12-22 05:42:26,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=429653.3333333333, ans=0.125 2023-12-22 05:42:35,891 INFO [train.py:886] (0/4) Epoch 14, batch 2500, loss[loss=0.01522, audio_tagging_loss=0.01522, over 24750.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4956501.82 frames. ], batch size: 99, lr: 7.80e-03, grad_scale: 64.0 2023-12-22 05:43:08,613 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.35 vs. limit=22.5 2023-12-22 05:43:14,124 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.19 vs. limit=22.5 2023-12-22 05:43:17,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=429986.6666666667, ans=0.2 2023-12-22 05:43:26,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=430053.3333333333, ans=0.0 2023-12-22 05:43:27,576 INFO [train.py:886] (0/4) Epoch 14, batch 2550, loss[loss=0.01188, audio_tagging_loss=0.01188, over 22303.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 4948496.95 frames. ], batch size: 107, lr: 7.80e-03, grad_scale: 64.0 2023-12-22 05:43:50,203 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+01 2.742e+01 2.892e+01 3.076e+01 3.372e+01, threshold=5.784e+01, percent-clipped=0.0 2023-12-22 05:44:16,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2023-12-22 05:44:20,780 INFO [train.py:886] (0/4) Epoch 14, batch 2600, loss[loss=0.01511, audio_tagging_loss=0.01511, over 25000.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4949428.26 frames. ], batch size: 100, lr: 7.79e-03, grad_scale: 64.0 2023-12-22 05:44:31,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=430453.3333333333, ans=0.2 2023-12-22 05:44:38,927 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:44:46,160 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:44:49,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=430520.0, ans=0.05 2023-12-22 05:44:52,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=430586.6666666667, ans=0.2 2023-12-22 05:45:01,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=430653.3333333333, ans=0.1 2023-12-22 05:45:01,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=430653.3333333333, ans=0.125 2023-12-22 05:45:03,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=430653.3333333333, ans=0.1 2023-12-22 05:45:11,163 INFO [train.py:886] (0/4) Epoch 14, batch 2650, loss[loss=0.01441, audio_tagging_loss=0.01441, over 25000.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4950786.27 frames. ], batch size: 100, lr: 7.79e-03, grad_scale: 64.0 2023-12-22 05:45:33,100 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+01 2.737e+01 2.885e+01 3.025e+01 3.317e+01, threshold=5.770e+01, percent-clipped=0.0 2023-12-22 05:45:38,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=430853.3333333333, ans=0.125 2023-12-22 05:45:39,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=430853.3333333333, ans=0.2 2023-12-22 05:46:01,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=430986.6666666667, ans=0.125 2023-12-22 05:46:02,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=431053.3333333333, ans=0.0 2023-12-22 05:46:03,589 INFO [train.py:886] (0/4) Epoch 14, batch 2700, loss[loss=0.01461, audio_tagging_loss=0.01461, over 25000.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 4956627.54 frames. ], batch size: 100, lr: 7.79e-03, grad_scale: 64.0 2023-12-22 05:46:09,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431053.3333333333, ans=0.1 2023-12-22 05:46:16,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2023-12-22 05:46:16,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=431120.0, ans=0.0 2023-12-22 05:46:16,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=431120.0, ans=0.125 2023-12-22 05:46:22,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=431120.0, ans=0.0 2023-12-22 05:46:23,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=431186.6666666667, ans=0.0 2023-12-22 05:46:36,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.34 vs. limit=22.5 2023-12-22 05:46:54,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431386.6666666667, ans=0.1 2023-12-22 05:46:55,321 INFO [train.py:886] (0/4) Epoch 14, batch 2750, loss[loss=0.01252, audio_tagging_loss=0.01252, over 25000.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4956422.78 frames. ], batch size: 100, lr: 7.79e-03, grad_scale: 64.0 2023-12-22 05:46:56,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=431386.6666666667, ans=0.125 2023-12-22 05:47:01,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=431386.6666666667, ans=0.125 2023-12-22 05:47:01,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=431386.6666666667, ans=0.125 2023-12-22 05:47:04,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431386.6666666667, ans=0.1 2023-12-22 05:47:17,160 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+01 2.705e+01 2.817e+01 2.978e+01 3.595e+01, threshold=5.635e+01, percent-clipped=0.0 2023-12-22 05:47:18,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=431520.0, ans=0.125 2023-12-22 05:47:20,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2023-12-22 05:47:24,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=431520.0, ans=0.2 2023-12-22 05:47:42,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=431653.3333333333, ans=0.1 2023-12-22 05:47:46,630 INFO [train.py:886] (0/4) Epoch 14, batch 2800, loss[loss=0.01701, audio_tagging_loss=0.01701, over 24750.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4954822.10 frames. ], batch size: 99, lr: 7.78e-03, grad_scale: 64.0 2023-12-22 05:47:53,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=431720.0, ans=0.2 2023-12-22 05:47:54,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431720.0, ans=0.1 2023-12-22 05:47:58,040 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.31 vs. limit=15.0 2023-12-22 05:48:07,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=431853.3333333333, ans=0.2 2023-12-22 05:48:08,957 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:48:35,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=431986.6666666667, ans=0.125 2023-12-22 05:48:39,043 INFO [train.py:886] (0/4) Epoch 14, batch 2850, loss[loss=0.01481, audio_tagging_loss=0.01481, over 24750.00 frames. ], tot_loss[loss=0.01447, audio_tagging_loss=0.01447, over 4952327.63 frames. ], batch size: 99, lr: 7.78e-03, grad_scale: 64.0 2023-12-22 05:49:00,968 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+01 2.725e+01 2.911e+01 3.000e+01 3.725e+01, threshold=5.822e+01, percent-clipped=0.0 2023-12-22 05:49:11,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=432253.3333333333, ans=0.2 2023-12-22 05:49:18,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.40 vs. limit=10.0 2023-12-22 05:49:31,044 INFO [train.py:886] (0/4) Epoch 14, batch 2900, loss[loss=0.0105, audio_tagging_loss=0.0105, over 24750.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4950658.77 frames. ], batch size: 99, lr: 7.78e-03, grad_scale: 64.0 2023-12-22 05:49:51,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=432520.0, ans=0.125 2023-12-22 05:50:03,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=432586.6666666667, ans=0.125 2023-12-22 05:50:14,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=432653.3333333333, ans=0.2 2023-12-22 05:50:22,850 INFO [train.py:886] (0/4) Epoch 14, batch 2950, loss[loss=0.01147, audio_tagging_loss=0.01147, over 24750.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4948897.52 frames. ], batch size: 99, lr: 7.77e-03, grad_scale: 64.0 2023-12-22 05:50:27,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=432720.0, ans=0.2 2023-12-22 05:50:37,105 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.38 vs. limit=15.0 2023-12-22 05:50:44,823 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.760e+01 2.879e+01 3.048e+01 3.785e+01, threshold=5.759e+01, percent-clipped=0.0 2023-12-22 05:50:49,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=432853.3333333333, ans=0.125 2023-12-22 05:50:56,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=432920.0, ans=0.1 2023-12-22 05:50:56,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=432920.0, ans=0.125 2023-12-22 05:51:00,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.17 vs. limit=22.5 2023-12-22 05:51:03,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=432986.6666666667, ans=0.05 2023-12-22 05:51:04,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=432986.6666666667, ans=0.1 2023-12-22 05:51:04,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=432986.6666666667, ans=0.1 2023-12-22 05:51:07,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.46 vs. limit=22.5 2023-12-22 05:51:14,397 INFO [train.py:886] (0/4) Epoch 14, batch 3000, loss[loss=0.01374, audio_tagging_loss=0.01374, over 25000.00 frames. ], tot_loss[loss=0.0143, audio_tagging_loss=0.0143, over 4954132.43 frames. ], batch size: 100, lr: 7.77e-03, grad_scale: 64.0 2023-12-22 05:51:14,399 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 05:51:35,574 INFO [train.py:917] (0/4) Epoch 14, validation: loss=0.03344, audio_tagging_loss=0.03344, over 3737520.00 frames. 2023-12-22 05:51:35,574 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 05:51:59,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=433186.6666666667, ans=0.1 2023-12-22 05:52:01,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=433186.6666666667, ans=0.035 2023-12-22 05:52:07,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=433253.3333333333, ans=0.5 2023-12-22 05:52:12,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=433253.3333333333, ans=0.2 2023-12-22 05:52:19,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=433320.0, ans=0.125 2023-12-22 05:52:20,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=12.0 2023-12-22 05:52:22,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.31 vs. limit=10.0 2023-12-22 05:52:26,660 INFO [train.py:886] (0/4) Epoch 14, batch 3050, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4953101.54 frames. ], batch size: 100, lr: 7.77e-03, grad_scale: 64.0 2023-12-22 05:52:41,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.07 vs. limit=15.0 2023-12-22 05:52:49,191 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 2.748e+01 2.853e+01 3.040e+01 4.569e+01, threshold=5.706e+01, percent-clipped=0.0 2023-12-22 05:53:16,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.91 vs. limit=22.5 2023-12-22 05:53:19,742 INFO [train.py:886] (0/4) Epoch 14, batch 3100, loss[loss=0.01552, audio_tagging_loss=0.01552, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4957110.94 frames. ], batch size: 99, lr: 7.76e-03, grad_scale: 64.0 2023-12-22 05:53:22,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=433720.0, ans=0.0 2023-12-22 05:53:42,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=433853.3333333333, ans=0.0 2023-12-22 05:53:44,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=433853.3333333333, ans=0.2 2023-12-22 05:53:47,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=433853.3333333333, ans=0.2 2023-12-22 05:54:02,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433986.6666666667, ans=0.1 2023-12-22 05:54:09,522 INFO [train.py:886] (0/4) Epoch 14, batch 3150, loss[loss=0.01508, audio_tagging_loss=0.01508, over 24750.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4952659.10 frames. ], batch size: 99, lr: 7.76e-03, grad_scale: 64.0 2023-12-22 05:54:11,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=434053.3333333333, ans=0.125 2023-12-22 05:54:19,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.29 vs. limit=6.0 2023-12-22 05:54:30,687 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.796e+01 2.957e+01 3.108e+01 3.516e+01, threshold=5.914e+01, percent-clipped=0.0 2023-12-22 05:54:40,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=434253.3333333333, ans=0.035 2023-12-22 05:55:01,147 INFO [train.py:886] (0/4) Epoch 14, batch 3200, loss[loss=0.01639, audio_tagging_loss=0.01639, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4949904.28 frames. ], batch size: 99, lr: 7.76e-03, grad_scale: 64.0 2023-12-22 05:55:07,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.38 vs. limit=15.0 2023-12-22 05:55:49,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=434653.3333333333, ans=0.125 2023-12-22 05:55:53,578 INFO [train.py:886] (0/4) Epoch 14, batch 3250, loss[loss=0.0139, audio_tagging_loss=0.0139, over 24750.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4949715.89 frames. ], batch size: 99, lr: 7.76e-03, grad_scale: 64.0 2023-12-22 05:56:14,176 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.695e+01 2.854e+01 3.040e+01 5.272e+01, threshold=5.707e+01, percent-clipped=0.0 2023-12-22 05:56:35,163 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:56:39,816 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.146e-01 2023-12-22 05:56:44,358 INFO [train.py:886] (0/4) Epoch 14, batch 3300, loss[loss=0.01226, audio_tagging_loss=0.01226, over 25000.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4954265.35 frames. ], batch size: 100, lr: 7.75e-03, grad_scale: 64.0 2023-12-22 05:56:48,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=435053.3333333333, ans=0.1 2023-12-22 05:56:49,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=435053.3333333333, ans=0.125 2023-12-22 05:56:53,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=435053.3333333333, ans=0.125 2023-12-22 05:57:15,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=435253.3333333333, ans=0.125 2023-12-22 05:57:16,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435253.3333333333, ans=0.1 2023-12-22 05:57:17,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.88 vs. limit=6.0 2023-12-22 05:57:27,337 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.591e-02 2023-12-22 05:57:34,701 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=4.528e-02 2023-12-22 05:57:37,370 INFO [train.py:886] (0/4) Epoch 14, batch 3350, loss[loss=0.01658, audio_tagging_loss=0.01658, over 24023.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4956457.85 frames. ], batch size: 100, lr: 7.75e-03, grad_scale: 64.0 2023-12-22 05:57:45,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=435386.6666666667, ans=0.0 2023-12-22 05:57:48,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=435453.3333333333, ans=0.125 2023-12-22 05:57:57,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=435520.0, ans=0.125 2023-12-22 05:57:59,779 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+01 2.712e+01 2.830e+01 3.003e+01 3.619e+01, threshold=5.660e+01, percent-clipped=0.0 2023-12-22 05:58:02,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=435520.0, ans=0.2 2023-12-22 05:58:24,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=435653.3333333333, ans=0.125 2023-12-22 05:58:27,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=435720.0, ans=0.1 2023-12-22 05:58:27,778 INFO [train.py:886] (0/4) Epoch 14, batch 3400, loss[loss=0.01438, audio_tagging_loss=0.01438, over 25000.00 frames. ], tot_loss[loss=0.01432, audio_tagging_loss=0.01432, over 4963630.74 frames. ], batch size: 100, lr: 7.75e-03, grad_scale: 64.0 2023-12-22 05:58:48,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=435853.3333333333, ans=0.125 2023-12-22 05:58:48,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=435853.3333333333, ans=0.0 2023-12-22 05:58:55,650 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=22.5 2023-12-22 05:59:01,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=435920.0, ans=0.125 2023-12-22 05:59:04,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=435920.0, ans=0.0 2023-12-22 05:59:20,164 INFO [train.py:886] (0/4) Epoch 14, batch 3450, loss[loss=0.01696, audio_tagging_loss=0.01696, over 24750.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4958949.56 frames. ], batch size: 99, lr: 7.74e-03, grad_scale: 64.0 2023-12-22 05:59:21,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=436053.3333333333, ans=0.125 2023-12-22 05:59:24,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.82 vs. limit=10.0 2023-12-22 05:59:26,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=436053.3333333333, ans=0.0 2023-12-22 05:59:30,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=436120.0, ans=0.125 2023-12-22 05:59:33,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2023-12-22 05:59:43,845 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.529e+01 2.742e+01 2.883e+01 3.018e+01 3.834e+01, threshold=5.765e+01, percent-clipped=0.0 2023-12-22 05:59:44,429 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.87 vs. limit=22.5 2023-12-22 05:59:48,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=436186.6666666667, ans=0.0 2023-12-22 06:00:03,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=436320.0, ans=0.125 2023-12-22 06:00:09,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=436320.0, ans=0.2 2023-12-22 06:00:13,199 INFO [train.py:886] (0/4) Epoch 14, batch 3500, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.01452, audio_tagging_loss=0.01452, over 4955675.25 frames. ], batch size: 100, lr: 7.74e-03, grad_scale: 32.0 2023-12-22 06:00:17,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=436386.6666666667, ans=0.1 2023-12-22 06:00:20,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.73 vs. limit=6.0 2023-12-22 06:00:34,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=436520.0, ans=0.0 2023-12-22 06:01:02,861 INFO [train.py:886] (0/4) Epoch 14, batch 3550, loss[loss=0.01513, audio_tagging_loss=0.01513, over 24750.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 4951600.28 frames. ], batch size: 99, lr: 7.74e-03, grad_scale: 32.0 2023-12-22 06:01:10,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=436720.0, ans=0.0 2023-12-22 06:01:13,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=436786.6666666667, ans=0.125 2023-12-22 06:01:26,842 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.413e+01 2.689e+01 2.829e+01 3.028e+01 3.560e+01, threshold=5.658e+01, percent-clipped=0.0 2023-12-22 06:01:36,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=436920.0, ans=0.5 2023-12-22 06:01:43,805 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.92 vs. limit=12.0 2023-12-22 06:01:54,624 INFO [train.py:886] (0/4) Epoch 14, batch 3600, loss[loss=0.01337, audio_tagging_loss=0.01337, over 25000.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4955880.47 frames. ], batch size: 100, lr: 7.74e-03, grad_scale: 32.0 2023-12-22 06:02:14,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=437186.6666666667, ans=0.0 2023-12-22 06:02:23,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437186.6666666667, ans=0.1 2023-12-22 06:02:23,416 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2023-12-22 06:02:28,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=437253.3333333333, ans=0.0 2023-12-22 06:02:35,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=437320.0, ans=0.125 2023-12-22 06:02:37,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=437320.0, ans=0.2 2023-12-22 06:02:46,115 INFO [train.py:886] (0/4) Epoch 14, batch 3650, loss[loss=0.01467, audio_tagging_loss=0.01467, over 25000.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4957763.94 frames. ], batch size: 100, lr: 7.73e-03, grad_scale: 32.0 2023-12-22 06:03:09,464 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.669e+01 2.809e+01 2.947e+01 3.516e+01, threshold=5.618e+01, percent-clipped=0.0 2023-12-22 06:03:11,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=437520.0, ans=0.125 2023-12-22 06:03:15,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=437520.0, ans=0.2 2023-12-22 06:03:15,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=437586.6666666667, ans=0.0 2023-12-22 06:03:19,154 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.07 vs. limit=22.5 2023-12-22 06:03:24,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=437586.6666666667, ans=0.2 2023-12-22 06:03:24,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=437586.6666666667, ans=0.0 2023-12-22 06:03:26,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=437586.6666666667, ans=6.0 2023-12-22 06:03:38,032 INFO [train.py:886] (0/4) Epoch 14, batch 3700, loss[loss=0.01471, audio_tagging_loss=0.01471, over 25000.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4954029.25 frames. ], batch size: 100, lr: 7.73e-03, grad_scale: 32.0 2023-12-22 06:03:43,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=437720.0, ans=0.0 2023-12-22 06:03:59,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=437853.3333333333, ans=0.2 2023-12-22 06:04:07,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=437853.3333333333, ans=0.125 2023-12-22 06:04:22,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=437986.6666666667, ans=0.0 2023-12-22 06:04:30,268 INFO [train.py:886] (0/4) Epoch 14, batch 3750, loss[loss=0.01571, audio_tagging_loss=0.01571, over 24750.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4952243.51 frames. ], batch size: 99, lr: 7.73e-03, grad_scale: 32.0 2023-12-22 06:04:36,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=438053.3333333333, ans=0.2 2023-12-22 06:04:42,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=438120.0, ans=0.5 2023-12-22 06:04:50,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=438186.6666666667, ans=0.1 2023-12-22 06:04:54,120 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.413e+01 2.790e+01 2.895e+01 3.050e+01 3.553e+01, threshold=5.791e+01, percent-clipped=0.0 2023-12-22 06:05:16,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=438320.0, ans=0.125 2023-12-22 06:05:22,188 INFO [train.py:886] (0/4) Epoch 14, batch 3800, loss[loss=0.01504, audio_tagging_loss=0.01504, over 24750.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4945115.03 frames. ], batch size: 99, lr: 7.72e-03, grad_scale: 32.0 2023-12-22 06:05:32,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2023-12-22 06:05:45,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.07 vs. limit=10.0 2023-12-22 06:06:02,011 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.69 vs. limit=22.5 2023-12-22 06:06:14,375 INFO [train.py:886] (0/4) Epoch 14, batch 3850, loss[loss=0.01565, audio_tagging_loss=0.01565, over 25000.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4946368.80 frames. ], batch size: 100, lr: 7.72e-03, grad_scale: 32.0 2023-12-22 06:06:17,522 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 06:06:38,159 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.442e+01 2.739e+01 2.908e+01 3.099e+01 3.536e+01, threshold=5.817e+01, percent-clipped=0.0 2023-12-22 06:06:39,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=438853.3333333333, ans=0.0 2023-12-22 06:06:41,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=438853.3333333333, ans=0.125 2023-12-22 06:06:42,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=438853.3333333333, ans=0.125 2023-12-22 06:06:47,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=438920.0, ans=0.2 2023-12-22 06:06:59,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2023-12-22 06:07:01,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2023-12-22 06:07:06,019 INFO [train.py:886] (0/4) Epoch 14, batch 3900, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4950835.58 frames. ], batch size: 100, lr: 7.72e-03, grad_scale: 32.0 2023-12-22 06:07:15,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=439120.0, ans=0.125 2023-12-22 06:07:18,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=439120.0, ans=0.2 2023-12-22 06:07:57,863 INFO [train.py:886] (0/4) Epoch 14, batch 3950, loss[loss=0.01388, audio_tagging_loss=0.01388, over 25000.00 frames. ], tot_loss[loss=0.01424, audio_tagging_loss=0.01424, over 4955737.62 frames. ], batch size: 100, lr: 7.72e-03, grad_scale: 32.0 2023-12-22 06:08:05,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=439386.6666666667, ans=0.0 2023-12-22 06:08:15,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=439453.3333333333, ans=0.125 2023-12-22 06:08:17,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=439453.3333333333, ans=0.0 2023-12-22 06:08:18,754 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2023-12-22 06:08:22,152 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.407e+01 2.684e+01 2.825e+01 2.982e+01 3.429e+01, threshold=5.651e+01, percent-clipped=0.0 2023-12-22 06:08:35,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=439586.6666666667, ans=0.125 2023-12-22 06:08:39,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=439653.3333333333, ans=0.1 2023-12-22 06:08:50,484 INFO [train.py:886] (0/4) Epoch 14, batch 4000, loss[loss=0.01412, audio_tagging_loss=0.01412, over 25000.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4958285.99 frames. ], batch size: 100, lr: 7.71e-03, grad_scale: 32.0 2023-12-22 06:09:01,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=439786.6666666667, ans=0.125 2023-12-22 06:09:02,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=439786.6666666667, ans=0.1 2023-12-22 06:09:05,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=439786.6666666667, ans=0.1 2023-12-22 06:09:17,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=439853.3333333333, ans=0.125 2023-12-22 06:09:41,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=440053.3333333333, ans=0.0 2023-12-22 06:09:42,994 INFO [train.py:886] (0/4) Epoch 14, batch 4050, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4956632.21 frames. ], batch size: 99, lr: 7.71e-03, grad_scale: 32.0 2023-12-22 06:09:43,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=15.0 2023-12-22 06:09:51,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=440053.3333333333, ans=0.1 2023-12-22 06:09:53,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.38 vs. limit=15.0 2023-12-22 06:10:06,320 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.404e+01 2.790e+01 2.955e+01 3.071e+01 3.568e+01, threshold=5.910e+01, percent-clipped=0.0 2023-12-22 06:10:19,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=440253.3333333333, ans=0.125 2023-12-22 06:10:33,718 INFO [train.py:886] (0/4) Epoch 14, batch 4100, loss[loss=0.01689, audio_tagging_loss=0.01689, over 24938.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 4951223.16 frames. ], batch size: 100, lr: 7.71e-03, grad_scale: 32.0 2023-12-22 06:10:49,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=440453.3333333333, ans=0.0 2023-12-22 06:10:55,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440520.0, ans=0.1 2023-12-22 06:11:08,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=440586.6666666667, ans=0.0 2023-12-22 06:11:15,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=440653.3333333333, ans=0.0 2023-12-22 06:11:26,748 INFO [train.py:886] (0/4) Epoch 14, batch 4150, loss[loss=0.01938, audio_tagging_loss=0.01938, over 25000.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4947919.11 frames. ], batch size: 100, lr: 7.70e-03, grad_scale: 32.0 2023-12-22 06:11:35,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=440786.6666666667, ans=0.125 2023-12-22 06:11:38,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440786.6666666667, ans=0.1 2023-12-22 06:11:48,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=440853.3333333333, ans=0.0 2023-12-22 06:11:50,566 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 2.748e+01 2.880e+01 2.984e+01 4.734e+01, threshold=5.760e+01, percent-clipped=0.0 2023-12-22 06:12:01,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=440920.0, ans=0.125 2023-12-22 06:12:03,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=440920.0, ans=0.2 2023-12-22 06:12:04,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=440920.0, ans=0.0 2023-12-22 06:12:09,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=440986.6666666667, ans=0.125 2023-12-22 06:12:09,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=440986.6666666667, ans=0.2 2023-12-22 06:12:17,769 INFO [train.py:886] (0/4) Epoch 14, batch 4200, loss[loss=0.01391, audio_tagging_loss=0.01391, over 25000.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4945303.91 frames. ], batch size: 100, lr: 7.70e-03, grad_scale: 32.0 2023-12-22 06:12:44,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=441186.6666666667, ans=0.1 2023-12-22 06:12:51,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=441253.3333333333, ans=0.125 2023-12-22 06:12:53,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=441253.3333333333, ans=0.0 2023-12-22 06:12:53,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2023-12-22 06:12:57,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=441253.3333333333, ans=0.125 2023-12-22 06:13:07,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=441320.0, ans=0.07 2023-12-22 06:13:10,160 INFO [train.py:886] (0/4) Epoch 14, batch 4250, loss[loss=0.01376, audio_tagging_loss=0.01376, over 22254.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4949052.12 frames. ], batch size: 107, lr: 7.70e-03, grad_scale: 32.0 2023-12-22 06:13:10,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=441386.6666666667, ans=0.125 2023-12-22 06:13:12,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=441386.6666666667, ans=0.0 2023-12-22 06:13:15,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2023-12-22 06:13:30,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=441520.0, ans=0.125 2023-12-22 06:13:34,376 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 2.735e+01 2.849e+01 2.986e+01 3.517e+01, threshold=5.698e+01, percent-clipped=0.0 2023-12-22 06:13:34,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=441520.0, ans=0.125 2023-12-22 06:13:41,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=12.0 2023-12-22 06:13:42,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=441586.6666666667, ans=0.125 2023-12-22 06:13:46,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=441586.6666666667, ans=0.125 2023-12-22 06:14:01,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2023-12-22 06:14:01,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=441720.0, ans=0.0 2023-12-22 06:14:02,682 INFO [train.py:886] (0/4) Epoch 14, batch 4300, loss[loss=0.01758, audio_tagging_loss=0.01758, over 25000.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4954306.19 frames. ], batch size: 100, lr: 7.69e-03, grad_scale: 32.0 2023-12-22 06:14:14,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=441786.6666666667, ans=0.0 2023-12-22 06:14:18,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=441786.6666666667, ans=0.125 2023-12-22 06:14:19,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2023-12-22 06:14:44,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=441986.6666666667, ans=0.2 2023-12-22 06:14:46,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2023-12-22 06:14:53,353 INFO [train.py:886] (0/4) Epoch 14, batch 4350, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24750.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4956135.29 frames. ], batch size: 99, lr: 7.69e-03, grad_scale: 32.0 2023-12-22 06:15:10,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=442120.0, ans=0.125 2023-12-22 06:15:16,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.06 vs. limit=15.0 2023-12-22 06:15:17,197 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.541e+01 2.829e+01 2.968e+01 3.125e+01 3.795e+01, threshold=5.936e+01, percent-clipped=0.0 2023-12-22 06:15:18,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=442186.6666666667, ans=0.2 2023-12-22 06:15:23,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.55 vs. limit=10.0 2023-12-22 06:15:27,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=442253.3333333333, ans=0.0 2023-12-22 06:15:44,754 INFO [train.py:886] (0/4) Epoch 14, batch 4400, loss[loss=0.01503, audio_tagging_loss=0.01503, over 24750.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 4952999.67 frames. ], batch size: 99, lr: 7.69e-03, grad_scale: 32.0 2023-12-22 06:15:51,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=442386.6666666667, ans=0.0 2023-12-22 06:15:57,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=442453.3333333333, ans=0.0 2023-12-22 06:16:01,986 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2023-12-22 06:16:03,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.99 vs. limit=10.0 2023-12-22 06:16:13,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.95 vs. limit=22.5 2023-12-22 06:16:15,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=442586.6666666667, ans=0.125 2023-12-22 06:16:35,487 INFO [train.py:886] (0/4) Epoch 14, batch 4450, loss[loss=0.01475, audio_tagging_loss=0.01475, over 24750.00 frames. ], tot_loss[loss=0.01464, audio_tagging_loss=0.01464, over 4954114.06 frames. ], batch size: 99, lr: 7.69e-03, grad_scale: 32.0 2023-12-22 06:16:58,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2023-12-22 06:16:59,495 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.711e+01 2.908e+01 3.101e+01 3.699e+01, threshold=5.817e+01, percent-clipped=0.0 2023-12-22 06:17:14,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=442920.0, ans=0.125 2023-12-22 06:17:19,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=442986.6666666667, ans=0.0 2023-12-22 06:17:27,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=443053.3333333333, ans=0.0 2023-12-22 06:17:27,841 INFO [train.py:886] (0/4) Epoch 14, batch 4500, loss[loss=0.01488, audio_tagging_loss=0.01488, over 24750.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 4954018.17 frames. ], batch size: 99, lr: 7.68e-03, grad_scale: 32.0 2023-12-22 06:17:40,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=443120.0, ans=0.2 2023-12-22 06:17:52,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.38 vs. limit=10.0 2023-12-22 06:18:20,090 INFO [train.py:886] (0/4) Epoch 14, batch 4550, loss[loss=0.01582, audio_tagging_loss=0.01582, over 25000.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4960397.53 frames. ], batch size: 100, lr: 7.68e-03, grad_scale: 32.0 2023-12-22 06:18:24,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443386.6666666667, ans=0.1 2023-12-22 06:18:39,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2023-12-22 06:18:43,103 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 2.709e+01 2.871e+01 3.035e+01 3.525e+01, threshold=5.741e+01, percent-clipped=0.0 2023-12-22 06:19:07,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=443653.3333333333, ans=0.125 2023-12-22 06:19:11,004 INFO [train.py:886] (0/4) Epoch 14, batch 4600, loss[loss=0.01408, audio_tagging_loss=0.01408, over 25000.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4962106.88 frames. ], batch size: 100, lr: 7.68e-03, grad_scale: 32.0 2023-12-22 06:19:12,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=443720.0, ans=0.125 2023-12-22 06:19:15,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=443720.0, ans=0.125 2023-12-22 06:19:18,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.85 vs. limit=10.0 2023-12-22 06:19:21,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=443786.6666666667, ans=0.0 2023-12-22 06:19:27,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2023-12-22 06:19:56,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=443986.6666666667, ans=0.0 2023-12-22 06:19:57,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443986.6666666667, ans=0.1 2023-12-22 06:20:04,009 INFO [train.py:886] (0/4) Epoch 14, batch 4650, loss[loss=0.0146, audio_tagging_loss=0.0146, over 24750.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4964719.23 frames. ], batch size: 99, lr: 7.67e-03, grad_scale: 32.0 2023-12-22 06:20:27,276 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.504e+01 2.730e+01 2.882e+01 3.030e+01 3.512e+01, threshold=5.764e+01, percent-clipped=0.0 2023-12-22 06:20:33,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=444186.6666666667, ans=0.1 2023-12-22 06:20:40,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=444253.3333333333, ans=0.125 2023-12-22 06:20:53,822 INFO [train.py:886] (0/4) Epoch 14, batch 4700, loss[loss=0.01206, audio_tagging_loss=0.01206, over 24750.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4964147.18 frames. ], batch size: 99, lr: 7.67e-03, grad_scale: 32.0 2023-12-22 06:20:54,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=444386.6666666667, ans=0.04949747468305833 2023-12-22 06:20:54,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=444386.6666666667, ans=0.125 2023-12-22 06:21:02,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.92 vs. limit=15.0 2023-12-22 06:21:07,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=444453.3333333333, ans=0.1 2023-12-22 06:21:13,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2023-12-22 06:21:39,917 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.705e-02 2023-12-22 06:21:41,571 INFO [train.py:886] (0/4) Epoch 14, batch 4750, loss[loss=0.0162, audio_tagging_loss=0.0162, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4958114.48 frames. ], batch size: 100, lr: 7.67e-03, grad_scale: 32.0 2023-12-22 06:21:53,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=444786.6666666667, ans=0.0 2023-12-22 06:21:56,720 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-14.pt 2023-12-22 06:22:18,551 INFO [train.py:886] (0/4) Epoch 15, batch 0, loss[loss=0.03715, audio_tagging_loss=0.03715, over 21096.00 frames. ], tot_loss[loss=0.03715, audio_tagging_loss=0.03715, over 21096.00 frames. ], batch size: 107, lr: 7.41e-03, grad_scale: 32.0 2023-12-22 06:22:18,552 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 06:22:39,997 INFO [train.py:917] (0/4) Epoch 15, validation: loss=0.03275, audio_tagging_loss=0.03275, over 3737520.00 frames. 2023-12-22 06:22:39,998 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 06:22:47,358 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.805e+01 2.969e+01 3.103e+01 9.102e+01, threshold=5.939e+01, percent-clipped=6.0 2023-12-22 06:22:48,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=444826.6666666667, ans=0.0 2023-12-22 06:23:02,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=444960.0, ans=0.0 2023-12-22 06:23:04,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=444960.0, ans=0.0 2023-12-22 06:23:30,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=445160.0, ans=0.07 2023-12-22 06:23:31,644 INFO [train.py:886] (0/4) Epoch 15, batch 50, loss[loss=0.01848, audio_tagging_loss=0.01848, over 25000.00 frames. ], tot_loss[loss=0.02233, audio_tagging_loss=0.02233, over 1114982.75 frames. ], batch size: 100, lr: 7.40e-03, grad_scale: 32.0 2023-12-22 06:23:35,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=445160.0, ans=0.125 2023-12-22 06:24:23,177 INFO [train.py:886] (0/4) Epoch 15, batch 100, loss[loss=0.0168, audio_tagging_loss=0.0168, over 25000.00 frames. ], tot_loss[loss=0.01962, audio_tagging_loss=0.01962, over 1968892.48 frames. ], batch size: 100, lr: 7.40e-03, grad_scale: 32.0 2023-12-22 06:24:30,484 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.756e+01 3.116e+01 3.356e+01 3.817e+01 5.461e+01, threshold=6.711e+01, percent-clipped=0.0 2023-12-22 06:25:14,545 INFO [train.py:886] (0/4) Epoch 15, batch 150, loss[loss=0.01532, audio_tagging_loss=0.01532, over 23974.00 frames. ], tot_loss[loss=0.01786, audio_tagging_loss=0.01786, over 2630750.51 frames. ], batch size: 100, lr: 7.40e-03, grad_scale: 32.0 2023-12-22 06:25:26,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=445893.3333333333, ans=0.125 2023-12-22 06:25:37,572 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 06:25:43,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=445960.0, ans=0.2 2023-12-22 06:26:06,062 INFO [train.py:886] (0/4) Epoch 15, batch 200, loss[loss=0.01418, audio_tagging_loss=0.01418, over 24750.00 frames. ], tot_loss[loss=0.01672, audio_tagging_loss=0.01672, over 3147739.60 frames. ], batch size: 99, lr: 7.40e-03, grad_scale: 32.0 2023-12-22 06:26:06,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=446160.0, ans=0.02 2023-12-22 06:26:13,374 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.586e+01 2.814e+01 2.983e+01 3.104e+01 3.592e+01, threshold=5.965e+01, percent-clipped=0.0 2023-12-22 06:26:29,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=446293.3333333333, ans=0.0 2023-12-22 06:26:30,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=446293.3333333333, ans=0.5 2023-12-22 06:26:49,675 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.19 vs. limit=22.5 2023-12-22 06:26:56,925 INFO [train.py:886] (0/4) Epoch 15, batch 250, loss[loss=0.01596, audio_tagging_loss=0.01596, over 25000.00 frames. ], tot_loss[loss=0.01604, audio_tagging_loss=0.01604, over 3545397.90 frames. ], batch size: 100, lr: 7.39e-03, grad_scale: 32.0 2023-12-22 06:27:18,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=446626.6666666667, ans=0.0 2023-12-22 06:27:18,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=22.5 2023-12-22 06:27:21,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.92 vs. limit=15.0 2023-12-22 06:27:27,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=446626.6666666667, ans=22.5 2023-12-22 06:27:32,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=446693.3333333333, ans=0.125 2023-12-22 06:27:43,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=446760.0, ans=0.125 2023-12-22 06:27:46,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=446760.0, ans=0.1 2023-12-22 06:27:50,316 INFO [train.py:886] (0/4) Epoch 15, batch 300, loss[loss=0.01282, audio_tagging_loss=0.01282, over 24750.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 3851471.10 frames. ], batch size: 99, lr: 7.39e-03, grad_scale: 32.0 2023-12-22 06:27:54,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=446826.6666666667, ans=0.125 2023-12-22 06:27:57,057 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.460e+01 2.728e+01 2.846e+01 3.000e+01 3.484e+01, threshold=5.691e+01, percent-clipped=0.0 2023-12-22 06:28:01,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=446893.3333333333, ans=0.125 2023-12-22 06:28:07,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2023-12-22 06:28:15,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2023-12-22 06:28:35,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=447093.3333333333, ans=0.5 2023-12-22 06:28:42,186 INFO [train.py:886] (0/4) Epoch 15, batch 350, loss[loss=0.01661, audio_tagging_loss=0.01661, over 24750.00 frames. ], tot_loss[loss=0.01549, audio_tagging_loss=0.01549, over 4094188.37 frames. ], batch size: 99, lr: 7.39e-03, grad_scale: 32.0 2023-12-22 06:28:54,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=447226.6666666667, ans=0.2 2023-12-22 06:29:04,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=447293.3333333333, ans=0.0 2023-12-22 06:29:08,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.97 vs. limit=6.0 2023-12-22 06:29:11,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=447293.3333333333, ans=0.125 2023-12-22 06:29:15,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=447360.0, ans=0.0 2023-12-22 06:29:19,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=447360.0, ans=0.125 2023-12-22 06:29:20,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=447360.0, ans=0.125 2023-12-22 06:29:22,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=447360.0, ans=0.0 2023-12-22 06:29:34,141 INFO [train.py:886] (0/4) Epoch 15, batch 400, loss[loss=0.01321, audio_tagging_loss=0.01321, over 25000.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4281510.14 frames. ], batch size: 100, lr: 7.38e-03, grad_scale: 32.0 2023-12-22 06:29:37,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=447493.3333333333, ans=0.125 2023-12-22 06:29:41,448 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.769e+01 2.874e+01 3.024e+01 3.342e+01, threshold=5.748e+01, percent-clipped=0.0 2023-12-22 06:29:42,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=447493.3333333333, ans=0.1 2023-12-22 06:29:47,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=447560.0, ans=0.0 2023-12-22 06:30:22,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=12.0 2023-12-22 06:30:26,474 INFO [train.py:886] (0/4) Epoch 15, batch 450, loss[loss=0.01136, audio_tagging_loss=0.01136, over 25000.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4429967.92 frames. ], batch size: 100, lr: 7.38e-03, grad_scale: 32.0 2023-12-22 06:30:33,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=447826.6666666667, ans=0.0 2023-12-22 06:30:40,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=447893.3333333333, ans=0.05 2023-12-22 06:31:18,202 INFO [train.py:886] (0/4) Epoch 15, batch 500, loss[loss=0.01147, audio_tagging_loss=0.01147, over 25000.00 frames. ], tot_loss[loss=0.01458, audio_tagging_loss=0.01458, over 4547886.56 frames. ], batch size: 100, lr: 7.38e-03, grad_scale: 32.0 2023-12-22 06:31:25,360 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.715e+01 2.862e+01 2.998e+01 3.574e+01, threshold=5.724e+01, percent-clipped=0.0 2023-12-22 06:31:33,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=448226.6666666667, ans=0.125 2023-12-22 06:31:49,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=448360.0, ans=0.125 2023-12-22 06:31:49,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=448360.0, ans=0.125 2023-12-22 06:31:50,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=448360.0, ans=0.1 2023-12-22 06:32:10,777 INFO [train.py:886] (0/4) Epoch 15, batch 550, loss[loss=0.01144, audio_tagging_loss=0.01144, over 25000.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4633341.83 frames. ], batch size: 100, lr: 7.38e-03, grad_scale: 32.0 2023-12-22 06:32:10,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=448493.3333333333, ans=10.0 2023-12-22 06:32:16,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2023-12-22 06:32:16,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=448493.3333333333, ans=0.04949747468305833 2023-12-22 06:32:24,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=448560.0, ans=0.015 2023-12-22 06:32:47,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=448693.3333333333, ans=0.1 2023-12-22 06:32:51,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=448760.0, ans=0.125 2023-12-22 06:33:02,504 INFO [train.py:886] (0/4) Epoch 15, batch 600, loss[loss=0.01452, audio_tagging_loss=0.01452, over 24750.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4700536.88 frames. ], batch size: 99, lr: 7.37e-03, grad_scale: 32.0 2023-12-22 06:33:09,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=448826.6666666667, ans=0.0 2023-12-22 06:33:09,764 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.782e+01 2.890e+01 3.094e+01 3.722e+01, threshold=5.781e+01, percent-clipped=0.0 2023-12-22 06:33:29,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=448960.0, ans=0.0 2023-12-22 06:33:31,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.23 vs. limit=10.0 2023-12-22 06:33:34,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=449026.6666666667, ans=0.125 2023-12-22 06:33:34,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=449026.6666666667, ans=0.125 2023-12-22 06:33:54,129 INFO [train.py:886] (0/4) Epoch 15, batch 650, loss[loss=0.01294, audio_tagging_loss=0.01294, over 24750.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4755057.33 frames. ], batch size: 99, lr: 7.37e-03, grad_scale: 32.0 2023-12-22 06:34:12,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=449226.6666666667, ans=0.0 2023-12-22 06:34:21,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=449293.3333333333, ans=0.0 2023-12-22 06:34:29,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=449360.0, ans=0.125 2023-12-22 06:34:33,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=449360.0, ans=0.05 2023-12-22 06:34:35,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=449426.6666666667, ans=0.125 2023-12-22 06:34:46,662 INFO [train.py:886] (0/4) Epoch 15, batch 700, loss[loss=0.01672, audio_tagging_loss=0.01672, over 25000.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 4799825.59 frames. ], batch size: 100, lr: 7.37e-03, grad_scale: 32.0 2023-12-22 06:34:46,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=449493.3333333333, ans=0.125 2023-12-22 06:34:53,989 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+01 2.765e+01 2.933e+01 3.097e+01 3.905e+01, threshold=5.867e+01, percent-clipped=0.0 2023-12-22 06:34:55,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=449493.3333333333, ans=0.2 2023-12-22 06:35:06,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=449626.6666666667, ans=0.125 2023-12-22 06:35:08,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=449626.6666666667, ans=0.125 2023-12-22 06:35:16,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=449693.3333333333, ans=0.0 2023-12-22 06:35:19,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=449693.3333333333, ans=0.125 2023-12-22 06:35:21,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=449693.3333333333, ans=0.0 2023-12-22 06:35:24,671 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2023-12-22 06:35:26,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=449693.3333333333, ans=0.2 2023-12-22 06:35:27,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=449760.0, ans=0.0 2023-12-22 06:35:32,124 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.57 vs. limit=12.0 2023-12-22 06:35:38,055 INFO [train.py:886] (0/4) Epoch 15, batch 750, loss[loss=0.01419, audio_tagging_loss=0.01419, over 25000.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4834392.56 frames. ], batch size: 100, lr: 7.37e-03, grad_scale: 64.0 2023-12-22 06:35:46,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=449826.6666666667, ans=0.125 2023-12-22 06:35:52,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=449893.3333333333, ans=0.125 2023-12-22 06:36:09,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=450026.6666666667, ans=0.2 2023-12-22 06:36:23,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=450093.3333333333, ans=0.125 2023-12-22 06:36:29,743 INFO [train.py:886] (0/4) Epoch 15, batch 800, loss[loss=0.01532, audio_tagging_loss=0.01532, over 25000.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4866342.39 frames. ], batch size: 100, lr: 7.36e-03, grad_scale: 64.0 2023-12-22 06:36:37,132 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.442e+01 2.693e+01 2.864e+01 3.009e+01 3.417e+01, threshold=5.729e+01, percent-clipped=0.0 2023-12-22 06:36:52,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=450293.3333333333, ans=0.0 2023-12-22 06:37:08,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=450360.0, ans=0.2 2023-12-22 06:37:16,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=450426.6666666667, ans=0.0 2023-12-22 06:37:22,178 INFO [train.py:886] (0/4) Epoch 15, batch 850, loss[loss=0.01462, audio_tagging_loss=0.01462, over 25000.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4883320.09 frames. ], batch size: 100, lr: 7.36e-03, grad_scale: 64.0 2023-12-22 06:37:39,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=450560.0, ans=0.0 2023-12-22 06:37:54,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=450693.3333333333, ans=0.07 2023-12-22 06:38:02,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.21 vs. limit=12.0 2023-12-22 06:38:09,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=450760.0, ans=0.0 2023-12-22 06:38:14,318 INFO [train.py:886] (0/4) Epoch 15, batch 900, loss[loss=0.01571, audio_tagging_loss=0.01571, over 24750.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4898700.07 frames. ], batch size: 99, lr: 7.36e-03, grad_scale: 64.0 2023-12-22 06:38:21,712 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.497e+01 2.789e+01 2.921e+01 3.060e+01 3.433e+01, threshold=5.842e+01, percent-clipped=0.0 2023-12-22 06:38:53,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=451026.6666666667, ans=0.0 2023-12-22 06:39:06,213 INFO [train.py:886] (0/4) Epoch 15, batch 950, loss[loss=0.01405, audio_tagging_loss=0.01405, over 24750.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4907127.31 frames. ], batch size: 99, lr: 7.36e-03, grad_scale: 64.0 2023-12-22 06:39:22,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=451226.6666666667, ans=0.125 2023-12-22 06:39:24,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=451226.6666666667, ans=0.125 2023-12-22 06:39:30,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=451293.3333333333, ans=0.1 2023-12-22 06:39:38,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=451360.0, ans=0.125 2023-12-22 06:39:53,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=451426.6666666667, ans=0.125 2023-12-22 06:39:58,842 INFO [train.py:886] (0/4) Epoch 15, batch 1000, loss[loss=0.01266, audio_tagging_loss=0.01266, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4909830.97 frames. ], batch size: 99, lr: 7.35e-03, grad_scale: 64.0 2023-12-22 06:40:06,053 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.747e+01 2.873e+01 3.000e+01 3.786e+01, threshold=5.747e+01, percent-clipped=0.0 2023-12-22 06:40:20,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=451626.6666666667, ans=0.125 2023-12-22 06:40:24,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=451626.6666666667, ans=0.0 2023-12-22 06:40:37,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=451693.3333333333, ans=10.0 2023-12-22 06:40:42,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=451760.0, ans=0.0 2023-12-22 06:40:48,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=451826.6666666667, ans=0.125 2023-12-22 06:40:49,189 INFO [train.py:886] (0/4) Epoch 15, batch 1050, loss[loss=0.01313, audio_tagging_loss=0.01313, over 24750.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4915400.94 frames. ], batch size: 99, lr: 7.35e-03, grad_scale: 64.0 2023-12-22 06:40:50,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=451826.6666666667, ans=0.025 2023-12-22 06:40:50,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=18.32 vs. limit=15.0 2023-12-22 06:40:53,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=451826.6666666667, ans=0.0 2023-12-22 06:41:12,190 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=12.0 2023-12-22 06:41:35,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=452093.3333333333, ans=0.0 2023-12-22 06:41:36,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=452093.3333333333, ans=0.125 2023-12-22 06:41:38,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=452093.3333333333, ans=0.125 2023-12-22 06:41:39,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452093.3333333333, ans=0.1 2023-12-22 06:41:42,808 INFO [train.py:886] (0/4) Epoch 15, batch 1100, loss[loss=0.01575, audio_tagging_loss=0.01575, over 25000.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4917779.91 frames. ], batch size: 100, lr: 7.35e-03, grad_scale: 64.0 2023-12-22 06:41:45,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=15.0 2023-12-22 06:41:49,461 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.467e+01 2.730e+01 2.835e+01 3.021e+01 3.607e+01, threshold=5.671e+01, percent-clipped=0.0 2023-12-22 06:41:50,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=452160.0, ans=0.0 2023-12-22 06:42:01,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=452226.6666666667, ans=0.2 2023-12-22 06:42:04,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=452293.3333333333, ans=0.125 2023-12-22 06:42:12,107 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.23 vs. limit=6.0 2023-12-22 06:42:17,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=452360.0, ans=0.125 2023-12-22 06:42:20,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=452360.0, ans=0.2 2023-12-22 06:42:34,426 INFO [train.py:886] (0/4) Epoch 15, batch 1150, loss[loss=0.01613, audio_tagging_loss=0.01613, over 21377.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4924040.26 frames. ], batch size: 107, lr: 7.34e-03, grad_scale: 64.0 2023-12-22 06:42:46,006 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2023-12-22 06:42:50,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=452560.0, ans=0.1 2023-12-22 06:43:02,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=452626.6666666667, ans=0.125 2023-12-22 06:43:06,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=452693.3333333333, ans=0.125 2023-12-22 06:43:15,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=452760.0, ans=0.125 2023-12-22 06:43:26,231 INFO [train.py:886] (0/4) Epoch 15, batch 1200, loss[loss=0.01332, audio_tagging_loss=0.01332, over 25000.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4933105.55 frames. ], batch size: 100, lr: 7.34e-03, grad_scale: 64.0 2023-12-22 06:43:32,797 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.424e+01 2.691e+01 2.862e+01 3.008e+01 3.701e+01, threshold=5.724e+01, percent-clipped=0.0 2023-12-22 06:43:34,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=452893.3333333333, ans=0.1 2023-12-22 06:43:49,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=452960.0, ans=0.1 2023-12-22 06:44:14,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=453093.3333333333, ans=0.1 2023-12-22 06:44:18,052 INFO [train.py:886] (0/4) Epoch 15, batch 1250, loss[loss=0.01442, audio_tagging_loss=0.01442, over 24750.00 frames. ], tot_loss[loss=0.01438, audio_tagging_loss=0.01438, over 4932908.35 frames. ], batch size: 99, lr: 7.34e-03, grad_scale: 64.0 2023-12-22 06:44:34,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=453226.6666666667, ans=0.125 2023-12-22 06:44:43,373 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-68000.pt 2023-12-22 06:44:52,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=453360.0, ans=0.09899494936611666 2023-12-22 06:45:06,047 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.42 vs. limit=12.0 2023-12-22 06:45:11,722 INFO [train.py:886] (0/4) Epoch 15, batch 1300, loss[loss=0.01454, audio_tagging_loss=0.01454, over 24750.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4937059.88 frames. ], batch size: 99, lr: 7.34e-03, grad_scale: 64.0 2023-12-22 06:45:12,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=453493.3333333333, ans=0.125 2023-12-22 06:45:19,293 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 2.803e+01 2.974e+01 3.116e+01 3.587e+01, threshold=5.949e+01, percent-clipped=0.0 2023-12-22 06:45:43,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2023-12-22 06:45:45,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453693.3333333333, ans=0.1 2023-12-22 06:45:46,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=453693.3333333333, ans=0.125 2023-12-22 06:45:50,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=453693.3333333333, ans=0.0 2023-12-22 06:45:55,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.70 vs. limit=15.0 2023-12-22 06:46:01,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.60 vs. limit=10.0 2023-12-22 06:46:04,211 INFO [train.py:886] (0/4) Epoch 15, batch 1350, loss[loss=0.015, audio_tagging_loss=0.015, over 25000.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4941798.31 frames. ], batch size: 100, lr: 7.33e-03, grad_scale: 64.0 2023-12-22 06:46:04,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=453826.6666666667, ans=0.0 2023-12-22 06:46:14,384 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2023-12-22 06:46:56,191 INFO [train.py:886] (0/4) Epoch 15, batch 1400, loss[loss=0.01418, audio_tagging_loss=0.01418, over 24750.00 frames. ], tot_loss[loss=0.01432, audio_tagging_loss=0.01432, over 4943142.81 frames. ], batch size: 99, lr: 7.33e-03, grad_scale: 64.0 2023-12-22 06:46:59,959 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-22 06:47:03,509 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 2.761e+01 2.896e+01 3.038e+01 4.038e+01, threshold=5.792e+01, percent-clipped=0.0 2023-12-22 06:47:16,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=454293.3333333333, ans=0.0 2023-12-22 06:47:29,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=454360.0, ans=0.125 2023-12-22 06:47:31,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=454360.0, ans=0.1 2023-12-22 06:47:44,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=454426.6666666667, ans=0.0 2023-12-22 06:47:47,637 INFO [train.py:886] (0/4) Epoch 15, batch 1450, loss[loss=0.01143, audio_tagging_loss=0.01143, over 25000.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4939204.14 frames. ], batch size: 100, lr: 7.33e-03, grad_scale: 64.0 2023-12-22 06:47:59,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=454560.0, ans=0.0 2023-12-22 06:48:00,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=454560.0, ans=0.0 2023-12-22 06:48:39,795 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.65 vs. limit=6.0 2023-12-22 06:48:40,092 INFO [train.py:886] (0/4) Epoch 15, batch 1500, loss[loss=0.01488, audio_tagging_loss=0.01488, over 25000.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4940534.00 frames. ], batch size: 100, lr: 7.33e-03, grad_scale: 64.0 2023-12-22 06:48:43,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=454826.6666666667, ans=0.2 2023-12-22 06:48:47,427 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 2.703e+01 2.865e+01 3.021e+01 3.763e+01, threshold=5.730e+01, percent-clipped=0.0 2023-12-22 06:49:02,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=454960.0, ans=0.1 2023-12-22 06:49:28,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=455093.3333333333, ans=0.125 2023-12-22 06:49:31,996 INFO [train.py:886] (0/4) Epoch 15, batch 1550, loss[loss=0.01714, audio_tagging_loss=0.01714, over 24750.00 frames. ], tot_loss[loss=0.01438, audio_tagging_loss=0.01438, over 4941337.45 frames. ], batch size: 99, lr: 7.32e-03, grad_scale: 64.0 2023-12-22 06:49:32,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=455160.0, ans=0.125 2023-12-22 06:49:38,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=455160.0, ans=0.125 2023-12-22 06:49:39,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=455160.0, ans=0.2 2023-12-22 06:49:52,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.31 vs. limit=15.0 2023-12-22 06:49:53,522 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 06:49:57,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=455293.3333333333, ans=0.0 2023-12-22 06:49:58,339 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.24 vs. limit=22.5 2023-12-22 06:50:23,346 INFO [train.py:886] (0/4) Epoch 15, batch 1600, loss[loss=0.01141, audio_tagging_loss=0.01141, over 24750.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4940423.88 frames. ], batch size: 99, lr: 7.32e-03, grad_scale: 64.0 2023-12-22 06:50:25,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=455493.3333333333, ans=0.125 2023-12-22 06:50:26,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=455493.3333333333, ans=0.125 2023-12-22 06:50:29,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=455493.3333333333, ans=0.125 2023-12-22 06:50:29,906 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.369e+01 2.762e+01 2.937e+01 3.082e+01 3.715e+01, threshold=5.874e+01, percent-clipped=0.0 2023-12-22 06:50:42,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=455560.0, ans=0.125 2023-12-22 06:50:43,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=455626.6666666667, ans=0.5 2023-12-22 06:50:45,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=455626.6666666667, ans=0.125 2023-12-22 06:50:46,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=455626.6666666667, ans=0.05 2023-12-22 06:51:01,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=455693.3333333333, ans=0.0 2023-12-22 06:51:09,869 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2023-12-22 06:51:14,881 INFO [train.py:886] (0/4) Epoch 15, batch 1650, loss[loss=0.01391, audio_tagging_loss=0.01391, over 24750.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4942990.94 frames. ], batch size: 99, lr: 7.32e-03, grad_scale: 64.0 2023-12-22 06:51:15,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=12.0 2023-12-22 06:51:20,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2023-12-22 06:51:44,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=455960.0, ans=0.1 2023-12-22 06:51:50,735 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.58 vs. limit=10.0 2023-12-22 06:51:52,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=456026.6666666667, ans=0.0 2023-12-22 06:52:06,848 INFO [train.py:886] (0/4) Epoch 15, batch 1700, loss[loss=0.01144, audio_tagging_loss=0.01144, over 24750.00 frames. ], tot_loss[loss=0.01422, audio_tagging_loss=0.01422, over 4940882.73 frames. ], batch size: 99, lr: 7.32e-03, grad_scale: 64.0 2023-12-22 06:52:14,101 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.418e+01 2.733e+01 2.857e+01 3.006e+01 3.981e+01, threshold=5.714e+01, percent-clipped=0.0 2023-12-22 06:52:16,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=456160.0, ans=10.0 2023-12-22 06:52:27,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=456293.3333333333, ans=0.125 2023-12-22 06:52:36,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.07 vs. limit=15.0 2023-12-22 06:52:58,976 INFO [train.py:886] (0/4) Epoch 15, batch 1750, loss[loss=0.01452, audio_tagging_loss=0.01452, over 25000.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4944220.65 frames. ], batch size: 100, lr: 7.31e-03, grad_scale: 64.0 2023-12-22 06:53:06,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=456493.3333333333, ans=0.2 2023-12-22 06:53:10,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=456560.0, ans=0.125 2023-12-22 06:53:10,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=456560.0, ans=0.1 2023-12-22 06:53:14,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=456560.0, ans=0.125 2023-12-22 06:53:18,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.35 vs. limit=22.5 2023-12-22 06:53:23,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=456626.6666666667, ans=0.125 2023-12-22 06:53:25,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=456626.6666666667, ans=0.07 2023-12-22 06:53:34,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=456693.3333333333, ans=0.0 2023-12-22 06:53:41,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=456760.0, ans=0.125 2023-12-22 06:53:46,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=456760.0, ans=0.125 2023-12-22 06:53:50,623 INFO [train.py:886] (0/4) Epoch 15, batch 1800, loss[loss=0.01436, audio_tagging_loss=0.01436, over 22514.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4952389.30 frames. ], batch size: 107, lr: 7.31e-03, grad_scale: 64.0 2023-12-22 06:53:57,944 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 2.752e+01 2.915e+01 3.059e+01 3.559e+01, threshold=5.830e+01, percent-clipped=0.0 2023-12-22 06:53:58,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=456826.6666666667, ans=10.0 2023-12-22 06:54:07,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=456893.3333333333, ans=0.125 2023-12-22 06:54:22,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=457026.6666666667, ans=0.0 2023-12-22 06:54:31,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=457093.3333333333, ans=0.125 2023-12-22 06:54:32,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=457093.3333333333, ans=0.0 2023-12-22 06:54:35,480 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2023-12-22 06:54:39,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=457093.3333333333, ans=0.125 2023-12-22 06:54:42,181 INFO [train.py:886] (0/4) Epoch 15, batch 1850, loss[loss=0.01268, audio_tagging_loss=0.01268, over 21192.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4952535.20 frames. ], batch size: 107, lr: 7.31e-03, grad_scale: 64.0 2023-12-22 06:54:57,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=457226.6666666667, ans=0.125 2023-12-22 06:55:04,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=457293.3333333333, ans=0.2 2023-12-22 06:55:09,547 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.73 vs. limit=22.5 2023-12-22 06:55:19,547 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.67 vs. limit=15.0 2023-12-22 06:55:20,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=457360.0, ans=0.0 2023-12-22 06:55:27,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=15.0 2023-12-22 06:55:34,783 INFO [train.py:886] (0/4) Epoch 15, batch 1900, loss[loss=0.01832, audio_tagging_loss=0.01832, over 24750.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4945980.72 frames. ], batch size: 99, lr: 7.30e-03, grad_scale: 64.0 2023-12-22 06:55:41,994 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+01 2.819e+01 2.950e+01 3.088e+01 3.539e+01, threshold=5.899e+01, percent-clipped=0.0 2023-12-22 06:55:51,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=457560.0, ans=0.125 2023-12-22 06:55:54,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457626.6666666667, ans=0.1 2023-12-22 06:55:57,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=457626.6666666667, ans=0.125 2023-12-22 06:56:10,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=457693.3333333333, ans=0.125 2023-12-22 06:56:11,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=457693.3333333333, ans=0.1 2023-12-22 06:56:26,090 INFO [train.py:886] (0/4) Epoch 15, batch 1950, loss[loss=0.01436, audio_tagging_loss=0.01436, over 25000.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4951225.43 frames. ], batch size: 100, lr: 7.30e-03, grad_scale: 64.0 2023-12-22 06:56:35,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.15 vs. limit=15.0 2023-12-22 06:56:46,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=457960.0, ans=0.125 2023-12-22 06:57:18,508 INFO [train.py:886] (0/4) Epoch 15, batch 2000, loss[loss=0.01406, audio_tagging_loss=0.01406, over 23989.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4945838.26 frames. ], batch size: 100, lr: 7.30e-03, grad_scale: 64.0 2023-12-22 06:57:25,040 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.729e+01 2.880e+01 3.053e+01 3.863e+01, threshold=5.759e+01, percent-clipped=0.0 2023-12-22 06:57:26,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=458160.0, ans=0.125 2023-12-22 06:57:54,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.01 vs. limit=22.5 2023-12-22 06:58:01,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=458426.6666666667, ans=0.125 2023-12-22 06:58:10,739 INFO [train.py:886] (0/4) Epoch 15, batch 2050, loss[loss=0.01408, audio_tagging_loss=0.01408, over 25000.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4946975.98 frames. ], batch size: 100, lr: 7.30e-03, grad_scale: 64.0 2023-12-22 06:58:24,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=458560.0, ans=0.125 2023-12-22 06:59:01,465 INFO [train.py:886] (0/4) Epoch 15, batch 2100, loss[loss=0.01691, audio_tagging_loss=0.01691, over 25000.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4951631.96 frames. ], batch size: 100, lr: 7.29e-03, grad_scale: 64.0 2023-12-22 06:59:09,473 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 2.721e+01 2.805e+01 2.996e+01 3.469e+01, threshold=5.611e+01, percent-clipped=0.0 2023-12-22 06:59:16,254 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=17.69 vs. limit=15.0 2023-12-22 06:59:35,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=459026.6666666667, ans=0.1 2023-12-22 06:59:48,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.97 vs. limit=15.0 2023-12-22 06:59:50,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=459093.3333333333, ans=0.1 2023-12-22 06:59:53,686 INFO [train.py:886] (0/4) Epoch 15, batch 2150, loss[loss=0.01534, audio_tagging_loss=0.01534, over 24750.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4954924.36 frames. ], batch size: 99, lr: 7.29e-03, grad_scale: 64.0 2023-12-22 07:00:04,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=459226.6666666667, ans=0.125 2023-12-22 07:00:15,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=459293.3333333333, ans=0.0 2023-12-22 07:00:16,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=459293.3333333333, ans=0.125 2023-12-22 07:00:21,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=459293.3333333333, ans=0.125 2023-12-22 07:00:44,576 INFO [train.py:886] (0/4) Epoch 15, batch 2200, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.01424, audio_tagging_loss=0.01424, over 4948401.93 frames. ], batch size: 99, lr: 7.29e-03, grad_scale: 64.0 2023-12-22 07:00:47,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=459493.3333333333, ans=0.125 2023-12-22 07:00:52,730 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.447e+01 2.795e+01 2.951e+01 3.077e+01 3.607e+01, threshold=5.903e+01, percent-clipped=0.0 2023-12-22 07:00:53,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=459493.3333333333, ans=0.0 2023-12-22 07:01:03,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=459560.0, ans=0.0 2023-12-22 07:01:10,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=459626.6666666667, ans=0.0 2023-12-22 07:01:11,823 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:01:13,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=459626.6666666667, ans=0.1 2023-12-22 07:01:16,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=459693.3333333333, ans=0.5 2023-12-22 07:01:23,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=459693.3333333333, ans=0.125 2023-12-22 07:01:36,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=459826.6666666667, ans=0.2 2023-12-22 07:01:37,219 INFO [train.py:886] (0/4) Epoch 15, batch 2250, loss[loss=0.01402, audio_tagging_loss=0.01402, over 23983.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4940078.48 frames. ], batch size: 100, lr: 7.29e-03, grad_scale: 64.0 2023-12-22 07:01:38,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=459826.6666666667, ans=0.125 2023-12-22 07:01:55,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=459893.3333333333, ans=0.0 2023-12-22 07:02:20,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=460093.3333333333, ans=0.125 2023-12-22 07:02:24,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=460093.3333333333, ans=0.0 2023-12-22 07:02:28,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.34 vs. limit=15.0 2023-12-22 07:02:29,285 INFO [train.py:886] (0/4) Epoch 15, batch 2300, loss[loss=0.01586, audio_tagging_loss=0.01586, over 25000.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4941321.25 frames. ], batch size: 100, lr: 7.28e-03, grad_scale: 64.0 2023-12-22 07:02:36,498 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.419e+01 2.737e+01 2.896e+01 3.072e+01 5.073e+01, threshold=5.791e+01, percent-clipped=0.0 2023-12-22 07:02:47,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=460226.6666666667, ans=0.0 2023-12-22 07:03:06,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-12-22 07:03:13,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=460426.6666666667, ans=0.0 2023-12-22 07:03:13,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=460426.6666666667, ans=0.0 2023-12-22 07:03:20,217 INFO [train.py:886] (0/4) Epoch 15, batch 2350, loss[loss=0.01297, audio_tagging_loss=0.01297, over 24750.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4945881.03 frames. ], batch size: 99, lr: 7.28e-03, grad_scale: 64.0 2023-12-22 07:03:23,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=460493.3333333333, ans=0.125 2023-12-22 07:03:30,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=460560.0, ans=0.125 2023-12-22 07:03:35,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=460560.0, ans=0.035 2023-12-22 07:03:46,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=460626.6666666667, ans=0.125 2023-12-22 07:03:49,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=460626.6666666667, ans=0.125 2023-12-22 07:03:56,251 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.806e-02 2023-12-22 07:03:56,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=460693.3333333333, ans=0.125 2023-12-22 07:04:13,013 INFO [train.py:886] (0/4) Epoch 15, batch 2400, loss[loss=0.01449, audio_tagging_loss=0.01449, over 24750.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4947260.69 frames. ], batch size: 99, lr: 7.28e-03, grad_scale: 64.0 2023-12-22 07:04:19,756 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.499e+01 2.726e+01 2.841e+01 2.994e+01 3.395e+01, threshold=5.683e+01, percent-clipped=0.0 2023-12-22 07:04:25,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=460893.3333333333, ans=0.04949747468305833 2023-12-22 07:04:34,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=460960.0, ans=0.125 2023-12-22 07:04:40,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=460960.0, ans=0.125 2023-12-22 07:04:46,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=461026.6666666667, ans=0.0 2023-12-22 07:04:55,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=461093.3333333333, ans=0.0 2023-12-22 07:05:01,037 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-12-22 07:05:04,076 INFO [train.py:886] (0/4) Epoch 15, batch 2450, loss[loss=0.01406, audio_tagging_loss=0.01406, over 24072.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4948795.50 frames. ], batch size: 100, lr: 7.28e-03, grad_scale: 64.0 2023-12-22 07:05:05,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=461160.0, ans=0.1 2023-12-22 07:05:06,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=461160.0, ans=0.125 2023-12-22 07:05:17,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2023-12-22 07:05:19,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=461226.6666666667, ans=0.0 2023-12-22 07:05:36,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=461360.0, ans=0.1 2023-12-22 07:05:37,634 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2023-12-22 07:05:41,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=461360.0, ans=0.125 2023-12-22 07:05:48,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=461426.6666666667, ans=0.0 2023-12-22 07:05:53,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=461426.6666666667, ans=0.125 2023-12-22 07:05:56,310 INFO [train.py:886] (0/4) Epoch 15, batch 2500, loss[loss=0.01461, audio_tagging_loss=0.01461, over 25000.00 frames. ], tot_loss[loss=0.01432, audio_tagging_loss=0.01432, over 4946385.31 frames. ], batch size: 100, lr: 7.27e-03, grad_scale: 64.0 2023-12-22 07:06:02,975 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.485e+01 2.837e+01 2.934e+01 3.109e+01 3.960e+01, threshold=5.868e+01, percent-clipped=0.0 2023-12-22 07:06:08,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=461560.0, ans=0.0 2023-12-22 07:06:29,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.61 vs. limit=15.0 2023-12-22 07:06:44,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=461760.0, ans=0.09899494936611666 2023-12-22 07:06:47,950 INFO [train.py:886] (0/4) Epoch 15, batch 2550, loss[loss=0.0129, audio_tagging_loss=0.0129, over 25000.00 frames. ], tot_loss[loss=0.01432, audio_tagging_loss=0.01432, over 4941164.79 frames. ], batch size: 100, lr: 7.27e-03, grad_scale: 64.0 2023-12-22 07:06:48,139 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:06:54,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=461826.6666666667, ans=0.2 2023-12-22 07:07:14,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=461960.0, ans=0.0 2023-12-22 07:07:30,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=462093.3333333333, ans=0.0 2023-12-22 07:07:32,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=462093.3333333333, ans=0.1 2023-12-22 07:07:35,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=462093.3333333333, ans=0.125 2023-12-22 07:07:39,799 INFO [train.py:886] (0/4) Epoch 15, batch 2600, loss[loss=0.01565, audio_tagging_loss=0.01565, over 25000.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4939416.59 frames. ], batch size: 100, lr: 7.27e-03, grad_scale: 64.0 2023-12-22 07:07:40,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=462160.0, ans=0.0 2023-12-22 07:07:43,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=462160.0, ans=0.0 2023-12-22 07:07:45,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.62 vs. limit=22.5 2023-12-22 07:07:47,084 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+01 2.786e+01 2.944e+01 3.057e+01 3.830e+01, threshold=5.888e+01, percent-clipped=0.0 2023-12-22 07:07:58,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=462226.6666666667, ans=0.125 2023-12-22 07:08:02,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.35 vs. limit=10.0 2023-12-22 07:08:04,725 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:08:06,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=462293.3333333333, ans=0.2 2023-12-22 07:08:32,451 INFO [train.py:886] (0/4) Epoch 15, batch 2650, loss[loss=0.01703, audio_tagging_loss=0.01703, over 25000.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4943156.22 frames. ], batch size: 100, lr: 7.27e-03, grad_scale: 64.0 2023-12-22 07:08:48,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=462560.0, ans=0.0 2023-12-22 07:09:14,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=462760.0, ans=0.125 2023-12-22 07:09:16,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=26.00 vs. limit=22.5 2023-12-22 07:09:17,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=462760.0, ans=0.2 2023-12-22 07:09:24,936 INFO [train.py:886] (0/4) Epoch 15, batch 2700, loss[loss=0.01435, audio_tagging_loss=0.01435, over 25000.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4948240.15 frames. ], batch size: 100, lr: 7.26e-03, grad_scale: 64.0 2023-12-22 07:09:32,224 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.416e+01 2.720e+01 2.866e+01 2.980e+01 3.396e+01, threshold=5.733e+01, percent-clipped=0.0 2023-12-22 07:09:52,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=462960.0, ans=0.125 2023-12-22 07:09:53,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=462960.0, ans=0.125 2023-12-22 07:09:59,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=463026.6666666667, ans=0.2 2023-12-22 07:10:16,478 INFO [train.py:886] (0/4) Epoch 15, batch 2750, loss[loss=0.01605, audio_tagging_loss=0.01605, over 24915.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4953839.85 frames. ], batch size: 100, lr: 7.26e-03, grad_scale: 64.0 2023-12-22 07:10:18,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=463160.0, ans=0.0 2023-12-22 07:10:25,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463160.0, ans=0.1 2023-12-22 07:10:33,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=463226.6666666667, ans=0.125 2023-12-22 07:11:09,253 INFO [train.py:886] (0/4) Epoch 15, batch 2800, loss[loss=0.02097, audio_tagging_loss=0.02097, over 24951.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4952962.16 frames. ], batch size: 100, lr: 7.26e-03, grad_scale: 64.0 2023-12-22 07:11:13,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=463493.3333333333, ans=0.125 2023-12-22 07:11:17,365 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.486e+01 2.810e+01 2.949e+01 3.108e+01 3.472e+01, threshold=5.898e+01, percent-clipped=0.0 2023-12-22 07:11:18,723 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.12 vs. limit=22.5 2023-12-22 07:11:29,853 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=12.0 2023-12-22 07:11:35,401 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.36 vs. limit=22.5 2023-12-22 07:12:00,620 INFO [train.py:886] (0/4) Epoch 15, batch 2850, loss[loss=0.01463, audio_tagging_loss=0.01463, over 25000.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4949736.77 frames. ], batch size: 100, lr: 7.26e-03, grad_scale: 64.0 2023-12-22 07:12:04,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2023-12-22 07:12:12,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=463893.3333333333, ans=0.0 2023-12-22 07:12:18,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=463893.3333333333, ans=0.125 2023-12-22 07:12:37,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=464026.6666666667, ans=0.015 2023-12-22 07:12:48,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.36 vs. limit=22.5 2023-12-22 07:12:51,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=464160.0, ans=0.125 2023-12-22 07:12:52,599 INFO [train.py:886] (0/4) Epoch 15, batch 2900, loss[loss=0.01351, audio_tagging_loss=0.01351, over 24750.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4948691.75 frames. ], batch size: 99, lr: 7.25e-03, grad_scale: 64.0 2023-12-22 07:12:56,835 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.03 vs. limit=15.0 2023-12-22 07:13:00,912 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+01 2.777e+01 2.915e+01 3.046e+01 3.469e+01, threshold=5.830e+01, percent-clipped=0.0 2023-12-22 07:13:04,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=464226.6666666667, ans=0.0 2023-12-22 07:13:09,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=464226.6666666667, ans=0.0 2023-12-22 07:13:23,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.62 vs. limit=5.0 2023-12-22 07:13:28,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=464360.0, ans=0.07 2023-12-22 07:13:44,937 INFO [train.py:886] (0/4) Epoch 15, batch 2950, loss[loss=0.01759, audio_tagging_loss=0.01759, over 25000.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4943784.55 frames. ], batch size: 100, lr: 7.25e-03, grad_scale: 64.0 2023-12-22 07:14:08,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2023-12-22 07:14:12,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=464626.6666666667, ans=0.125 2023-12-22 07:14:16,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=464693.3333333333, ans=0.0 2023-12-22 07:14:23,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=464693.3333333333, ans=0.0 2023-12-22 07:14:36,683 INFO [train.py:886] (0/4) Epoch 15, batch 3000, loss[loss=0.01187, audio_tagging_loss=0.01187, over 24750.00 frames. ], tot_loss[loss=0.01422, audio_tagging_loss=0.01422, over 4942427.77 frames. ], batch size: 99, lr: 7.25e-03, grad_scale: 64.0 2023-12-22 07:14:36,685 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 07:14:56,751 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.0447, 3.0203, 3.2333, 3.2436, 3.2400, 3.1945, 2.9480, 2.7491], device='cuda:0') 2023-12-22 07:14:57,490 INFO [train.py:917] (0/4) Epoch 15, validation: loss=0.03387, audio_tagging_loss=0.03387, over 3737520.00 frames. 2023-12-22 07:14:57,491 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 07:15:00,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=464826.6666666667, ans=0.2 2023-12-22 07:15:05,581 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.509e+01 2.703e+01 2.839e+01 2.986e+01 3.331e+01, threshold=5.678e+01, percent-clipped=0.0 2023-12-22 07:15:23,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=464960.0, ans=0.125 2023-12-22 07:15:49,504 INFO [train.py:886] (0/4) Epoch 15, batch 3050, loss[loss=0.01657, audio_tagging_loss=0.01657, over 25000.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4946417.86 frames. ], batch size: 100, lr: 7.24e-03, grad_scale: 64.0 2023-12-22 07:15:52,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=465160.0, ans=0.2 2023-12-22 07:15:55,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=465160.0, ans=0.125 2023-12-22 07:16:01,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.23 vs. limit=15.0 2023-12-22 07:16:01,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=465226.6666666667, ans=0.125 2023-12-22 07:16:16,624 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:16:22,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=465360.0, ans=0.125 2023-12-22 07:16:38,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.85 vs. limit=15.0 2023-12-22 07:16:40,702 INFO [train.py:886] (0/4) Epoch 15, batch 3100, loss[loss=0.01373, audio_tagging_loss=0.01373, over 25000.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4947951.63 frames. ], batch size: 100, lr: 7.24e-03, grad_scale: 64.0 2023-12-22 07:16:49,701 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.496e+01 2.762e+01 2.896e+01 3.048e+01 3.549e+01, threshold=5.793e+01, percent-clipped=0.0 2023-12-22 07:17:00,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=465560.0, ans=0.125 2023-12-22 07:17:12,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.50 vs. limit=10.0 2023-12-22 07:17:24,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=465760.0, ans=0.0 2023-12-22 07:17:33,962 INFO [train.py:886] (0/4) Epoch 15, batch 3150, loss[loss=0.0131, audio_tagging_loss=0.0131, over 24750.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4943825.33 frames. ], batch size: 99, lr: 7.24e-03, grad_scale: 64.0 2023-12-22 07:17:36,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=465826.6666666667, ans=0.125 2023-12-22 07:17:53,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=465960.0, ans=0.0 2023-12-22 07:17:58,062 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=9.019e-02 2023-12-22 07:18:00,126 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.12 vs. limit=6.0 2023-12-22 07:18:25,587 INFO [train.py:886] (0/4) Epoch 15, batch 3200, loss[loss=0.01308, audio_tagging_loss=0.01308, over 25000.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4940865.67 frames. ], batch size: 100, lr: 7.24e-03, grad_scale: 64.0 2023-12-22 07:18:29,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466160.0, ans=0.1 2023-12-22 07:18:31,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=466160.0, ans=0.125 2023-12-22 07:18:34,466 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.409e+01 2.747e+01 2.900e+01 3.032e+01 3.537e+01, threshold=5.801e+01, percent-clipped=0.0 2023-12-22 07:19:00,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=466360.0, ans=0.125 2023-12-22 07:19:06,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=466426.6666666667, ans=0.125 2023-12-22 07:19:13,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=466426.6666666667, ans=0.125 2023-12-22 07:19:16,943 INFO [train.py:886] (0/4) Epoch 15, batch 3250, loss[loss=0.01771, audio_tagging_loss=0.01771, over 25000.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4941369.68 frames. ], batch size: 100, lr: 7.23e-03, grad_scale: 64.0 2023-12-22 07:19:21,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=466493.3333333333, ans=0.125 2023-12-22 07:19:23,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=466493.3333333333, ans=0.125 2023-12-22 07:19:29,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=466560.0, ans=0.04949747468305833 2023-12-22 07:19:45,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=466626.6666666667, ans=0.0 2023-12-22 07:19:54,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=466693.3333333333, ans=0.2 2023-12-22 07:20:05,804 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:20:09,536 INFO [train.py:886] (0/4) Epoch 15, batch 3300, loss[loss=0.01343, audio_tagging_loss=0.01343, over 25000.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4941633.99 frames. ], batch size: 100, lr: 7.23e-03, grad_scale: 64.0 2023-12-22 07:20:13,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=466826.6666666667, ans=0.0 2023-12-22 07:20:13,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2023-12-22 07:20:14,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=466826.6666666667, ans=0.0 2023-12-22 07:20:17,791 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.385e+01 2.692e+01 2.863e+01 2.982e+01 3.634e+01, threshold=5.726e+01, percent-clipped=0.0 2023-12-22 07:20:18,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2023-12-22 07:20:32,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=466960.0, ans=0.125 2023-12-22 07:20:33,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=466960.0, ans=0.07 2023-12-22 07:20:33,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.79 vs. limit=15.0 2023-12-22 07:20:35,433 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:20:52,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=467093.3333333333, ans=0.125 2023-12-22 07:21:00,884 INFO [train.py:886] (0/4) Epoch 15, batch 3350, loss[loss=0.01412, audio_tagging_loss=0.01412, over 25000.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4948739.30 frames. ], batch size: 100, lr: 7.23e-03, grad_scale: 64.0 2023-12-22 07:21:22,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=467293.3333333333, ans=0.125 2023-12-22 07:21:36,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=467360.0, ans=0.0 2023-12-22 07:21:53,187 INFO [train.py:886] (0/4) Epoch 15, batch 3400, loss[loss=0.01458, audio_tagging_loss=0.01458, over 25000.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4951328.72 frames. ], batch size: 100, lr: 7.23e-03, grad_scale: 64.0 2023-12-22 07:22:00,730 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.501e+01 2.741e+01 2.914e+01 3.061e+01 3.505e+01, threshold=5.827e+01, percent-clipped=0.0 2023-12-22 07:22:01,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=467493.3333333333, ans=0.125 2023-12-22 07:22:11,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=467560.0, ans=0.125 2023-12-22 07:22:23,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=467693.3333333333, ans=0.0 2023-12-22 07:22:28,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=467693.3333333333, ans=0.09899494936611666 2023-12-22 07:22:44,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=467826.6666666667, ans=15.0 2023-12-22 07:22:44,532 INFO [train.py:886] (0/4) Epoch 15, batch 3450, loss[loss=0.013, audio_tagging_loss=0.013, over 24750.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4943872.27 frames. ], batch size: 99, lr: 7.22e-03, grad_scale: 64.0 2023-12-22 07:22:49,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=467826.6666666667, ans=0.125 2023-12-22 07:22:52,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=467826.6666666667, ans=0.125 2023-12-22 07:23:36,046 INFO [train.py:886] (0/4) Epoch 15, batch 3500, loss[loss=0.01531, audio_tagging_loss=0.01531, over 24750.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4945269.87 frames. ], batch size: 99, lr: 7.22e-03, grad_scale: 64.0 2023-12-22 07:23:39,098 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:23:40,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=468160.0, ans=0.0 2023-12-22 07:23:43,627 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+01 2.815e+01 2.965e+01 3.096e+01 3.766e+01, threshold=5.930e+01, percent-clipped=0.0 2023-12-22 07:23:44,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=468226.6666666667, ans=0.125 2023-12-22 07:24:06,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=468360.0, ans=0.0 2023-12-22 07:24:13,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=468360.0, ans=0.07 2023-12-22 07:24:16,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=468426.6666666667, ans=0.0 2023-12-22 07:24:28,364 INFO [train.py:886] (0/4) Epoch 15, batch 3550, loss[loss=0.01434, audio_tagging_loss=0.01434, over 24750.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4939723.26 frames. ], batch size: 99, lr: 7.22e-03, grad_scale: 64.0 2023-12-22 07:24:28,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=468493.3333333333, ans=0.2 2023-12-22 07:24:34,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=468493.3333333333, ans=0.125 2023-12-22 07:24:38,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=468560.0, ans=0.5 2023-12-22 07:24:39,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=468560.0, ans=0.0 2023-12-22 07:24:50,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.14 vs. limit=15.0 2023-12-22 07:24:56,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=468626.6666666667, ans=0.125 2023-12-22 07:25:05,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.85 vs. limit=22.5 2023-12-22 07:25:13,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=468760.0, ans=0.2 2023-12-22 07:25:19,405 INFO [train.py:886] (0/4) Epoch 15, batch 3600, loss[loss=0.0119, audio_tagging_loss=0.0119, over 25000.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4946763.48 frames. ], batch size: 100, lr: 7.22e-03, grad_scale: 64.0 2023-12-22 07:25:28,301 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.699e+01 2.856e+01 3.019e+01 3.433e+01, threshold=5.713e+01, percent-clipped=0.0 2023-12-22 07:25:34,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=468893.3333333333, ans=0.125 2023-12-22 07:25:50,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=469026.6666666667, ans=0.125 2023-12-22 07:25:52,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=469026.6666666667, ans=0.125 2023-12-22 07:26:11,393 INFO [train.py:886] (0/4) Epoch 15, batch 3650, loss[loss=0.01386, audio_tagging_loss=0.01386, over 25000.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4954356.18 frames. ], batch size: 100, lr: 7.21e-03, grad_scale: 64.0 2023-12-22 07:26:27,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=469226.6666666667, ans=0.5 2023-12-22 07:26:42,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=469360.0, ans=0.0 2023-12-22 07:26:47,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.59 vs. limit=6.0 2023-12-22 07:26:52,888 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-12-22 07:27:01,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=469493.3333333333, ans=0.125 2023-12-22 07:27:02,573 INFO [train.py:886] (0/4) Epoch 15, batch 3700, loss[loss=0.01545, audio_tagging_loss=0.01545, over 24750.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4956817.37 frames. ], batch size: 99, lr: 7.21e-03, grad_scale: 64.0 2023-12-22 07:27:11,637 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+01 2.741e+01 2.872e+01 3.028e+01 3.371e+01, threshold=5.744e+01, percent-clipped=0.0 2023-12-22 07:27:51,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=469760.0, ans=0.125 2023-12-22 07:27:55,259 INFO [train.py:886] (0/4) Epoch 15, batch 3750, loss[loss=0.01563, audio_tagging_loss=0.01563, over 24750.00 frames. ], tot_loss[loss=0.0143, audio_tagging_loss=0.0143, over 4958416.06 frames. ], batch size: 99, lr: 7.21e-03, grad_scale: 64.0 2023-12-22 07:28:04,117 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.27 vs. limit=22.5 2023-12-22 07:28:32,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.34 vs. limit=10.0 2023-12-22 07:28:34,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=470093.3333333333, ans=0.2 2023-12-22 07:28:41,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2023-12-22 07:28:47,224 INFO [train.py:886] (0/4) Epoch 15, batch 3800, loss[loss=0.01318, audio_tagging_loss=0.01318, over 24750.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 4955093.66 frames. ], batch size: 99, lr: 7.21e-03, grad_scale: 64.0 2023-12-22 07:28:49,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=470160.0, ans=0.125 2023-12-22 07:28:55,413 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+01 2.844e+01 2.987e+01 3.173e+01 3.633e+01, threshold=5.973e+01, percent-clipped=0.0 2023-12-22 07:29:04,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=470226.6666666667, ans=0.0 2023-12-22 07:29:19,528 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:29:37,986 INFO [train.py:886] (0/4) Epoch 15, batch 3850, loss[loss=0.01288, audio_tagging_loss=0.01288, over 24750.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4950231.62 frames. ], batch size: 99, lr: 7.20e-03, grad_scale: 64.0 2023-12-22 07:29:44,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.63 vs. limit=15.0 2023-12-22 07:29:47,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=470493.3333333333, ans=0.02 2023-12-22 07:29:53,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=470560.0, ans=0.0 2023-12-22 07:30:08,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=470693.3333333333, ans=0.125 2023-12-22 07:30:20,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=470760.0, ans=0.1 2023-12-22 07:30:30,443 INFO [train.py:886] (0/4) Epoch 15, batch 3900, loss[loss=0.01104, audio_tagging_loss=0.01104, over 24750.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4952587.08 frames. ], batch size: 99, lr: 7.20e-03, grad_scale: 64.0 2023-12-22 07:30:34,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=470826.6666666667, ans=0.1 2023-12-22 07:30:38,599 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 2.742e+01 2.857e+01 3.020e+01 3.655e+01, threshold=5.714e+01, percent-clipped=0.0 2023-12-22 07:30:39,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=470893.3333333333, ans=0.125 2023-12-22 07:30:42,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=470893.3333333333, ans=0.2 2023-12-22 07:30:46,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=470893.3333333333, ans=0.07 2023-12-22 07:31:13,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471093.3333333333, ans=0.1 2023-12-22 07:31:22,514 INFO [train.py:886] (0/4) Epoch 15, batch 3950, loss[loss=0.01467, audio_tagging_loss=0.01467, over 25000.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4954004.25 frames. ], batch size: 100, lr: 7.20e-03, grad_scale: 64.0 2023-12-22 07:31:26,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=471160.0, ans=0.0 2023-12-22 07:31:27,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=471160.0, ans=0.125 2023-12-22 07:31:27,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=471160.0, ans=0.5 2023-12-22 07:31:35,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=471226.6666666667, ans=0.2 2023-12-22 07:31:40,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=471226.6666666667, ans=0.125 2023-12-22 07:31:45,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=471293.3333333333, ans=0.1 2023-12-22 07:31:45,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=471293.3333333333, ans=0.0 2023-12-22 07:32:01,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=471360.0, ans=0.125 2023-12-22 07:32:11,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-12-22 07:32:13,936 INFO [train.py:886] (0/4) Epoch 15, batch 4000, loss[loss=0.01476, audio_tagging_loss=0.01476, over 25000.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4959004.52 frames. ], batch size: 100, lr: 7.20e-03, grad_scale: 64.0 2023-12-22 07:32:15,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=471493.3333333333, ans=0.125 2023-12-22 07:32:20,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=471493.3333333333, ans=0.0 2023-12-22 07:32:21,503 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.461e+01 2.772e+01 2.863e+01 2.977e+01 3.465e+01, threshold=5.726e+01, percent-clipped=0.0 2023-12-22 07:32:25,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=471560.0, ans=0.1 2023-12-22 07:32:29,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=471560.0, ans=0.07 2023-12-22 07:32:41,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=471626.6666666667, ans=0.0 2023-12-22 07:32:48,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=22.5 2023-12-22 07:32:49,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=15.0 2023-12-22 07:32:55,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=471760.0, ans=0.125 2023-12-22 07:32:56,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=471760.0, ans=0.0 2023-12-22 07:33:05,120 INFO [train.py:886] (0/4) Epoch 15, batch 4050, loss[loss=0.01487, audio_tagging_loss=0.01487, over 24750.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4955699.07 frames. ], batch size: 99, lr: 7.19e-03, grad_scale: 64.0 2023-12-22 07:33:12,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-12-22 07:33:16,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=471893.3333333333, ans=0.2 2023-12-22 07:33:23,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=471893.3333333333, ans=0.2 2023-12-22 07:33:26,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=471960.0, ans=0.1 2023-12-22 07:33:31,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=471960.0, ans=0.125 2023-12-22 07:33:32,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=471960.0, ans=0.125 2023-12-22 07:33:38,607 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:33:38,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.01 vs. limit=22.5 2023-12-22 07:33:40,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=472026.6666666667, ans=0.0 2023-12-22 07:33:57,426 INFO [train.py:886] (0/4) Epoch 15, batch 4100, loss[loss=0.01228, audio_tagging_loss=0.01228, over 24750.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4945644.75 frames. ], batch size: 99, lr: 7.19e-03, grad_scale: 64.0 2023-12-22 07:34:00,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=472160.0, ans=0.125 2023-12-22 07:34:04,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=472160.0, ans=0.125 2023-12-22 07:34:05,959 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.602e+01 2.831e+01 2.961e+01 3.185e+01 3.752e+01, threshold=5.922e+01, percent-clipped=0.0 2023-12-22 07:34:08,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=472226.6666666667, ans=0.0 2023-12-22 07:34:12,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=472226.6666666667, ans=0.035 2023-12-22 07:34:37,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=472360.0, ans=0.125 2023-12-22 07:34:38,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2023-12-22 07:34:40,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=472426.6666666667, ans=0.125 2023-12-22 07:34:46,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=472426.6666666667, ans=0.0 2023-12-22 07:34:48,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2023-12-22 07:34:49,349 INFO [train.py:886] (0/4) Epoch 15, batch 4150, loss[loss=0.01495, audio_tagging_loss=0.01495, over 24750.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4945758.17 frames. ], batch size: 99, lr: 7.19e-03, grad_scale: 64.0 2023-12-22 07:34:49,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=472493.3333333333, ans=0.0 2023-12-22 07:34:52,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=472493.3333333333, ans=0.0 2023-12-22 07:35:06,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=472560.0, ans=0.0 2023-12-22 07:35:15,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=472626.6666666667, ans=0.05 2023-12-22 07:35:19,502 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.01 vs. limit=6.0 2023-12-22 07:35:36,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.43 vs. limit=15.0 2023-12-22 07:35:36,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2023-12-22 07:35:40,938 INFO [train.py:886] (0/4) Epoch 15, batch 4200, loss[loss=0.01344, audio_tagging_loss=0.01344, over 25000.00 frames. ], tot_loss[loss=0.0143, audio_tagging_loss=0.0143, over 4943110.70 frames. ], batch size: 100, lr: 7.19e-03, grad_scale: 64.0 2023-12-22 07:35:42,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=472826.6666666667, ans=0.2 2023-12-22 07:35:48,443 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.360e+01 2.733e+01 2.855e+01 3.027e+01 3.631e+01, threshold=5.711e+01, percent-clipped=0.0 2023-12-22 07:35:52,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=472893.3333333333, ans=0.0 2023-12-22 07:36:05,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2023-12-22 07:36:07,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.76 vs. limit=15.0 2023-12-22 07:36:11,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=473026.6666666667, ans=0.1 2023-12-22 07:36:20,246 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=7.986e-03 2023-12-22 07:36:23,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=473093.3333333333, ans=0.125 2023-12-22 07:36:26,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2023-12-22 07:36:32,702 INFO [train.py:886] (0/4) Epoch 15, batch 4250, loss[loss=0.01519, audio_tagging_loss=0.01519, over 24750.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4947842.48 frames. ], batch size: 99, lr: 7.18e-03, grad_scale: 64.0 2023-12-22 07:36:39,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=473160.0, ans=0.2 2023-12-22 07:36:42,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=473226.6666666667, ans=0.125 2023-12-22 07:36:50,737 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.90 vs. limit=15.0 2023-12-22 07:37:05,924 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.10 vs. limit=22.5 2023-12-22 07:37:24,535 INFO [train.py:886] (0/4) Epoch 15, batch 4300, loss[loss=0.01637, audio_tagging_loss=0.01637, over 25000.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4953809.56 frames. ], batch size: 100, lr: 7.18e-03, grad_scale: 64.0 2023-12-22 07:37:27,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=473493.3333333333, ans=0.1 2023-12-22 07:37:32,803 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.452e+01 2.780e+01 2.893e+01 3.020e+01 3.657e+01, threshold=5.787e+01, percent-clipped=0.0 2023-12-22 07:37:40,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=473560.0, ans=0.2 2023-12-22 07:37:46,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=473626.6666666667, ans=0.125 2023-12-22 07:37:47,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.06 vs. limit=10.0 2023-12-22 07:37:54,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=473693.3333333333, ans=0.1 2023-12-22 07:38:00,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=473693.3333333333, ans=0.125 2023-12-22 07:38:16,789 INFO [train.py:886] (0/4) Epoch 15, batch 4350, loss[loss=0.01601, audio_tagging_loss=0.01601, over 25000.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4949120.52 frames. ], batch size: 100, lr: 7.18e-03, grad_scale: 64.0 2023-12-22 07:38:34,265 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.68 vs. limit=22.5 2023-12-22 07:38:37,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=473960.0, ans=0.125 2023-12-22 07:38:48,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=474026.6666666667, ans=0.95 2023-12-22 07:38:50,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=474026.6666666667, ans=0.0 2023-12-22 07:38:55,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=474026.6666666667, ans=0.125 2023-12-22 07:39:00,250 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.520e-03 2023-12-22 07:39:08,315 INFO [train.py:886] (0/4) Epoch 15, batch 4400, loss[loss=0.01598, audio_tagging_loss=0.01598, over 24750.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4951958.92 frames. ], batch size: 99, lr: 7.18e-03, grad_scale: 64.0 2023-12-22 07:39:16,463 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.580e+01 2.787e+01 2.952e+01 3.076e+01 3.640e+01, threshold=5.903e+01, percent-clipped=0.0 2023-12-22 07:39:23,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=474226.6666666667, ans=0.125 2023-12-22 07:39:37,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=474360.0, ans=0.125 2023-12-22 07:39:55,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=474426.6666666667, ans=0.125 2023-12-22 07:39:59,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.95 vs. limit=22.5 2023-12-22 07:40:00,415 INFO [train.py:886] (0/4) Epoch 15, batch 4450, loss[loss=0.01415, audio_tagging_loss=0.01415, over 24750.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4943901.43 frames. ], batch size: 99, lr: 7.17e-03, grad_scale: 64.0 2023-12-22 07:40:08,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=474493.3333333333, ans=0.125 2023-12-22 07:40:37,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=474693.3333333333, ans=0.0 2023-12-22 07:40:51,810 INFO [train.py:886] (0/4) Epoch 15, batch 4500, loss[loss=0.01165, audio_tagging_loss=0.01165, over 25000.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4946248.99 frames. ], batch size: 100, lr: 7.17e-03, grad_scale: 64.0 2023-12-22 07:40:52,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=474826.6666666667, ans=0.5 2023-12-22 07:41:00,678 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.363e+01 2.748e+01 2.874e+01 3.023e+01 3.451e+01, threshold=5.747e+01, percent-clipped=0.0 2023-12-22 07:41:41,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=475093.3333333333, ans=0.125 2023-12-22 07:41:43,286 INFO [train.py:886] (0/4) Epoch 15, batch 4550, loss[loss=0.01278, audio_tagging_loss=0.01278, over 24750.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4947125.36 frames. ], batch size: 99, lr: 7.17e-03, grad_scale: 64.0 2023-12-22 07:41:43,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=475160.0, ans=0.2 2023-12-22 07:41:46,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=475160.0, ans=0.04949747468305833 2023-12-22 07:41:56,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=475226.6666666667, ans=0.0 2023-12-22 07:42:03,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=475293.3333333333, ans=0.2 2023-12-22 07:42:06,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.98 vs. limit=22.5 2023-12-22 07:42:07,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=475293.3333333333, ans=0.2 2023-12-22 07:42:14,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2023-12-22 07:42:25,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=475426.6666666667, ans=0.0 2023-12-22 07:42:26,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=475426.6666666667, ans=0.0 2023-12-22 07:42:28,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=475426.6666666667, ans=0.0 2023-12-22 07:42:35,157 INFO [train.py:886] (0/4) Epoch 15, batch 4600, loss[loss=0.01339, audio_tagging_loss=0.01339, over 24750.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4943187.42 frames. ], batch size: 99, lr: 7.17e-03, grad_scale: 64.0 2023-12-22 07:42:38,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=475493.3333333333, ans=0.125 2023-12-22 07:42:41,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=475493.3333333333, ans=0.125 2023-12-22 07:42:43,595 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+01 2.790e+01 2.912e+01 3.073e+01 3.704e+01, threshold=5.824e+01, percent-clipped=0.0 2023-12-22 07:43:15,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=475693.3333333333, ans=0.125 2023-12-22 07:43:24,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=475760.0, ans=0.1 2023-12-22 07:43:27,434 INFO [train.py:886] (0/4) Epoch 15, batch 4650, loss[loss=0.01299, audio_tagging_loss=0.01299, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4954250.68 frames. ], batch size: 100, lr: 7.16e-03, grad_scale: 64.0 2023-12-22 07:43:27,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=475826.6666666667, ans=0.125 2023-12-22 07:44:03,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=12.0 2023-12-22 07:44:10,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=476093.3333333333, ans=0.125 2023-12-22 07:44:14,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=476093.3333333333, ans=0.125 2023-12-22 07:44:17,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=476160.0, ans=0.125 2023-12-22 07:44:18,132 INFO [train.py:886] (0/4) Epoch 15, batch 4700, loss[loss=0.01435, audio_tagging_loss=0.01435, over 24750.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4952169.45 frames. ], batch size: 99, lr: 7.16e-03, grad_scale: 64.0 2023-12-22 07:44:26,797 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.571e+01 2.835e+01 2.953e+01 3.126e+01 3.694e+01, threshold=5.906e+01, percent-clipped=0.0 2023-12-22 07:44:27,242 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2023-12-22 07:44:58,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=476426.6666666667, ans=0.125 2023-12-22 07:45:03,327 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.89 vs. limit=15.0 2023-12-22 07:45:05,689 INFO [train.py:886] (0/4) Epoch 15, batch 4750, loss[loss=0.01608, audio_tagging_loss=0.01608, over 24750.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 4947336.78 frames. ], batch size: 99, lr: 7.16e-03, grad_scale: 128.0 2023-12-22 07:45:05,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=476493.3333333333, ans=0.125 2023-12-22 07:45:08,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.24 vs. limit=15.0 2023-12-22 07:45:12,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=476493.3333333333, ans=0.04949747468305833 2023-12-22 07:45:21,224 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-15.pt 2023-12-22 07:45:42,780 INFO [train.py:886] (0/4) Epoch 16, batch 0, loss[loss=0.03577, audio_tagging_loss=0.03577, over 21300.00 frames. ], tot_loss[loss=0.03577, audio_tagging_loss=0.03577, over 21300.00 frames. ], batch size: 107, lr: 6.93e-03, grad_scale: 32.0 2023-12-22 07:45:42,781 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 07:46:04,063 INFO [train.py:917] (0/4) Epoch 16, validation: loss=0.03318, audio_tagging_loss=0.03318, over 3737520.00 frames. 2023-12-22 07:46:04,064 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 07:46:05,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.00 vs. limit=10.0 2023-12-22 07:46:11,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=476600.0, ans=0.0 2023-12-22 07:46:12,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=476600.0, ans=0.2 2023-12-22 07:46:14,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=476666.6666666667, ans=0.125 2023-12-22 07:46:24,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=476733.3333333333, ans=0.125 2023-12-22 07:46:27,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=15.0 2023-12-22 07:46:40,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=476800.0, ans=0.125 2023-12-22 07:46:49,656 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.572e+01 2.881e+01 3.132e+01 4.118e+01 9.111e+01, threshold=6.264e+01, percent-clipped=8.0 2023-12-22 07:46:53,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=476866.6666666667, ans=0.1 2023-12-22 07:46:53,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=476866.6666666667, ans=0.0 2023-12-22 07:46:55,362 INFO [train.py:886] (0/4) Epoch 16, batch 50, loss[loss=0.02063, audio_tagging_loss=0.02063, over 25000.00 frames. ], tot_loss[loss=0.02226, audio_tagging_loss=0.02226, over 1116953.69 frames. ], batch size: 100, lr: 6.93e-03, grad_scale: 32.0 2023-12-22 07:47:05,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2023-12-22 07:47:41,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=477200.0, ans=0.0 2023-12-22 07:47:47,679 INFO [train.py:886] (0/4) Epoch 16, batch 100, loss[loss=0.01652, audio_tagging_loss=0.01652, over 24750.00 frames. ], tot_loss[loss=0.01943, audio_tagging_loss=0.01943, over 1975676.00 frames. ], batch size: 99, lr: 6.92e-03, grad_scale: 32.0 2023-12-22 07:48:14,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=477400.0, ans=0.125 2023-12-22 07:48:25,222 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2023-12-22 07:48:33,203 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.658e+01 3.003e+01 3.220e+01 3.387e+01 3.937e+01, threshold=6.439e+01, percent-clipped=0.0 2023-12-22 07:48:38,854 INFO [train.py:886] (0/4) Epoch 16, batch 150, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.01784, audio_tagging_loss=0.01784, over 2642305.30 frames. ], batch size: 100, lr: 6.92e-03, grad_scale: 32.0 2023-12-22 07:48:46,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=477600.0, ans=0.125 2023-12-22 07:48:59,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=477733.3333333333, ans=0.125 2023-12-22 07:49:06,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=477733.3333333333, ans=0.125 2023-12-22 07:49:16,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=477800.0, ans=15.0 2023-12-22 07:49:31,231 INFO [train.py:886] (0/4) Epoch 16, batch 200, loss[loss=0.01691, audio_tagging_loss=0.01691, over 25000.00 frames. ], tot_loss[loss=0.01678, audio_tagging_loss=0.01678, over 3152438.93 frames. ], batch size: 100, lr: 6.92e-03, grad_scale: 32.0 2023-12-22 07:49:47,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=478000.0, ans=0.125 2023-12-22 07:49:50,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=478066.6666666667, ans=0.2 2023-12-22 07:50:08,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.48 vs. limit=15.0 2023-12-22 07:50:09,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=478133.3333333333, ans=0.0 2023-12-22 07:50:14,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=15.0 2023-12-22 07:50:16,273 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.531e+01 2.761e+01 2.944e+01 3.069e+01 3.550e+01, threshold=5.887e+01, percent-clipped=0.0 2023-12-22 07:50:16,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=478200.0, ans=0.125 2023-12-22 07:50:22,659 INFO [train.py:886] (0/4) Epoch 16, batch 250, loss[loss=0.01225, audio_tagging_loss=0.01225, over 25000.00 frames. ], tot_loss[loss=0.01594, audio_tagging_loss=0.01594, over 3557944.04 frames. ], batch size: 100, lr: 6.92e-03, grad_scale: 32.0 2023-12-22 07:50:30,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=478266.6666666667, ans=0.125 2023-12-22 07:50:34,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=478333.3333333333, ans=0.2 2023-12-22 07:50:42,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=478333.3333333333, ans=0.2 2023-12-22 07:50:45,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=478400.0, ans=0.125 2023-12-22 07:50:45,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=478400.0, ans=0.2 2023-12-22 07:50:49,736 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.634e-03 2023-12-22 07:50:57,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=478466.6666666667, ans=0.2 2023-12-22 07:50:59,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=478466.6666666667, ans=0.1 2023-12-22 07:51:06,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=478533.3333333333, ans=0.125 2023-12-22 07:51:07,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=478533.3333333333, ans=0.125 2023-12-22 07:51:15,085 INFO [train.py:886] (0/4) Epoch 16, batch 300, loss[loss=0.01659, audio_tagging_loss=0.01659, over 24750.00 frames. ], tot_loss[loss=0.01554, audio_tagging_loss=0.01554, over 3868237.11 frames. ], batch size: 99, lr: 6.91e-03, grad_scale: 32.0 2023-12-22 07:51:15,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=478600.0, ans=0.1 2023-12-22 07:51:19,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=478600.0, ans=0.0 2023-12-22 07:51:20,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=478600.0, ans=0.125 2023-12-22 07:51:39,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=478733.3333333333, ans=0.125 2023-12-22 07:52:00,601 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.509e+01 2.872e+01 2.968e+01 3.150e+01 3.579e+01, threshold=5.936e+01, percent-clipped=0.0 2023-12-22 07:52:07,796 INFO [train.py:886] (0/4) Epoch 16, batch 350, loss[loss=0.01334, audio_tagging_loss=0.01334, over 24750.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4100231.41 frames. ], batch size: 99, lr: 6.91e-03, grad_scale: 32.0 2023-12-22 07:52:09,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=478933.3333333333, ans=10.0 2023-12-22 07:52:16,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=478933.3333333333, ans=0.125 2023-12-22 07:52:59,651 INFO [train.py:886] (0/4) Epoch 16, batch 400, loss[loss=0.01404, audio_tagging_loss=0.01404, over 25000.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4288398.75 frames. ], batch size: 100, lr: 6.91e-03, grad_scale: 32.0 2023-12-22 07:53:10,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.07 vs. limit=15.0 2023-12-22 07:53:45,880 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.751e+01 2.885e+01 3.045e+01 3.501e+01, threshold=5.769e+01, percent-clipped=0.0 2023-12-22 07:53:49,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=479533.3333333333, ans=0.125 2023-12-22 07:53:51,539 INFO [train.py:886] (0/4) Epoch 16, batch 450, loss[loss=0.01462, audio_tagging_loss=0.01462, over 23960.00 frames. ], tot_loss[loss=0.01471, audio_tagging_loss=0.01471, over 4434296.93 frames. ], batch size: 100, lr: 6.91e-03, grad_scale: 32.0 2023-12-22 07:54:02,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=479666.6666666667, ans=0.125 2023-12-22 07:54:09,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=479666.6666666667, ans=0.0 2023-12-22 07:54:13,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.40 vs. limit=10.0 2023-12-22 07:54:37,912 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.92 vs. limit=15.0 2023-12-22 07:54:43,144 INFO [train.py:886] (0/4) Epoch 16, batch 500, loss[loss=0.01424, audio_tagging_loss=0.01424, over 25000.00 frames. ], tot_loss[loss=0.01447, audio_tagging_loss=0.01447, over 4549989.89 frames. ], batch size: 100, lr: 6.91e-03, grad_scale: 32.0 2023-12-22 07:54:43,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=479933.3333333333, ans=0.125 2023-12-22 07:54:47,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.64 vs. limit=15.0 2023-12-22 07:54:53,292 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-72000.pt 2023-12-22 07:55:01,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=480000.0, ans=0.125 2023-12-22 07:55:31,300 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 2.740e+01 2.884e+01 3.004e+01 3.376e+01, threshold=5.768e+01, percent-clipped=0.0 2023-12-22 07:55:32,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2023-12-22 07:55:37,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.36 vs. limit=22.5 2023-12-22 07:55:37,646 INFO [train.py:886] (0/4) Epoch 16, batch 550, loss[loss=0.01272, audio_tagging_loss=0.01272, over 24750.00 frames. ], tot_loss[loss=0.01438, audio_tagging_loss=0.01438, over 4644055.70 frames. ], batch size: 99, lr: 6.90e-03, grad_scale: 32.0 2023-12-22 07:55:51,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.51 vs. limit=12.0 2023-12-22 07:55:57,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=480333.3333333333, ans=0.125 2023-12-22 07:56:04,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-12-22 07:56:08,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=480466.6666666667, ans=0.0 2023-12-22 07:56:13,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=480466.6666666667, ans=0.0 2023-12-22 07:56:29,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=480600.0, ans=0.125 2023-12-22 07:56:29,824 INFO [train.py:886] (0/4) Epoch 16, batch 600, loss[loss=0.01394, audio_tagging_loss=0.01394, over 25000.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 4714020.71 frames. ], batch size: 100, lr: 6.90e-03, grad_scale: 32.0 2023-12-22 07:56:36,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=480600.0, ans=0.0 2023-12-22 07:56:40,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=22.5 2023-12-22 07:56:43,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=480666.6666666667, ans=0.125 2023-12-22 07:56:44,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=480666.6666666667, ans=0.2 2023-12-22 07:57:02,675 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.05 vs. limit=15.0 2023-12-22 07:57:14,890 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.350e+01 2.812e+01 2.929e+01 3.069e+01 3.591e+01, threshold=5.857e+01, percent-clipped=0.0 2023-12-22 07:57:21,255 INFO [train.py:886] (0/4) Epoch 16, batch 650, loss[loss=0.0132, audio_tagging_loss=0.0132, over 24750.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4762963.41 frames. ], batch size: 99, lr: 6.90e-03, grad_scale: 32.0 2023-12-22 07:57:25,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=480933.3333333333, ans=0.125 2023-12-22 07:57:37,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=481000.0, ans=0.0 2023-12-22 07:57:38,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=481000.0, ans=0.125 2023-12-22 07:57:44,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=481066.6666666667, ans=0.125 2023-12-22 07:57:48,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=481066.6666666667, ans=0.125 2023-12-22 07:57:53,261 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2023-12-22 07:57:55,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=481133.3333333333, ans=0.125 2023-12-22 07:58:07,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2023-12-22 07:58:13,145 INFO [train.py:886] (0/4) Epoch 16, batch 700, loss[loss=0.01787, audio_tagging_loss=0.01787, over 21998.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 4797987.35 frames. ], batch size: 107, lr: 6.90e-03, grad_scale: 32.0 2023-12-22 07:58:42,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=481400.0, ans=0.0 2023-12-22 07:58:50,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=481466.6666666667, ans=0.125 2023-12-22 07:58:55,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=481533.3333333333, ans=0.125 2023-12-22 07:58:56,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=481533.3333333333, ans=0.2 2023-12-22 07:58:58,829 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+01 2.788e+01 2.907e+01 3.038e+01 3.343e+01, threshold=5.814e+01, percent-clipped=0.0 2023-12-22 07:59:05,259 INFO [train.py:886] (0/4) Epoch 16, batch 750, loss[loss=0.0132, audio_tagging_loss=0.0132, over 25000.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4833834.83 frames. ], batch size: 100, lr: 6.89e-03, grad_scale: 32.0 2023-12-22 07:59:08,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481600.0, ans=0.1 2023-12-22 07:59:22,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=481666.6666666667, ans=15.0 2023-12-22 07:59:38,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=481800.0, ans=0.0 2023-12-22 07:59:48,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=481866.6666666667, ans=0.2 2023-12-22 07:59:53,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=481866.6666666667, ans=0.125 2023-12-22 07:59:56,145 INFO [train.py:886] (0/4) Epoch 16, batch 800, loss[loss=0.01482, audio_tagging_loss=0.01482, over 25000.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4861709.51 frames. ], batch size: 100, lr: 6.89e-03, grad_scale: 32.0 2023-12-22 08:00:07,289 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.541e-03 2023-12-22 08:00:18,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2023-12-22 08:00:36,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=482133.3333333333, ans=0.125 2023-12-22 08:00:38,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=482200.0, ans=0.1 2023-12-22 08:00:39,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=482200.0, ans=0.125 2023-12-22 08:00:42,818 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+01 2.804e+01 2.918e+01 3.049e+01 4.122e+01, threshold=5.837e+01, percent-clipped=0.0 2023-12-22 08:00:49,187 INFO [train.py:886] (0/4) Epoch 16, batch 850, loss[loss=0.01687, audio_tagging_loss=0.01687, over 25000.00 frames. ], tot_loss[loss=0.01422, audio_tagging_loss=0.01422, over 4885581.00 frames. ], batch size: 100, lr: 6.89e-03, grad_scale: 32.0 2023-12-22 08:00:52,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=12.0 2023-12-22 08:00:53,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.04 vs. limit=10.0 2023-12-22 08:01:02,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=482333.3333333333, ans=0.125 2023-12-22 08:01:08,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482400.0, ans=0.1 2023-12-22 08:01:31,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=482533.3333333333, ans=0.125 2023-12-22 08:01:40,642 INFO [train.py:886] (0/4) Epoch 16, batch 900, loss[loss=0.01486, audio_tagging_loss=0.01486, over 24750.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4899804.25 frames. ], batch size: 99, lr: 6.89e-03, grad_scale: 32.0 2023-12-22 08:01:52,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=482666.6666666667, ans=0.125 2023-12-22 08:02:26,294 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+01 2.779e+01 2.893e+01 3.122e+01 3.718e+01, threshold=5.785e+01, percent-clipped=0.0 2023-12-22 08:02:31,925 INFO [train.py:886] (0/4) Epoch 16, batch 950, loss[loss=0.01334, audio_tagging_loss=0.01334, over 24750.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4905800.78 frames. ], batch size: 99, lr: 6.88e-03, grad_scale: 32.0 2023-12-22 08:02:52,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=483066.6666666667, ans=0.0 2023-12-22 08:03:13,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=483200.0, ans=0.0 2023-12-22 08:03:22,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483200.0, ans=0.1 2023-12-22 08:03:25,262 INFO [train.py:886] (0/4) Epoch 16, batch 1000, loss[loss=0.01234, audio_tagging_loss=0.01234, over 24750.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4910540.30 frames. ], batch size: 99, lr: 6.88e-03, grad_scale: 32.0 2023-12-22 08:03:29,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=483266.6666666667, ans=0.125 2023-12-22 08:03:38,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483333.3333333333, ans=0.1 2023-12-22 08:04:09,565 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.390e+01 2.722e+01 2.865e+01 3.109e+01 3.615e+01, threshold=5.730e+01, percent-clipped=0.0 2023-12-22 08:04:10,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.19 vs. limit=22.5 2023-12-22 08:04:15,956 INFO [train.py:886] (0/4) Epoch 16, batch 1050, loss[loss=0.01513, audio_tagging_loss=0.01513, over 25000.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4923511.55 frames. ], batch size: 100, lr: 6.88e-03, grad_scale: 32.0 2023-12-22 08:04:17,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483600.0, ans=0.1 2023-12-22 08:04:23,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2023-12-22 08:04:33,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=483666.6666666667, ans=0.0 2023-12-22 08:04:38,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=483733.3333333333, ans=0.0 2023-12-22 08:04:45,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=483800.0, ans=0.125 2023-12-22 08:04:52,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=483800.0, ans=0.125 2023-12-22 08:05:07,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=483933.3333333333, ans=0.125 2023-12-22 08:05:08,399 INFO [train.py:886] (0/4) Epoch 16, batch 1100, loss[loss=0.01574, audio_tagging_loss=0.01574, over 24750.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4928370.14 frames. ], batch size: 99, lr: 6.88e-03, grad_scale: 32.0 2023-12-22 08:05:15,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=483933.3333333333, ans=0.2 2023-12-22 08:05:15,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=483933.3333333333, ans=0.0 2023-12-22 08:05:16,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=483933.3333333333, ans=0.125 2023-12-22 08:05:16,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=483933.3333333333, ans=0.125 2023-12-22 08:05:21,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=484000.0, ans=0.125 2023-12-22 08:05:25,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=484000.0, ans=0.125 2023-12-22 08:05:25,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=484000.0, ans=0.125 2023-12-22 08:05:42,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=484133.3333333333, ans=0.125 2023-12-22 08:05:48,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=484200.0, ans=0.2 2023-12-22 08:05:53,772 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.525e+01 2.808e+01 2.906e+01 3.022e+01 3.549e+01, threshold=5.811e+01, percent-clipped=0.0 2023-12-22 08:05:59,468 INFO [train.py:886] (0/4) Epoch 16, batch 1150, loss[loss=0.01541, audio_tagging_loss=0.01541, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4935883.92 frames. ], batch size: 100, lr: 6.87e-03, grad_scale: 32.0 2023-12-22 08:06:04,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=484266.6666666667, ans=0.125 2023-12-22 08:06:10,236 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:06:20,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-12-22 08:06:21,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2023-12-22 08:06:33,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=484466.6666666667, ans=0.125 2023-12-22 08:06:50,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=484600.0, ans=0.025 2023-12-22 08:06:51,518 INFO [train.py:886] (0/4) Epoch 16, batch 1200, loss[loss=0.01583, audio_tagging_loss=0.01583, over 24750.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4948696.33 frames. ], batch size: 99, lr: 6.87e-03, grad_scale: 32.0 2023-12-22 08:06:52,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2023-12-22 08:07:00,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=484666.6666666667, ans=0.1 2023-12-22 08:07:32,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=484866.6666666667, ans=0.125 2023-12-22 08:07:35,754 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+01 2.792e+01 2.945e+01 3.119e+01 3.482e+01, threshold=5.890e+01, percent-clipped=0.0 2023-12-22 08:07:38,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=484866.6666666667, ans=0.0 2023-12-22 08:07:39,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=484866.6666666667, ans=0.2 2023-12-22 08:07:42,913 INFO [train.py:886] (0/4) Epoch 16, batch 1250, loss[loss=0.01885, audio_tagging_loss=0.01885, over 24948.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4946687.74 frames. ], batch size: 100, lr: 6.87e-03, grad_scale: 32.0 2023-12-22 08:07:47,194 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2023-12-22 08:07:49,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=484933.3333333333, ans=0.125 2023-12-22 08:07:52,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=485000.0, ans=0.125 2023-12-22 08:07:53,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=485000.0, ans=0.125 2023-12-22 08:08:18,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=485133.3333333333, ans=0.0 2023-12-22 08:08:33,970 INFO [train.py:886] (0/4) Epoch 16, batch 1300, loss[loss=0.01471, audio_tagging_loss=0.01471, over 22374.00 frames. ], tot_loss[loss=0.01438, audio_tagging_loss=0.01438, over 4941144.76 frames. ], batch size: 107, lr: 6.87e-03, grad_scale: 32.0 2023-12-22 08:08:35,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=485266.6666666667, ans=0.125 2023-12-22 08:08:51,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=485333.3333333333, ans=0.125 2023-12-22 08:09:10,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=485466.6666666667, ans=0.2 2023-12-22 08:09:20,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=485533.3333333333, ans=0.1 2023-12-22 08:09:20,874 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.809e+01 2.938e+01 3.092e+01 3.584e+01, threshold=5.876e+01, percent-clipped=0.0 2023-12-22 08:09:25,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=485600.0, ans=0.125 2023-12-22 08:09:26,552 INFO [train.py:886] (0/4) Epoch 16, batch 1350, loss[loss=0.0155, audio_tagging_loss=0.0155, over 25000.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4942011.46 frames. ], batch size: 100, lr: 6.87e-03, grad_scale: 32.0 2023-12-22 08:09:29,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=485600.0, ans=0.125 2023-12-22 08:09:30,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=485600.0, ans=0.125 2023-12-22 08:09:36,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=485666.6666666667, ans=0.025 2023-12-22 08:10:06,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=485866.6666666667, ans=0.0 2023-12-22 08:10:18,058 INFO [train.py:886] (0/4) Epoch 16, batch 1400, loss[loss=0.01432, audio_tagging_loss=0.01432, over 24750.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4944376.36 frames. ], batch size: 99, lr: 6.86e-03, grad_scale: 32.0 2023-12-22 08:10:18,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=485933.3333333333, ans=0.1 2023-12-22 08:10:20,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=485933.3333333333, ans=0.1 2023-12-22 08:10:32,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.71 vs. limit=10.0 2023-12-22 08:10:37,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.37 vs. limit=10.0 2023-12-22 08:11:00,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=486200.0, ans=0.1 2023-12-22 08:11:04,394 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.416e+01 2.787e+01 2.895e+01 3.084e+01 3.427e+01, threshold=5.790e+01, percent-clipped=0.0 2023-12-22 08:11:05,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=486200.0, ans=0.0 2023-12-22 08:11:10,098 INFO [train.py:886] (0/4) Epoch 16, batch 1450, loss[loss=0.01251, audio_tagging_loss=0.01251, over 24750.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 4948844.13 frames. ], batch size: 99, lr: 6.86e-03, grad_scale: 32.0 2023-12-22 08:11:15,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=486266.6666666667, ans=0.125 2023-12-22 08:11:17,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=486266.6666666667, ans=0.125 2023-12-22 08:11:26,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=486333.3333333333, ans=10.0 2023-12-22 08:11:39,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=486400.0, ans=0.1 2023-12-22 08:11:57,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=486533.3333333333, ans=0.02 2023-12-22 08:12:02,647 INFO [train.py:886] (0/4) Epoch 16, batch 1500, loss[loss=0.01479, audio_tagging_loss=0.01479, over 24750.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 4953687.04 frames. ], batch size: 99, lr: 6.86e-03, grad_scale: 32.0 2023-12-22 08:12:13,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=486666.6666666667, ans=0.0 2023-12-22 08:12:38,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=486800.0, ans=0.125 2023-12-22 08:12:48,467 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.524e+01 2.830e+01 2.960e+01 3.074e+01 3.564e+01, threshold=5.919e+01, percent-clipped=0.0 2023-12-22 08:12:54,851 INFO [train.py:886] (0/4) Epoch 16, batch 1550, loss[loss=0.01393, audio_tagging_loss=0.01393, over 24750.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4946695.73 frames. ], batch size: 99, lr: 6.86e-03, grad_scale: 32.0 2023-12-22 08:13:06,869 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:13:34,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=487133.3333333333, ans=0.125 2023-12-22 08:13:41,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=487200.0, ans=15.0 2023-12-22 08:13:47,238 INFO [train.py:886] (0/4) Epoch 16, batch 1600, loss[loss=0.0125, audio_tagging_loss=0.0125, over 24750.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4931460.34 frames. ], batch size: 99, lr: 6.85e-03, grad_scale: 32.0 2023-12-22 08:14:03,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=487333.3333333333, ans=0.1 2023-12-22 08:14:12,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=487400.0, ans=0.05 2023-12-22 08:14:32,411 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+01 2.798e+01 2.942e+01 3.066e+01 3.582e+01, threshold=5.884e+01, percent-clipped=0.0 2023-12-22 08:14:34,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.07 vs. limit=10.0 2023-12-22 08:14:38,771 INFO [train.py:886] (0/4) Epoch 16, batch 1650, loss[loss=0.01496, audio_tagging_loss=0.01496, over 24750.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4930555.07 frames. ], batch size: 99, lr: 6.85e-03, grad_scale: 32.0 2023-12-22 08:14:57,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=487666.6666666667, ans=0.1 2023-12-22 08:15:17,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=487800.0, ans=0.0 2023-12-22 08:15:17,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=487800.0, ans=0.125 2023-12-22 08:15:24,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=487866.6666666667, ans=0.1 2023-12-22 08:15:31,038 INFO [train.py:886] (0/4) Epoch 16, batch 1700, loss[loss=0.01316, audio_tagging_loss=0.01316, over 22369.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4927622.64 frames. ], batch size: 107, lr: 6.85e-03, grad_scale: 32.0 2023-12-22 08:15:42,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=488000.0, ans=0.125 2023-12-22 08:15:44,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=12.0 2023-12-22 08:15:49,887 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:15:57,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.13 vs. limit=6.0 2023-12-22 08:16:11,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=488200.0, ans=0.125 2023-12-22 08:16:13,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=488200.0, ans=0.2 2023-12-22 08:16:16,267 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+01 2.752e+01 2.880e+01 3.018e+01 3.839e+01, threshold=5.760e+01, percent-clipped=0.0 2023-12-22 08:16:20,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=488200.0, ans=0.0 2023-12-22 08:16:22,696 INFO [train.py:886] (0/4) Epoch 16, batch 1750, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4932690.29 frames. ], batch size: 100, lr: 6.85e-03, grad_scale: 32.0 2023-12-22 08:16:33,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=488333.3333333333, ans=0.035 2023-12-22 08:16:34,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=488333.3333333333, ans=0.1 2023-12-22 08:16:35,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=488333.3333333333, ans=0.125 2023-12-22 08:16:39,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=488333.3333333333, ans=0.07 2023-12-22 08:16:46,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=488400.0, ans=0.125 2023-12-22 08:16:52,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=15.0 2023-12-22 08:16:53,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=488466.6666666667, ans=0.0 2023-12-22 08:16:57,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=488466.6666666667, ans=0.0 2023-12-22 08:17:00,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=488466.6666666667, ans=0.2 2023-12-22 08:17:06,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2023-12-22 08:17:13,905 INFO [train.py:886] (0/4) Epoch 16, batch 1800, loss[loss=0.01436, audio_tagging_loss=0.01436, over 25000.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4940287.03 frames. ], batch size: 100, lr: 6.84e-03, grad_scale: 32.0 2023-12-22 08:17:26,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2023-12-22 08:17:47,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=488800.0, ans=0.0 2023-12-22 08:17:49,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=488800.0, ans=0.1 2023-12-22 08:17:57,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=488866.6666666667, ans=0.125 2023-12-22 08:17:59,526 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.777e+01 2.946e+01 3.090e+01 3.583e+01, threshold=5.892e+01, percent-clipped=0.0 2023-12-22 08:18:00,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=488866.6666666667, ans=0.1 2023-12-22 08:18:03,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=488866.6666666667, ans=0.0 2023-12-22 08:18:03,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.29 vs. limit=15.0 2023-12-22 08:18:05,222 INFO [train.py:886] (0/4) Epoch 16, batch 1850, loss[loss=0.01388, audio_tagging_loss=0.01388, over 25000.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4944847.24 frames. ], batch size: 100, lr: 6.84e-03, grad_scale: 32.0 2023-12-22 08:18:18,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=15.0 2023-12-22 08:18:20,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2023-12-22 08:18:26,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=489066.6666666667, ans=0.125 2023-12-22 08:18:41,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=489133.3333333333, ans=0.125 2023-12-22 08:18:53,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.84 vs. limit=15.0 2023-12-22 08:18:56,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=489200.0, ans=0.0 2023-12-22 08:18:57,695 INFO [train.py:886] (0/4) Epoch 16, batch 1900, loss[loss=0.01463, audio_tagging_loss=0.01463, over 24750.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4942066.49 frames. ], batch size: 99, lr: 6.84e-03, grad_scale: 32.0 2023-12-22 08:19:24,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=489400.0, ans=0.125 2023-12-22 08:19:24,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=489400.0, ans=0.0 2023-12-22 08:19:29,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=489466.6666666667, ans=0.1 2023-12-22 08:19:35,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=489466.6666666667, ans=0.0 2023-12-22 08:19:43,099 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.607e+01 2.822e+01 2.946e+01 3.143e+01 3.927e+01, threshold=5.893e+01, percent-clipped=0.0 2023-12-22 08:19:45,439 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2023-12-22 08:19:47,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=489533.3333333333, ans=0.1 2023-12-22 08:19:49,449 INFO [train.py:886] (0/4) Epoch 16, batch 1950, loss[loss=0.0108, audio_tagging_loss=0.0108, over 25000.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 4939852.82 frames. ], batch size: 100, lr: 6.84e-03, grad_scale: 32.0 2023-12-22 08:19:59,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.47 vs. limit=22.5 2023-12-22 08:20:10,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=489733.3333333333, ans=0.125 2023-12-22 08:20:23,124 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:20:34,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=489866.6666666667, ans=0.09899494936611666 2023-12-22 08:20:41,327 INFO [train.py:886] (0/4) Epoch 16, batch 2000, loss[loss=0.01287, audio_tagging_loss=0.01287, over 25000.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4941866.18 frames. ], batch size: 100, lr: 6.83e-03, grad_scale: 64.0 2023-12-22 08:20:41,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=489933.3333333333, ans=0.05 2023-12-22 08:20:42,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=489933.3333333333, ans=0.0 2023-12-22 08:21:12,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=490133.3333333333, ans=0.035 2023-12-22 08:21:19,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=490133.3333333333, ans=0.125 2023-12-22 08:21:26,213 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.757e+01 2.905e+01 3.071e+01 3.430e+01, threshold=5.811e+01, percent-clipped=0.0 2023-12-22 08:21:33,358 INFO [train.py:886] (0/4) Epoch 16, batch 2050, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4944264.79 frames. ], batch size: 100, lr: 6.83e-03, grad_scale: 64.0 2023-12-22 08:21:55,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=490400.0, ans=0.1 2023-12-22 08:22:07,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=490466.6666666667, ans=0.125 2023-12-22 08:22:09,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=490466.6666666667, ans=0.125 2023-12-22 08:22:13,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=490533.3333333333, ans=0.125 2023-12-22 08:22:15,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=490533.3333333333, ans=0.125 2023-12-22 08:22:16,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=490533.3333333333, ans=0.125 2023-12-22 08:22:23,479 INFO [train.py:886] (0/4) Epoch 16, batch 2100, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4953216.66 frames. ], batch size: 100, lr: 6.83e-03, grad_scale: 64.0 2023-12-22 08:22:52,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=490733.3333333333, ans=0.125 2023-12-22 08:22:52,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=490733.3333333333, ans=0.0 2023-12-22 08:22:52,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=490733.3333333333, ans=0.025 2023-12-22 08:22:59,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=490800.0, ans=0.125 2023-12-22 08:23:02,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=490800.0, ans=0.125 2023-12-22 08:23:09,915 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 2.753e+01 2.940e+01 3.068e+01 3.569e+01, threshold=5.880e+01, percent-clipped=0.0 2023-12-22 08:23:15,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2023-12-22 08:23:16,227 INFO [train.py:886] (0/4) Epoch 16, batch 2150, loss[loss=0.01275, audio_tagging_loss=0.01275, over 25000.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4954039.80 frames. ], batch size: 100, lr: 6.83e-03, grad_scale: 64.0 2023-12-22 08:23:21,338 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.09 vs. limit=15.0 2023-12-22 08:23:27,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=491000.0, ans=0.0 2023-12-22 08:23:35,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=491066.6666666667, ans=0.125 2023-12-22 08:23:39,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=491066.6666666667, ans=0.5 2023-12-22 08:24:07,026 INFO [train.py:886] (0/4) Epoch 16, batch 2200, loss[loss=0.0148, audio_tagging_loss=0.0148, over 24750.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4949079.00 frames. ], batch size: 99, lr: 6.83e-03, grad_scale: 64.0 2023-12-22 08:24:17,441 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=12.0 2023-12-22 08:24:34,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=491400.0, ans=0.1 2023-12-22 08:24:41,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2023-12-22 08:24:41,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=491466.6666666667, ans=0.125 2023-12-22 08:24:53,267 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+01 2.816e+01 2.901e+01 3.052e+01 3.527e+01, threshold=5.802e+01, percent-clipped=0.0 2023-12-22 08:24:58,961 INFO [train.py:886] (0/4) Epoch 16, batch 2250, loss[loss=0.01356, audio_tagging_loss=0.01356, over 25000.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4944799.63 frames. ], batch size: 100, lr: 6.82e-03, grad_scale: 64.0 2023-12-22 08:25:02,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=491600.0, ans=0.5 2023-12-22 08:25:10,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=491666.6666666667, ans=0.125 2023-12-22 08:25:27,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.00 vs. limit=22.5 2023-12-22 08:25:30,500 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2023-12-22 08:25:48,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=491866.6666666667, ans=0.2 2023-12-22 08:25:48,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491866.6666666667, ans=0.1 2023-12-22 08:25:50,136 INFO [train.py:886] (0/4) Epoch 16, batch 2300, loss[loss=0.01488, audio_tagging_loss=0.01488, over 24750.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4951709.90 frames. ], batch size: 99, lr: 6.82e-03, grad_scale: 64.0 2023-12-22 08:26:13,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=492066.6666666667, ans=0.125 2023-12-22 08:26:21,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.44 vs. limit=22.5 2023-12-22 08:26:33,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492200.0, ans=0.1 2023-12-22 08:26:35,516 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.473e+01 2.719e+01 2.858e+01 3.036e+01 3.603e+01, threshold=5.716e+01, percent-clipped=0.0 2023-12-22 08:26:36,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492200.0, ans=0.1 2023-12-22 08:26:40,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=492266.6666666667, ans=0.04949747468305833 2023-12-22 08:26:41,342 INFO [train.py:886] (0/4) Epoch 16, batch 2350, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4949561.88 frames. ], batch size: 100, lr: 6.82e-03, grad_scale: 64.0 2023-12-22 08:26:46,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=492266.6666666667, ans=0.0 2023-12-22 08:27:03,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=492400.0, ans=0.015 2023-12-22 08:27:06,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.40 vs. limit=15.0 2023-12-22 08:27:11,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=492466.6666666667, ans=0.04949747468305833 2023-12-22 08:27:22,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.59 vs. limit=22.5 2023-12-22 08:27:26,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=492533.3333333333, ans=0.125 2023-12-22 08:27:34,562 INFO [train.py:886] (0/4) Epoch 16, batch 2400, loss[loss=0.01363, audio_tagging_loss=0.01363, over 25000.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4956461.11 frames. ], batch size: 100, lr: 6.82e-03, grad_scale: 64.0 2023-12-22 08:27:36,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=492600.0, ans=0.125 2023-12-22 08:27:45,573 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.99 vs. limit=22.5 2023-12-22 08:27:54,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=492733.3333333333, ans=0.125 2023-12-22 08:28:05,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=492800.0, ans=0.125 2023-12-22 08:28:11,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=492800.0, ans=0.0 2023-12-22 08:28:15,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.63 vs. limit=22.5 2023-12-22 08:28:16,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=492866.6666666667, ans=0.0 2023-12-22 08:28:18,834 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.512e+01 2.788e+01 2.902e+01 3.058e+01 4.154e+01, threshold=5.803e+01, percent-clipped=0.0 2023-12-22 08:28:25,297 INFO [train.py:886] (0/4) Epoch 16, batch 2450, loss[loss=0.01325, audio_tagging_loss=0.01325, over 25000.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4964243.67 frames. ], batch size: 100, lr: 6.81e-03, grad_scale: 64.0 2023-12-22 08:28:46,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=493066.6666666667, ans=0.0 2023-12-22 08:28:50,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=493066.6666666667, ans=0.125 2023-12-22 08:29:14,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=493200.0, ans=0.2 2023-12-22 08:29:17,807 INFO [train.py:886] (0/4) Epoch 16, batch 2500, loss[loss=0.01503, audio_tagging_loss=0.01503, over 24750.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 4962330.91 frames. ], batch size: 99, lr: 6.81e-03, grad_scale: 64.0 2023-12-22 08:29:18,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=493266.6666666667, ans=0.09899494936611666 2023-12-22 08:29:28,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=493333.3333333333, ans=0.1 2023-12-22 08:29:37,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=493333.3333333333, ans=0.125 2023-12-22 08:29:41,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=493400.0, ans=0.125 2023-12-22 08:29:48,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=493466.6666666667, ans=0.05 2023-12-22 08:30:03,600 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.509e+01 2.809e+01 2.998e+01 3.086e+01 4.150e+01, threshold=5.995e+01, percent-clipped=0.0 2023-12-22 08:30:09,928 INFO [train.py:886] (0/4) Epoch 16, batch 2550, loss[loss=0.01472, audio_tagging_loss=0.01472, over 25000.00 frames. ], tot_loss[loss=0.01424, audio_tagging_loss=0.01424, over 4956541.44 frames. ], batch size: 100, lr: 6.81e-03, grad_scale: 64.0 2023-12-22 08:30:23,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=493666.6666666667, ans=0.2 2023-12-22 08:30:40,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=493800.0, ans=0.125 2023-12-22 08:30:53,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=493866.6666666667, ans=0.015 2023-12-22 08:31:01,087 INFO [train.py:886] (0/4) Epoch 16, batch 2600, loss[loss=0.01318, audio_tagging_loss=0.01318, over 25000.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4955178.26 frames. ], batch size: 100, lr: 6.81e-03, grad_scale: 64.0 2023-12-22 08:31:17,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=494000.0, ans=0.125 2023-12-22 08:31:23,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=494066.6666666667, ans=0.1 2023-12-22 08:31:24,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=494066.6666666667, ans=0.1 2023-12-22 08:31:46,508 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+01 2.802e+01 2.926e+01 3.072e+01 4.080e+01, threshold=5.852e+01, percent-clipped=0.0 2023-12-22 08:31:47,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=494200.0, ans=0.125 2023-12-22 08:31:51,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=494266.6666666667, ans=0.0 2023-12-22 08:31:52,131 INFO [train.py:886] (0/4) Epoch 16, batch 2650, loss[loss=0.01406, audio_tagging_loss=0.01406, over 24750.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4951740.72 frames. ], batch size: 99, lr: 6.81e-03, grad_scale: 64.0 2023-12-22 08:31:55,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2023-12-22 08:32:05,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=494333.3333333333, ans=0.125 2023-12-22 08:32:28,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=494466.6666666667, ans=0.0 2023-12-22 08:32:32,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=494533.3333333333, ans=0.0 2023-12-22 08:32:43,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=494600.0, ans=10.0 2023-12-22 08:32:44,381 INFO [train.py:886] (0/4) Epoch 16, batch 2700, loss[loss=0.01137, audio_tagging_loss=0.01137, over 25000.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4953574.28 frames. ], batch size: 100, lr: 6.80e-03, grad_scale: 64.0 2023-12-22 08:32:51,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=494600.0, ans=0.125 2023-12-22 08:32:56,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=494666.6666666667, ans=0.025 2023-12-22 08:33:05,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.70 vs. limit=10.0 2023-12-22 08:33:05,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=494733.3333333333, ans=0.0 2023-12-22 08:33:06,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=494733.3333333333, ans=0.125 2023-12-22 08:33:10,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=494733.3333333333, ans=0.0 2023-12-22 08:33:29,567 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.420e+01 2.769e+01 2.925e+01 3.079e+01 4.143e+01, threshold=5.850e+01, percent-clipped=0.0 2023-12-22 08:33:30,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=494866.6666666667, ans=0.125 2023-12-22 08:33:34,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=494866.6666666667, ans=0.125 2023-12-22 08:33:35,989 INFO [train.py:886] (0/4) Epoch 16, batch 2750, loss[loss=0.01587, audio_tagging_loss=0.01587, over 25000.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4961929.47 frames. ], batch size: 100, lr: 6.80e-03, grad_scale: 64.0 2023-12-22 08:33:42,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=494933.3333333333, ans=0.0 2023-12-22 08:33:51,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.57 vs. limit=22.5 2023-12-22 08:34:05,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=495133.3333333333, ans=0.125 2023-12-22 08:34:28,191 INFO [train.py:886] (0/4) Epoch 16, batch 2800, loss[loss=0.01469, audio_tagging_loss=0.01469, over 24750.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4962390.78 frames. ], batch size: 99, lr: 6.80e-03, grad_scale: 64.0 2023-12-22 08:34:40,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=495333.3333333333, ans=0.125 2023-12-22 08:34:51,581 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.07 vs. limit=10.0 2023-12-22 08:35:13,347 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.512e+01 2.809e+01 2.977e+01 3.124e+01 3.459e+01, threshold=5.955e+01, percent-clipped=0.0 2023-12-22 08:35:13,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=495533.3333333333, ans=0.0 2023-12-22 08:35:18,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2023-12-22 08:35:19,681 INFO [train.py:886] (0/4) Epoch 16, batch 2850, loss[loss=0.01459, audio_tagging_loss=0.01459, over 25000.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4956016.94 frames. ], batch size: 100, lr: 6.80e-03, grad_scale: 64.0 2023-12-22 08:35:39,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=495733.3333333333, ans=0.2 2023-12-22 08:36:00,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=495800.0, ans=0.04949747468305833 2023-12-22 08:36:06,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2023-12-22 08:36:11,113 INFO [train.py:886] (0/4) Epoch 16, batch 2900, loss[loss=0.0133, audio_tagging_loss=0.0133, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4954965.93 frames. ], batch size: 100, lr: 6.79e-03, grad_scale: 64.0 2023-12-22 08:36:29,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=496000.0, ans=0.125 2023-12-22 08:36:31,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=496066.6666666667, ans=0.0 2023-12-22 08:36:36,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=496066.6666666667, ans=0.125 2023-12-22 08:36:40,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=496066.6666666667, ans=0.1 2023-12-22 08:36:56,680 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.420e+01 2.734e+01 2.897e+01 3.006e+01 3.535e+01, threshold=5.795e+01, percent-clipped=0.0 2023-12-22 08:36:58,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=496200.0, ans=0.2 2023-12-22 08:37:02,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=496266.6666666667, ans=0.0 2023-12-22 08:37:03,076 INFO [train.py:886] (0/4) Epoch 16, batch 2950, loss[loss=0.01419, audio_tagging_loss=0.01419, over 25000.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4948782.89 frames. ], batch size: 100, lr: 6.79e-03, grad_scale: 64.0 2023-12-22 08:37:13,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=496333.3333333333, ans=0.2 2023-12-22 08:37:18,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.47 vs. limit=22.5 2023-12-22 08:37:35,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.62 vs. limit=10.0 2023-12-22 08:37:38,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=496466.6666666667, ans=0.125 2023-12-22 08:37:46,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=496533.3333333333, ans=0.05 2023-12-22 08:37:50,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2023-12-22 08:37:51,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=496533.3333333333, ans=0.0 2023-12-22 08:37:54,010 INFO [train.py:886] (0/4) Epoch 16, batch 3000, loss[loss=0.01336, audio_tagging_loss=0.01336, over 25000.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4949882.71 frames. ], batch size: 100, lr: 6.79e-03, grad_scale: 64.0 2023-12-22 08:37:54,012 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 08:38:14,834 INFO [train.py:917] (0/4) Epoch 16, validation: loss=0.0344, audio_tagging_loss=0.0344, over 3737520.00 frames. 2023-12-22 08:38:14,835 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 08:38:24,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=496600.0, ans=0.2 2023-12-22 08:38:36,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=496733.3333333333, ans=0.1 2023-12-22 08:39:01,433 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 2.766e+01 2.892e+01 3.038e+01 3.392e+01, threshold=5.783e+01, percent-clipped=0.0 2023-12-22 08:39:07,796 INFO [train.py:886] (0/4) Epoch 16, batch 3050, loss[loss=0.01569, audio_tagging_loss=0.01569, over 25000.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4953112.59 frames. ], batch size: 100, lr: 6.79e-03, grad_scale: 64.0 2023-12-22 08:39:10,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2023-12-22 08:39:15,005 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2023-12-22 08:39:18,826 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2023-12-22 08:39:36,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.22 vs. limit=22.5 2023-12-22 08:39:47,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=497200.0, ans=0.05 2023-12-22 08:39:53,712 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.77 vs. limit=22.5 2023-12-22 08:39:58,675 INFO [train.py:886] (0/4) Epoch 16, batch 3100, loss[loss=0.01273, audio_tagging_loss=0.01273, over 25000.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4956556.51 frames. ], batch size: 100, lr: 6.78e-03, grad_scale: 64.0 2023-12-22 08:40:16,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=22.5 2023-12-22 08:40:17,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2023-12-22 08:40:26,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=12.0 2023-12-22 08:40:45,392 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.521e+01 2.809e+01 2.944e+01 3.089e+01 3.657e+01, threshold=5.888e+01, percent-clipped=0.0 2023-12-22 08:40:51,052 INFO [train.py:886] (0/4) Epoch 16, batch 3150, loss[loss=0.01575, audio_tagging_loss=0.01575, over 24750.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4949666.45 frames. ], batch size: 99, lr: 6.78e-03, grad_scale: 64.0 2023-12-22 08:40:52,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.86 vs. limit=15.0 2023-12-22 08:40:59,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=497666.6666666667, ans=0.125 2023-12-22 08:41:00,925 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.78 vs. limit=15.0 2023-12-22 08:41:08,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=497666.6666666667, ans=0.125 2023-12-22 08:41:10,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=497733.3333333333, ans=0.0 2023-12-22 08:41:15,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2023-12-22 08:41:40,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=497866.6666666667, ans=0.125 2023-12-22 08:41:41,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=497933.3333333333, ans=0.125 2023-12-22 08:41:42,565 INFO [train.py:886] (0/4) Epoch 16, batch 3200, loss[loss=0.01576, audio_tagging_loss=0.01576, over 25000.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4947385.29 frames. ], batch size: 100, lr: 6.78e-03, grad_scale: 64.0 2023-12-22 08:41:52,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=498000.0, ans=0.125 2023-12-22 08:41:57,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=498000.0, ans=0.1 2023-12-22 08:41:59,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=498000.0, ans=0.0 2023-12-22 08:42:13,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=498133.3333333333, ans=0.125 2023-12-22 08:42:14,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.56 vs. limit=22.5 2023-12-22 08:42:24,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=498200.0, ans=0.0 2023-12-22 08:42:27,664 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.481e+01 2.733e+01 2.858e+01 3.051e+01 3.454e+01, threshold=5.717e+01, percent-clipped=0.0 2023-12-22 08:42:28,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=498200.0, ans=0.125 2023-12-22 08:42:29,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=498200.0, ans=0.2 2023-12-22 08:42:30,190 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=22.5 2023-12-22 08:42:33,350 INFO [train.py:886] (0/4) Epoch 16, batch 3250, loss[loss=0.015, audio_tagging_loss=0.015, over 22421.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4944824.09 frames. ], batch size: 107, lr: 6.78e-03, grad_scale: 64.0 2023-12-22 08:42:45,519 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=36.94 vs. limit=22.5 2023-12-22 08:42:47,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=498333.3333333333, ans=0.1 2023-12-22 08:42:49,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=12.0 2023-12-22 08:43:04,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=498466.6666666667, ans=0.0 2023-12-22 08:43:07,037 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2023-12-22 08:43:15,126 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.45 vs. limit=15.0 2023-12-22 08:43:15,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=498533.3333333333, ans=0.125 2023-12-22 08:43:19,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=498533.3333333333, ans=0.125 2023-12-22 08:43:26,555 INFO [train.py:886] (0/4) Epoch 16, batch 3300, loss[loss=0.01736, audio_tagging_loss=0.01736, over 24903.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4944277.45 frames. ], batch size: 100, lr: 6.78e-03, grad_scale: 64.0 2023-12-22 08:43:26,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=498600.0, ans=0.125 2023-12-22 08:44:11,335 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.738e+01 2.869e+01 3.079e+01 3.471e+01, threshold=5.738e+01, percent-clipped=0.0 2023-12-22 08:44:14,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=498866.6666666667, ans=0.125 2023-12-22 08:44:17,633 INFO [train.py:886] (0/4) Epoch 16, batch 3350, loss[loss=0.01351, audio_tagging_loss=0.01351, over 25000.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4948620.48 frames. ], batch size: 100, lr: 6.77e-03, grad_scale: 64.0 2023-12-22 08:44:40,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=499066.6666666667, ans=0.0 2023-12-22 08:45:06,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-12-22 08:45:08,410 INFO [train.py:886] (0/4) Epoch 16, batch 3400, loss[loss=0.01543, audio_tagging_loss=0.01543, over 25000.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4951320.52 frames. ], batch size: 100, lr: 6.77e-03, grad_scale: 64.0 2023-12-22 08:45:24,390 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.99 vs. limit=15.0 2023-12-22 08:45:38,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=15.0 2023-12-22 08:45:53,379 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.507e+01 2.800e+01 2.974e+01 3.102e+01 3.785e+01, threshold=5.949e+01, percent-clipped=0.0 2023-12-22 08:45:59,706 INFO [train.py:886] (0/4) Epoch 16, batch 3450, loss[loss=0.01652, audio_tagging_loss=0.01652, over 24750.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4948679.95 frames. ], batch size: 99, lr: 6.77e-03, grad_scale: 64.0 2023-12-22 08:46:02,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=499600.0, ans=0.05 2023-12-22 08:46:08,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=499600.0, ans=0.0 2023-12-22 08:46:10,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.33 vs. limit=10.0 2023-12-22 08:46:17,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=499666.6666666667, ans=0.125 2023-12-22 08:46:26,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=499733.3333333333, ans=0.1 2023-12-22 08:46:39,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2023-12-22 08:46:46,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=499866.6666666667, ans=0.09899494936611666 2023-12-22 08:46:51,492 INFO [train.py:886] (0/4) Epoch 16, batch 3500, loss[loss=0.0143, audio_tagging_loss=0.0143, over 24750.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4946995.13 frames. ], batch size: 99, lr: 6.77e-03, grad_scale: 64.0 2023-12-22 08:47:00,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=499933.3333333333, ans=0.1 2023-12-22 08:47:05,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=12.0 2023-12-22 08:47:07,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=500000.0, ans=0.125 2023-12-22 08:47:14,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=500066.6666666667, ans=0.2 2023-12-22 08:47:34,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=500200.0, ans=0.1 2023-12-22 08:47:38,980 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.455e+01 2.826e+01 2.952e+01 3.104e+01 3.647e+01, threshold=5.905e+01, percent-clipped=0.0 2023-12-22 08:47:45,461 INFO [train.py:886] (0/4) Epoch 16, batch 3550, loss[loss=0.01561, audio_tagging_loss=0.01561, over 25000.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4948787.24 frames. ], batch size: 100, lr: 6.76e-03, grad_scale: 64.0 2023-12-22 08:47:58,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=500333.3333333333, ans=0.2 2023-12-22 08:48:01,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=500333.3333333333, ans=0.125 2023-12-22 08:48:03,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=500333.3333333333, ans=0.125 2023-12-22 08:48:18,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=500466.6666666667, ans=0.09899494936611666 2023-12-22 08:48:36,975 INFO [train.py:886] (0/4) Epoch 16, batch 3600, loss[loss=0.01478, audio_tagging_loss=0.01478, over 25000.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4950452.61 frames. ], batch size: 100, lr: 6.76e-03, grad_scale: 64.0 2023-12-22 08:48:42,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=500600.0, ans=0.125 2023-12-22 08:48:44,870 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2023-12-22 08:48:45,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.11 vs. limit=22.5 2023-12-22 08:48:47,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=500666.6666666667, ans=0.04949747468305833 2023-12-22 08:49:22,451 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+01 2.766e+01 2.870e+01 3.088e+01 3.484e+01, threshold=5.741e+01, percent-clipped=0.0 2023-12-22 08:49:28,828 INFO [train.py:886] (0/4) Epoch 16, batch 3650, loss[loss=0.01311, audio_tagging_loss=0.01311, over 21380.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4947635.86 frames. ], batch size: 107, lr: 6.76e-03, grad_scale: 64.0 2023-12-22 08:49:33,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.62 vs. limit=22.5 2023-12-22 08:49:43,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=501000.0, ans=0.0 2023-12-22 08:49:45,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=15.0 2023-12-22 08:49:52,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=501066.6666666667, ans=0.0 2023-12-22 08:49:52,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=501066.6666666667, ans=0.2 2023-12-22 08:50:01,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=501133.3333333333, ans=0.035 2023-12-22 08:50:13,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=501200.0, ans=0.0 2023-12-22 08:50:21,099 INFO [train.py:886] (0/4) Epoch 16, batch 3700, loss[loss=0.01426, audio_tagging_loss=0.01426, over 25000.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4953580.17 frames. ], batch size: 100, lr: 6.76e-03, grad_scale: 64.0 2023-12-22 08:50:28,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=501266.6666666667, ans=0.0 2023-12-22 08:51:06,605 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.832e+01 2.951e+01 3.083e+01 3.444e+01, threshold=5.901e+01, percent-clipped=0.0 2023-12-22 08:51:13,051 INFO [train.py:886] (0/4) Epoch 16, batch 3750, loss[loss=0.01458, audio_tagging_loss=0.01458, over 24750.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4952180.14 frames. ], batch size: 99, lr: 6.76e-03, grad_scale: 64.0 2023-12-22 08:51:14,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=501600.0, ans=0.125 2023-12-22 08:51:21,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=501600.0, ans=0.0 2023-12-22 08:51:47,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=501800.0, ans=0.025 2023-12-22 08:52:04,565 INFO [train.py:886] (0/4) Epoch 16, batch 3800, loss[loss=0.009696, audio_tagging_loss=0.009696, over 24016.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4947033.49 frames. ], batch size: 100, lr: 6.75e-03, grad_scale: 64.0 2023-12-22 08:52:07,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=501933.3333333333, ans=0.1 2023-12-22 08:52:20,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=502000.0, ans=0.05 2023-12-22 08:52:23,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=502000.0, ans=0.125 2023-12-22 08:52:42,354 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.27 vs. limit=22.5 2023-12-22 08:52:46,683 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:52:46,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=502200.0, ans=0.2 2023-12-22 08:52:50,281 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.369e+01 2.815e+01 2.951e+01 3.109e+01 3.599e+01, threshold=5.902e+01, percent-clipped=0.0 2023-12-22 08:52:56,723 INFO [train.py:886] (0/4) Epoch 16, batch 3850, loss[loss=0.01011, audio_tagging_loss=0.01011, over 24057.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4947350.30 frames. ], batch size: 100, lr: 6.75e-03, grad_scale: 64.0 2023-12-22 08:52:59,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=502266.6666666667, ans=0.04949747468305833 2023-12-22 08:53:08,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=502333.3333333333, ans=0.1 2023-12-22 08:53:11,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=502333.3333333333, ans=0.2 2023-12-22 08:53:12,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=502333.3333333333, ans=0.125 2023-12-22 08:53:14,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2023-12-22 08:53:24,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=502400.0, ans=0.125 2023-12-22 08:53:32,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.72 vs. limit=22.5 2023-12-22 08:53:48,102 INFO [train.py:886] (0/4) Epoch 16, batch 3900, loss[loss=0.01215, audio_tagging_loss=0.01215, over 24019.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4947965.14 frames. ], batch size: 100, lr: 6.75e-03, grad_scale: 64.0 2023-12-22 08:53:48,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=502600.0, ans=0.125 2023-12-22 08:54:15,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2023-12-22 08:54:23,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=502800.0, ans=0.125 2023-12-22 08:54:25,454 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.29 vs. limit=15.0 2023-12-22 08:54:32,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=502866.6666666667, ans=0.0 2023-12-22 08:54:34,023 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.409e+01 2.745e+01 2.917e+01 3.077e+01 3.660e+01, threshold=5.833e+01, percent-clipped=0.0 2023-12-22 08:54:40,367 INFO [train.py:886] (0/4) Epoch 16, batch 3950, loss[loss=0.01163, audio_tagging_loss=0.01163, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4950715.53 frames. ], batch size: 100, lr: 6.75e-03, grad_scale: 64.0 2023-12-22 08:54:45,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=502933.3333333333, ans=0.0 2023-12-22 08:54:49,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=503000.0, ans=0.0 2023-12-22 08:54:58,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.78 vs. limit=5.0 2023-12-22 08:55:31,486 INFO [train.py:886] (0/4) Epoch 16, batch 4000, loss[loss=0.01545, audio_tagging_loss=0.01545, over 24750.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4952363.46 frames. ], batch size: 99, lr: 6.74e-03, grad_scale: 128.0 2023-12-22 08:55:36,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=503266.6666666667, ans=0.2 2023-12-22 08:55:40,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=503266.6666666667, ans=0.125 2023-12-22 08:55:41,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=503333.3333333333, ans=0.1 2023-12-22 08:56:07,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=503466.6666666667, ans=0.0 2023-12-22 08:56:14,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=503533.3333333333, ans=0.0 2023-12-22 08:56:18,186 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.540e+01 2.822e+01 2.939e+01 3.066e+01 3.812e+01, threshold=5.877e+01, percent-clipped=0.0 2023-12-22 08:56:18,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=503533.3333333333, ans=0.2 2023-12-22 08:56:20,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=503533.3333333333, ans=0.1 2023-12-22 08:56:22,957 INFO [train.py:886] (0/4) Epoch 16, batch 4050, loss[loss=0.01399, audio_tagging_loss=0.01399, over 24750.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4953692.44 frames. ], batch size: 99, lr: 6.74e-03, grad_scale: 64.0 2023-12-22 08:56:27,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503600.0, ans=0.1 2023-12-22 08:56:41,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=503666.6666666667, ans=0.125 2023-12-22 08:56:50,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=503733.3333333333, ans=0.125 2023-12-22 08:56:59,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=503800.0, ans=0.1 2023-12-22 08:57:04,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=503866.6666666667, ans=0.125 2023-12-22 08:57:13,577 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-12-22 08:57:15,704 INFO [train.py:886] (0/4) Epoch 16, batch 4100, loss[loss=0.01242, audio_tagging_loss=0.01242, over 24750.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4949055.84 frames. ], batch size: 99, lr: 6.74e-03, grad_scale: 64.0 2023-12-22 08:57:19,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=503933.3333333333, ans=0.125 2023-12-22 08:57:19,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=503933.3333333333, ans=0.125 2023-12-22 08:57:23,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2023-12-22 08:57:26,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=504000.0, ans=10.0 2023-12-22 08:57:29,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.93 vs. limit=6.0 2023-12-22 08:57:30,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=504000.0, ans=0.2 2023-12-22 08:57:34,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=504066.6666666667, ans=0.0 2023-12-22 08:57:40,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=504066.6666666667, ans=0.0 2023-12-22 08:57:43,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=504066.6666666667, ans=15.0 2023-12-22 08:57:52,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=504133.3333333333, ans=0.0 2023-12-22 08:57:56,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=504200.0, ans=0.125 2023-12-22 08:58:01,236 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.494e+01 2.833e+01 2.971e+01 3.147e+01 3.748e+01, threshold=5.941e+01, percent-clipped=0.0 2023-12-22 08:58:04,527 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.34 vs. limit=10.0 2023-12-22 08:58:04,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=504200.0, ans=10.0 2023-12-22 08:58:06,726 INFO [train.py:886] (0/4) Epoch 16, batch 4150, loss[loss=0.012, audio_tagging_loss=0.012, over 24750.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4945673.32 frames. ], batch size: 99, lr: 6.74e-03, grad_scale: 64.0 2023-12-22 08:58:11,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.48 vs. limit=10.0 2023-12-22 08:58:14,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=504266.6666666667, ans=0.0 2023-12-22 08:58:28,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=504400.0, ans=0.125 2023-12-22 08:58:31,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=504400.0, ans=0.0 2023-12-22 08:58:36,407 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.817e-01 2023-12-22 08:58:58,051 INFO [train.py:886] (0/4) Epoch 16, batch 4200, loss[loss=0.01329, audio_tagging_loss=0.01329, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4952974.58 frames. ], batch size: 100, lr: 6.74e-03, grad_scale: 64.0 2023-12-22 08:59:02,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=504600.0, ans=0.09899494936611666 2023-12-22 08:59:09,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=504666.6666666667, ans=0.05 2023-12-22 08:59:31,663 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.50 vs. limit=10.0 2023-12-22 08:59:43,948 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.486e+01 2.796e+01 2.928e+01 3.041e+01 3.614e+01, threshold=5.855e+01, percent-clipped=0.0 2023-12-22 08:59:45,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=504866.6666666667, ans=0.0 2023-12-22 08:59:48,744 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:59:49,380 INFO [train.py:886] (0/4) Epoch 16, batch 4250, loss[loss=0.01421, audio_tagging_loss=0.01421, over 24750.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4950812.57 frames. ], batch size: 99, lr: 6.73e-03, grad_scale: 64.0 2023-12-22 09:00:01,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=505000.0, ans=0.2 2023-12-22 09:00:08,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=505000.0, ans=0.1 2023-12-22 09:00:22,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2023-12-22 09:00:38,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=505200.0, ans=0.07 2023-12-22 09:00:40,928 INFO [train.py:886] (0/4) Epoch 16, batch 4300, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4950880.94 frames. ], batch size: 100, lr: 6.73e-03, grad_scale: 64.0 2023-12-22 09:01:06,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=505400.0, ans=0.0 2023-12-22 09:01:08,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=505400.0, ans=0.2 2023-12-22 09:01:11,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=505466.6666666667, ans=0.0 2023-12-22 09:01:12,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=505466.6666666667, ans=0.0 2023-12-22 09:01:16,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=505466.6666666667, ans=0.125 2023-12-22 09:01:18,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=505466.6666666667, ans=0.0 2023-12-22 09:01:18,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=505466.6666666667, ans=0.125 2023-12-22 09:01:27,805 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.412e+01 2.822e+01 2.946e+01 3.100e+01 3.669e+01, threshold=5.892e+01, percent-clipped=0.0 2023-12-22 09:01:29,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=505533.3333333333, ans=0.125 2023-12-22 09:01:33,243 INFO [train.py:886] (0/4) Epoch 16, batch 4350, loss[loss=0.01527, audio_tagging_loss=0.01527, over 25000.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4955629.77 frames. ], batch size: 100, lr: 6.73e-03, grad_scale: 64.0 2023-12-22 09:01:47,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505666.6666666667, ans=0.1 2023-12-22 09:01:47,731 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:02:23,184 INFO [train.py:886] (0/4) Epoch 16, batch 4400, loss[loss=0.01654, audio_tagging_loss=0.01654, over 24750.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4950880.54 frames. ], batch size: 99, lr: 6.73e-03, grad_scale: 64.0 2023-12-22 09:02:25,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=505933.3333333333, ans=0.2 2023-12-22 09:02:26,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2023-12-22 09:02:28,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=505933.3333333333, ans=0.125 2023-12-22 09:02:30,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2023-12-22 09:02:38,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=506000.0, ans=0.2 2023-12-22 09:02:43,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=506066.6666666667, ans=0.125 2023-12-22 09:02:46,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2023-12-22 09:02:47,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=506066.6666666667, ans=0.1 2023-12-22 09:02:56,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=506133.3333333333, ans=0.0 2023-12-22 09:03:08,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=506200.0, ans=0.125 2023-12-22 09:03:10,654 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.444e+01 2.860e+01 2.974e+01 3.134e+01 3.649e+01, threshold=5.949e+01, percent-clipped=0.0 2023-12-22 09:03:15,412 INFO [train.py:886] (0/4) Epoch 16, batch 4450, loss[loss=0.0135, audio_tagging_loss=0.0135, over 25000.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4945087.28 frames. ], batch size: 100, lr: 6.72e-03, grad_scale: 64.0 2023-12-22 09:03:15,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=506266.6666666667, ans=15.0 2023-12-22 09:03:21,770 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.65 vs. limit=10.0 2023-12-22 09:03:26,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=506333.3333333333, ans=0.125 2023-12-22 09:04:00,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=506533.3333333333, ans=0.125 2023-12-22 09:04:06,914 INFO [train.py:886] (0/4) Epoch 16, batch 4500, loss[loss=0.01664, audio_tagging_loss=0.01664, over 25000.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4949015.64 frames. ], batch size: 100, lr: 6.72e-03, grad_scale: 64.0 2023-12-22 09:04:10,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=506600.0, ans=0.1 2023-12-22 09:04:14,351 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2023-12-22 09:04:17,202 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-76000.pt 2023-12-22 09:04:55,934 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.520e+01 2.756e+01 2.905e+01 3.063e+01 3.487e+01, threshold=5.810e+01, percent-clipped=0.0 2023-12-22 09:04:56,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=506866.6666666667, ans=0.0 2023-12-22 09:04:57,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=506866.6666666667, ans=0.125 2023-12-22 09:04:58,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=506866.6666666667, ans=0.125 2023-12-22 09:05:00,644 INFO [train.py:886] (0/4) Epoch 16, batch 4550, loss[loss=0.01483, audio_tagging_loss=0.01483, over 24750.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4953307.19 frames. ], batch size: 99, lr: 6.72e-03, grad_scale: 64.0 2023-12-22 09:05:00,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=506933.3333333333, ans=0.2 2023-12-22 09:05:17,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=507000.0, ans=10.0 2023-12-22 09:05:22,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.36 vs. limit=22.5 2023-12-22 09:05:40,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=507133.3333333333, ans=0.0 2023-12-22 09:05:48,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=507200.0, ans=0.0 2023-12-22 09:05:51,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2023-12-22 09:05:53,164 INFO [train.py:886] (0/4) Epoch 16, batch 4600, loss[loss=0.01579, audio_tagging_loss=0.01579, over 25000.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4958298.87 frames. ], batch size: 100, lr: 6.72e-03, grad_scale: 64.0 2023-12-22 09:06:13,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507400.0, ans=0.1 2023-12-22 09:06:24,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=507466.6666666667, ans=0.0 2023-12-22 09:06:39,328 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.493e+01 2.771e+01 2.902e+01 3.112e+01 3.389e+01, threshold=5.805e+01, percent-clipped=0.0 2023-12-22 09:06:41,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=507533.3333333333, ans=0.125 2023-12-22 09:06:44,717 INFO [train.py:886] (0/4) Epoch 16, batch 4650, loss[loss=0.0145, audio_tagging_loss=0.0145, over 24750.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4961362.06 frames. ], batch size: 99, lr: 6.72e-03, grad_scale: 64.0 2023-12-22 09:06:51,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=507600.0, ans=0.125 2023-12-22 09:06:53,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=507600.0, ans=0.0 2023-12-22 09:07:01,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=507666.6666666667, ans=0.125 2023-12-22 09:07:24,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.79 vs. limit=10.0 2023-12-22 09:07:30,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.59 vs. limit=15.0 2023-12-22 09:07:31,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=507866.6666666667, ans=0.125 2023-12-22 09:07:36,002 INFO [train.py:886] (0/4) Epoch 16, batch 4700, loss[loss=0.01483, audio_tagging_loss=0.01483, over 24750.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 4962239.75 frames. ], batch size: 99, lr: 6.71e-03, grad_scale: 64.0 2023-12-22 09:07:54,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=508066.6666666667, ans=0.0 2023-12-22 09:07:59,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=508066.6666666667, ans=0.125 2023-12-22 09:08:10,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=508133.3333333333, ans=0.0 2023-12-22 09:08:18,087 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 2.851e+01 3.014e+01 3.171e+01 4.011e+01, threshold=6.029e+01, percent-clipped=0.0 2023-12-22 09:08:19,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.99 vs. limit=10.0 2023-12-22 09:08:23,112 INFO [train.py:886] (0/4) Epoch 16, batch 4750, loss[loss=0.01583, audio_tagging_loss=0.01583, over 25000.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4955333.18 frames. ], batch size: 100, lr: 6.71e-03, grad_scale: 64.0 2023-12-22 09:08:35,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=508333.3333333333, ans=0.125 2023-12-22 09:08:38,825 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-16.pt 2023-12-22 09:08:59,116 INFO [train.py:886] (0/4) Epoch 17, batch 0, loss[loss=0.0321, audio_tagging_loss=0.0321, over 24004.00 frames. ], tot_loss[loss=0.0321, audio_tagging_loss=0.0321, over 24004.00 frames. ], batch size: 100, lr: 6.51e-03, grad_scale: 64.0 2023-12-22 09:08:59,117 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 09:09:08,293 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.2170, 3.4618, 3.4929, 3.4797], device='cuda:0') 2023-12-22 09:09:20,288 INFO [train.py:917] (0/4) Epoch 17, validation: loss=0.03195, audio_tagging_loss=0.03195, over 3737520.00 frames. 2023-12-22 09:09:20,288 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 09:09:28,716 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.182e-03 2023-12-22 09:09:39,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=508506.6666666667, ans=0.125 2023-12-22 09:09:40,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=508506.6666666667, ans=0.0 2023-12-22 09:09:40,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=508506.6666666667, ans=0.1 2023-12-22 09:09:40,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=508506.6666666667, ans=0.0 2023-12-22 09:09:58,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=508573.3333333333, ans=0.0 2023-12-22 09:09:59,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=508573.3333333333, ans=0.0 2023-12-22 09:10:10,893 INFO [train.py:886] (0/4) Epoch 17, batch 50, loss[loss=0.01816, audio_tagging_loss=0.01816, over 25000.00 frames. ], tot_loss[loss=0.02241, audio_tagging_loss=0.02241, over 1114726.63 frames. ], batch size: 100, lr: 6.51e-03, grad_scale: 64.0 2023-12-22 09:10:24,274 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2023-12-22 09:10:29,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=508773.3333333333, ans=22.5 2023-12-22 09:10:32,258 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:10:35,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=508840.0, ans=0.125 2023-12-22 09:10:40,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=508840.0, ans=0.025 2023-12-22 09:10:41,307 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.691e+01 3.315e+01 3.576e+01 4.102e+01 9.303e+01, threshold=7.152e+01, percent-clipped=8.0 2023-12-22 09:10:58,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=508973.3333333333, ans=0.1 2023-12-22 09:11:03,183 INFO [train.py:886] (0/4) Epoch 17, batch 100, loss[loss=0.01677, audio_tagging_loss=0.01677, over 25000.00 frames. ], tot_loss[loss=0.01906, audio_tagging_loss=0.01906, over 1966456.25 frames. ], batch size: 100, lr: 6.50e-03, grad_scale: 64.0 2023-12-22 09:11:23,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2023-12-22 09:11:28,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2023-12-22 09:11:54,541 INFO [train.py:886] (0/4) Epoch 17, batch 150, loss[loss=0.01394, audio_tagging_loss=0.01394, over 25000.00 frames. ], tot_loss[loss=0.01726, audio_tagging_loss=0.01726, over 2634537.03 frames. ], batch size: 100, lr: 6.50e-03, grad_scale: 64.0 2023-12-22 09:11:56,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=509373.3333333333, ans=0.125 2023-12-22 09:12:02,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.28 vs. limit=15.0 2023-12-22 09:12:24,686 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.663e+01 2.920e+01 3.033e+01 3.237e+01 3.873e+01, threshold=6.065e+01, percent-clipped=0.0 2023-12-22 09:12:30,072 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-12-22 09:12:37,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=509640.0, ans=0.0 2023-12-22 09:12:42,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=509640.0, ans=0.1 2023-12-22 09:12:42,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=509640.0, ans=0.1 2023-12-22 09:12:46,837 INFO [train.py:886] (0/4) Epoch 17, batch 200, loss[loss=0.01686, audio_tagging_loss=0.01686, over 25000.00 frames. ], tot_loss[loss=0.01639, audio_tagging_loss=0.01639, over 3150316.84 frames. ], batch size: 100, lr: 6.50e-03, grad_scale: 64.0 2023-12-22 09:12:49,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=509706.6666666667, ans=0.0 2023-12-22 09:12:53,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=509706.6666666667, ans=0.125 2023-12-22 09:13:00,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=509773.3333333333, ans=0.1 2023-12-22 09:13:04,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=509773.3333333333, ans=0.125 2023-12-22 09:13:10,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=509840.0, ans=0.125 2023-12-22 09:13:27,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=509973.3333333333, ans=0.125 2023-12-22 09:13:39,426 INFO [train.py:886] (0/4) Epoch 17, batch 250, loss[loss=0.0126, audio_tagging_loss=0.0126, over 25000.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 3551727.31 frames. ], batch size: 100, lr: 6.50e-03, grad_scale: 64.0 2023-12-22 09:13:40,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=510040.0, ans=0.125 2023-12-22 09:14:08,572 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.464e+01 2.772e+01 2.917e+01 3.041e+01 3.552e+01, threshold=5.833e+01, percent-clipped=0.0 2023-12-22 09:14:10,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=510240.0, ans=0.05 2023-12-22 09:14:25,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=510306.6666666667, ans=0.0 2023-12-22 09:14:26,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=510306.6666666667, ans=0.1 2023-12-22 09:14:30,814 INFO [train.py:886] (0/4) Epoch 17, batch 300, loss[loss=0.01588, audio_tagging_loss=0.01588, over 25000.00 frames. ], tot_loss[loss=0.01544, audio_tagging_loss=0.01544, over 3862468.40 frames. ], batch size: 100, lr: 6.50e-03, grad_scale: 64.0 2023-12-22 09:14:42,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=510440.0, ans=0.125 2023-12-22 09:14:43,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=510440.0, ans=0.5 2023-12-22 09:14:51,011 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=12.0 2023-12-22 09:14:55,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-12-22 09:15:23,906 INFO [train.py:886] (0/4) Epoch 17, batch 350, loss[loss=0.01405, audio_tagging_loss=0.01405, over 24750.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 4100106.66 frames. ], batch size: 99, lr: 6.49e-03, grad_scale: 64.0 2023-12-22 09:15:38,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=510773.3333333333, ans=0.125 2023-12-22 09:15:40,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=15.0 2023-12-22 09:15:52,082 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2023-12-22 09:15:54,267 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.457e+01 2.827e+01 3.010e+01 3.113e+01 3.707e+01, threshold=6.020e+01, percent-clipped=0.0 2023-12-22 09:16:09,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=510973.3333333333, ans=0.125 2023-12-22 09:16:15,124 INFO [train.py:886] (0/4) Epoch 17, batch 400, loss[loss=0.01429, audio_tagging_loss=0.01429, over 25000.00 frames. ], tot_loss[loss=0.01483, audio_tagging_loss=0.01483, over 4285535.83 frames. ], batch size: 100, lr: 6.49e-03, grad_scale: 64.0 2023-12-22 09:16:17,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=511040.0, ans=0.125 2023-12-22 09:16:25,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=511106.6666666667, ans=0.0 2023-12-22 09:16:32,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=511106.6666666667, ans=22.5 2023-12-22 09:16:35,825 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.37 vs. limit=22.5 2023-12-22 09:16:50,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=511240.0, ans=0.125 2023-12-22 09:17:04,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=511306.6666666667, ans=0.2 2023-12-22 09:17:07,483 INFO [train.py:886] (0/4) Epoch 17, batch 450, loss[loss=0.01392, audio_tagging_loss=0.01392, over 25000.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4429710.16 frames. ], batch size: 100, lr: 6.49e-03, grad_scale: 64.0 2023-12-22 09:17:09,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=511373.3333333333, ans=0.0 2023-12-22 09:17:20,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=511440.0, ans=0.125 2023-12-22 09:17:31,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.82 vs. limit=12.0 2023-12-22 09:17:38,009 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.481e+01 2.743e+01 2.910e+01 3.047e+01 3.599e+01, threshold=5.820e+01, percent-clipped=0.0 2023-12-22 09:17:42,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.80 vs. limit=15.0 2023-12-22 09:18:00,501 INFO [train.py:886] (0/4) Epoch 17, batch 500, loss[loss=0.01466, audio_tagging_loss=0.01466, over 24750.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4542260.16 frames. ], batch size: 99, lr: 6.49e-03, grad_scale: 64.0 2023-12-22 09:18:10,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=511773.3333333333, ans=0.0 2023-12-22 09:18:11,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=511773.3333333333, ans=0.125 2023-12-22 09:18:12,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=511773.3333333333, ans=0.125 2023-12-22 09:18:24,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=511840.0, ans=0.07 2023-12-22 09:18:26,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=511840.0, ans=0.125 2023-12-22 09:18:28,530 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.89 vs. limit=10.0 2023-12-22 09:18:51,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=512040.0, ans=0.125 2023-12-22 09:18:52,183 INFO [train.py:886] (0/4) Epoch 17, batch 550, loss[loss=0.01308, audio_tagging_loss=0.01308, over 25000.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4639160.07 frames. ], batch size: 100, lr: 6.49e-03, grad_scale: 64.0 2023-12-22 09:19:02,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=512106.6666666667, ans=0.125 2023-12-22 09:19:08,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=512106.6666666667, ans=0.125 2023-12-22 09:19:13,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=512173.3333333333, ans=10.0 2023-12-22 09:19:15,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=512173.3333333333, ans=0.125 2023-12-22 09:19:22,635 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.486e+01 2.788e+01 2.954e+01 3.144e+01 3.570e+01, threshold=5.909e+01, percent-clipped=0.0 2023-12-22 09:19:23,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=512240.0, ans=0.2 2023-12-22 09:19:36,687 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:19:44,637 INFO [train.py:886] (0/4) Epoch 17, batch 600, loss[loss=0.01208, audio_tagging_loss=0.01208, over 24750.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4712083.94 frames. ], batch size: 99, lr: 6.48e-03, grad_scale: 64.0 2023-12-22 09:19:55,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=25.54 vs. limit=15.0 2023-12-22 09:20:03,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=512440.0, ans=0.0 2023-12-22 09:20:03,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=15.0 2023-12-22 09:20:04,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=512506.6666666667, ans=0.125 2023-12-22 09:20:07,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=512506.6666666667, ans=0.0 2023-12-22 09:20:17,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=512573.3333333333, ans=0.125 2023-12-22 09:20:20,806 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.083e+00 2023-12-22 09:20:36,500 INFO [train.py:886] (0/4) Epoch 17, batch 650, loss[loss=0.01136, audio_tagging_loss=0.01136, over 24750.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4761452.60 frames. ], batch size: 99, lr: 6.48e-03, grad_scale: 64.0 2023-12-22 09:20:43,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=512706.6666666667, ans=0.0 2023-12-22 09:21:04,866 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.497e+01 2.843e+01 2.927e+01 3.090e+01 3.715e+01, threshold=5.854e+01, percent-clipped=0.0 2023-12-22 09:21:14,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=512906.6666666667, ans=0.125 2023-12-22 09:21:27,344 INFO [train.py:886] (0/4) Epoch 17, batch 700, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01438, audio_tagging_loss=0.01438, over 4801633.19 frames. ], batch size: 100, lr: 6.48e-03, grad_scale: 64.0 2023-12-22 09:21:29,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513040.0, ans=0.1 2023-12-22 09:21:35,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=513040.0, ans=0.125 2023-12-22 09:21:46,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=513106.6666666667, ans=0.0 2023-12-22 09:21:59,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=513240.0, ans=0.125 2023-12-22 09:22:14,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=513306.6666666667, ans=0.0 2023-12-22 09:22:19,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=513373.3333333333, ans=0.125 2023-12-22 09:22:19,886 INFO [train.py:886] (0/4) Epoch 17, batch 750, loss[loss=0.01208, audio_tagging_loss=0.01208, over 25000.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4826715.94 frames. ], batch size: 100, lr: 6.48e-03, grad_scale: 64.0 2023-12-22 09:22:24,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=513373.3333333333, ans=0.0 2023-12-22 09:22:29,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-12-22 09:22:47,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=513506.6666666667, ans=0.0 2023-12-22 09:22:50,394 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.463e+01 2.839e+01 2.963e+01 3.097e+01 3.616e+01, threshold=5.927e+01, percent-clipped=0.0 2023-12-22 09:22:52,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=513573.3333333333, ans=15.0 2023-12-22 09:22:58,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.12 vs. limit=10.0 2023-12-22 09:23:10,507 INFO [train.py:886] (0/4) Epoch 17, batch 800, loss[loss=0.01366, audio_tagging_loss=0.01366, over 25000.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4857793.51 frames. ], batch size: 100, lr: 6.47e-03, grad_scale: 64.0 2023-12-22 09:23:29,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=513773.3333333333, ans=0.0 2023-12-22 09:23:54,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=513973.3333333333, ans=0.125 2023-12-22 09:23:58,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=513973.3333333333, ans=0.1 2023-12-22 09:24:03,865 INFO [train.py:886] (0/4) Epoch 17, batch 850, loss[loss=0.01291, audio_tagging_loss=0.01291, over 25000.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4880739.37 frames. ], batch size: 100, lr: 6.47e-03, grad_scale: 64.0 2023-12-22 09:24:11,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2023-12-22 09:24:12,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=514106.6666666667, ans=0.0 2023-12-22 09:24:34,346 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.468e+01 2.760e+01 2.887e+01 3.062e+01 3.648e+01, threshold=5.774e+01, percent-clipped=0.0 2023-12-22 09:24:41,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=514240.0, ans=0.125 2023-12-22 09:24:42,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=514240.0, ans=0.09899494936611666 2023-12-22 09:24:48,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=514306.6666666667, ans=0.1 2023-12-22 09:24:55,667 INFO [train.py:886] (0/4) Epoch 17, batch 900, loss[loss=0.01278, audio_tagging_loss=0.01278, over 24750.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4894819.04 frames. ], batch size: 99, lr: 6.47e-03, grad_scale: 64.0 2023-12-22 09:24:55,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=514373.3333333333, ans=0.0 2023-12-22 09:25:07,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=514440.0, ans=0.125 2023-12-22 09:25:14,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=514440.0, ans=0.125 2023-12-22 09:25:19,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=514506.6666666667, ans=0.1 2023-12-22 09:25:22,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=514506.6666666667, ans=0.0 2023-12-22 09:25:39,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=514640.0, ans=0.125 2023-12-22 09:25:45,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=514640.0, ans=12.0 2023-12-22 09:25:47,435 INFO [train.py:886] (0/4) Epoch 17, batch 950, loss[loss=0.01541, audio_tagging_loss=0.01541, over 24750.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4904338.93 frames. ], batch size: 99, lr: 6.47e-03, grad_scale: 64.0 2023-12-22 09:25:55,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=514706.6666666667, ans=0.2 2023-12-22 09:26:18,084 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.614e+01 2.797e+01 2.941e+01 3.077e+01 3.617e+01, threshold=5.883e+01, percent-clipped=0.0 2023-12-22 09:26:24,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=514906.6666666667, ans=0.2 2023-12-22 09:26:32,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=514973.3333333333, ans=0.125 2023-12-22 09:26:41,021 INFO [train.py:886] (0/4) Epoch 17, batch 1000, loss[loss=0.01456, audio_tagging_loss=0.01456, over 24750.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4909996.33 frames. ], batch size: 99, lr: 6.47e-03, grad_scale: 64.0 2023-12-22 09:26:47,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=515040.0, ans=0.125 2023-12-22 09:26:51,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=515106.6666666667, ans=0.125 2023-12-22 09:27:03,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=515173.3333333333, ans=0.2 2023-12-22 09:27:07,108 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.28 vs. limit=22.5 2023-12-22 09:27:09,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=515173.3333333333, ans=0.125 2023-12-22 09:27:09,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2023-12-22 09:27:31,108 INFO [train.py:886] (0/4) Epoch 17, batch 1050, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4915110.10 frames. ], batch size: 99, lr: 6.46e-03, grad_scale: 64.0 2023-12-22 09:27:36,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=515373.3333333333, ans=0.07 2023-12-22 09:27:38,752 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.21 vs. limit=15.0 2023-12-22 09:27:41,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=515440.0, ans=0.125 2023-12-22 09:28:01,249 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.471e+01 2.759e+01 2.948e+01 3.113e+01 4.038e+01, threshold=5.895e+01, percent-clipped=0.0 2023-12-22 09:28:03,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=515573.3333333333, ans=0.125 2023-12-22 09:28:04,712 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2023-12-22 09:28:06,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=515573.3333333333, ans=0.07 2023-12-22 09:28:07,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.57 vs. limit=15.0 2023-12-22 09:28:24,312 INFO [train.py:886] (0/4) Epoch 17, batch 1100, loss[loss=0.01336, audio_tagging_loss=0.01336, over 25000.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4922149.41 frames. ], batch size: 100, lr: 6.46e-03, grad_scale: 64.0 2023-12-22 09:28:27,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=515706.6666666667, ans=0.0 2023-12-22 09:28:35,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=515773.3333333333, ans=0.125 2023-12-22 09:28:57,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=515906.6666666667, ans=0.125 2023-12-22 09:28:59,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=515906.6666666667, ans=0.1 2023-12-22 09:29:00,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=515906.6666666667, ans=0.0 2023-12-22 09:29:14,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=515973.3333333333, ans=0.0 2023-12-22 09:29:17,771 INFO [train.py:886] (0/4) Epoch 17, batch 1150, loss[loss=0.01227, audio_tagging_loss=0.01227, over 25000.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4924888.07 frames. ], batch size: 100, lr: 6.46e-03, grad_scale: 64.0 2023-12-22 09:29:18,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516040.0, ans=0.1 2023-12-22 09:29:25,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=516040.0, ans=0.125 2023-12-22 09:29:43,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=516173.3333333333, ans=0.125 2023-12-22 09:29:47,300 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 2.790e+01 2.920e+01 3.045e+01 3.393e+01, threshold=5.839e+01, percent-clipped=0.0 2023-12-22 09:29:47,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=516240.0, ans=0.125 2023-12-22 09:30:08,762 INFO [train.py:886] (0/4) Epoch 17, batch 1200, loss[loss=0.01611, audio_tagging_loss=0.01611, over 25000.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4932762.64 frames. ], batch size: 100, lr: 6.46e-03, grad_scale: 64.0 2023-12-22 09:30:10,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=516373.3333333333, ans=0.2 2023-12-22 09:30:14,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=516373.3333333333, ans=15.0 2023-12-22 09:30:27,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=516440.0, ans=10.0 2023-12-22 09:30:31,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2023-12-22 09:30:40,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=516573.3333333333, ans=0.125 2023-12-22 09:30:50,858 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=15.0 2023-12-22 09:30:55,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=516640.0, ans=0.07 2023-12-22 09:31:01,203 INFO [train.py:886] (0/4) Epoch 17, batch 1250, loss[loss=0.0161, audio_tagging_loss=0.0161, over 25000.00 frames. ], tot_loss[loss=0.01422, audio_tagging_loss=0.01422, over 4935085.43 frames. ], batch size: 100, lr: 6.46e-03, grad_scale: 64.0 2023-12-22 09:31:31,602 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+01 2.867e+01 2.981e+01 3.113e+01 3.868e+01, threshold=5.962e+01, percent-clipped=0.0 2023-12-22 09:31:41,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=516906.6666666667, ans=0.125 2023-12-22 09:31:44,148 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:31:47,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2023-12-22 09:31:53,080 INFO [train.py:886] (0/4) Epoch 17, batch 1300, loss[loss=0.01502, audio_tagging_loss=0.01502, over 25000.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4936246.32 frames. ], batch size: 100, lr: 6.45e-03, grad_scale: 128.0 2023-12-22 09:31:59,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2023-12-22 09:32:04,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.24 vs. limit=15.0 2023-12-22 09:32:06,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.02 vs. limit=10.0 2023-12-22 09:32:45,488 INFO [train.py:886] (0/4) Epoch 17, batch 1350, loss[loss=0.01365, audio_tagging_loss=0.01365, over 25000.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4933507.91 frames. ], batch size: 100, lr: 6.45e-03, grad_scale: 64.0 2023-12-22 09:32:48,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=517373.3333333333, ans=0.2 2023-12-22 09:32:52,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=517373.3333333333, ans=0.125 2023-12-22 09:32:54,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=517440.0, ans=0.1 2023-12-22 09:33:16,146 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.365e+01 2.812e+01 2.965e+01 3.178e+01 3.888e+01, threshold=5.930e+01, percent-clipped=0.0 2023-12-22 09:33:27,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=517640.0, ans=0.0 2023-12-22 09:33:30,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=517640.0, ans=0.125 2023-12-22 09:33:35,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.17 vs. limit=22.5 2023-12-22 09:33:38,010 INFO [train.py:886] (0/4) Epoch 17, batch 1400, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4939568.86 frames. ], batch size: 100, lr: 6.45e-03, grad_scale: 64.0 2023-12-22 09:33:43,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=517706.6666666667, ans=0.125 2023-12-22 09:33:44,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=517706.6666666667, ans=0.2 2023-12-22 09:33:46,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=517773.3333333333, ans=0.125 2023-12-22 09:34:19,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=517973.3333333333, ans=0.025 2023-12-22 09:34:24,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517973.3333333333, ans=0.1 2023-12-22 09:34:26,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=517973.3333333333, ans=0.0 2023-12-22 09:34:29,381 INFO [train.py:886] (0/4) Epoch 17, batch 1450, loss[loss=0.01634, audio_tagging_loss=0.01634, over 25000.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4943918.49 frames. ], batch size: 100, lr: 6.45e-03, grad_scale: 64.0 2023-12-22 09:35:00,422 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.798e+01 2.926e+01 3.112e+01 3.814e+01, threshold=5.853e+01, percent-clipped=0.0 2023-12-22 09:35:20,875 INFO [train.py:886] (0/4) Epoch 17, batch 1500, loss[loss=0.0141, audio_tagging_loss=0.0141, over 25000.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4941820.07 frames. ], batch size: 100, lr: 6.45e-03, grad_scale: 64.0 2023-12-22 09:35:23,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=518373.3333333333, ans=0.1 2023-12-22 09:35:25,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=518373.3333333333, ans=0.1 2023-12-22 09:35:39,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.61 vs. limit=22.5 2023-12-22 09:35:48,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=518506.6666666667, ans=0.0 2023-12-22 09:35:55,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=518573.3333333333, ans=0.125 2023-12-22 09:36:02,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=518640.0, ans=0.125 2023-12-22 09:36:02,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2023-12-22 09:36:12,823 INFO [train.py:886] (0/4) Epoch 17, batch 1550, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4945822.57 frames. ], batch size: 100, lr: 6.44e-03, grad_scale: 64.0 2023-12-22 09:36:34,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=518840.0, ans=0.2 2023-12-22 09:36:38,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=518840.0, ans=0.125 2023-12-22 09:36:44,194 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.447e+01 2.815e+01 2.986e+01 3.108e+01 3.824e+01, threshold=5.972e+01, percent-clipped=0.0 2023-12-22 09:36:56,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=518973.3333333333, ans=0.0 2023-12-22 09:36:57,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=518973.3333333333, ans=0.1 2023-12-22 09:37:00,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=518973.3333333333, ans=0.0 2023-12-22 09:37:03,979 INFO [train.py:886] (0/4) Epoch 17, batch 1600, loss[loss=0.01352, audio_tagging_loss=0.01352, over 25000.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4943743.45 frames. ], batch size: 100, lr: 6.44e-03, grad_scale: 64.0 2023-12-22 09:37:23,585 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.35 vs. limit=10.0 2023-12-22 09:37:37,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=519240.0, ans=0.125 2023-12-22 09:37:37,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=519240.0, ans=0.2 2023-12-22 09:37:39,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=519240.0, ans=0.0 2023-12-22 09:37:42,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=519240.0, ans=0.1 2023-12-22 09:37:56,806 INFO [train.py:886] (0/4) Epoch 17, batch 1650, loss[loss=0.01432, audio_tagging_loss=0.01432, over 22060.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4942955.92 frames. ], batch size: 107, lr: 6.44e-03, grad_scale: 64.0 2023-12-22 09:38:01,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=519373.3333333333, ans=0.0 2023-12-22 09:38:07,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=519440.0, ans=0.125 2023-12-22 09:38:08,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=519440.0, ans=0.0 2023-12-22 09:38:10,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=519440.0, ans=0.0 2023-12-22 09:38:18,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=519506.6666666667, ans=0.0 2023-12-22 09:38:25,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=519506.6666666667, ans=0.0 2023-12-22 09:38:28,292 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+01 2.852e+01 2.976e+01 3.149e+01 3.480e+01, threshold=5.952e+01, percent-clipped=0.0 2023-12-22 09:38:43,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=519640.0, ans=0.2 2023-12-22 09:38:48,467 INFO [train.py:886] (0/4) Epoch 17, batch 1700, loss[loss=0.01211, audio_tagging_loss=0.01211, over 24750.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4944719.37 frames. ], batch size: 99, lr: 6.44e-03, grad_scale: 64.0 2023-12-22 09:38:51,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=519706.6666666667, ans=0.1 2023-12-22 09:39:05,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=519773.3333333333, ans=0.025 2023-12-22 09:39:13,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=519840.0, ans=0.125 2023-12-22 09:39:27,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=519906.6666666667, ans=0.1 2023-12-22 09:39:40,250 INFO [train.py:886] (0/4) Epoch 17, batch 1750, loss[loss=0.01248, audio_tagging_loss=0.01248, over 25000.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4951787.68 frames. ], batch size: 100, lr: 6.44e-03, grad_scale: 64.0 2023-12-22 09:39:46,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=520040.0, ans=0.0 2023-12-22 09:39:52,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=520106.6666666667, ans=0.125 2023-12-22 09:39:53,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=520106.6666666667, ans=0.0 2023-12-22 09:39:56,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=520106.6666666667, ans=0.0 2023-12-22 09:39:57,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.51 vs. limit=22.5 2023-12-22 09:40:11,185 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+01 2.793e+01 2.974e+01 3.087e+01 3.674e+01, threshold=5.948e+01, percent-clipped=0.0 2023-12-22 09:40:33,488 INFO [train.py:886] (0/4) Epoch 17, batch 1800, loss[loss=0.01545, audio_tagging_loss=0.01545, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4953647.47 frames. ], batch size: 100, lr: 6.43e-03, grad_scale: 64.0 2023-12-22 09:40:34,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=520373.3333333333, ans=0.0 2023-12-22 09:40:34,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=520373.3333333333, ans=0.1 2023-12-22 09:40:40,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=520373.3333333333, ans=0.2 2023-12-22 09:40:49,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=520440.0, ans=0.125 2023-12-22 09:40:58,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=520506.6666666667, ans=15.0 2023-12-22 09:41:06,628 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:41:09,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=520573.3333333333, ans=0.1 2023-12-22 09:41:22,124 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.83 vs. limit=22.5 2023-12-22 09:41:23,651 INFO [train.py:886] (0/4) Epoch 17, batch 1850, loss[loss=0.01542, audio_tagging_loss=0.01542, over 24958.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4956007.76 frames. ], batch size: 100, lr: 6.43e-03, grad_scale: 64.0 2023-12-22 09:41:24,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=520706.6666666667, ans=0.07 2023-12-22 09:41:31,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=520706.6666666667, ans=0.125 2023-12-22 09:41:38,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=520773.3333333333, ans=0.0 2023-12-22 09:41:42,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=520773.3333333333, ans=0.0 2023-12-22 09:41:42,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=520773.3333333333, ans=0.125 2023-12-22 09:41:54,302 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.549e+01 2.846e+01 3.000e+01 3.166e+01 3.525e+01, threshold=6.000e+01, percent-clipped=0.0 2023-12-22 09:42:15,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.99 vs. limit=22.5 2023-12-22 09:42:15,611 INFO [train.py:886] (0/4) Epoch 17, batch 1900, loss[loss=0.01623, audio_tagging_loss=0.01623, over 24750.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4947267.89 frames. ], batch size: 99, lr: 6.43e-03, grad_scale: 64.0 2023-12-22 09:43:00,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=521306.6666666667, ans=0.125 2023-12-22 09:43:06,666 INFO [train.py:886] (0/4) Epoch 17, batch 1950, loss[loss=0.01145, audio_tagging_loss=0.01145, over 25000.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4948499.65 frames. ], batch size: 100, lr: 6.43e-03, grad_scale: 64.0 2023-12-22 09:43:15,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=521373.3333333333, ans=0.0 2023-12-22 09:43:16,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=521440.0, ans=0.1 2023-12-22 09:43:20,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=521440.0, ans=0.07 2023-12-22 09:43:23,770 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2023-12-22 09:43:25,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=521440.0, ans=0.1 2023-12-22 09:43:32,251 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-12-22 09:43:37,152 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.529e+01 2.757e+01 2.929e+01 3.098e+01 3.613e+01, threshold=5.858e+01, percent-clipped=0.0 2023-12-22 09:43:54,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=521640.0, ans=0.125 2023-12-22 09:43:55,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=521640.0, ans=0.1 2023-12-22 09:43:57,562 INFO [train.py:886] (0/4) Epoch 17, batch 2000, loss[loss=0.01386, audio_tagging_loss=0.01386, over 24750.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4946073.75 frames. ], batch size: 99, lr: 6.43e-03, grad_scale: 64.0 2023-12-22 09:44:06,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=521773.3333333333, ans=0.125 2023-12-22 09:44:24,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=521840.0, ans=0.125 2023-12-22 09:44:31,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=521906.6666666667, ans=0.2 2023-12-22 09:44:35,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.82 vs. limit=15.0 2023-12-22 09:44:49,073 INFO [train.py:886] (0/4) Epoch 17, batch 2050, loss[loss=0.0172, audio_tagging_loss=0.0172, over 25000.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4947226.28 frames. ], batch size: 100, lr: 6.42e-03, grad_scale: 64.0 2023-12-22 09:44:52,169 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.94 vs. limit=6.0 2023-12-22 09:45:05,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=522106.6666666667, ans=0.2 2023-12-22 09:45:19,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=522240.0, ans=0.125 2023-12-22 09:45:20,212 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.771e+01 2.888e+01 3.046e+01 3.628e+01, threshold=5.776e+01, percent-clipped=0.0 2023-12-22 09:45:22,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=522240.0, ans=0.1 2023-12-22 09:45:40,159 INFO [train.py:886] (0/4) Epoch 17, batch 2100, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4952733.31 frames. ], batch size: 100, lr: 6.42e-03, grad_scale: 64.0 2023-12-22 09:45:43,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=522373.3333333333, ans=0.125 2023-12-22 09:45:48,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=522373.3333333333, ans=0.0 2023-12-22 09:45:49,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=522373.3333333333, ans=0.0 2023-12-22 09:45:51,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=522440.0, ans=0.1 2023-12-22 09:46:03,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=522506.6666666667, ans=0.125 2023-12-22 09:46:28,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=522640.0, ans=0.0 2023-12-22 09:46:33,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=522706.6666666667, ans=15.0 2023-12-22 09:46:33,798 INFO [train.py:886] (0/4) Epoch 17, batch 2150, loss[loss=0.01384, audio_tagging_loss=0.01384, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4953996.08 frames. ], batch size: 100, lr: 6.42e-03, grad_scale: 64.0 2023-12-22 09:46:48,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=522773.3333333333, ans=0.04949747468305833 2023-12-22 09:47:01,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=522840.0, ans=0.0 2023-12-22 09:47:04,313 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.575e+01 2.873e+01 2.993e+01 3.098e+01 3.427e+01, threshold=5.985e+01, percent-clipped=0.0 2023-12-22 09:47:25,533 INFO [train.py:886] (0/4) Epoch 17, batch 2200, loss[loss=0.01558, audio_tagging_loss=0.01558, over 24750.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4943539.74 frames. ], batch size: 99, lr: 6.42e-03, grad_scale: 64.0 2023-12-22 09:47:44,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=523106.6666666667, ans=0.125 2023-12-22 09:47:47,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=523173.3333333333, ans=0.125 2023-12-22 09:47:53,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=523173.3333333333, ans=0.0 2023-12-22 09:48:17,372 INFO [train.py:886] (0/4) Epoch 17, batch 2250, loss[loss=0.01407, audio_tagging_loss=0.01407, over 24750.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4945317.50 frames. ], batch size: 99, lr: 6.42e-03, grad_scale: 64.0 2023-12-22 09:48:20,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.10 vs. limit=22.5 2023-12-22 09:48:48,950 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.505e+01 2.799e+01 2.927e+01 3.060e+01 4.133e+01, threshold=5.854e+01, percent-clipped=0.0 2023-12-22 09:49:00,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=523640.0, ans=0.1 2023-12-22 09:49:04,373 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.50 vs. limit=22.5 2023-12-22 09:49:10,371 INFO [train.py:886] (0/4) Epoch 17, batch 2300, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4944648.23 frames. ], batch size: 100, lr: 6.41e-03, grad_scale: 64.0 2023-12-22 09:49:11,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2023-12-22 09:49:23,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=523773.3333333333, ans=0.0 2023-12-22 09:49:33,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2023-12-22 09:49:43,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.39 vs. limit=22.5 2023-12-22 09:49:45,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=523906.6666666667, ans=0.125 2023-12-22 09:50:00,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=523973.3333333333, ans=0.125 2023-12-22 09:50:02,426 INFO [train.py:886] (0/4) Epoch 17, batch 2350, loss[loss=0.01338, audio_tagging_loss=0.01338, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4948036.19 frames. ], batch size: 100, lr: 6.41e-03, grad_scale: 64.0 2023-12-22 09:50:12,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=524106.6666666667, ans=0.0 2023-12-22 09:50:22,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=524173.3333333333, ans=0.0 2023-12-22 09:50:29,568 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:50:32,463 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2023-12-22 09:50:33,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=524240.0, ans=0.0 2023-12-22 09:50:33,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=524240.0, ans=0.0 2023-12-22 09:50:33,832 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+01 2.768e+01 2.920e+01 3.076e+01 3.528e+01, threshold=5.841e+01, percent-clipped=0.0 2023-12-22 09:50:34,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=524240.0, ans=0.125 2023-12-22 09:50:47,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=524306.6666666666, ans=0.125 2023-12-22 09:50:49,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=524306.6666666666, ans=0.125 2023-12-22 09:50:53,533 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:50:54,130 INFO [train.py:886] (0/4) Epoch 17, batch 2400, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4950235.16 frames. ], batch size: 100, lr: 6.41e-03, grad_scale: 64.0 2023-12-22 09:51:12,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=524440.0, ans=0.125 2023-12-22 09:51:13,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=524440.0, ans=0.125 2023-12-22 09:51:44,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=524640.0, ans=0.125 2023-12-22 09:51:46,792 INFO [train.py:886] (0/4) Epoch 17, batch 2450, loss[loss=0.01398, audio_tagging_loss=0.01398, over 25000.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4954229.36 frames. ], batch size: 100, lr: 6.41e-03, grad_scale: 64.0 2023-12-22 09:51:51,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=524706.6666666666, ans=0.125 2023-12-22 09:51:53,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=524706.6666666666, ans=0.125 2023-12-22 09:52:18,067 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.568e+01 2.810e+01 2.991e+01 3.131e+01 3.656e+01, threshold=5.983e+01, percent-clipped=0.0 2023-12-22 09:52:30,601 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=12.0 2023-12-22 09:52:38,606 INFO [train.py:886] (0/4) Epoch 17, batch 2500, loss[loss=0.01503, audio_tagging_loss=0.01503, over 24750.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4951117.85 frames. ], batch size: 99, lr: 6.40e-03, grad_scale: 64.0 2023-12-22 09:52:44,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.13 vs. limit=22.5 2023-12-22 09:52:50,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=525106.6666666666, ans=0.125 2023-12-22 09:52:51,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.52 vs. limit=10.0 2023-12-22 09:52:53,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=525106.6666666666, ans=0.0 2023-12-22 09:52:57,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=525106.6666666666, ans=0.125 2023-12-22 09:53:25,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=525306.6666666666, ans=0.0 2023-12-22 09:53:26,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=525306.6666666666, ans=0.125 2023-12-22 09:53:27,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525306.6666666666, ans=0.1 2023-12-22 09:53:30,988 INFO [train.py:886] (0/4) Epoch 17, batch 2550, loss[loss=0.01337, audio_tagging_loss=0.01337, over 24750.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4950488.22 frames. ], batch size: 99, lr: 6.40e-03, grad_scale: 64.0 2023-12-22 09:53:31,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2023-12-22 09:53:35,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=525373.3333333334, ans=0.1 2023-12-22 09:53:46,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=525440.0, ans=0.05 2023-12-22 09:54:02,059 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.555e+01 2.815e+01 2.969e+01 3.145e+01 4.179e+01, threshold=5.937e+01, percent-clipped=0.0 2023-12-22 09:54:23,104 INFO [train.py:886] (0/4) Epoch 17, batch 2600, loss[loss=0.01775, audio_tagging_loss=0.01775, over 25000.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4951561.72 frames. ], batch size: 100, lr: 6.40e-03, grad_scale: 64.0 2023-12-22 09:54:40,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=525773.3333333334, ans=0.125 2023-12-22 09:54:55,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=525906.6666666666, ans=0.125 2023-12-22 09:55:01,057 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.36 vs. limit=12.0 2023-12-22 09:55:13,769 INFO [train.py:886] (0/4) Epoch 17, batch 2650, loss[loss=0.01254, audio_tagging_loss=0.01254, over 21048.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4949140.19 frames. ], batch size: 107, lr: 6.40e-03, grad_scale: 64.0 2023-12-22 09:55:15,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=526040.0, ans=0.0 2023-12-22 09:55:16,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=526040.0, ans=0.0 2023-12-22 09:55:27,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=15.0 2023-12-22 09:55:36,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=526173.3333333334, ans=0.0 2023-12-22 09:55:41,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=526173.3333333334, ans=0.125 2023-12-22 09:55:43,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526240.0, ans=0.1 2023-12-22 09:55:44,587 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.500e+01 2.764e+01 2.889e+01 3.029e+01 3.511e+01, threshold=5.779e+01, percent-clipped=0.0 2023-12-22 09:55:45,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=526240.0, ans=0.1 2023-12-22 09:55:53,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=526240.0, ans=0.0 2023-12-22 09:56:06,531 INFO [train.py:886] (0/4) Epoch 17, batch 2700, loss[loss=0.01511, audio_tagging_loss=0.01511, over 25000.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4954565.47 frames. ], batch size: 100, lr: 6.40e-03, grad_scale: 64.0 2023-12-22 09:56:19,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=15.0 2023-12-22 09:56:21,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=526440.0, ans=0.125 2023-12-22 09:56:23,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=526440.0, ans=12.0 2023-12-22 09:56:28,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=22.5 2023-12-22 09:56:35,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=526506.6666666666, ans=0.125 2023-12-22 09:56:57,710 INFO [train.py:886] (0/4) Epoch 17, batch 2750, loss[loss=0.01395, audio_tagging_loss=0.01395, over 25000.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4956765.74 frames. ], batch size: 100, lr: 6.39e-03, grad_scale: 64.0 2023-12-22 09:56:59,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=526706.6666666666, ans=0.2 2023-12-22 09:57:06,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=526706.6666666666, ans=0.0 2023-12-22 09:57:14,601 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.52 vs. limit=22.5 2023-12-22 09:57:28,231 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.564e+01 2.829e+01 2.970e+01 3.111e+01 3.407e+01, threshold=5.940e+01, percent-clipped=0.0 2023-12-22 09:57:28,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=526906.6666666666, ans=0.125 2023-12-22 09:57:40,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=526973.3333333334, ans=0.0 2023-12-22 09:57:45,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.97 vs. limit=12.0 2023-12-22 09:57:50,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2023-12-22 09:57:50,852 INFO [train.py:886] (0/4) Epoch 17, batch 2800, loss[loss=0.01334, audio_tagging_loss=0.01334, over 24750.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4955543.91 frames. ], batch size: 99, lr: 6.39e-03, grad_scale: 64.0 2023-12-22 09:58:16,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=527173.3333333334, ans=0.125 2023-12-22 09:58:19,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=527173.3333333334, ans=0.0 2023-12-22 09:58:21,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=527240.0, ans=0.09899494936611666 2023-12-22 09:58:23,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=527240.0, ans=0.125 2023-12-22 09:58:26,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=527240.0, ans=0.125 2023-12-22 09:58:31,598 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.86 vs. limit=22.5 2023-12-22 09:58:43,620 INFO [train.py:886] (0/4) Epoch 17, batch 2850, loss[loss=0.01341, audio_tagging_loss=0.01341, over 24750.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4949936.07 frames. ], batch size: 99, lr: 6.39e-03, grad_scale: 64.0 2023-12-22 09:58:54,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.43 vs. limit=15.0 2023-12-22 09:58:54,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=527440.0, ans=0.125 2023-12-22 09:58:55,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=527440.0, ans=0.1 2023-12-22 09:59:02,536 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.065e-01 2023-12-22 09:59:02,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=527506.6666666666, ans=0.0 2023-12-22 09:59:13,548 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.610e-02 2023-12-22 09:59:14,236 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+01 2.860e+01 2.998e+01 3.153e+01 3.451e+01, threshold=5.997e+01, percent-clipped=0.0 2023-12-22 09:59:30,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=527640.0, ans=0.0 2023-12-22 09:59:34,623 INFO [train.py:886] (0/4) Epoch 17, batch 2900, loss[loss=0.01153, audio_tagging_loss=0.01153, over 25000.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4947390.79 frames. ], batch size: 100, lr: 6.39e-03, grad_scale: 64.0 2023-12-22 09:59:39,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527706.6666666666, ans=0.1 2023-12-22 09:59:42,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=527706.6666666666, ans=0.0 2023-12-22 09:59:53,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=527773.3333333334, ans=0.2 2023-12-22 10:00:10,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.47 vs. limit=10.0 2023-12-22 10:00:27,714 INFO [train.py:886] (0/4) Epoch 17, batch 2950, loss[loss=0.01379, audio_tagging_loss=0.01379, over 24750.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4948725.83 frames. ], batch size: 99, lr: 6.39e-03, grad_scale: 64.0 2023-12-22 10:00:42,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=528106.6666666666, ans=0.0 2023-12-22 10:00:53,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=528173.3333333334, ans=0.0 2023-12-22 10:00:59,346 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.553e+01 2.819e+01 2.962e+01 3.151e+01 3.518e+01, threshold=5.925e+01, percent-clipped=0.0 2023-12-22 10:01:05,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=528240.0, ans=0.125 2023-12-22 10:01:06,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=528240.0, ans=0.125 2023-12-22 10:01:09,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=528306.6666666666, ans=0.0 2023-12-22 10:01:19,362 INFO [train.py:886] (0/4) Epoch 17, batch 3000, loss[loss=0.01457, audio_tagging_loss=0.01457, over 24750.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4948747.11 frames. ], batch size: 99, lr: 6.38e-03, grad_scale: 64.0 2023-12-22 10:01:19,377 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 10:01:40,064 INFO [train.py:917] (0/4) Epoch 17, validation: loss=0.03336, audio_tagging_loss=0.03336, over 3737520.00 frames. 2023-12-22 10:01:40,065 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 10:01:44,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=528373.3333333334, ans=0.125 2023-12-22 10:01:48,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=528373.3333333334, ans=0.125 2023-12-22 10:01:51,525 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=12.0 2023-12-22 10:01:52,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=528440.0, ans=0.02 2023-12-22 10:01:54,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=528440.0, ans=0.125 2023-12-22 10:02:03,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=528506.6666666666, ans=0.125 2023-12-22 10:02:11,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.22 vs. limit=22.5 2023-12-22 10:02:20,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=528573.3333333334, ans=0.2 2023-12-22 10:02:33,307 INFO [train.py:886] (0/4) Epoch 17, batch 3050, loss[loss=0.01618, audio_tagging_loss=0.01618, over 25000.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 4955688.47 frames. ], batch size: 100, lr: 6.38e-03, grad_scale: 64.0 2023-12-22 10:02:45,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=528773.3333333334, ans=0.0 2023-12-22 10:03:03,845 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.567e+01 2.845e+01 2.964e+01 3.082e+01 4.137e+01, threshold=5.927e+01, percent-clipped=0.0 2023-12-22 10:03:05,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=528906.6666666666, ans=0.1 2023-12-22 10:03:10,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.20 vs. limit=10.0 2023-12-22 10:03:15,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=528973.3333333334, ans=0.1 2023-12-22 10:03:25,018 INFO [train.py:886] (0/4) Epoch 17, batch 3100, loss[loss=0.01715, audio_tagging_loss=0.01715, over 25000.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4960906.21 frames. ], batch size: 100, lr: 6.38e-03, grad_scale: 64.0 2023-12-22 10:03:29,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=529040.0, ans=0.2 2023-12-22 10:03:49,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=529173.3333333334, ans=0.125 2023-12-22 10:03:58,986 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=12.0 2023-12-22 10:04:00,169 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.59 vs. limit=15.0 2023-12-22 10:04:11,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529306.6666666666, ans=0.1 2023-12-22 10:04:17,583 INFO [train.py:886] (0/4) Epoch 17, batch 3150, loss[loss=0.01497, audio_tagging_loss=0.01497, over 24750.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4956890.19 frames. ], batch size: 99, lr: 6.38e-03, grad_scale: 64.0 2023-12-22 10:04:47,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=529506.6666666666, ans=0.2 2023-12-22 10:04:49,161 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.542e+01 2.844e+01 2.967e+01 3.134e+01 3.611e+01, threshold=5.935e+01, percent-clipped=0.0 2023-12-22 10:05:09,794 INFO [train.py:886] (0/4) Epoch 17, batch 3200, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4950360.12 frames. ], batch size: 100, lr: 6.38e-03, grad_scale: 64.0 2023-12-22 10:05:12,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2023-12-22 10:05:17,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=529706.6666666666, ans=0.5 2023-12-22 10:05:19,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=529773.3333333334, ans=0.07 2023-12-22 10:05:26,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=529773.3333333334, ans=0.125 2023-12-22 10:05:27,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=529773.3333333334, ans=0.125 2023-12-22 10:05:33,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=529840.0, ans=0.125 2023-12-22 10:05:47,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=529906.6666666666, ans=0.125 2023-12-22 10:05:54,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=529973.3333333334, ans=0.0 2023-12-22 10:06:01,798 INFO [train.py:886] (0/4) Epoch 17, batch 3250, loss[loss=0.01438, audio_tagging_loss=0.01438, over 24750.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4948688.98 frames. ], batch size: 99, lr: 6.37e-03, grad_scale: 64.0 2023-12-22 10:06:04,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.17 vs. limit=15.0 2023-12-22 10:06:17,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=530106.6666666666, ans=0.125 2023-12-22 10:06:32,360 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 2.816e+01 2.913e+01 3.117e+01 3.932e+01, threshold=5.825e+01, percent-clipped=0.0 2023-12-22 10:06:38,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=530240.0, ans=0.1 2023-12-22 10:06:42,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=530306.6666666666, ans=0.0 2023-12-22 10:06:48,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=530306.6666666666, ans=0.125 2023-12-22 10:06:49,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=530306.6666666666, ans=0.125 2023-12-22 10:06:53,267 INFO [train.py:886] (0/4) Epoch 17, batch 3300, loss[loss=0.01501, audio_tagging_loss=0.01501, over 24750.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4948519.55 frames. ], batch size: 99, lr: 6.37e-03, grad_scale: 64.0 2023-12-22 10:06:53,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=530373.3333333334, ans=0.125 2023-12-22 10:07:17,062 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.031e-02 2023-12-22 10:07:18,332 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.12 vs. limit=22.5 2023-12-22 10:07:36,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=530640.0, ans=0.125 2023-12-22 10:07:45,503 INFO [train.py:886] (0/4) Epoch 17, batch 3350, loss[loss=0.009965, audio_tagging_loss=0.009965, over 23953.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4951616.52 frames. ], batch size: 100, lr: 6.37e-03, grad_scale: 128.0 2023-12-22 10:07:53,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=530706.6666666666, ans=0.0 2023-12-22 10:08:04,655 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2023-12-22 10:08:07,588 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=12.0 2023-12-22 10:08:08,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=530840.0, ans=0.125 2023-12-22 10:08:17,807 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.375e+01 2.799e+01 2.966e+01 3.159e+01 3.514e+01, threshold=5.932e+01, percent-clipped=0.0 2023-12-22 10:08:25,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=530906.6666666666, ans=0.125 2023-12-22 10:08:27,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=530973.3333333334, ans=0.1 2023-12-22 10:08:32,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=530973.3333333334, ans=0.0 2023-12-22 10:08:36,652 INFO [train.py:886] (0/4) Epoch 17, batch 3400, loss[loss=0.01245, audio_tagging_loss=0.01245, over 25000.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4956144.33 frames. ], batch size: 100, lr: 6.37e-03, grad_scale: 64.0 2023-12-22 10:09:01,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.26 vs. limit=15.0 2023-12-22 10:09:07,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=531240.0, ans=0.0 2023-12-22 10:09:13,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=531240.0, ans=10.0 2023-12-22 10:09:27,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=531306.6666666666, ans=0.125 2023-12-22 10:09:29,999 INFO [train.py:886] (0/4) Epoch 17, batch 3450, loss[loss=0.01522, audio_tagging_loss=0.01522, over 21317.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4945621.82 frames. ], batch size: 107, lr: 6.37e-03, grad_scale: 64.0 2023-12-22 10:09:40,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=531440.0, ans=0.05 2023-12-22 10:09:43,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=531440.0, ans=0.0 2023-12-22 10:09:53,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=531506.6666666666, ans=0.0 2023-12-22 10:10:02,505 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.535e+01 2.890e+01 3.022e+01 3.190e+01 3.509e+01, threshold=6.045e+01, percent-clipped=0.0 2023-12-22 10:10:11,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=531640.0, ans=0.125 2023-12-22 10:10:23,151 INFO [train.py:886] (0/4) Epoch 17, batch 3500, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4936018.64 frames. ], batch size: 100, lr: 6.37e-03, grad_scale: 64.0 2023-12-22 10:10:23,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=531706.6666666666, ans=0.125 2023-12-22 10:10:26,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=531706.6666666666, ans=0.1 2023-12-22 10:10:30,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=531706.6666666666, ans=0.125 2023-12-22 10:10:33,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=531773.3333333334, ans=0.125 2023-12-22 10:10:46,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=531840.0, ans=0.0 2023-12-22 10:10:55,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=531906.6666666666, ans=0.5 2023-12-22 10:10:58,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=531906.6666666666, ans=0.0 2023-12-22 10:10:59,717 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.028e-02 2023-12-22 10:11:02,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=12.0 2023-12-22 10:11:06,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=531973.3333333334, ans=0.035 2023-12-22 10:11:11,109 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:11:13,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=531973.3333333334, ans=0.125 2023-12-22 10:11:14,707 INFO [train.py:886] (0/4) Epoch 17, batch 3550, loss[loss=0.01358, audio_tagging_loss=0.01358, over 25000.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4937272.22 frames. ], batch size: 100, lr: 6.36e-03, grad_scale: 64.0 2023-12-22 10:11:35,485 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=22.5 2023-12-22 10:11:38,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=532173.3333333334, ans=0.0 2023-12-22 10:11:39,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=532173.3333333334, ans=0.125 2023-12-22 10:11:42,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=532173.3333333334, ans=0.125 2023-12-22 10:11:43,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=532173.3333333334, ans=0.125 2023-12-22 10:11:46,398 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.431e+01 2.820e+01 2.970e+01 3.113e+01 4.129e+01, threshold=5.940e+01, percent-clipped=0.0 2023-12-22 10:11:47,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=532240.0, ans=0.09899494936611666 2023-12-22 10:11:56,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=532306.6666666666, ans=0.2 2023-12-22 10:12:06,046 INFO [train.py:886] (0/4) Epoch 17, batch 3600, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4932980.35 frames. ], batch size: 100, lr: 6.36e-03, grad_scale: 64.0 2023-12-22 10:12:30,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=532506.6666666666, ans=0.125 2023-12-22 10:12:31,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=532506.6666666666, ans=0.125 2023-12-22 10:12:55,946 INFO [train.py:886] (0/4) Epoch 17, batch 3650, loss[loss=0.01227, audio_tagging_loss=0.01227, over 24015.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4943880.10 frames. ], batch size: 100, lr: 6.36e-03, grad_scale: 64.0 2023-12-22 10:13:01,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=532706.6666666666, ans=0.125 2023-12-22 10:13:08,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=532773.3333333334, ans=0.2 2023-12-22 10:13:18,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=532840.0, ans=0.0 2023-12-22 10:13:27,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-12-22 10:13:27,617 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.518e+01 2.770e+01 2.907e+01 3.018e+01 3.500e+01, threshold=5.815e+01, percent-clipped=0.0 2023-12-22 10:13:33,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=532906.6666666666, ans=0.0 2023-12-22 10:13:48,596 INFO [train.py:886] (0/4) Epoch 17, batch 3700, loss[loss=0.01413, audio_tagging_loss=0.01413, over 25000.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4945590.20 frames. ], batch size: 100, lr: 6.36e-03, grad_scale: 64.0 2023-12-22 10:13:49,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=533040.0, ans=0.125 2023-12-22 10:13:50,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=533040.0, ans=0.125 2023-12-22 10:13:51,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=533040.0, ans=0.1 2023-12-22 10:13:54,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=533040.0, ans=0.0 2023-12-22 10:13:57,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=533106.6666666666, ans=0.0 2023-12-22 10:14:00,552 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.44 vs. limit=22.5 2023-12-22 10:14:32,582 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-80000.pt 2023-12-22 10:14:42,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=533373.3333333334, ans=0.1 2023-12-22 10:14:43,265 INFO [train.py:886] (0/4) Epoch 17, batch 3750, loss[loss=0.0154, audio_tagging_loss=0.0154, over 24750.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4946447.30 frames. ], batch size: 99, lr: 6.36e-03, grad_scale: 64.0 2023-12-22 10:14:58,858 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2023-12-22 10:15:12,444 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.605e-03 2023-12-22 10:15:14,891 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.583e+01 2.851e+01 2.968e+01 3.092e+01 3.872e+01, threshold=5.935e+01, percent-clipped=0.0 2023-12-22 10:15:24,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=533640.0, ans=0.1 2023-12-22 10:15:30,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=533640.0, ans=0.0 2023-12-22 10:15:33,565 INFO [train.py:886] (0/4) Epoch 17, batch 3800, loss[loss=0.01395, audio_tagging_loss=0.01395, over 24750.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4945040.82 frames. ], batch size: 99, lr: 6.35e-03, grad_scale: 64.0 2023-12-22 10:15:34,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=533706.6666666666, ans=0.0 2023-12-22 10:15:37,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=533706.6666666666, ans=12.0 2023-12-22 10:15:51,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=533773.3333333334, ans=0.1 2023-12-22 10:16:19,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=533973.3333333334, ans=0.125 2023-12-22 10:16:21,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=533973.3333333334, ans=0.0 2023-12-22 10:16:26,367 INFO [train.py:886] (0/4) Epoch 17, batch 3850, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4943912.94 frames. ], batch size: 99, lr: 6.35e-03, grad_scale: 64.0 2023-12-22 10:16:58,073 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.583e+01 2.795e+01 2.953e+01 3.065e+01 3.551e+01, threshold=5.907e+01, percent-clipped=0.0 2023-12-22 10:17:16,794 INFO [train.py:886] (0/4) Epoch 17, batch 3900, loss[loss=0.0132, audio_tagging_loss=0.0132, over 24750.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4943451.31 frames. ], batch size: 99, lr: 6.35e-03, grad_scale: 64.0 2023-12-22 10:17:29,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.37 vs. limit=15.0 2023-12-22 10:17:45,499 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2023-12-22 10:18:08,576 INFO [train.py:886] (0/4) Epoch 17, batch 3950, loss[loss=0.01592, audio_tagging_loss=0.01592, over 25000.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4948341.01 frames. ], batch size: 100, lr: 6.35e-03, grad_scale: 64.0 2023-12-22 10:18:09,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=534706.6666666666, ans=0.2 2023-12-22 10:18:12,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=534706.6666666666, ans=0.125 2023-12-22 10:18:12,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=534706.6666666666, ans=0.1 2023-12-22 10:18:14,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=534706.6666666666, ans=0.125 2023-12-22 10:18:21,032 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.069e-02 2023-12-22 10:18:22,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=534773.3333333334, ans=0.0 2023-12-22 10:18:39,647 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+01 2.778e+01 2.901e+01 3.094e+01 3.779e+01, threshold=5.801e+01, percent-clipped=0.0 2023-12-22 10:18:56,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.99 vs. limit=22.5 2023-12-22 10:18:56,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=534973.3333333334, ans=0.2 2023-12-22 10:18:58,644 INFO [train.py:886] (0/4) Epoch 17, batch 4000, loss[loss=0.01476, audio_tagging_loss=0.01476, over 25000.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4947518.14 frames. ], batch size: 100, lr: 6.35e-03, grad_scale: 64.0 2023-12-22 10:19:06,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=535040.0, ans=0.125 2023-12-22 10:19:45,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=535306.6666666666, ans=0.0 2023-12-22 10:19:49,776 INFO [train.py:886] (0/4) Epoch 17, batch 4050, loss[loss=0.01568, audio_tagging_loss=0.01568, over 24750.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4948527.90 frames. ], batch size: 99, lr: 6.34e-03, grad_scale: 64.0 2023-12-22 10:20:16,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=535506.6666666666, ans=0.2 2023-12-22 10:20:22,058 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.511e+01 2.893e+01 2.988e+01 3.133e+01 3.578e+01, threshold=5.975e+01, percent-clipped=0.0 2023-12-22 10:20:25,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=535573.3333333334, ans=0.0 2023-12-22 10:20:34,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=535640.0, ans=0.125 2023-12-22 10:20:38,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.49 vs. limit=22.5 2023-12-22 10:20:42,133 INFO [train.py:886] (0/4) Epoch 17, batch 4100, loss[loss=0.01528, audio_tagging_loss=0.01528, over 24082.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4945326.44 frames. ], batch size: 100, lr: 6.34e-03, grad_scale: 64.0 2023-12-22 10:20:48,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=535706.6666666666, ans=0.125 2023-12-22 10:21:03,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2023-12-22 10:21:32,759 INFO [train.py:886] (0/4) Epoch 17, batch 4150, loss[loss=0.01189, audio_tagging_loss=0.01189, over 24094.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4939624.22 frames. ], batch size: 100, lr: 6.34e-03, grad_scale: 64.0 2023-12-22 10:21:33,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=536040.0, ans=0.1 2023-12-22 10:21:51,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=536106.6666666666, ans=0.0 2023-12-22 10:21:54,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=536173.3333333334, ans=0.125 2023-12-22 10:22:03,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=536240.0, ans=0.125 2023-12-22 10:22:04,548 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.507e+01 2.856e+01 2.974e+01 3.122e+01 3.730e+01, threshold=5.949e+01, percent-clipped=0.0 2023-12-22 10:22:05,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=536240.0, ans=0.125 2023-12-22 10:22:14,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=536306.6666666666, ans=0.125 2023-12-22 10:22:24,184 INFO [train.py:886] (0/4) Epoch 17, batch 4200, loss[loss=0.01607, audio_tagging_loss=0.01607, over 24750.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4946018.87 frames. ], batch size: 99, lr: 6.34e-03, grad_scale: 64.0 2023-12-22 10:22:33,735 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2023-12-22 10:22:50,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=536506.6666666666, ans=0.125 2023-12-22 10:22:53,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=536573.3333333334, ans=0.125 2023-12-22 10:22:58,278 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2023-12-22 10:23:02,597 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:23:06,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=536640.0, ans=0.125 2023-12-22 10:23:11,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=536640.0, ans=0.1 2023-12-22 10:23:15,969 INFO [train.py:886] (0/4) Epoch 17, batch 4250, loss[loss=0.01104, audio_tagging_loss=0.01104, over 24750.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4949268.41 frames. ], batch size: 99, lr: 6.34e-03, grad_scale: 64.0 2023-12-22 10:23:18,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=536706.6666666666, ans=0.125 2023-12-22 10:23:36,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=536840.0, ans=0.04949747468305833 2023-12-22 10:23:42,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=536840.0, ans=0.125 2023-12-22 10:23:47,886 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.588e+01 2.846e+01 2.960e+01 3.073e+01 3.606e+01, threshold=5.921e+01, percent-clipped=0.0 2023-12-22 10:23:51,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=536906.6666666666, ans=0.2 2023-12-22 10:23:57,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.14 vs. limit=15.0 2023-12-22 10:24:06,816 INFO [train.py:886] (0/4) Epoch 17, batch 4300, loss[loss=0.01141, audio_tagging_loss=0.01141, over 24085.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4951817.21 frames. ], batch size: 100, lr: 6.33e-03, grad_scale: 64.0 2023-12-22 10:24:13,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=537040.0, ans=0.1 2023-12-22 10:24:19,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=537106.6666666666, ans=0.125 2023-12-22 10:24:21,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=537106.6666666666, ans=0.0 2023-12-22 10:24:24,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=537106.6666666666, ans=0.2 2023-12-22 10:24:33,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2023-12-22 10:24:38,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=537240.0, ans=0.125 2023-12-22 10:24:45,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=537240.0, ans=0.125 2023-12-22 10:24:51,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=537306.6666666666, ans=0.125 2023-12-22 10:24:54,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=537306.6666666666, ans=0.125 2023-12-22 10:24:59,302 INFO [train.py:886] (0/4) Epoch 17, batch 4350, loss[loss=0.01544, audio_tagging_loss=0.01544, over 25000.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4950264.75 frames. ], batch size: 100, lr: 6.33e-03, grad_scale: 64.0 2023-12-22 10:25:20,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=537506.6666666666, ans=0.1 2023-12-22 10:25:30,832 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.572e+01 2.833e+01 2.974e+01 3.137e+01 3.666e+01, threshold=5.948e+01, percent-clipped=0.0 2023-12-22 10:25:33,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=537573.3333333334, ans=0.5 2023-12-22 10:25:44,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=537640.0, ans=0.1 2023-12-22 10:25:48,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=537640.0, ans=0.2 2023-12-22 10:25:50,429 INFO [train.py:886] (0/4) Epoch 17, batch 4400, loss[loss=0.01431, audio_tagging_loss=0.01431, over 24750.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4944520.61 frames. ], batch size: 99, lr: 6.33e-03, grad_scale: 64.0 2023-12-22 10:25:57,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=537706.6666666666, ans=0.125 2023-12-22 10:25:58,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=537706.6666666666, ans=0.5 2023-12-22 10:26:18,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=537840.0, ans=0.0 2023-12-22 10:26:19,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=537840.0, ans=0.0 2023-12-22 10:26:19,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=537840.0, ans=0.125 2023-12-22 10:26:23,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=537906.6666666666, ans=0.125 2023-12-22 10:26:33,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=537973.3333333334, ans=0.07 2023-12-22 10:26:35,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=537973.3333333334, ans=0.125 2023-12-22 10:26:35,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=537973.3333333334, ans=0.125 2023-12-22 10:26:35,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2023-12-22 10:26:41,595 INFO [train.py:886] (0/4) Epoch 17, batch 4450, loss[loss=0.01404, audio_tagging_loss=0.01404, over 25000.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4943822.05 frames. ], batch size: 100, lr: 6.33e-03, grad_scale: 64.0 2023-12-22 10:27:02,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=538173.3333333334, ans=0.0 2023-12-22 10:27:13,176 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.528e+01 2.814e+01 2.976e+01 3.129e+01 3.598e+01, threshold=5.951e+01, percent-clipped=0.0 2023-12-22 10:27:17,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=538240.0, ans=0.125 2023-12-22 10:27:25,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2023-12-22 10:27:32,758 INFO [train.py:886] (0/4) Epoch 17, batch 4500, loss[loss=0.01525, audio_tagging_loss=0.01525, over 25000.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4945663.08 frames. ], batch size: 100, lr: 6.33e-03, grad_scale: 64.0 2023-12-22 10:27:32,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=538373.3333333334, ans=0.0 2023-12-22 10:27:32,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=538373.3333333334, ans=0.125 2023-12-22 10:27:33,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=538373.3333333334, ans=0.1 2023-12-22 10:27:50,207 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2023-12-22 10:27:58,494 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2023-12-22 10:28:24,638 INFO [train.py:886] (0/4) Epoch 17, batch 4550, loss[loss=0.01641, audio_tagging_loss=0.01641, over 25000.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4946433.56 frames. ], batch size: 100, lr: 6.32e-03, grad_scale: 64.0 2023-12-22 10:28:28,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.98 vs. limit=6.0 2023-12-22 10:28:37,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=538773.3333333334, ans=0.125 2023-12-22 10:28:53,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=538906.6666666666, ans=0.125 2023-12-22 10:28:57,008 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.472e+01 2.801e+01 2.924e+01 3.059e+01 3.634e+01, threshold=5.849e+01, percent-clipped=0.0 2023-12-22 10:28:57,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=538906.6666666666, ans=0.125 2023-12-22 10:29:07,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=538973.3333333334, ans=0.2 2023-12-22 10:29:09,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=538973.3333333334, ans=0.0 2023-12-22 10:29:16,002 INFO [train.py:886] (0/4) Epoch 17, batch 4600, loss[loss=0.01658, audio_tagging_loss=0.01658, over 24750.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4952000.32 frames. ], batch size: 99, lr: 6.32e-03, grad_scale: 64.0 2023-12-22 10:29:17,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=539040.0, ans=0.125 2023-12-22 10:29:35,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=539106.6666666666, ans=0.0 2023-12-22 10:29:38,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=539173.3333333334, ans=0.125 2023-12-22 10:29:50,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=539240.0, ans=0.125 2023-12-22 10:30:05,420 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.76 vs. limit=15.0 2023-12-22 10:30:08,711 INFO [train.py:886] (0/4) Epoch 17, batch 4650, loss[loss=0.01402, audio_tagging_loss=0.01402, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4950891.37 frames. ], batch size: 100, lr: 6.32e-03, grad_scale: 64.0 2023-12-22 10:30:13,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=539373.3333333334, ans=0.1 2023-12-22 10:30:28,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=539506.6666666666, ans=0.125 2023-12-22 10:30:30,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=539506.6666666666, ans=0.125 2023-12-22 10:30:37,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=539506.6666666666, ans=0.0 2023-12-22 10:30:41,878 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.569e+01 2.874e+01 3.014e+01 3.111e+01 3.634e+01, threshold=6.028e+01, percent-clipped=0.0 2023-12-22 10:30:46,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=539573.3333333334, ans=0.2 2023-12-22 10:31:00,409 INFO [train.py:886] (0/4) Epoch 17, batch 4700, loss[loss=0.01427, audio_tagging_loss=0.01427, over 24750.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4949387.87 frames. ], batch size: 99, lr: 6.32e-03, grad_scale: 64.0 2023-12-22 10:31:33,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.13 vs. limit=6.0 2023-12-22 10:31:39,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=539973.3333333334, ans=0.0 2023-12-22 10:31:45,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=539973.3333333334, ans=0.0 2023-12-22 10:31:48,493 INFO [train.py:886] (0/4) Epoch 17, batch 4750, loss[loss=0.01605, audio_tagging_loss=0.01605, over 24750.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 4944459.14 frames. ], batch size: 99, lr: 6.32e-03, grad_scale: 64.0 2023-12-22 10:31:53,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=540040.0, ans=0.125 2023-12-22 10:31:53,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=540040.0, ans=0.125 2023-12-22 10:32:04,160 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-17.pt 2023-12-22 10:32:26,596 INFO [train.py:886] (0/4) Epoch 18, batch 0, loss[loss=0.02899, audio_tagging_loss=0.02899, over 25000.00 frames. ], tot_loss[loss=0.02899, audio_tagging_loss=0.02899, over 25000.00 frames. ], batch size: 100, lr: 6.14e-03, grad_scale: 32.0 2023-12-22 10:32:26,598 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 10:32:47,814 INFO [train.py:917] (0/4) Epoch 18, validation: loss=0.03336, audio_tagging_loss=0.03336, over 3737520.00 frames. 2023-12-22 10:32:47,815 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 10:32:50,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=540146.6666666666, ans=0.125 2023-12-22 10:32:52,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=540146.6666666666, ans=0.125 2023-12-22 10:33:02,899 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.634e+01 2.896e+01 3.092e+01 3.384e+01 9.418e+01, threshold=6.184e+01, percent-clipped=7.0 2023-12-22 10:33:23,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=540346.6666666666, ans=0.0 2023-12-22 10:33:31,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.85 vs. limit=10.0 2023-12-22 10:33:32,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=540413.3333333334, ans=0.0 2023-12-22 10:33:34,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=540413.3333333334, ans=0.04949747468305833 2023-12-22 10:33:36,986 INFO [train.py:886] (0/4) Epoch 18, batch 50, loss[loss=0.0176, audio_tagging_loss=0.0176, over 25000.00 frames. ], tot_loss[loss=0.02206, audio_tagging_loss=0.02206, over 1122667.80 frames. ], batch size: 100, lr: 6.13e-03, grad_scale: 32.0 2023-12-22 10:34:03,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=540613.3333333334, ans=0.0 2023-12-22 10:34:08,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=540680.0, ans=0.0 2023-12-22 10:34:08,771 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=15.0 2023-12-22 10:34:23,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=540746.6666666666, ans=0.2 2023-12-22 10:34:30,062 INFO [train.py:886] (0/4) Epoch 18, batch 100, loss[loss=0.01418, audio_tagging_loss=0.01418, over 25000.00 frames. ], tot_loss[loss=0.01902, audio_tagging_loss=0.01902, over 1971772.49 frames. ], batch size: 100, lr: 6.13e-03, grad_scale: 32.0 2023-12-22 10:34:45,958 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.772e+01 3.148e+01 3.407e+01 3.744e+01 5.066e+01, threshold=6.815e+01, percent-clipped=0.0 2023-12-22 10:34:54,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=540946.6666666666, ans=0.125 2023-12-22 10:34:56,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=540946.6666666666, ans=0.125 2023-12-22 10:34:56,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=15.0 2023-12-22 10:34:59,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=540946.6666666666, ans=0.2 2023-12-22 10:35:00,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=541013.3333333334, ans=0.125 2023-12-22 10:35:20,936 INFO [train.py:886] (0/4) Epoch 18, batch 150, loss[loss=0.0123, audio_tagging_loss=0.0123, over 22583.00 frames. ], tot_loss[loss=0.01726, audio_tagging_loss=0.01726, over 2633359.98 frames. ], batch size: 107, lr: 6.13e-03, grad_scale: 32.0 2023-12-22 10:35:22,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=541146.6666666666, ans=0.0 2023-12-22 10:35:33,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=541213.3333333334, ans=0.09899494936611666 2023-12-22 10:35:37,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=541213.3333333334, ans=0.04949747468305833 2023-12-22 10:35:38,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=541213.3333333334, ans=10.0 2023-12-22 10:35:48,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=541280.0, ans=0.0 2023-12-22 10:35:50,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=541346.6666666666, ans=0.125 2023-12-22 10:35:52,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=541346.6666666666, ans=0.125 2023-12-22 10:35:58,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=541346.6666666666, ans=0.1 2023-12-22 10:36:04,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=541413.3333333334, ans=0.2 2023-12-22 10:36:12,538 INFO [train.py:886] (0/4) Epoch 18, batch 200, loss[loss=0.01524, audio_tagging_loss=0.01524, over 25000.00 frames. ], tot_loss[loss=0.01635, audio_tagging_loss=0.01635, over 3156143.07 frames. ], batch size: 100, lr: 6.13e-03, grad_scale: 32.0 2023-12-22 10:36:29,159 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.557e+01 2.889e+01 3.014e+01 3.175e+01 3.778e+01, threshold=6.028e+01, percent-clipped=0.0 2023-12-22 10:36:46,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=15.0 2023-12-22 10:36:55,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=541746.6666666666, ans=0.125 2023-12-22 10:37:04,251 INFO [train.py:886] (0/4) Epoch 18, batch 250, loss[loss=0.01157, audio_tagging_loss=0.01157, over 23943.00 frames. ], tot_loss[loss=0.01561, audio_tagging_loss=0.01561, over 3558904.29 frames. ], batch size: 100, lr: 6.13e-03, grad_scale: 32.0 2023-12-22 10:37:19,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=541880.0, ans=0.125 2023-12-22 10:37:21,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=541880.0, ans=0.125 2023-12-22 10:37:34,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=542013.3333333334, ans=0.0 2023-12-22 10:37:50,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=542080.0, ans=0.2 2023-12-22 10:37:50,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=542080.0, ans=0.125 2023-12-22 10:37:52,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=542080.0, ans=0.1 2023-12-22 10:37:56,125 INFO [train.py:886] (0/4) Epoch 18, batch 300, loss[loss=0.01635, audio_tagging_loss=0.01635, over 24750.00 frames. ], tot_loss[loss=0.01538, audio_tagging_loss=0.01538, over 3868283.84 frames. ], batch size: 99, lr: 6.12e-03, grad_scale: 32.0 2023-12-22 10:38:01,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.44 vs. limit=15.0 2023-12-22 10:38:14,087 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.479e+01 2.870e+01 3.034e+01 3.182e+01 3.757e+01, threshold=6.068e+01, percent-clipped=0.0 2023-12-22 10:38:14,651 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.87 vs. limit=10.0 2023-12-22 10:38:47,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2023-12-22 10:38:47,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.52 vs. limit=6.0 2023-12-22 10:38:48,506 INFO [train.py:886] (0/4) Epoch 18, batch 350, loss[loss=0.01276, audio_tagging_loss=0.01276, over 24750.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 4099207.65 frames. ], batch size: 99, lr: 6.12e-03, grad_scale: 32.0 2023-12-22 10:38:53,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=542480.0, ans=0.0 2023-12-22 10:39:02,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.84 vs. limit=22.5 2023-12-22 10:39:11,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=542613.3333333334, ans=0.125 2023-12-22 10:39:37,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=542746.6666666666, ans=0.125 2023-12-22 10:39:39,181 INFO [train.py:886] (0/4) Epoch 18, batch 400, loss[loss=0.01413, audio_tagging_loss=0.01413, over 24750.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4288403.23 frames. ], batch size: 99, lr: 6.12e-03, grad_scale: 32.0 2023-12-22 10:39:45,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=542813.3333333334, ans=0.125 2023-12-22 10:39:55,844 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.510e+01 2.811e+01 2.915e+01 3.061e+01 3.426e+01, threshold=5.831e+01, percent-clipped=0.0 2023-12-22 10:40:15,423 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.76 vs. limit=6.0 2023-12-22 10:40:23,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=15.0 2023-12-22 10:40:31,251 INFO [train.py:886] (0/4) Epoch 18, batch 450, loss[loss=0.01441, audio_tagging_loss=0.01441, over 25000.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4436219.08 frames. ], batch size: 100, lr: 6.12e-03, grad_scale: 32.0 2023-12-22 10:40:35,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.91 vs. limit=22.5 2023-12-22 10:41:07,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=543346.6666666666, ans=0.125 2023-12-22 10:41:09,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.85 vs. limit=15.0 2023-12-22 10:41:19,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=543413.3333333334, ans=0.125 2023-12-22 10:41:22,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=543480.0, ans=0.125 2023-12-22 10:41:23,711 INFO [train.py:886] (0/4) Epoch 18, batch 500, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4557187.97 frames. ], batch size: 100, lr: 6.12e-03, grad_scale: 32.0 2023-12-22 10:41:39,746 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 2.753e+01 2.847e+01 2.993e+01 3.573e+01, threshold=5.694e+01, percent-clipped=0.0 2023-12-22 10:41:40,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=543546.6666666666, ans=0.07 2023-12-22 10:41:48,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=543613.3333333334, ans=0.125 2023-12-22 10:42:00,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=543680.0, ans=0.0 2023-12-22 10:42:15,105 INFO [train.py:886] (0/4) Epoch 18, batch 550, loss[loss=0.01763, audio_tagging_loss=0.01763, over 25000.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4645875.18 frames. ], batch size: 100, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:42:26,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=543880.0, ans=0.07 2023-12-22 10:42:32,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=543880.0, ans=0.1 2023-12-22 10:42:54,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=544013.3333333334, ans=0.125 2023-12-22 10:42:55,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=22.5 2023-12-22 10:42:59,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=544080.0, ans=0.025 2023-12-22 10:43:00,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=544080.0, ans=0.0 2023-12-22 10:43:03,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=544080.0, ans=0.125 2023-12-22 10:43:03,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.02 vs. limit=22.5 2023-12-22 10:43:07,220 INFO [train.py:886] (0/4) Epoch 18, batch 600, loss[loss=0.01328, audio_tagging_loss=0.01328, over 24750.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4709864.69 frames. ], batch size: 99, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:43:21,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=544213.3333333334, ans=0.0 2023-12-22 10:43:23,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.28 vs. limit=10.0 2023-12-22 10:43:24,355 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.505e+01 2.858e+01 2.981e+01 3.093e+01 3.735e+01, threshold=5.962e+01, percent-clipped=0.0 2023-12-22 10:43:26,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=544213.3333333334, ans=0.0 2023-12-22 10:43:30,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=22.5 2023-12-22 10:43:59,128 INFO [train.py:886] (0/4) Epoch 18, batch 650, loss[loss=0.01536, audio_tagging_loss=0.01536, over 23943.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4753980.68 frames. ], batch size: 100, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:44:26,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=544613.3333333334, ans=0.1 2023-12-22 10:44:29,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=544680.0, ans=0.125 2023-12-22 10:44:31,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=544680.0, ans=0.0 2023-12-22 10:44:39,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.78 vs. limit=12.0 2023-12-22 10:44:40,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=544746.6666666666, ans=0.125 2023-12-22 10:44:40,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=544746.6666666666, ans=0.0 2023-12-22 10:44:50,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=544813.3333333334, ans=0.125 2023-12-22 10:44:51,152 INFO [train.py:886] (0/4) Epoch 18, batch 700, loss[loss=0.01197, audio_tagging_loss=0.01197, over 24750.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4787351.71 frames. ], batch size: 99, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:44:52,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=544813.3333333334, ans=0.0 2023-12-22 10:45:04,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=544880.0, ans=0.125 2023-12-22 10:45:08,782 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.527e+01 2.825e+01 2.944e+01 3.061e+01 3.906e+01, threshold=5.887e+01, percent-clipped=0.0 2023-12-22 10:45:20,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=544946.6666666666, ans=0.1 2023-12-22 10:45:21,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=544946.6666666666, ans=0.2 2023-12-22 10:45:23,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=545013.3333333334, ans=0.1 2023-12-22 10:45:34,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2023-12-22 10:45:37,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=545080.0, ans=0.09899494936611666 2023-12-22 10:45:43,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.91 vs. limit=10.0 2023-12-22 10:45:44,165 INFO [train.py:886] (0/4) Epoch 18, batch 750, loss[loss=0.01407, audio_tagging_loss=0.01407, over 25000.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4825238.97 frames. ], batch size: 100, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:46:20,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=545346.6666666666, ans=0.0 2023-12-22 10:46:36,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=545480.0, ans=15.0 2023-12-22 10:46:36,894 INFO [train.py:886] (0/4) Epoch 18, batch 800, loss[loss=0.01509, audio_tagging_loss=0.01509, over 25000.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4858513.36 frames. ], batch size: 100, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:46:45,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=545480.0, ans=0.09899494936611666 2023-12-22 10:46:52,749 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.515e+01 2.820e+01 2.937e+01 3.093e+01 3.443e+01, threshold=5.874e+01, percent-clipped=0.0 2023-12-22 10:47:02,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=545613.3333333334, ans=0.125 2023-12-22 10:47:25,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=545746.6666666666, ans=0.125 2023-12-22 10:47:27,806 INFO [train.py:886] (0/4) Epoch 18, batch 850, loss[loss=0.01144, audio_tagging_loss=0.01144, over 25000.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4886348.47 frames. ], batch size: 100, lr: 6.10e-03, grad_scale: 32.0 2023-12-22 10:47:46,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=12.0 2023-12-22 10:47:54,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=545946.6666666666, ans=0.0 2023-12-22 10:47:55,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=545946.6666666666, ans=0.1 2023-12-22 10:47:57,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.43 vs. limit=22.5 2023-12-22 10:48:08,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=546080.0, ans=0.2 2023-12-22 10:48:19,669 INFO [train.py:886] (0/4) Epoch 18, batch 900, loss[loss=0.01422, audio_tagging_loss=0.01422, over 24750.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4901751.12 frames. ], batch size: 99, lr: 6.10e-03, grad_scale: 32.0 2023-12-22 10:48:30,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=546213.3333333334, ans=0.125 2023-12-22 10:48:32,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=546213.3333333334, ans=0.125 2023-12-22 10:48:35,718 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.627e+01 2.855e+01 2.963e+01 3.115e+01 3.581e+01, threshold=5.925e+01, percent-clipped=0.0 2023-12-22 10:48:45,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=546280.0, ans=6.0 2023-12-22 10:48:45,420 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2023-12-22 10:48:48,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=546280.0, ans=0.1 2023-12-22 10:48:49,022 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.64 vs. limit=10.0 2023-12-22 10:48:57,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=546346.6666666666, ans=0.0 2023-12-22 10:49:10,080 INFO [train.py:886] (0/4) Epoch 18, batch 950, loss[loss=0.01315, audio_tagging_loss=0.01315, over 24750.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4897734.56 frames. ], batch size: 99, lr: 6.10e-03, grad_scale: 32.0 2023-12-22 10:49:36,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=546613.3333333334, ans=0.09899494936611666 2023-12-22 10:49:53,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=546746.6666666666, ans=0.125 2023-12-22 10:50:02,897 INFO [train.py:886] (0/4) Epoch 18, batch 1000, loss[loss=0.012, audio_tagging_loss=0.012, over 24750.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4906764.46 frames. ], batch size: 99, lr: 6.10e-03, grad_scale: 32.0 2023-12-22 10:50:16,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=546880.0, ans=0.0 2023-12-22 10:50:17,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=546880.0, ans=0.2 2023-12-22 10:50:19,629 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.455e+01 2.848e+01 2.998e+01 3.205e+01 5.243e+01, threshold=5.996e+01, percent-clipped=0.0 2023-12-22 10:50:22,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=546946.6666666666, ans=10.0 2023-12-22 10:50:51,622 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2023-12-22 10:50:53,943 INFO [train.py:886] (0/4) Epoch 18, batch 1050, loss[loss=0.01526, audio_tagging_loss=0.01526, over 25000.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4914859.29 frames. ], batch size: 100, lr: 6.10e-03, grad_scale: 32.0 2023-12-22 10:50:55,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=547146.6666666666, ans=0.125 2023-12-22 10:50:56,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=547146.6666666666, ans=0.1 2023-12-22 10:51:09,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=12.0 2023-12-22 10:51:17,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=547280.0, ans=0.125 2023-12-22 10:51:17,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=547280.0, ans=0.125 2023-12-22 10:51:38,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2023-12-22 10:51:44,388 INFO [train.py:886] (0/4) Epoch 18, batch 1100, loss[loss=0.01317, audio_tagging_loss=0.01317, over 25000.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4922912.61 frames. ], batch size: 100, lr: 6.09e-03, grad_scale: 32.0 2023-12-22 10:51:55,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.66 vs. limit=5.0 2023-12-22 10:52:02,014 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.486e+01 2.837e+01 2.977e+01 3.116e+01 4.656e+01, threshold=5.954e+01, percent-clipped=0.0 2023-12-22 10:52:20,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=547680.0, ans=0.0 2023-12-22 10:52:36,579 INFO [train.py:886] (0/4) Epoch 18, batch 1150, loss[loss=0.01191, audio_tagging_loss=0.01191, over 23953.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4933303.84 frames. ], batch size: 100, lr: 6.09e-03, grad_scale: 32.0 2023-12-22 10:52:39,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.41 vs. limit=15.0 2023-12-22 10:52:58,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=547946.6666666666, ans=0.1 2023-12-22 10:53:02,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=15.0 2023-12-22 10:53:25,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=548080.0, ans=0.125 2023-12-22 10:53:27,147 INFO [train.py:886] (0/4) Epoch 18, batch 1200, loss[loss=0.01363, audio_tagging_loss=0.01363, over 25000.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4940177.00 frames. ], batch size: 100, lr: 6.09e-03, grad_scale: 32.0 2023-12-22 10:53:27,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=548146.6666666666, ans=0.125 2023-12-22 10:53:37,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.88 vs. limit=15.0 2023-12-22 10:53:45,236 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+01 2.823e+01 2.963e+01 3.147e+01 3.546e+01, threshold=5.926e+01, percent-clipped=0.0 2023-12-22 10:54:17,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=548413.3333333334, ans=0.09899494936611666 2023-12-22 10:54:20,537 INFO [train.py:886] (0/4) Epoch 18, batch 1250, loss[loss=0.0142, audio_tagging_loss=0.0142, over 24750.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4944748.64 frames. ], batch size: 99, lr: 6.09e-03, grad_scale: 32.0 2023-12-22 10:54:27,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=548480.0, ans=0.0 2023-12-22 10:54:39,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=548546.6666666666, ans=0.125 2023-12-22 10:54:53,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=548680.0, ans=0.0 2023-12-22 10:55:13,226 INFO [train.py:886] (0/4) Epoch 18, batch 1300, loss[loss=0.01384, audio_tagging_loss=0.01384, over 22510.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4937390.37 frames. ], batch size: 107, lr: 6.09e-03, grad_scale: 32.0 2023-12-22 10:55:29,145 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+01 2.885e+01 3.017e+01 3.228e+01 3.822e+01, threshold=6.035e+01, percent-clipped=0.0 2023-12-22 10:55:35,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.70 vs. limit=10.0 2023-12-22 10:55:51,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=549013.3333333334, ans=0.125 2023-12-22 10:56:03,804 INFO [train.py:886] (0/4) Epoch 18, batch 1350, loss[loss=0.01218, audio_tagging_loss=0.01218, over 24750.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4935053.60 frames. ], batch size: 99, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 10:56:20,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=549213.3333333334, ans=0.0 2023-12-22 10:56:33,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=549280.0, ans=0.125 2023-12-22 10:56:46,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=549413.3333333334, ans=0.07 2023-12-22 10:56:57,167 INFO [train.py:886] (0/4) Epoch 18, batch 1400, loss[loss=0.013, audio_tagging_loss=0.013, over 25000.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4937879.62 frames. ], batch size: 100, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 10:57:04,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=549480.0, ans=0.1 2023-12-22 10:57:12,423 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 2.790e+01 2.898e+01 3.128e+01 3.665e+01, threshold=5.796e+01, percent-clipped=0.0 2023-12-22 10:57:17,485 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.17 vs. limit=15.0 2023-12-22 10:57:42,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2023-12-22 10:57:48,209 INFO [train.py:886] (0/4) Epoch 18, batch 1450, loss[loss=0.01548, audio_tagging_loss=0.01548, over 25000.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4941580.67 frames. ], batch size: 100, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 10:58:24,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=550013.3333333334, ans=0.125 2023-12-22 10:58:40,457 INFO [train.py:886] (0/4) Epoch 18, batch 1500, loss[loss=0.01176, audio_tagging_loss=0.01176, over 24750.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4945511.01 frames. ], batch size: 99, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 10:58:50,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2023-12-22 10:58:56,298 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.501e+01 2.873e+01 2.986e+01 3.160e+01 3.608e+01, threshold=5.972e+01, percent-clipped=0.0 2023-12-22 10:59:08,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=550280.0, ans=0.0 2023-12-22 10:59:15,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2023-12-22 10:59:22,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=550413.3333333334, ans=0.2 2023-12-22 10:59:31,081 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.87 vs. limit=15.0 2023-12-22 10:59:31,564 INFO [train.py:886] (0/4) Epoch 18, batch 1550, loss[loss=0.01273, audio_tagging_loss=0.01273, over 24750.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4943492.49 frames. ], batch size: 99, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 10:59:40,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=550480.0, ans=0.1 2023-12-22 10:59:44,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=550546.6666666666, ans=10.0 2023-12-22 10:59:59,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=550613.3333333334, ans=0.035 2023-12-22 11:00:11,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=550680.0, ans=0.125 2023-12-22 11:00:19,808 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=12.0 2023-12-22 11:00:23,828 INFO [train.py:886] (0/4) Epoch 18, batch 1600, loss[loss=0.0163, audio_tagging_loss=0.0163, over 24078.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4935319.64 frames. ], batch size: 100, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 11:00:36,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=550880.0, ans=0.0 2023-12-22 11:00:37,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2023-12-22 11:00:39,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=550880.0, ans=0.0 2023-12-22 11:00:40,562 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+01 2.899e+01 3.029e+01 3.152e+01 3.450e+01, threshold=6.059e+01, percent-clipped=0.0 2023-12-22 11:00:40,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=550880.0, ans=0.1 2023-12-22 11:00:48,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=550946.6666666666, ans=0.125 2023-12-22 11:00:53,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=551013.3333333334, ans=0.125 2023-12-22 11:01:15,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=551146.6666666666, ans=0.125 2023-12-22 11:01:15,673 INFO [train.py:886] (0/4) Epoch 18, batch 1650, loss[loss=0.01339, audio_tagging_loss=0.01339, over 24750.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4942297.17 frames. ], batch size: 99, lr: 6.07e-03, grad_scale: 32.0 2023-12-22 11:01:32,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=551213.3333333334, ans=0.125 2023-12-22 11:01:45,807 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-12-22 11:01:46,713 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=12.0 2023-12-22 11:01:47,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.56 vs. limit=6.0 2023-12-22 11:02:01,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=551413.3333333334, ans=0.0 2023-12-22 11:02:07,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=551480.0, ans=0.0 2023-12-22 11:02:08,342 INFO [train.py:886] (0/4) Epoch 18, batch 1700, loss[loss=0.01374, audio_tagging_loss=0.01374, over 25000.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4938843.21 frames. ], batch size: 100, lr: 6.07e-03, grad_scale: 32.0 2023-12-22 11:02:16,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=551480.0, ans=0.2 2023-12-22 11:02:24,925 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.540e+01 2.844e+01 2.991e+01 3.158e+01 3.709e+01, threshold=5.982e+01, percent-clipped=0.0 2023-12-22 11:02:35,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=551613.3333333334, ans=0.0 2023-12-22 11:02:36,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=551613.3333333334, ans=0.2 2023-12-22 11:02:51,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=551746.6666666666, ans=0.125 2023-12-22 11:02:57,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.28 vs. limit=15.0 2023-12-22 11:02:59,971 INFO [train.py:886] (0/4) Epoch 18, batch 1750, loss[loss=0.01607, audio_tagging_loss=0.01607, over 25000.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4937548.66 frames. ], batch size: 100, lr: 6.07e-03, grad_scale: 32.0 2023-12-22 11:03:25,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.05 vs. limit=10.0 2023-12-22 11:03:30,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=552013.3333333334, ans=0.125 2023-12-22 11:03:35,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=552013.3333333334, ans=0.0 2023-12-22 11:03:41,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2023-12-22 11:03:49,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=552080.0, ans=0.125 2023-12-22 11:03:50,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=552080.0, ans=0.0 2023-12-22 11:03:51,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.38 vs. limit=22.5 2023-12-22 11:03:52,487 INFO [train.py:886] (0/4) Epoch 18, batch 1800, loss[loss=0.01105, audio_tagging_loss=0.01105, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4940799.18 frames. ], batch size: 100, lr: 6.07e-03, grad_scale: 32.0 2023-12-22 11:04:01,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=552213.3333333334, ans=0.0 2023-12-22 11:04:04,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=552213.3333333334, ans=0.0 2023-12-22 11:04:08,382 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.424e+01 2.840e+01 2.988e+01 3.166e+01 4.187e+01, threshold=5.976e+01, percent-clipped=0.0 2023-12-22 11:04:12,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=552280.0, ans=0.125 2023-12-22 11:04:18,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=552280.0, ans=0.0 2023-12-22 11:04:44,141 INFO [train.py:886] (0/4) Epoch 18, batch 1850, loss[loss=0.01337, audio_tagging_loss=0.01337, over 24750.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4947931.12 frames. ], batch size: 99, lr: 6.07e-03, grad_scale: 32.0 2023-12-22 11:04:50,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=552480.0, ans=0.0 2023-12-22 11:05:09,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=552613.3333333334, ans=0.0 2023-12-22 11:05:10,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=552613.3333333334, ans=0.0 2023-12-22 11:05:21,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.14 vs. limit=10.0 2023-12-22 11:05:23,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2023-12-22 11:05:23,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=552680.0, ans=0.0 2023-12-22 11:05:35,437 INFO [train.py:886] (0/4) Epoch 18, batch 1900, loss[loss=0.01376, audio_tagging_loss=0.01376, over 24750.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4948956.76 frames. ], batch size: 99, lr: 6.06e-03, grad_scale: 32.0 2023-12-22 11:05:52,661 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 2.849e+01 3.028e+01 3.160e+01 3.565e+01, threshold=6.056e+01, percent-clipped=0.0 2023-12-22 11:05:55,254 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.91 vs. limit=15.0 2023-12-22 11:06:16,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=15.0 2023-12-22 11:06:18,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=553080.0, ans=0.0 2023-12-22 11:06:21,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=553080.0, ans=0.0 2023-12-22 11:06:27,545 INFO [train.py:886] (0/4) Epoch 18, batch 1950, loss[loss=0.01253, audio_tagging_loss=0.01253, over 24750.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4948270.03 frames. ], batch size: 99, lr: 6.06e-03, grad_scale: 32.0 2023-12-22 11:06:36,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=553146.6666666666, ans=0.125 2023-12-22 11:06:38,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2023-12-22 11:06:40,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=553213.3333333334, ans=0.0 2023-12-22 11:07:02,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=553346.6666666666, ans=0.1 2023-12-22 11:07:14,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=553413.3333333334, ans=0.0 2023-12-22 11:07:18,702 INFO [train.py:886] (0/4) Epoch 18, batch 2000, loss[loss=0.01511, audio_tagging_loss=0.01511, over 25000.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4947285.91 frames. ], batch size: 100, lr: 6.06e-03, grad_scale: 64.0 2023-12-22 11:07:21,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=553480.0, ans=0.125 2023-12-22 11:07:26,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.65 vs. limit=15.0 2023-12-22 11:07:35,963 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.475e+01 2.820e+01 2.955e+01 3.133e+01 3.622e+01, threshold=5.910e+01, percent-clipped=0.0 2023-12-22 11:07:45,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=553613.3333333334, ans=0.0 2023-12-22 11:07:46,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=553613.3333333334, ans=0.125 2023-12-22 11:07:48,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=553680.0, ans=0.1 2023-12-22 11:08:11,137 INFO [train.py:886] (0/4) Epoch 18, batch 2050, loss[loss=0.01594, audio_tagging_loss=0.01594, over 25000.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4945170.38 frames. ], batch size: 100, lr: 6.06e-03, grad_scale: 64.0 2023-12-22 11:08:12,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.93 vs. limit=22.5 2023-12-22 11:08:16,411 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.18 vs. limit=12.0 2023-12-22 11:08:32,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=553946.6666666666, ans=0.0 2023-12-22 11:08:47,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=554013.3333333334, ans=0.1 2023-12-22 11:08:56,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=12.0 2023-12-22 11:09:02,567 INFO [train.py:886] (0/4) Epoch 18, batch 2100, loss[loss=0.01229, audio_tagging_loss=0.01229, over 25000.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4952805.81 frames. ], batch size: 100, lr: 6.06e-03, grad_scale: 64.0 2023-12-22 11:09:18,377 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.520e+01 2.806e+01 2.962e+01 3.157e+01 3.752e+01, threshold=5.925e+01, percent-clipped=0.0 2023-12-22 11:09:34,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.78 vs. limit=12.0 2023-12-22 11:09:42,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.16 vs. limit=6.0 2023-12-22 11:09:53,741 INFO [train.py:886] (0/4) Epoch 18, batch 2150, loss[loss=0.0171, audio_tagging_loss=0.0171, over 24948.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4954585.88 frames. ], batch size: 100, lr: 6.06e-03, grad_scale: 64.0 2023-12-22 11:10:01,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=554480.0, ans=0.0 2023-12-22 11:10:06,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=554546.6666666666, ans=0.035 2023-12-22 11:10:13,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=554546.6666666666, ans=0.125 2023-12-22 11:10:26,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2023-12-22 11:10:47,194 INFO [train.py:886] (0/4) Epoch 18, batch 2200, loss[loss=0.01441, audio_tagging_loss=0.01441, over 24750.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4943143.24 frames. ], batch size: 99, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:10:50,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=554813.3333333334, ans=0.0 2023-12-22 11:10:54,109 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:10:58,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=554880.0, ans=15.0 2023-12-22 11:10:59,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=554880.0, ans=0.0 2023-12-22 11:10:59,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.15 vs. limit=22.5 2023-12-22 11:11:02,285 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.571e+01 2.880e+01 3.019e+01 3.160e+01 3.670e+01, threshold=6.038e+01, percent-clipped=0.0 2023-12-22 11:11:03,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=554880.0, ans=0.125 2023-12-22 11:11:14,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=554946.6666666666, ans=0.07 2023-12-22 11:11:21,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=555013.3333333334, ans=0.0 2023-12-22 11:11:21,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=555013.3333333334, ans=0.05 2023-12-22 11:11:24,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=555013.3333333334, ans=0.2 2023-12-22 11:11:31,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=555080.0, ans=0.0 2023-12-22 11:11:38,051 INFO [train.py:886] (0/4) Epoch 18, batch 2250, loss[loss=0.01338, audio_tagging_loss=0.01338, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4943902.92 frames. ], batch size: 100, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:11:47,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=555146.6666666666, ans=0.0 2023-12-22 11:12:05,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=555280.0, ans=0.125 2023-12-22 11:12:25,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=555413.3333333334, ans=0.125 2023-12-22 11:12:28,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=555413.3333333334, ans=0.0 2023-12-22 11:12:30,208 INFO [train.py:886] (0/4) Epoch 18, batch 2300, loss[loss=0.01334, audio_tagging_loss=0.01334, over 24750.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4943262.00 frames. ], batch size: 99, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:12:46,751 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.443e+01 2.767e+01 2.927e+01 3.038e+01 3.631e+01, threshold=5.853e+01, percent-clipped=0.0 2023-12-22 11:13:01,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2023-12-22 11:13:21,707 INFO [train.py:886] (0/4) Epoch 18, batch 2350, loss[loss=0.01212, audio_tagging_loss=0.01212, over 25000.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4948213.34 frames. ], batch size: 100, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:13:30,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=555813.3333333334, ans=0.125 2023-12-22 11:13:48,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=555946.6666666666, ans=0.0 2023-12-22 11:13:51,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=556013.3333333334, ans=0.125 2023-12-22 11:13:53,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=556013.3333333334, ans=0.05 2023-12-22 11:14:01,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=556080.0, ans=0.125 2023-12-22 11:14:12,063 INFO [train.py:886] (0/4) Epoch 18, batch 2400, loss[loss=0.01731, audio_tagging_loss=0.01731, over 25000.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4950001.19 frames. ], batch size: 100, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:14:16,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=556146.6666666666, ans=0.125 2023-12-22 11:14:23,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=556213.3333333334, ans=0.125 2023-12-22 11:14:29,874 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+01 2.855e+01 2.962e+01 3.074e+01 3.618e+01, threshold=5.924e+01, percent-clipped=0.0 2023-12-22 11:14:34,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=15.0 2023-12-22 11:14:41,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=556280.0, ans=0.0 2023-12-22 11:14:41,570 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:14:48,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=556346.6666666666, ans=0.07 2023-12-22 11:14:59,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=556413.3333333334, ans=0.125 2023-12-22 11:15:05,106 INFO [train.py:886] (0/4) Epoch 18, batch 2450, loss[loss=0.01588, audio_tagging_loss=0.01588, over 25000.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4953104.33 frames. ], batch size: 100, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:15:14,196 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=15.0 2023-12-22 11:15:21,007 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.58 vs. limit=12.0 2023-12-22 11:15:23,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=556546.6666666666, ans=0.125 2023-12-22 11:15:23,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=556546.6666666666, ans=0.07 2023-12-22 11:15:29,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=556613.3333333334, ans=0.125 2023-12-22 11:15:45,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.21 vs. limit=15.0 2023-12-22 11:15:56,166 INFO [train.py:886] (0/4) Epoch 18, batch 2500, loss[loss=0.01601, audio_tagging_loss=0.01601, over 24750.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4951569.71 frames. ], batch size: 99, lr: 6.04e-03, grad_scale: 64.0 2023-12-22 11:16:00,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=556813.3333333334, ans=0.2 2023-12-22 11:16:09,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=556880.0, ans=0.0 2023-12-22 11:16:10,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2023-12-22 11:16:13,368 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.591e+01 2.911e+01 3.040e+01 3.167e+01 3.837e+01, threshold=6.080e+01, percent-clipped=0.0 2023-12-22 11:16:48,283 INFO [train.py:886] (0/4) Epoch 18, batch 2550, loss[loss=0.01474, audio_tagging_loss=0.01474, over 22596.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4945481.91 frames. ], batch size: 107, lr: 6.04e-03, grad_scale: 64.0 2023-12-22 11:16:49,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=557146.6666666666, ans=0.025 2023-12-22 11:16:57,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=557213.3333333334, ans=0.0 2023-12-22 11:16:59,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=557213.3333333334, ans=0.0 2023-12-22 11:17:00,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=557213.3333333334, ans=0.1 2023-12-22 11:17:13,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=557280.0, ans=0.1 2023-12-22 11:17:40,763 INFO [train.py:886] (0/4) Epoch 18, batch 2600, loss[loss=0.01287, audio_tagging_loss=0.01287, over 25000.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4943586.74 frames. ], batch size: 100, lr: 6.04e-03, grad_scale: 64.0 2023-12-22 11:17:52,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=557546.6666666666, ans=0.1 2023-12-22 11:17:56,618 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.461e+01 2.828e+01 2.988e+01 3.124e+01 3.523e+01, threshold=5.975e+01, percent-clipped=0.0 2023-12-22 11:18:02,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=557613.3333333334, ans=0.05 2023-12-22 11:18:07,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=557613.3333333334, ans=0.125 2023-12-22 11:18:07,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=557613.3333333334, ans=0.125 2023-12-22 11:18:09,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=557613.3333333334, ans=0.2 2023-12-22 11:18:10,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=557680.0, ans=0.125 2023-12-22 11:18:29,527 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2023-12-22 11:18:32,520 INFO [train.py:886] (0/4) Epoch 18, batch 2650, loss[loss=0.0142, audio_tagging_loss=0.0142, over 25000.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4951095.34 frames. ], batch size: 100, lr: 6.04e-03, grad_scale: 64.0 2023-12-22 11:18:34,965 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2023-12-22 11:18:36,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=557813.3333333334, ans=0.125 2023-12-22 11:18:44,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-12-22 11:19:02,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=558013.3333333334, ans=0.125 2023-12-22 11:19:07,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=558013.3333333334, ans=0.125 2023-12-22 11:19:19,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=558080.0, ans=0.1 2023-12-22 11:19:24,356 INFO [train.py:886] (0/4) Epoch 18, batch 2700, loss[loss=0.01451, audio_tagging_loss=0.01451, over 25000.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4951810.17 frames. ], batch size: 100, lr: 6.04e-03, grad_scale: 64.0 2023-12-22 11:19:38,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=558213.3333333334, ans=0.1 2023-12-22 11:19:40,794 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+01 2.868e+01 2.973e+01 3.116e+01 3.614e+01, threshold=5.947e+01, percent-clipped=0.0 2023-12-22 11:20:13,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=558413.3333333334, ans=0.125 2023-12-22 11:20:13,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2023-12-22 11:20:16,864 INFO [train.py:886] (0/4) Epoch 18, batch 2750, loss[loss=0.0149, audio_tagging_loss=0.0149, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4957576.84 frames. ], batch size: 100, lr: 6.03e-03, grad_scale: 64.0 2023-12-22 11:20:35,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-12-22 11:20:56,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=558680.0, ans=0.125 2023-12-22 11:21:08,765 INFO [train.py:886] (0/4) Epoch 18, batch 2800, loss[loss=0.01095, audio_tagging_loss=0.01095, over 24750.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4960611.30 frames. ], batch size: 99, lr: 6.03e-03, grad_scale: 64.0 2023-12-22 11:21:12,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=558813.3333333334, ans=0.125 2023-12-22 11:21:15,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=558813.3333333334, ans=0.125 2023-12-22 11:21:18,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=558880.0, ans=0.1 2023-12-22 11:21:20,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=558880.0, ans=10.0 2023-12-22 11:21:26,347 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.484e+01 2.878e+01 3.022e+01 3.166e+01 3.737e+01, threshold=6.045e+01, percent-clipped=0.0 2023-12-22 11:21:41,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=559013.3333333334, ans=0.125 2023-12-22 11:22:00,827 INFO [train.py:886] (0/4) Epoch 18, batch 2850, loss[loss=0.01126, audio_tagging_loss=0.01126, over 25000.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4950796.53 frames. ], batch size: 100, lr: 6.03e-03, grad_scale: 32.0 2023-12-22 11:22:01,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=559146.6666666666, ans=0.125 2023-12-22 11:22:06,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=559146.6666666666, ans=0.125 2023-12-22 11:22:25,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=559280.0, ans=0.125 2023-12-22 11:22:25,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2023-12-22 11:22:26,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-12-22 11:22:34,608 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=15.0 2023-12-22 11:22:52,671 INFO [train.py:886] (0/4) Epoch 18, batch 2900, loss[loss=0.01353, audio_tagging_loss=0.01353, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4949747.67 frames. ], batch size: 100, lr: 6.03e-03, grad_scale: 32.0 2023-12-22 11:22:57,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=559480.0, ans=0.125 2023-12-22 11:23:00,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=559480.0, ans=0.125 2023-12-22 11:23:10,143 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.547e+01 2.901e+01 3.031e+01 3.155e+01 3.612e+01, threshold=6.062e+01, percent-clipped=0.0 2023-12-22 11:23:14,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=559613.3333333334, ans=0.125 2023-12-22 11:23:32,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=559680.0, ans=0.125 2023-12-22 11:23:44,254 INFO [train.py:886] (0/4) Epoch 18, batch 2950, loss[loss=0.0158, audio_tagging_loss=0.0158, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4948172.12 frames. ], batch size: 100, lr: 6.03e-03, grad_scale: 32.0 2023-12-22 11:23:45,577 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.43 vs. limit=12.0 2023-12-22 11:23:56,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=559880.0, ans=0.2 2023-12-22 11:23:59,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=559880.0, ans=0.125 2023-12-22 11:24:12,753 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-84000.pt 2023-12-22 11:24:19,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=560013.3333333334, ans=0.0 2023-12-22 11:24:34,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-12-22 11:24:36,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.91 vs. limit=22.5 2023-12-22 11:24:37,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.57 vs. limit=15.0 2023-12-22 11:24:38,499 INFO [train.py:886] (0/4) Epoch 18, batch 3000, loss[loss=0.0148, audio_tagging_loss=0.0148, over 25000.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4953762.37 frames. ], batch size: 100, lr: 6.03e-03, grad_scale: 32.0 2023-12-22 11:24:38,501 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 11:24:55,484 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5777, 2.8027, 4.0665, 3.6825], device='cuda:0') 2023-12-22 11:24:59,975 INFO [train.py:917] (0/4) Epoch 18, validation: loss=0.03414, audio_tagging_loss=0.03414, over 3737520.00 frames. 2023-12-22 11:24:59,975 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 11:25:16,889 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.569e+01 2.845e+01 2.970e+01 3.099e+01 3.862e+01, threshold=5.940e+01, percent-clipped=0.0 2023-12-22 11:25:36,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=560346.6666666666, ans=15.0 2023-12-22 11:25:39,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=560413.3333333334, ans=0.125 2023-12-22 11:25:50,261 INFO [train.py:886] (0/4) Epoch 18, batch 3050, loss[loss=0.01468, audio_tagging_loss=0.01468, over 24750.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4948796.63 frames. ], batch size: 99, lr: 6.02e-03, grad_scale: 32.0 2023-12-22 11:25:54,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560480.0, ans=0.1 2023-12-22 11:26:03,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.16 vs. limit=6.0 2023-12-22 11:26:06,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560546.6666666666, ans=0.1 2023-12-22 11:26:07,870 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.89 vs. limit=10.0 2023-12-22 11:26:38,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=560746.6666666666, ans=0.0 2023-12-22 11:26:42,422 INFO [train.py:886] (0/4) Epoch 18, batch 3100, loss[loss=0.01399, audio_tagging_loss=0.01399, over 24750.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4947719.76 frames. ], batch size: 99, lr: 6.02e-03, grad_scale: 32.0 2023-12-22 11:26:47,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=560813.3333333334, ans=0.0 2023-12-22 11:26:49,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=560813.3333333334, ans=0.2 2023-12-22 11:26:52,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=560880.0, ans=0.2 2023-12-22 11:26:59,219 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.547e+01 2.848e+01 3.018e+01 3.166e+01 3.503e+01, threshold=6.036e+01, percent-clipped=0.0 2023-12-22 11:27:04,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=560946.6666666666, ans=0.125 2023-12-22 11:27:23,326 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:27:32,484 INFO [train.py:886] (0/4) Epoch 18, batch 3150, loss[loss=0.01289, audio_tagging_loss=0.01289, over 24750.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4945547.40 frames. ], batch size: 99, lr: 6.02e-03, grad_scale: 32.0 2023-12-22 11:27:33,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=561146.6666666666, ans=0.2 2023-12-22 11:27:49,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=561213.3333333334, ans=0.125 2023-12-22 11:27:53,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=561280.0, ans=15.0 2023-12-22 11:27:58,376 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=15.0 2023-12-22 11:28:19,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.60 vs. limit=22.5 2023-12-22 11:28:22,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.35 vs. limit=15.0 2023-12-22 11:28:25,740 INFO [train.py:886] (0/4) Epoch 18, batch 3200, loss[loss=0.01531, audio_tagging_loss=0.01531, over 25000.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4940953.87 frames. ], batch size: 100, lr: 6.02e-03, grad_scale: 32.0 2023-12-22 11:28:26,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=561480.0, ans=0.0 2023-12-22 11:28:42,516 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 2.819e+01 2.957e+01 3.105e+01 3.510e+01, threshold=5.913e+01, percent-clipped=0.0 2023-12-22 11:28:58,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=561680.0, ans=0.05 2023-12-22 11:29:09,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=561746.6666666666, ans=0.0 2023-12-22 11:29:10,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=561746.6666666666, ans=0.0 2023-12-22 11:29:14,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=561746.6666666666, ans=0.125 2023-12-22 11:29:16,274 INFO [train.py:886] (0/4) Epoch 18, batch 3250, loss[loss=0.01064, audio_tagging_loss=0.01064, over 25000.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4946687.78 frames. ], batch size: 100, lr: 6.02e-03, grad_scale: 32.0 2023-12-22 11:29:36,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.16 vs. limit=22.5 2023-12-22 11:29:45,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.17 vs. limit=22.5 2023-12-22 11:30:07,129 INFO [train.py:886] (0/4) Epoch 18, batch 3300, loss[loss=0.01364, audio_tagging_loss=0.01364, over 24750.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4947923.61 frames. ], batch size: 99, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:30:22,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=562213.3333333334, ans=0.125 2023-12-22 11:30:24,545 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.551e+01 2.841e+01 2.981e+01 3.120e+01 3.559e+01, threshold=5.962e+01, percent-clipped=0.0 2023-12-22 11:30:44,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=562346.6666666666, ans=0.0 2023-12-22 11:30:45,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.37 vs. limit=22.5 2023-12-22 11:30:47,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=562413.3333333334, ans=0.125 2023-12-22 11:30:58,506 INFO [train.py:886] (0/4) Epoch 18, batch 3350, loss[loss=0.01279, audio_tagging_loss=0.01279, over 25000.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4953098.96 frames. ], batch size: 100, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:31:01,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=562480.0, ans=0.125 2023-12-22 11:31:01,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=562480.0, ans=0.125 2023-12-22 11:31:02,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=562480.0, ans=0.2 2023-12-22 11:31:47,318 INFO [train.py:886] (0/4) Epoch 18, batch 3400, loss[loss=0.01367, audio_tagging_loss=0.01367, over 25000.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4958588.36 frames. ], batch size: 100, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:31:49,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=562813.3333333334, ans=0.125 2023-12-22 11:31:52,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=562813.3333333334, ans=0.125 2023-12-22 11:31:59,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=562880.0, ans=0.125 2023-12-22 11:32:04,796 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.564e+01 2.896e+01 3.047e+01 3.185e+01 3.707e+01, threshold=6.093e+01, percent-clipped=0.0 2023-12-22 11:32:20,605 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.67 vs. limit=15.0 2023-12-22 11:32:34,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.70 vs. limit=15.0 2023-12-22 11:32:38,972 INFO [train.py:886] (0/4) Epoch 18, batch 3450, loss[loss=0.01254, audio_tagging_loss=0.01254, over 24750.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4952505.68 frames. ], batch size: 99, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:32:46,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=563146.6666666666, ans=0.1 2023-12-22 11:32:50,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.95 vs. limit=15.0 2023-12-22 11:32:51,870 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2023-12-22 11:32:56,242 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-12-22 11:33:23,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=563413.3333333334, ans=0.125 2023-12-22 11:33:26,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=563413.3333333334, ans=0.025 2023-12-22 11:33:26,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=563413.3333333334, ans=0.0 2023-12-22 11:33:29,156 INFO [train.py:886] (0/4) Epoch 18, batch 3500, loss[loss=0.01331, audio_tagging_loss=0.01331, over 24750.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4951382.33 frames. ], batch size: 99, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:33:42,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563546.6666666666, ans=0.1 2023-12-22 11:33:46,473 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.620e+01 2.893e+01 3.039e+01 3.166e+01 3.781e+01, threshold=6.078e+01, percent-clipped=0.0 2023-12-22 11:33:48,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=563613.3333333334, ans=0.125 2023-12-22 11:33:52,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=563613.3333333334, ans=0.125 2023-12-22 11:33:55,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=563613.3333333334, ans=0.0 2023-12-22 11:33:58,251 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=15.0 2023-12-22 11:34:05,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=563680.0, ans=0.2 2023-12-22 11:34:10,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.88 vs. limit=10.0 2023-12-22 11:34:20,059 INFO [train.py:886] (0/4) Epoch 18, batch 3550, loss[loss=0.01323, audio_tagging_loss=0.01323, over 25000.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4948196.05 frames. ], batch size: 100, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:34:23,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2023-12-22 11:34:25,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.31 vs. limit=15.0 2023-12-22 11:34:26,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=563813.3333333334, ans=0.0 2023-12-22 11:34:26,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=563813.3333333334, ans=0.125 2023-12-22 11:34:28,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2023-12-22 11:34:37,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-12-22 11:34:50,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.00 vs. limit=22.5 2023-12-22 11:35:11,880 INFO [train.py:886] (0/4) Epoch 18, batch 3600, loss[loss=0.01512, audio_tagging_loss=0.01512, over 25000.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4945049.73 frames. ], batch size: 100, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:35:18,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=564146.6666666666, ans=0.125 2023-12-22 11:35:19,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=564146.6666666666, ans=0.07 2023-12-22 11:35:22,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=564213.3333333334, ans=0.2 2023-12-22 11:35:24,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=564213.3333333334, ans=0.125 2023-12-22 11:35:28,817 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 2.848e+01 2.990e+01 3.158e+01 3.756e+01, threshold=5.981e+01, percent-clipped=0.0 2023-12-22 11:35:32,525 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.11 vs. limit=6.0 2023-12-22 11:35:42,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=564346.6666666666, ans=0.125 2023-12-22 11:35:43,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2023-12-22 11:36:03,829 INFO [train.py:886] (0/4) Epoch 18, batch 3650, loss[loss=0.01801, audio_tagging_loss=0.01801, over 25000.00 frames. ], tot_loss[loss=0.0138, audio_tagging_loss=0.0138, over 4948513.97 frames. ], batch size: 100, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:36:07,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.02 vs. limit=22.5 2023-12-22 11:36:30,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=564613.3333333334, ans=0.125 2023-12-22 11:36:44,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=564746.6666666666, ans=0.0 2023-12-22 11:36:47,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=564746.6666666666, ans=0.0 2023-12-22 11:36:56,208 INFO [train.py:886] (0/4) Epoch 18, batch 3700, loss[loss=0.01807, audio_tagging_loss=0.01807, over 25000.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4957726.23 frames. ], batch size: 100, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:37:02,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2023-12-22 11:37:13,791 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.550e+01 2.838e+01 2.944e+01 3.115e+01 3.574e+01, threshold=5.888e+01, percent-clipped=0.0 2023-12-22 11:37:36,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=565080.0, ans=0.0 2023-12-22 11:37:43,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=565080.0, ans=0.07 2023-12-22 11:37:48,052 INFO [train.py:886] (0/4) Epoch 18, batch 3750, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24750.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4958725.25 frames. ], batch size: 99, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:38:05,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2023-12-22 11:38:20,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2023-12-22 11:38:32,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=565413.3333333334, ans=0.2 2023-12-22 11:38:39,876 INFO [train.py:886] (0/4) Epoch 18, batch 3800, loss[loss=0.01534, audio_tagging_loss=0.01534, over 24750.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4956518.52 frames. ], batch size: 99, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:38:52,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=565546.6666666666, ans=0.0 2023-12-22 11:38:57,910 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.529e+01 2.848e+01 2.976e+01 3.119e+01 3.728e+01, threshold=5.951e+01, percent-clipped=0.0 2023-12-22 11:38:58,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=565546.6666666666, ans=0.2 2023-12-22 11:39:29,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=565746.6666666666, ans=0.125 2023-12-22 11:39:31,784 INFO [train.py:886] (0/4) Epoch 18, batch 3850, loss[loss=0.01353, audio_tagging_loss=0.01353, over 24750.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4952346.78 frames. ], batch size: 99, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:39:52,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.86 vs. limit=15.0 2023-12-22 11:39:54,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=565946.6666666666, ans=0.125 2023-12-22 11:40:00,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=565946.6666666666, ans=0.1 2023-12-22 11:40:03,172 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=12.0 2023-12-22 11:40:03,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2023-12-22 11:40:18,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.95 vs. limit=15.0 2023-12-22 11:40:22,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=566080.0, ans=0.125 2023-12-22 11:40:24,660 INFO [train.py:886] (0/4) Epoch 18, batch 3900, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4956020.03 frames. ], batch size: 100, lr: 5.99e-03, grad_scale: 32.0 2023-12-22 11:40:24,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=566146.6666666666, ans=0.125 2023-12-22 11:40:41,626 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.561e+01 2.851e+01 2.964e+01 3.170e+01 3.515e+01, threshold=5.928e+01, percent-clipped=0.0 2023-12-22 11:40:53,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=566280.0, ans=0.1 2023-12-22 11:41:01,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=566346.6666666666, ans=0.125 2023-12-22 11:41:01,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=566346.6666666666, ans=0.1 2023-12-22 11:41:15,952 INFO [train.py:886] (0/4) Epoch 18, batch 3950, loss[loss=0.01146, audio_tagging_loss=0.01146, over 25000.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4956950.81 frames. ], batch size: 100, lr: 5.99e-03, grad_scale: 32.0 2023-12-22 11:41:18,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=566480.0, ans=0.125 2023-12-22 11:41:49,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=566680.0, ans=0.125 2023-12-22 11:42:09,134 INFO [train.py:886] (0/4) Epoch 18, batch 4000, loss[loss=0.01194, audio_tagging_loss=0.01194, over 25000.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4958084.64 frames. ], batch size: 100, lr: 5.99e-03, grad_scale: 32.0 2023-12-22 11:42:17,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.63 vs. limit=10.0 2023-12-22 11:42:25,329 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.905e+01 3.054e+01 3.173e+01 3.628e+01, threshold=6.109e+01, percent-clipped=0.0 2023-12-22 11:42:26,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2023-12-22 11:42:28,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=566946.6666666666, ans=0.125 2023-12-22 11:42:34,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=566946.6666666666, ans=0.0 2023-12-22 11:42:52,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=567080.0, ans=0.125 2023-12-22 11:42:53,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.28 vs. limit=22.5 2023-12-22 11:42:59,842 INFO [train.py:886] (0/4) Epoch 18, batch 4050, loss[loss=0.01532, audio_tagging_loss=0.01532, over 24750.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4957323.75 frames. ], batch size: 99, lr: 5.99e-03, grad_scale: 32.0 2023-12-22 11:43:09,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2023-12-22 11:43:15,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=567213.3333333334, ans=0.0 2023-12-22 11:43:45,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=567413.3333333334, ans=0.125 2023-12-22 11:43:52,207 INFO [train.py:886] (0/4) Epoch 18, batch 4100, loss[loss=0.01366, audio_tagging_loss=0.01366, over 24750.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4952864.50 frames. ], batch size: 99, lr: 5.99e-03, grad_scale: 32.0 2023-12-22 11:43:58,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=567480.0, ans=0.0 2023-12-22 11:44:02,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=567546.6666666666, ans=0.0 2023-12-22 11:44:08,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=567546.6666666666, ans=0.2 2023-12-22 11:44:09,652 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.648e+01 2.912e+01 3.022e+01 3.164e+01 3.761e+01, threshold=6.044e+01, percent-clipped=0.0 2023-12-22 11:44:30,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=567680.0, ans=0.125 2023-12-22 11:44:33,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.14 vs. limit=6.0 2023-12-22 11:44:38,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=567746.6666666666, ans=0.2 2023-12-22 11:44:42,160 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:44:43,777 INFO [train.py:886] (0/4) Epoch 18, batch 4150, loss[loss=0.01513, audio_tagging_loss=0.01513, over 25000.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4950358.13 frames. ], batch size: 100, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:44:45,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.29 vs. limit=12.0 2023-12-22 11:45:00,341 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.67 vs. limit=5.0 2023-12-22 11:45:02,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=567946.6666666666, ans=0.125 2023-12-22 11:45:09,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=567946.6666666666, ans=0.125 2023-12-22 11:45:11,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=567946.6666666666, ans=0.0 2023-12-22 11:45:16,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=568013.3333333334, ans=0.05 2023-12-22 11:45:30,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=568080.0, ans=0.125 2023-12-22 11:45:33,869 INFO [train.py:886] (0/4) Epoch 18, batch 4200, loss[loss=0.01494, audio_tagging_loss=0.01494, over 24750.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4952036.56 frames. ], batch size: 99, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:45:36,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2023-12-22 11:45:52,748 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 2.822e+01 2.922e+01 3.153e+01 3.772e+01, threshold=5.845e+01, percent-clipped=0.0 2023-12-22 11:45:55,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=568280.0, ans=0.0 2023-12-22 11:46:01,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=568280.0, ans=10.0 2023-12-22 11:46:06,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=568346.6666666666, ans=0.125 2023-12-22 11:46:27,019 INFO [train.py:886] (0/4) Epoch 18, batch 4250, loss[loss=0.01546, audio_tagging_loss=0.01546, over 22384.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4953440.51 frames. ], batch size: 107, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:46:28,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.99 vs. limit=10.0 2023-12-22 11:46:35,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=568546.6666666666, ans=0.0 2023-12-22 11:46:36,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-12-22 11:46:50,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=568613.3333333334, ans=0.0 2023-12-22 11:46:50,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568613.3333333334, ans=0.1 2023-12-22 11:47:01,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=568680.0, ans=0.125 2023-12-22 11:47:01,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=568680.0, ans=0.0 2023-12-22 11:47:08,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=568746.6666666666, ans=0.125 2023-12-22 11:47:17,859 INFO [train.py:886] (0/4) Epoch 18, batch 4300, loss[loss=0.01377, audio_tagging_loss=0.01377, over 25000.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4958851.06 frames. ], batch size: 100, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:47:36,147 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.510e+01 2.847e+01 2.955e+01 3.077e+01 3.608e+01, threshold=5.910e+01, percent-clipped=0.0 2023-12-22 11:47:37,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=568880.0, ans=0.125 2023-12-22 11:47:38,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2023-12-22 11:48:10,211 INFO [train.py:886] (0/4) Epoch 18, batch 4350, loss[loss=0.01454, audio_tagging_loss=0.01454, over 25000.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4962966.28 frames. ], batch size: 100, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:48:13,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=569146.6666666666, ans=0.1 2023-12-22 11:48:15,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=569146.6666666666, ans=0.0 2023-12-22 11:48:16,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-12-22 11:48:28,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=569213.3333333334, ans=0.2 2023-12-22 11:48:42,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=569346.6666666666, ans=10.0 2023-12-22 11:48:48,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=569346.6666666666, ans=0.125 2023-12-22 11:48:51,334 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.049e-02 2023-12-22 11:48:53,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=569413.3333333334, ans=0.2 2023-12-22 11:49:01,968 INFO [train.py:886] (0/4) Epoch 18, batch 4400, loss[loss=0.01399, audio_tagging_loss=0.01399, over 24750.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4955876.27 frames. ], batch size: 99, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:49:19,484 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.540e+01 2.944e+01 3.084e+01 3.234e+01 4.247e+01, threshold=6.167e+01, percent-clipped=0.0 2023-12-22 11:49:20,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=569546.6666666666, ans=10.0 2023-12-22 11:49:28,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=569613.3333333334, ans=0.5 2023-12-22 11:49:41,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569680.0, ans=0.1 2023-12-22 11:49:52,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=569813.3333333334, ans=0.1 2023-12-22 11:49:53,660 INFO [train.py:886] (0/4) Epoch 18, batch 4450, loss[loss=0.01328, audio_tagging_loss=0.01328, over 24750.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4952530.52 frames. ], batch size: 99, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:49:54,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=569813.3333333334, ans=0.2 2023-12-22 11:50:01,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=569813.3333333334, ans=0.125 2023-12-22 11:50:45,778 INFO [train.py:886] (0/4) Epoch 18, batch 4500, loss[loss=0.01426, audio_tagging_loss=0.01426, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4952407.28 frames. ], batch size: 100, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:50:45,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570146.6666666666, ans=0.1 2023-12-22 11:51:03,230 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.526e+01 2.863e+01 2.984e+01 3.180e+01 3.715e+01, threshold=5.969e+01, percent-clipped=0.0 2023-12-22 11:51:06,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=570280.0, ans=0.125 2023-12-22 11:51:20,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=570346.6666666666, ans=0.2 2023-12-22 11:51:22,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2023-12-22 11:51:37,351 INFO [train.py:886] (0/4) Epoch 18, batch 4550, loss[loss=0.01414, audio_tagging_loss=0.01414, over 25000.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4949899.38 frames. ], batch size: 100, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:51:57,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=570613.3333333334, ans=0.2 2023-12-22 11:52:08,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=570680.0, ans=0.1 2023-12-22 11:52:12,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=570680.0, ans=0.0 2023-12-22 11:52:22,169 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.20 vs. limit=12.0 2023-12-22 11:52:29,116 INFO [train.py:886] (0/4) Epoch 18, batch 4600, loss[loss=0.01527, audio_tagging_loss=0.01527, over 25000.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4954545.11 frames. ], batch size: 100, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:52:30,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=570813.3333333334, ans=0.0 2023-12-22 11:52:33,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=570813.3333333334, ans=0.0 2023-12-22 11:52:34,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=570813.3333333334, ans=0.125 2023-12-22 11:52:35,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.48 vs. limit=15.0 2023-12-22 11:52:35,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=570813.3333333334, ans=0.125 2023-12-22 11:52:42,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=570880.0, ans=0.125 2023-12-22 11:52:46,539 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.514e+01 2.915e+01 3.083e+01 3.222e+01 3.663e+01, threshold=6.165e+01, percent-clipped=0.0 2023-12-22 11:52:51,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=570946.6666666666, ans=0.125 2023-12-22 11:52:51,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.69 vs. limit=22.5 2023-12-22 11:52:54,858 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2023-12-22 11:53:00,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=571013.3333333334, ans=0.0 2023-12-22 11:53:20,548 INFO [train.py:886] (0/4) Epoch 18, batch 4650, loss[loss=0.01504, audio_tagging_loss=0.01504, over 25000.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4963091.99 frames. ], batch size: 100, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:53:23,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=571146.6666666666, ans=0.125 2023-12-22 11:53:34,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=571213.3333333334, ans=0.1 2023-12-22 11:53:38,480 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.46 vs. limit=15.0 2023-12-22 11:53:50,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=571346.6666666666, ans=0.0 2023-12-22 11:53:55,541 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.149e-02 2023-12-22 11:54:11,472 INFO [train.py:886] (0/4) Epoch 18, batch 4700, loss[loss=0.01573, audio_tagging_loss=0.01573, over 24750.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4961519.23 frames. ], batch size: 99, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:54:26,976 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.542e+01 2.832e+01 2.995e+01 3.118e+01 3.636e+01, threshold=5.991e+01, percent-clipped=0.0 2023-12-22 11:54:29,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=571613.3333333334, ans=0.125 2023-12-22 11:54:50,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=571746.6666666666, ans=0.125 2023-12-22 11:54:54,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=571746.6666666666, ans=0.125 2023-12-22 11:54:58,481 INFO [train.py:886] (0/4) Epoch 18, batch 4750, loss[loss=0.01462, audio_tagging_loss=0.01462, over 24750.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4953626.60 frames. ], batch size: 99, lr: 5.96e-03, grad_scale: 32.0 2023-12-22 11:55:05,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=571813.3333333334, ans=0.1 2023-12-22 11:55:13,881 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-18.pt 2023-12-22 11:55:34,698 INFO [train.py:886] (0/4) Epoch 19, batch 0, loss[loss=0.03639, audio_tagging_loss=0.03639, over 21171.00 frames. ], tot_loss[loss=0.03639, audio_tagging_loss=0.03639, over 21171.00 frames. ], batch size: 107, lr: 5.80e-03, grad_scale: 32.0 2023-12-22 11:55:34,699 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 11:55:56,080 INFO [train.py:917] (0/4) Epoch 19, validation: loss=0.03209, audio_tagging_loss=0.03209, over 3737520.00 frames. 2023-12-22 11:55:56,081 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 11:55:56,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=571920.0, ans=0.2 2023-12-22 11:56:04,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=571986.6666666666, ans=0.125 2023-12-22 11:56:30,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572120.0, ans=0.1 2023-12-22 11:56:33,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=572120.0, ans=0.0 2023-12-22 11:56:42,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=572186.6666666666, ans=0.125 2023-12-22 11:56:44,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572186.6666666666, ans=0.1 2023-12-22 11:56:46,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.06 vs. limit=10.0 2023-12-22 11:56:46,850 INFO [train.py:886] (0/4) Epoch 19, batch 50, loss[loss=0.0153, audio_tagging_loss=0.0153, over 24128.00 frames. ], tot_loss[loss=0.02226, audio_tagging_loss=0.02226, over 1118649.69 frames. ], batch size: 100, lr: 5.80e-03, grad_scale: 64.0 2023-12-22 11:56:47,742 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.638e+01 3.056e+01 3.497e+01 4.227e+01 9.985e+01, threshold=6.993e+01, percent-clipped=7.0 2023-12-22 11:56:52,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=572253.3333333334, ans=0.1 2023-12-22 11:56:54,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=572253.3333333334, ans=0.09899494936611666 2023-12-22 11:56:54,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=572253.3333333334, ans=0.0 2023-12-22 11:57:05,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=572320.0, ans=0.125 2023-12-22 11:57:05,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=572320.0, ans=0.0 2023-12-22 11:57:09,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=6.0 2023-12-22 11:57:16,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572453.3333333334, ans=0.1 2023-12-22 11:57:19,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=572453.3333333334, ans=0.05 2023-12-22 11:57:27,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=572520.0, ans=0.125 2023-12-22 11:57:38,769 INFO [train.py:886] (0/4) Epoch 19, batch 100, loss[loss=0.01466, audio_tagging_loss=0.01466, over 25000.00 frames. ], tot_loss[loss=0.01934, audio_tagging_loss=0.01934, over 1974288.24 frames. ], batch size: 100, lr: 5.80e-03, grad_scale: 64.0 2023-12-22 11:57:38,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=572586.6666666666, ans=0.0 2023-12-22 11:57:40,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=572586.6666666666, ans=0.125 2023-12-22 11:57:44,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=572586.6666666666, ans=0.0 2023-12-22 11:57:49,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=572653.3333333334, ans=0.1 2023-12-22 11:58:01,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572720.0, ans=0.1 2023-12-22 11:58:01,323 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.529e-03 2023-12-22 11:58:02,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=572720.0, ans=0.125 2023-12-22 11:58:15,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=572786.6666666666, ans=0.0 2023-12-22 11:58:15,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=572786.6666666666, ans=0.125 2023-12-22 11:58:30,338 INFO [train.py:886] (0/4) Epoch 19, batch 150, loss[loss=0.01521, audio_tagging_loss=0.01521, over 24750.00 frames. ], tot_loss[loss=0.01734, audio_tagging_loss=0.01734, over 2634589.91 frames. ], batch size: 99, lr: 5.80e-03, grad_scale: 64.0 2023-12-22 11:58:31,284 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.740e+01 3.036e+01 3.224e+01 3.385e+01 4.253e+01, threshold=6.449e+01, percent-clipped=0.0 2023-12-22 11:58:37,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2023-12-22 11:58:41,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=572986.6666666666, ans=0.0 2023-12-22 11:59:05,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=573120.0, ans=0.0 2023-12-22 11:59:06,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=573120.0, ans=0.2 2023-12-22 11:59:07,853 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=12.0 2023-12-22 11:59:10,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=573120.0, ans=0.07 2023-12-22 11:59:15,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.93 vs. limit=15.0 2023-12-22 11:59:17,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.24 vs. limit=22.5 2023-12-22 11:59:18,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=573186.6666666666, ans=0.125 2023-12-22 11:59:22,105 INFO [train.py:886] (0/4) Epoch 19, batch 200, loss[loss=0.01128, audio_tagging_loss=0.01128, over 25000.00 frames. ], tot_loss[loss=0.01624, audio_tagging_loss=0.01624, over 3151003.68 frames. ], batch size: 100, lr: 5.80e-03, grad_scale: 64.0 2023-12-22 11:59:27,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=573253.3333333334, ans=0.125 2023-12-22 11:59:28,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.35 vs. limit=12.0 2023-12-22 11:59:32,525 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2023-12-22 11:59:34,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=573320.0, ans=0.125 2023-12-22 11:59:34,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=573320.0, ans=0.125 2023-12-22 12:00:03,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=573520.0, ans=0.09899494936611666 2023-12-22 12:00:14,374 INFO [train.py:886] (0/4) Epoch 19, batch 250, loss[loss=0.01439, audio_tagging_loss=0.01439, over 25000.00 frames. ], tot_loss[loss=0.01558, audio_tagging_loss=0.01558, over 3548541.44 frames. ], batch size: 100, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:00:15,323 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.661e+01 2.866e+01 3.015e+01 3.184e+01 3.683e+01, threshold=6.030e+01, percent-clipped=0.0 2023-12-22 12:00:24,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2023-12-22 12:00:58,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=573853.3333333334, ans=0.0 2023-12-22 12:01:00,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=573853.3333333334, ans=0.0 2023-12-22 12:01:01,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=573853.3333333334, ans=0.0 2023-12-22 12:01:04,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=573920.0, ans=0.125 2023-12-22 12:01:06,091 INFO [train.py:886] (0/4) Epoch 19, batch 300, loss[loss=0.01513, audio_tagging_loss=0.01513, over 24750.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 3857339.82 frames. ], batch size: 99, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:01:09,572 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.44 vs. limit=15.0 2023-12-22 12:01:13,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.34 vs. limit=10.0 2023-12-22 12:01:20,453 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.80 vs. limit=22.5 2023-12-22 12:01:24,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=573986.6666666666, ans=0.07 2023-12-22 12:01:43,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=574120.0, ans=0.2 2023-12-22 12:01:46,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.20 vs. limit=22.5 2023-12-22 12:01:57,894 INFO [train.py:886] (0/4) Epoch 19, batch 350, loss[loss=0.01217, audio_tagging_loss=0.01217, over 24750.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4097943.87 frames. ], batch size: 99, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:01:58,806 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.631e+01 2.896e+01 3.022e+01 3.147e+01 3.983e+01, threshold=6.044e+01, percent-clipped=0.0 2023-12-22 12:02:00,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=574253.3333333334, ans=0.0 2023-12-22 12:02:03,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.24 vs. limit=15.0 2023-12-22 12:02:09,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=574320.0, ans=0.1 2023-12-22 12:02:29,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=574453.3333333334, ans=0.05 2023-12-22 12:02:31,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=574453.3333333334, ans=0.125 2023-12-22 12:02:36,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=574453.3333333334, ans=0.05 2023-12-22 12:02:46,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=574520.0, ans=0.0 2023-12-22 12:02:46,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=574520.0, ans=0.0 2023-12-22 12:02:47,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=574520.0, ans=0.2 2023-12-22 12:02:50,300 INFO [train.py:886] (0/4) Epoch 19, batch 400, loss[loss=0.01486, audio_tagging_loss=0.01486, over 25000.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4281889.35 frames. ], batch size: 100, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:02:54,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=574586.6666666666, ans=0.125 2023-12-22 12:02:54,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.08 vs. limit=10.0 2023-12-22 12:02:58,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=574586.6666666666, ans=0.125 2023-12-22 12:02:58,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=574586.6666666666, ans=0.125 2023-12-22 12:03:02,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.66 vs. limit=22.5 2023-12-22 12:03:42,115 INFO [train.py:886] (0/4) Epoch 19, batch 450, loss[loss=0.01311, audio_tagging_loss=0.01311, over 24750.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4425615.92 frames. ], batch size: 99, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:03:42,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=574920.0, ans=0.125 2023-12-22 12:03:43,022 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.542e+01 2.821e+01 2.989e+01 3.118e+01 4.084e+01, threshold=5.979e+01, percent-clipped=0.0 2023-12-22 12:04:07,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=575053.3333333334, ans=0.125 2023-12-22 12:04:08,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=575053.3333333334, ans=0.125 2023-12-22 12:04:09,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=575053.3333333334, ans=0.0 2023-12-22 12:04:18,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=575120.0, ans=0.125 2023-12-22 12:04:18,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=575120.0, ans=0.125 2023-12-22 12:04:26,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.54 vs. limit=15.0 2023-12-22 12:04:34,407 INFO [train.py:886] (0/4) Epoch 19, batch 500, loss[loss=0.01256, audio_tagging_loss=0.01256, over 25000.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4547529.70 frames. ], batch size: 100, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:04:37,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=575253.3333333334, ans=0.09899494936611666 2023-12-22 12:04:44,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=575320.0, ans=0.2 2023-12-22 12:05:25,832 INFO [train.py:886] (0/4) Epoch 19, batch 550, loss[loss=0.01556, audio_tagging_loss=0.01556, over 25000.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4644212.60 frames. ], batch size: 100, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:05:27,459 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.644e+01 2.851e+01 2.999e+01 3.143e+01 3.549e+01, threshold=5.998e+01, percent-clipped=0.0 2023-12-22 12:05:38,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=575653.3333333334, ans=0.125 2023-12-22 12:05:47,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=575720.0, ans=0.125 2023-12-22 12:06:14,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=575853.3333333334, ans=0.125 2023-12-22 12:06:17,523 INFO [train.py:886] (0/4) Epoch 19, batch 600, loss[loss=0.01291, audio_tagging_loss=0.01291, over 24750.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 4717151.28 frames. ], batch size: 99, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:06:18,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=575920.0, ans=0.125 2023-12-22 12:06:34,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.16 vs. limit=15.0 2023-12-22 12:06:38,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.85 vs. limit=22.5 2023-12-22 12:06:40,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=576053.3333333334, ans=0.05 2023-12-22 12:06:53,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=576120.0, ans=0.0 2023-12-22 12:06:59,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2023-12-22 12:07:10,171 INFO [train.py:886] (0/4) Epoch 19, batch 650, loss[loss=0.01299, audio_tagging_loss=0.01299, over 24750.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4764444.24 frames. ], batch size: 99, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:07:11,730 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.622e+01 2.907e+01 3.049e+01 3.128e+01 3.545e+01, threshold=6.097e+01, percent-clipped=0.0 2023-12-22 12:07:23,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=576320.0, ans=0.0 2023-12-22 12:07:35,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=576386.6666666666, ans=0.0 2023-12-22 12:07:57,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=576520.0, ans=0.025 2023-12-22 12:08:01,306 INFO [train.py:886] (0/4) Epoch 19, batch 700, loss[loss=0.01105, audio_tagging_loss=0.01105, over 24750.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4805429.80 frames. ], batch size: 99, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:08:15,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=576653.3333333334, ans=0.125 2023-12-22 12:08:19,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2023-12-22 12:08:22,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-12-22 12:08:23,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=576720.0, ans=0.125 2023-12-22 12:08:25,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=576720.0, ans=0.0 2023-12-22 12:08:29,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=15.0 2023-12-22 12:08:46,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=576853.3333333334, ans=0.125 2023-12-22 12:08:53,205 INFO [train.py:886] (0/4) Epoch 19, batch 750, loss[loss=0.01428, audio_tagging_loss=0.01428, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4839027.55 frames. ], batch size: 100, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:08:54,163 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.545e+01 2.842e+01 3.000e+01 3.145e+01 3.640e+01, threshold=6.000e+01, percent-clipped=0.0 2023-12-22 12:08:59,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=576920.0, ans=0.2 2023-12-22 12:09:02,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=576986.6666666666, ans=0.2 2023-12-22 12:09:07,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2023-12-22 12:09:09,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=576986.6666666666, ans=0.0 2023-12-22 12:09:19,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.08 vs. limit=10.0 2023-12-22 12:09:20,240 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.11 vs. limit=15.0 2023-12-22 12:09:23,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=577120.0, ans=0.0 2023-12-22 12:09:30,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=577120.0, ans=0.1 2023-12-22 12:09:36,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=577186.6666666666, ans=0.1 2023-12-22 12:09:44,885 INFO [train.py:886] (0/4) Epoch 19, batch 800, loss[loss=0.01428, audio_tagging_loss=0.01428, over 25000.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4859850.89 frames. ], batch size: 100, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:09:52,499 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=15.0 2023-12-22 12:09:54,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2023-12-22 12:09:55,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=577320.0, ans=0.1 2023-12-22 12:10:28,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=577520.0, ans=0.05 2023-12-22 12:10:36,763 INFO [train.py:886] (0/4) Epoch 19, batch 850, loss[loss=0.01207, audio_tagging_loss=0.01207, over 25000.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4887672.34 frames. ], batch size: 100, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:10:37,688 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.538e+01 2.855e+01 2.981e+01 3.141e+01 3.623e+01, threshold=5.962e+01, percent-clipped=0.0 2023-12-22 12:10:43,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2023-12-22 12:10:46,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=577653.3333333334, ans=0.1 2023-12-22 12:10:55,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=577653.3333333334, ans=0.125 2023-12-22 12:11:10,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=577786.6666666666, ans=0.0 2023-12-22 12:11:15,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=577786.6666666666, ans=0.125 2023-12-22 12:11:27,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=577853.3333333334, ans=0.125 2023-12-22 12:11:29,931 INFO [train.py:886] (0/4) Epoch 19, batch 900, loss[loss=0.01262, audio_tagging_loss=0.01262, over 24750.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4904088.90 frames. ], batch size: 99, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:11:41,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=577986.6666666666, ans=0.125 2023-12-22 12:11:44,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=577986.6666666666, ans=0.0 2023-12-22 12:11:55,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=578053.3333333334, ans=0.0 2023-12-22 12:12:19,887 INFO [train.py:886] (0/4) Epoch 19, batch 950, loss[loss=0.01453, audio_tagging_loss=0.01453, over 24750.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4895962.46 frames. ], batch size: 99, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:12:20,790 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.573e+01 2.852e+01 2.989e+01 3.124e+01 4.073e+01, threshold=5.979e+01, percent-clipped=0.0 2023-12-22 12:12:25,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=578253.3333333334, ans=0.125 2023-12-22 12:12:31,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=578320.0, ans=0.125 2023-12-22 12:12:36,565 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-12-22 12:12:37,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=578320.0, ans=0.125 2023-12-22 12:12:39,379 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=15.0 2023-12-22 12:12:53,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=578453.3333333334, ans=0.125 2023-12-22 12:13:11,967 INFO [train.py:886] (0/4) Epoch 19, batch 1000, loss[loss=0.01416, audio_tagging_loss=0.01416, over 24750.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4903270.21 frames. ], batch size: 99, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:13:13,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=578586.6666666666, ans=0.0 2023-12-22 12:13:15,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=578586.6666666666, ans=0.125 2023-12-22 12:13:19,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.73 vs. limit=15.0 2023-12-22 12:13:19,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=578586.6666666666, ans=0.0 2023-12-22 12:14:00,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=578853.3333333334, ans=0.2 2023-12-22 12:14:04,559 INFO [train.py:886] (0/4) Epoch 19, batch 1050, loss[loss=0.01126, audio_tagging_loss=0.01126, over 24750.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4910581.11 frames. ], batch size: 99, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:14:05,502 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.555e+01 2.856e+01 2.980e+01 3.114e+01 3.633e+01, threshold=5.960e+01, percent-clipped=0.0 2023-12-22 12:14:40,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-12-22 12:14:43,196 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.69 vs. limit=5.0 2023-12-22 12:14:45,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=579186.6666666666, ans=0.125 2023-12-22 12:14:53,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=579186.6666666666, ans=0.0 2023-12-22 12:14:55,538 INFO [train.py:886] (0/4) Epoch 19, batch 1100, loss[loss=0.01295, audio_tagging_loss=0.01295, over 24750.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4920329.71 frames. ], batch size: 99, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:14:57,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.15 vs. limit=22.5 2023-12-22 12:15:12,342 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.65 vs. limit=15.0 2023-12-22 12:15:19,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=579386.6666666666, ans=0.0 2023-12-22 12:15:21,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.61 vs. limit=12.0 2023-12-22 12:15:25,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=579453.3333333334, ans=0.125 2023-12-22 12:15:47,810 INFO [train.py:886] (0/4) Epoch 19, batch 1150, loss[loss=0.01381, audio_tagging_loss=0.01381, over 25000.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4930297.80 frames. ], batch size: 100, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:15:49,403 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.419e+01 2.855e+01 2.981e+01 3.079e+01 3.493e+01, threshold=5.963e+01, percent-clipped=0.0 2023-12-22 12:16:07,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=579720.0, ans=0.1 2023-12-22 12:16:08,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=579720.0, ans=15.0 2023-12-22 12:16:28,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=579853.3333333334, ans=0.0 2023-12-22 12:16:39,353 INFO [train.py:886] (0/4) Epoch 19, batch 1200, loss[loss=0.01528, audio_tagging_loss=0.01528, over 25000.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4938466.34 frames. ], batch size: 100, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:16:41,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=579920.0, ans=0.1 2023-12-22 12:16:50,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=579986.6666666666, ans=0.125 2023-12-22 12:17:00,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2023-12-22 12:17:03,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=580053.3333333334, ans=0.04949747468305833 2023-12-22 12:17:16,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=580120.0, ans=0.125 2023-12-22 12:17:19,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.72 vs. limit=22.5 2023-12-22 12:17:20,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=580120.0, ans=0.125 2023-12-22 12:17:21,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=580186.6666666666, ans=0.0 2023-12-22 12:17:30,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.17 vs. limit=15.0 2023-12-22 12:17:31,954 INFO [train.py:886] (0/4) Epoch 19, batch 1250, loss[loss=0.01351, audio_tagging_loss=0.01351, over 24750.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4934194.42 frames. ], batch size: 99, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:17:32,869 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+01 2.968e+01 3.088e+01 3.270e+01 3.810e+01, threshold=6.176e+01, percent-clipped=0.0 2023-12-22 12:17:40,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.85 vs. limit=15.0 2023-12-22 12:17:59,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=580386.6666666666, ans=0.0 2023-12-22 12:18:04,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=580453.3333333334, ans=0.1 2023-12-22 12:18:04,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.53 vs. limit=22.5 2023-12-22 12:18:10,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=580453.3333333334, ans=0.125 2023-12-22 12:18:11,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=580453.3333333334, ans=0.0 2023-12-22 12:18:24,357 INFO [train.py:886] (0/4) Epoch 19, batch 1300, loss[loss=0.01309, audio_tagging_loss=0.01309, over 25000.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4932911.39 frames. ], batch size: 100, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:18:43,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-12-22 12:18:45,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=580720.0, ans=0.125 2023-12-22 12:18:50,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=580720.0, ans=0.0 2023-12-22 12:18:51,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=580720.0, ans=0.2 2023-12-22 12:19:16,597 INFO [train.py:886] (0/4) Epoch 19, batch 1350, loss[loss=0.01625, audio_tagging_loss=0.01625, over 25000.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4936718.49 frames. ], batch size: 100, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:19:17,516 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+01 2.933e+01 3.068e+01 3.210e+01 4.169e+01, threshold=6.137e+01, percent-clipped=0.0 2023-12-22 12:19:25,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=580920.0, ans=0.0 2023-12-22 12:19:52,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=581120.0, ans=0.125 2023-12-22 12:20:00,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=12.0 2023-12-22 12:20:03,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.02 vs. limit=22.5 2023-12-22 12:20:05,767 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.84 vs. limit=15.0 2023-12-22 12:20:08,143 INFO [train.py:886] (0/4) Epoch 19, batch 1400, loss[loss=0.01166, audio_tagging_loss=0.01166, over 25000.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4941273.56 frames. ], batch size: 100, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:20:09,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=581253.3333333334, ans=0.0 2023-12-22 12:20:17,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=581320.0, ans=0.125 2023-12-22 12:20:56,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2023-12-22 12:20:57,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=581520.0, ans=0.125 2023-12-22 12:21:00,403 INFO [train.py:886] (0/4) Epoch 19, batch 1450, loss[loss=0.01226, audio_tagging_loss=0.01226, over 22354.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4944723.47 frames. ], batch size: 107, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:21:01,343 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.498e+01 2.831e+01 2.949e+01 3.105e+01 4.075e+01, threshold=5.898e+01, percent-clipped=0.0 2023-12-22 12:21:12,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=581653.3333333334, ans=0.125 2023-12-22 12:21:15,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=581653.3333333334, ans=0.05 2023-12-22 12:21:22,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=581720.0, ans=0.125 2023-12-22 12:21:26,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=581720.0, ans=0.125 2023-12-22 12:21:28,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=581720.0, ans=0.125 2023-12-22 12:21:38,888 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2023-12-22 12:21:46,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=581853.3333333334, ans=0.125 2023-12-22 12:21:46,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=581853.3333333334, ans=0.1 2023-12-22 12:21:51,566 INFO [train.py:886] (0/4) Epoch 19, batch 1500, loss[loss=0.01356, audio_tagging_loss=0.01356, over 25000.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4950283.46 frames. ], batch size: 100, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:22:22,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2023-12-22 12:22:22,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=582120.0, ans=0.125 2023-12-22 12:22:24,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=582120.0, ans=0.04949747468305833 2023-12-22 12:22:35,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=582186.6666666666, ans=0.125 2023-12-22 12:22:42,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=582186.6666666666, ans=0.125 2023-12-22 12:22:44,276 INFO [train.py:886] (0/4) Epoch 19, batch 1550, loss[loss=0.01425, audio_tagging_loss=0.01425, over 24750.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4946538.99 frames. ], batch size: 99, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:22:45,195 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.569e+01 2.901e+01 3.016e+01 3.205e+01 3.562e+01, threshold=6.032e+01, percent-clipped=0.0 2023-12-22 12:22:50,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.91 vs. limit=10.0 2023-12-22 12:23:02,231 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.27 vs. limit=22.5 2023-12-22 12:23:13,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=582386.6666666666, ans=0.125 2023-12-22 12:23:16,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.97 vs. limit=15.0 2023-12-22 12:23:17,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.01 vs. limit=6.0 2023-12-22 12:23:35,105 INFO [train.py:886] (0/4) Epoch 19, batch 1600, loss[loss=0.01207, audio_tagging_loss=0.01207, over 24750.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4937597.88 frames. ], batch size: 99, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:23:45,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-12-22 12:23:46,605 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.00 vs. limit=12.0 2023-12-22 12:23:50,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=582653.3333333334, ans=0.1 2023-12-22 12:23:51,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=582653.3333333334, ans=0.125 2023-12-22 12:24:03,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=582720.0, ans=0.0 2023-12-22 12:24:10,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=582786.6666666666, ans=0.0 2023-12-22 12:24:19,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=582853.3333333334, ans=0.1 2023-12-22 12:24:26,979 INFO [train.py:886] (0/4) Epoch 19, batch 1650, loss[loss=0.01517, audio_tagging_loss=0.01517, over 22353.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4932974.02 frames. ], batch size: 107, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:24:27,921 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.591e+01 2.832e+01 3.010e+01 3.174e+01 4.519e+01, threshold=6.020e+01, percent-clipped=0.0 2023-12-22 12:24:50,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=583053.3333333334, ans=0.1 2023-12-22 12:24:52,775 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:24:57,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=583120.0, ans=0.2 2023-12-22 12:24:58,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=583120.0, ans=0.0 2023-12-22 12:25:20,025 INFO [train.py:886] (0/4) Epoch 19, batch 1700, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4936097.90 frames. ], batch size: 100, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:25:20,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=583253.3333333334, ans=0.0 2023-12-22 12:25:37,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=583320.0, ans=0.125 2023-12-22 12:25:39,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583386.6666666666, ans=0.1 2023-12-22 12:26:05,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=583520.0, ans=0.0 2023-12-22 12:26:10,341 INFO [train.py:886] (0/4) Epoch 19, batch 1750, loss[loss=0.01377, audio_tagging_loss=0.01377, over 25000.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4942618.78 frames. ], batch size: 100, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:26:11,958 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+01 2.867e+01 2.991e+01 3.170e+01 3.971e+01, threshold=5.982e+01, percent-clipped=0.0 2023-12-22 12:26:13,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.26 vs. limit=15.0 2023-12-22 12:26:19,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=583586.6666666666, ans=0.04949747468305833 2023-12-22 12:26:43,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2023-12-22 12:26:45,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=583786.6666666666, ans=0.0 2023-12-22 12:26:55,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.83 vs. limit=15.0 2023-12-22 12:26:56,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2023-12-22 12:27:00,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=583853.3333333334, ans=0.125 2023-12-22 12:27:01,890 INFO [train.py:886] (0/4) Epoch 19, batch 1800, loss[loss=0.013, audio_tagging_loss=0.013, over 25000.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4953107.94 frames. ], batch size: 100, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:27:05,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2023-12-22 12:27:06,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=583920.0, ans=0.125 2023-12-22 12:27:08,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=583920.0, ans=0.125 2023-12-22 12:27:20,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2023-12-22 12:27:35,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2023-12-22 12:27:38,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=584120.0, ans=0.0 2023-12-22 12:27:39,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=584120.0, ans=0.0 2023-12-22 12:27:51,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=584186.6666666666, ans=0.09899494936611666 2023-12-22 12:27:54,645 INFO [train.py:886] (0/4) Epoch 19, batch 1850, loss[loss=0.01258, audio_tagging_loss=0.01258, over 24750.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4948486.83 frames. ], batch size: 99, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:27:55,545 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+01 2.889e+01 3.007e+01 3.145e+01 3.702e+01, threshold=6.015e+01, percent-clipped=0.0 2023-12-22 12:27:58,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-12-22 12:28:04,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.92 vs. limit=10.0 2023-12-22 12:28:12,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=15.0 2023-12-22 12:28:13,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=584320.0, ans=0.125 2023-12-22 12:28:46,471 INFO [train.py:886] (0/4) Epoch 19, batch 1900, loss[loss=0.01499, audio_tagging_loss=0.01499, over 24750.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4941804.01 frames. ], batch size: 99, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:28:57,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584653.3333333334, ans=0.1 2023-12-22 12:29:13,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=584720.0, ans=0.125 2023-12-22 12:29:21,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=584786.6666666666, ans=10.0 2023-12-22 12:29:21,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=584786.6666666666, ans=0.125 2023-12-22 12:29:39,773 INFO [train.py:886] (0/4) Epoch 19, batch 1950, loss[loss=0.01415, audio_tagging_loss=0.01415, over 25000.00 frames. ], tot_loss[loss=0.0138, audio_tagging_loss=0.0138, over 4943232.76 frames. ], batch size: 100, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:29:40,703 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.571e+01 2.861e+01 2.979e+01 3.122e+01 3.684e+01, threshold=5.958e+01, percent-clipped=0.0 2023-12-22 12:29:59,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=585053.3333333334, ans=0.125 2023-12-22 12:30:06,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=585053.3333333334, ans=0.125 2023-12-22 12:30:14,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=585120.0, ans=0.0 2023-12-22 12:30:26,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=585186.6666666666, ans=0.125 2023-12-22 12:30:30,751 INFO [train.py:886] (0/4) Epoch 19, batch 2000, loss[loss=0.01342, audio_tagging_loss=0.01342, over 25000.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4943843.28 frames. ], batch size: 100, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:30:30,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=585253.3333333334, ans=0.0 2023-12-22 12:30:50,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=585320.0, ans=0.125 2023-12-22 12:31:06,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=585453.3333333334, ans=0.1 2023-12-22 12:31:08,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=585453.3333333334, ans=0.125 2023-12-22 12:31:17,240 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.69 vs. limit=15.0 2023-12-22 12:31:18,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=585520.0, ans=0.0 2023-12-22 12:31:22,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=585586.6666666666, ans=0.1 2023-12-22 12:31:23,488 INFO [train.py:886] (0/4) Epoch 19, batch 2050, loss[loss=0.01263, audio_tagging_loss=0.01263, over 24024.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4942318.15 frames. ], batch size: 100, lr: 5.73e-03, grad_scale: 128.0 2023-12-22 12:31:24,384 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.636e+01 2.887e+01 3.027e+01 3.183e+01 3.649e+01, threshold=6.055e+01, percent-clipped=0.0 2023-12-22 12:31:33,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2023-12-22 12:31:48,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=585720.0, ans=0.125 2023-12-22 12:31:55,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.08 vs. limit=15.0 2023-12-22 12:32:01,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.85 vs. limit=22.5 2023-12-22 12:32:02,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=585786.6666666666, ans=0.5 2023-12-22 12:32:04,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=585853.3333333334, ans=0.015 2023-12-22 12:32:10,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=585853.3333333334, ans=0.0 2023-12-22 12:32:14,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=585920.0, ans=0.0 2023-12-22 12:32:15,254 INFO [train.py:886] (0/4) Epoch 19, batch 2100, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4946945.94 frames. ], batch size: 100, lr: 5.73e-03, grad_scale: 128.0 2023-12-22 12:32:21,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=585920.0, ans=0.125 2023-12-22 12:32:27,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=585986.6666666666, ans=0.125 2023-12-22 12:32:33,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=585986.6666666666, ans=0.0 2023-12-22 12:32:33,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=585986.6666666666, ans=0.1 2023-12-22 12:32:48,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=586120.0, ans=0.05 2023-12-22 12:33:05,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=586186.6666666666, ans=0.0 2023-12-22 12:33:06,965 INFO [train.py:886] (0/4) Epoch 19, batch 2150, loss[loss=0.01306, audio_tagging_loss=0.01306, over 25000.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4955969.98 frames. ], batch size: 100, lr: 5.73e-03, grad_scale: 128.0 2023-12-22 12:33:07,903 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 2.855e+01 2.979e+01 3.117e+01 3.544e+01, threshold=5.958e+01, percent-clipped=0.0 2023-12-22 12:33:58,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=586586.6666666666, ans=0.125 2023-12-22 12:33:59,117 INFO [train.py:886] (0/4) Epoch 19, batch 2200, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24053.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 4953249.57 frames. ], batch size: 100, lr: 5.73e-03, grad_scale: 128.0 2023-12-22 12:34:05,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=586586.6666666666, ans=0.07 2023-12-22 12:34:10,372 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-88000.pt 2023-12-22 12:34:12,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=586653.3333333334, ans=0.125 2023-12-22 12:34:25,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=586720.0, ans=0.0 2023-12-22 12:34:39,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=586786.6666666666, ans=0.125 2023-12-22 12:34:42,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=586853.3333333334, ans=0.125 2023-12-22 12:34:53,647 INFO [train.py:886] (0/4) Epoch 19, batch 2250, loss[loss=0.01602, audio_tagging_loss=0.01602, over 24750.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4943669.00 frames. ], batch size: 99, lr: 5.73e-03, grad_scale: 64.0 2023-12-22 12:34:55,530 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.594e+01 2.899e+01 3.031e+01 3.221e+01 3.742e+01, threshold=6.061e+01, percent-clipped=0.0 2023-12-22 12:34:56,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=586920.0, ans=0.125 2023-12-22 12:35:05,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=586986.6666666666, ans=0.125 2023-12-22 12:35:12,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=586986.6666666666, ans=0.0 2023-12-22 12:35:20,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.91 vs. limit=22.5 2023-12-22 12:35:24,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=587120.0, ans=0.0 2023-12-22 12:35:40,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=587186.6666666666, ans=6.0 2023-12-22 12:35:45,207 INFO [train.py:886] (0/4) Epoch 19, batch 2300, loss[loss=0.01621, audio_tagging_loss=0.01621, over 24750.00 frames. ], tot_loss[loss=0.0138, audio_tagging_loss=0.0138, over 4945555.16 frames. ], batch size: 99, lr: 5.73e-03, grad_scale: 64.0 2023-12-22 12:36:03,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=587320.0, ans=15.0 2023-12-22 12:36:15,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=587453.3333333334, ans=0.125 2023-12-22 12:36:34,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=587520.0, ans=0.125 2023-12-22 12:36:37,597 INFO [train.py:886] (0/4) Epoch 19, batch 2350, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4949646.68 frames. ], batch size: 100, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:36:39,493 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.629e+01 2.873e+01 2.999e+01 3.144e+01 3.595e+01, threshold=5.997e+01, percent-clipped=0.0 2023-12-22 12:36:40,072 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.52 vs. limit=15.0 2023-12-22 12:36:56,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=587653.3333333334, ans=15.0 2023-12-22 12:37:02,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=587720.0, ans=0.0 2023-12-22 12:37:03,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=587720.0, ans=0.035 2023-12-22 12:37:09,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=587786.6666666666, ans=0.2 2023-12-22 12:37:11,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=587786.6666666666, ans=0.125 2023-12-22 12:37:26,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2023-12-22 12:37:29,344 INFO [train.py:886] (0/4) Epoch 19, batch 2400, loss[loss=0.01632, audio_tagging_loss=0.01632, over 25000.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4951561.57 frames. ], batch size: 100, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:37:52,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2023-12-22 12:37:56,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=588053.3333333334, ans=0.125 2023-12-22 12:38:16,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=588186.6666666666, ans=0.2 2023-12-22 12:38:20,928 INFO [train.py:886] (0/4) Epoch 19, batch 2450, loss[loss=0.0131, audio_tagging_loss=0.0131, over 25000.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4958697.10 frames. ], batch size: 100, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:38:22,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=588253.3333333334, ans=0.125 2023-12-22 12:38:22,760 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.448e+01 2.869e+01 3.044e+01 3.154e+01 3.723e+01, threshold=6.089e+01, percent-clipped=0.0 2023-12-22 12:38:23,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=588253.3333333334, ans=0.2 2023-12-22 12:38:28,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=588253.3333333334, ans=0.0 2023-12-22 12:38:46,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=588386.6666666666, ans=0.125 2023-12-22 12:38:59,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=588453.3333333334, ans=0.0 2023-12-22 12:39:03,949 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.48 vs. limit=15.0 2023-12-22 12:39:10,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=588520.0, ans=0.2 2023-12-22 12:39:10,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=588520.0, ans=0.0 2023-12-22 12:39:12,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=588586.6666666666, ans=0.125 2023-12-22 12:39:13,118 INFO [train.py:886] (0/4) Epoch 19, batch 2500, loss[loss=0.01622, audio_tagging_loss=0.01622, over 24951.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4960780.11 frames. ], batch size: 100, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:39:18,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=588586.6666666666, ans=0.125 2023-12-22 12:39:55,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=588853.3333333334, ans=0.2 2023-12-22 12:39:58,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=588853.3333333334, ans=0.2 2023-12-22 12:40:03,873 INFO [train.py:886] (0/4) Epoch 19, batch 2550, loss[loss=0.01282, audio_tagging_loss=0.01282, over 24059.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4953653.04 frames. ], batch size: 100, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:40:06,706 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.587e+01 2.976e+01 3.078e+01 3.235e+01 3.985e+01, threshold=6.155e+01, percent-clipped=0.0 2023-12-22 12:40:24,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2023-12-22 12:40:27,339 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.63 vs. limit=6.0 2023-12-22 12:40:32,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=589053.3333333334, ans=0.125 2023-12-22 12:40:42,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.56 vs. limit=22.5 2023-12-22 12:40:56,965 INFO [train.py:886] (0/4) Epoch 19, batch 2600, loss[loss=0.01523, audio_tagging_loss=0.01523, over 23995.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4947545.98 frames. ], batch size: 100, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:41:00,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=589253.3333333334, ans=0.125 2023-12-22 12:41:02,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.67 vs. limit=6.0 2023-12-22 12:41:15,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=589320.0, ans=0.05 2023-12-22 12:41:28,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=589453.3333333334, ans=0.025 2023-12-22 12:41:48,938 INFO [train.py:886] (0/4) Epoch 19, batch 2650, loss[loss=0.01662, audio_tagging_loss=0.01662, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4946160.37 frames. ], batch size: 100, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:41:51,499 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.297e+01 2.856e+01 2.983e+01 3.159e+01 3.716e+01, threshold=5.966e+01, percent-clipped=0.0 2023-12-22 12:41:51,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=589586.6666666666, ans=0.125 2023-12-22 12:42:14,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=5.85 vs. limit=15.0 2023-12-22 12:42:15,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2023-12-22 12:42:16,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=589720.0, ans=0.125 2023-12-22 12:42:24,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=589786.6666666666, ans=0.125 2023-12-22 12:42:41,021 INFO [train.py:886] (0/4) Epoch 19, batch 2700, loss[loss=0.01125, audio_tagging_loss=0.01125, over 25000.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4951013.65 frames. ], batch size: 100, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:42:53,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=589986.6666666666, ans=0.1 2023-12-22 12:43:28,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=590186.6666666666, ans=0.2 2023-12-22 12:43:33,971 INFO [train.py:886] (0/4) Epoch 19, batch 2750, loss[loss=0.01376, audio_tagging_loss=0.01376, over 25000.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4953281.43 frames. ], batch size: 100, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:43:35,869 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.593e+01 2.856e+01 3.011e+01 3.165e+01 3.589e+01, threshold=6.021e+01, percent-clipped=0.0 2023-12-22 12:43:36,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=590253.3333333334, ans=0.1 2023-12-22 12:43:37,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=590253.3333333334, ans=0.125 2023-12-22 12:43:39,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=590253.3333333334, ans=0.1 2023-12-22 12:43:40,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=590253.3333333334, ans=0.125 2023-12-22 12:44:07,108 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.748e-03 2023-12-22 12:44:18,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=590520.0, ans=0.125 2023-12-22 12:44:24,047 INFO [train.py:886] (0/4) Epoch 19, batch 2800, loss[loss=0.01322, audio_tagging_loss=0.01322, over 24750.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4952450.12 frames. ], batch size: 99, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:44:49,041 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:45:09,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=590853.3333333334, ans=10.0 2023-12-22 12:45:16,446 INFO [train.py:886] (0/4) Epoch 19, batch 2850, loss[loss=0.01439, audio_tagging_loss=0.01439, over 24750.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4947797.87 frames. ], batch size: 99, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:45:18,392 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.657e+01 2.937e+01 3.059e+01 3.223e+01 3.901e+01, threshold=6.118e+01, percent-clipped=0.0 2023-12-22 12:45:35,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2023-12-22 12:45:44,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=591053.3333333334, ans=0.125 2023-12-22 12:45:49,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=591120.0, ans=0.0 2023-12-22 12:45:52,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=591120.0, ans=0.09899494936611666 2023-12-22 12:46:08,810 INFO [train.py:886] (0/4) Epoch 19, batch 2900, loss[loss=0.01413, audio_tagging_loss=0.01413, over 24750.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4945464.44 frames. ], batch size: 99, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:46:11,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=591253.3333333334, ans=0.1 2023-12-22 12:46:39,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.19 vs. limit=22.5 2023-12-22 12:46:51,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=591520.0, ans=0.125 2023-12-22 12:47:00,311 INFO [train.py:886] (0/4) Epoch 19, batch 2950, loss[loss=0.01441, audio_tagging_loss=0.01441, over 25000.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4946100.56 frames. ], batch size: 100, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:47:02,195 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.513e+01 2.834e+01 2.936e+01 3.112e+01 3.649e+01, threshold=5.872e+01, percent-clipped=0.0 2023-12-22 12:47:03,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=591586.6666666666, ans=0.0 2023-12-22 12:47:26,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=591720.0, ans=0.1 2023-12-22 12:47:30,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=591720.0, ans=0.125 2023-12-22 12:47:33,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=591786.6666666666, ans=0.95 2023-12-22 12:47:54,103 INFO [train.py:886] (0/4) Epoch 19, batch 3000, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4951363.34 frames. ], batch size: 100, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:47:54,104 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 12:48:15,450 INFO [train.py:917] (0/4) Epoch 19, validation: loss=0.0333, audio_tagging_loss=0.0333, over 3737520.00 frames. 2023-12-22 12:48:15,451 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 12:48:23,947 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:48:28,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=591986.6666666666, ans=0.125 2023-12-22 12:48:36,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=592053.3333333334, ans=0.1 2023-12-22 12:48:47,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=592120.0, ans=0.125 2023-12-22 12:49:06,691 INFO [train.py:886] (0/4) Epoch 19, batch 3050, loss[loss=0.01128, audio_tagging_loss=0.01128, over 25000.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4957329.75 frames. ], batch size: 100, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:49:08,539 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.628e+01 2.886e+01 3.017e+01 3.136e+01 3.625e+01, threshold=6.033e+01, percent-clipped=0.0 2023-12-22 12:49:08,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=592253.3333333334, ans=0.125 2023-12-22 12:49:27,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=592386.6666666666, ans=0.0 2023-12-22 12:49:43,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=592453.3333333334, ans=0.05 2023-12-22 12:49:59,757 INFO [train.py:886] (0/4) Epoch 19, batch 3100, loss[loss=0.0132, audio_tagging_loss=0.0132, over 25000.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 4953184.70 frames. ], batch size: 100, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:50:07,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=592586.6666666666, ans=0.125 2023-12-22 12:50:12,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=592653.3333333334, ans=22.5 2023-12-22 12:50:22,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.56 vs. limit=6.0 2023-12-22 12:50:31,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2023-12-22 12:50:43,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=592853.3333333334, ans=0.1 2023-12-22 12:50:47,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=592853.3333333334, ans=0.125 2023-12-22 12:50:50,364 INFO [train.py:886] (0/4) Epoch 19, batch 3150, loss[loss=0.01358, audio_tagging_loss=0.01358, over 24750.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 4947351.74 frames. ], batch size: 99, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:50:52,268 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.598e+01 2.871e+01 2.985e+01 3.126e+01 3.979e+01, threshold=5.970e+01, percent-clipped=0.0 2023-12-22 12:50:52,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=592920.0, ans=0.0 2023-12-22 12:50:52,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=592920.0, ans=0.125 2023-12-22 12:51:22,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=593120.0, ans=0.125 2023-12-22 12:51:31,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=593186.6666666666, ans=0.125 2023-12-22 12:51:42,994 INFO [train.py:886] (0/4) Epoch 19, batch 3200, loss[loss=0.01283, audio_tagging_loss=0.01283, over 24750.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4942184.02 frames. ], batch size: 99, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:51:49,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.41 vs. limit=10.0 2023-12-22 12:51:53,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.54 vs. limit=6.0 2023-12-22 12:51:54,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.79 vs. limit=15.0 2023-12-22 12:51:57,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=593320.0, ans=0.125 2023-12-22 12:52:03,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2023-12-22 12:52:06,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=593386.6666666666, ans=0.1 2023-12-22 12:52:07,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=593386.6666666666, ans=0.0 2023-12-22 12:52:21,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=593453.3333333334, ans=0.1 2023-12-22 12:52:35,661 INFO [train.py:886] (0/4) Epoch 19, batch 3250, loss[loss=0.01471, audio_tagging_loss=0.01471, over 24750.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4941151.45 frames. ], batch size: 99, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:52:37,601 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.543e+01 2.874e+01 3.008e+01 3.221e+01 3.622e+01, threshold=6.016e+01, percent-clipped=0.0 2023-12-22 12:52:48,325 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=15.0 2023-12-22 12:53:13,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=593786.6666666666, ans=0.125 2023-12-22 12:53:25,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.02 vs. limit=22.5 2023-12-22 12:53:27,373 INFO [train.py:886] (0/4) Epoch 19, batch 3300, loss[loss=0.01708, audio_tagging_loss=0.01708, over 24918.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4949283.72 frames. ], batch size: 100, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:53:27,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=593920.0, ans=0.0 2023-12-22 12:53:32,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=593920.0, ans=15.0 2023-12-22 12:53:42,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=593986.6666666666, ans=0.0 2023-12-22 12:54:18,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=594253.3333333334, ans=0.1 2023-12-22 12:54:19,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=594253.3333333334, ans=0.0 2023-12-22 12:54:19,660 INFO [train.py:886] (0/4) Epoch 19, batch 3350, loss[loss=0.01371, audio_tagging_loss=0.01371, over 25000.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4956619.65 frames. ], batch size: 100, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:54:21,563 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.652e+01 2.848e+01 3.000e+01 3.148e+01 3.687e+01, threshold=5.999e+01, percent-clipped=0.0 2023-12-22 12:54:29,500 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2023-12-22 12:54:42,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=594386.6666666666, ans=0.0 2023-12-22 12:54:48,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=594386.6666666666, ans=0.0 2023-12-22 12:54:53,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=594453.3333333334, ans=0.0 2023-12-22 12:54:54,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=594453.3333333334, ans=0.1 2023-12-22 12:55:03,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=594520.0, ans=0.0 2023-12-22 12:55:06,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=594520.0, ans=0.125 2023-12-22 12:55:10,457 INFO [train.py:886] (0/4) Epoch 19, batch 3400, loss[loss=0.01506, audio_tagging_loss=0.01506, over 25000.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4957086.44 frames. ], batch size: 100, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:55:27,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.03 vs. limit=22.5 2023-12-22 12:55:51,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=594786.6666666666, ans=0.2 2023-12-22 12:55:56,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=12.0 2023-12-22 12:56:03,787 INFO [train.py:886] (0/4) Epoch 19, batch 3450, loss[loss=0.01443, audio_tagging_loss=0.01443, over 24750.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4950182.91 frames. ], batch size: 99, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:56:05,666 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.655e+01 2.881e+01 2.999e+01 3.150e+01 3.664e+01, threshold=5.998e+01, percent-clipped=0.0 2023-12-22 12:56:10,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=594920.0, ans=0.125 2023-12-22 12:56:31,480 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:56:37,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=595120.0, ans=0.0 2023-12-22 12:56:55,973 INFO [train.py:886] (0/4) Epoch 19, batch 3500, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24750.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4943644.96 frames. ], batch size: 99, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:57:02,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=595253.3333333334, ans=0.125 2023-12-22 12:57:03,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=595253.3333333334, ans=0.5 2023-12-22 12:57:20,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=595386.6666666666, ans=0.09899494936611666 2023-12-22 12:57:46,954 INFO [train.py:886] (0/4) Epoch 19, batch 3550, loss[loss=0.01411, audio_tagging_loss=0.01411, over 25000.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4945941.26 frames. ], batch size: 100, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:57:49,607 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.664e+01 2.876e+01 3.031e+01 3.189e+01 3.844e+01, threshold=6.062e+01, percent-clipped=0.0 2023-12-22 12:57:49,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=595586.6666666666, ans=0.1 2023-12-22 12:57:50,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595586.6666666666, ans=0.1 2023-12-22 12:58:22,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.58 vs. limit=22.5 2023-12-22 12:58:39,944 INFO [train.py:886] (0/4) Epoch 19, batch 3600, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4948137.30 frames. ], batch size: 100, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 12:58:57,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=595986.6666666666, ans=0.125 2023-12-22 12:59:13,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=596120.0, ans=0.1 2023-12-22 12:59:21,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=596186.6666666666, ans=0.0 2023-12-22 12:59:22,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=596186.6666666666, ans=0.0 2023-12-22 12:59:26,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=596186.6666666666, ans=0.125 2023-12-22 12:59:28,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=596186.6666666666, ans=0.125 2023-12-22 12:59:32,374 INFO [train.py:886] (0/4) Epoch 19, batch 3650, loss[loss=0.01397, audio_tagging_loss=0.01397, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4953327.56 frames. ], batch size: 100, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 12:59:35,050 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.592e+01 2.803e+01 2.946e+01 3.056e+01 3.583e+01, threshold=5.891e+01, percent-clipped=0.0 2023-12-22 12:59:35,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.21 vs. limit=22.5 2023-12-22 12:59:37,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596253.3333333334, ans=0.1 2023-12-22 12:59:38,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2023-12-22 12:59:41,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=596253.3333333334, ans=0.125 2023-12-22 12:59:49,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=596320.0, ans=0.0 2023-12-22 12:59:51,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596386.6666666666, ans=0.1 2023-12-22 12:59:54,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596386.6666666666, ans=0.1 2023-12-22 13:00:23,286 INFO [train.py:886] (0/4) Epoch 19, batch 3700, loss[loss=0.01378, audio_tagging_loss=0.01378, over 25000.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4951181.19 frames. ], batch size: 100, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 13:00:42,345 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:00:54,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=596786.6666666666, ans=0.5 2023-12-22 13:01:15,903 INFO [train.py:886] (0/4) Epoch 19, batch 3750, loss[loss=0.01353, audio_tagging_loss=0.01353, over 24750.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4945299.08 frames. ], batch size: 99, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 13:01:17,774 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.628e+01 2.917e+01 3.055e+01 3.190e+01 3.624e+01, threshold=6.111e+01, percent-clipped=0.0 2023-12-22 13:01:32,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=596986.6666666666, ans=0.1 2023-12-22 13:01:33,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=596986.6666666666, ans=0.0 2023-12-22 13:01:41,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=597053.3333333334, ans=0.1 2023-12-22 13:01:48,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.30 vs. limit=15.0 2023-12-22 13:02:06,085 INFO [train.py:886] (0/4) Epoch 19, batch 3800, loss[loss=0.01278, audio_tagging_loss=0.01278, over 25000.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4945408.73 frames. ], batch size: 100, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 13:02:07,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=597253.3333333334, ans=0.125 2023-12-22 13:02:13,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=597253.3333333334, ans=0.125 2023-12-22 13:02:14,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=597253.3333333334, ans=0.05 2023-12-22 13:02:26,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=597386.6666666666, ans=0.1 2023-12-22 13:02:35,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=597453.3333333334, ans=0.0 2023-12-22 13:02:41,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=597453.3333333334, ans=0.1 2023-12-22 13:02:47,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2023-12-22 13:02:48,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=597520.0, ans=0.125 2023-12-22 13:02:56,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=597586.6666666666, ans=0.125 2023-12-22 13:02:57,568 INFO [train.py:886] (0/4) Epoch 19, batch 3850, loss[loss=0.0127, audio_tagging_loss=0.0127, over 24750.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4946559.78 frames. ], batch size: 99, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 13:02:57,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=597586.6666666666, ans=0.0 2023-12-22 13:02:59,400 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.693e+01 2.917e+01 3.058e+01 3.165e+01 3.564e+01, threshold=6.116e+01, percent-clipped=0.0 2023-12-22 13:03:09,279 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-12-22 13:03:20,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=597720.0, ans=0.05 2023-12-22 13:03:28,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=597786.6666666666, ans=0.05 2023-12-22 13:03:43,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=597853.3333333334, ans=0.125 2023-12-22 13:03:49,339 INFO [train.py:886] (0/4) Epoch 19, batch 3900, loss[loss=0.01112, audio_tagging_loss=0.01112, over 25000.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4947270.37 frames. ], batch size: 100, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 13:04:02,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=597986.6666666666, ans=0.0 2023-12-22 13:04:06,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=597986.6666666666, ans=0.0 2023-12-22 13:04:09,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=598053.3333333334, ans=0.125 2023-12-22 13:04:16,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=598053.3333333334, ans=0.0 2023-12-22 13:04:39,092 INFO [train.py:886] (0/4) Epoch 19, batch 3950, loss[loss=0.01349, audio_tagging_loss=0.01349, over 25000.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4951032.57 frames. ], batch size: 100, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:04:40,997 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.494e+01 2.903e+01 2.997e+01 3.155e+01 3.492e+01, threshold=5.994e+01, percent-clipped=0.0 2023-12-22 13:04:56,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=598320.0, ans=0.125 2023-12-22 13:05:04,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=598386.6666666666, ans=0.1 2023-12-22 13:05:14,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=598453.3333333334, ans=0.2 2023-12-22 13:05:14,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2023-12-22 13:05:19,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=598520.0, ans=0.0 2023-12-22 13:05:31,256 INFO [train.py:886] (0/4) Epoch 19, batch 4000, loss[loss=0.01179, audio_tagging_loss=0.01179, over 25000.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4949289.05 frames. ], batch size: 100, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:05:33,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=598586.6666666666, ans=0.1 2023-12-22 13:05:52,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=598720.0, ans=0.07 2023-12-22 13:06:00,293 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=22.5 2023-12-22 13:06:14,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=598853.3333333334, ans=0.2 2023-12-22 13:06:14,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=598853.3333333334, ans=0.1 2023-12-22 13:06:21,982 INFO [train.py:886] (0/4) Epoch 19, batch 4050, loss[loss=0.0143, audio_tagging_loss=0.0143, over 24750.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4952752.87 frames. ], batch size: 99, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:06:24,465 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.602e+01 2.925e+01 3.047e+01 3.151e+01 3.558e+01, threshold=6.093e+01, percent-clipped=0.0 2023-12-22 13:06:29,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=598920.0, ans=0.0 2023-12-22 13:06:32,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=598986.6666666666, ans=0.2 2023-12-22 13:06:46,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=599053.3333333334, ans=0.09899494936611666 2023-12-22 13:07:00,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=15.0 2023-12-22 13:07:04,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=599186.6666666666, ans=0.0 2023-12-22 13:07:14,254 INFO [train.py:886] (0/4) Epoch 19, batch 4100, loss[loss=0.01205, audio_tagging_loss=0.01205, over 23997.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4942347.26 frames. ], batch size: 100, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:08:06,015 INFO [train.py:886] (0/4) Epoch 19, batch 4150, loss[loss=0.01283, audio_tagging_loss=0.01283, over 25000.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4939890.49 frames. ], batch size: 100, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:08:07,964 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.722e+01 2.927e+01 3.054e+01 3.225e+01 3.796e+01, threshold=6.108e+01, percent-clipped=0.0 2023-12-22 13:08:15,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=599653.3333333334, ans=0.125 2023-12-22 13:08:17,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=599653.3333333334, ans=0.0 2023-12-22 13:08:28,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=599720.0, ans=0.0 2023-12-22 13:08:55,594 INFO [train.py:886] (0/4) Epoch 19, batch 4200, loss[loss=0.01592, audio_tagging_loss=0.01592, over 25000.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4945327.56 frames. ], batch size: 100, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:09:27,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=600120.0, ans=0.0 2023-12-22 13:09:27,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2023-12-22 13:09:42,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=600186.6666666666, ans=0.1 2023-12-22 13:09:46,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=600186.6666666666, ans=0.125 2023-12-22 13:09:48,159 INFO [train.py:886] (0/4) Epoch 19, batch 4250, loss[loss=0.0138, audio_tagging_loss=0.0138, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4944178.62 frames. ], batch size: 100, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:09:50,967 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.636e+01 2.908e+01 3.025e+01 3.145e+01 3.602e+01, threshold=6.050e+01, percent-clipped=0.0 2023-12-22 13:10:24,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=600453.3333333334, ans=0.2 2023-12-22 13:10:25,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=600453.3333333334, ans=0.2 2023-12-22 13:10:26,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=600453.3333333334, ans=0.125 2023-12-22 13:10:27,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=600453.3333333334, ans=0.0 2023-12-22 13:10:39,401 INFO [train.py:886] (0/4) Epoch 19, batch 4300, loss[loss=0.01465, audio_tagging_loss=0.01465, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4953133.53 frames. ], batch size: 100, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:10:48,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=600586.6666666666, ans=0.02 2023-12-22 13:10:52,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.16 vs. limit=22.5 2023-12-22 13:10:55,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600653.3333333334, ans=0.1 2023-12-22 13:11:00,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=600720.0, ans=0.07 2023-12-22 13:11:18,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=600786.6666666666, ans=0.125 2023-12-22 13:11:20,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=600786.6666666666, ans=0.0 2023-12-22 13:11:31,567 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=12.0 2023-12-22 13:11:32,038 INFO [train.py:886] (0/4) Epoch 19, batch 4350, loss[loss=0.01559, audio_tagging_loss=0.01559, over 24750.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 4956040.42 frames. ], batch size: 99, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:11:34,859 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.519e+01 2.892e+01 3.027e+01 3.169e+01 3.833e+01, threshold=6.053e+01, percent-clipped=0.0 2023-12-22 13:11:37,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.23 vs. limit=22.5 2023-12-22 13:11:43,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=600986.6666666666, ans=0.0 2023-12-22 13:11:50,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.39 vs. limit=10.0 2023-12-22 13:11:51,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=600986.6666666666, ans=0.1 2023-12-22 13:12:03,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2023-12-22 13:12:07,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=601120.0, ans=0.95 2023-12-22 13:12:17,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=601186.6666666666, ans=0.0 2023-12-22 13:12:23,986 INFO [train.py:886] (0/4) Epoch 19, batch 4400, loss[loss=0.01189, audio_tagging_loss=0.01189, over 24032.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4950526.13 frames. ], batch size: 100, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:12:41,138 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:12:50,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=601386.6666666666, ans=0.07 2023-12-22 13:13:03,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=601453.3333333334, ans=0.125 2023-12-22 13:13:03,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=601453.3333333334, ans=0.125 2023-12-22 13:13:06,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=601520.0, ans=0.125 2023-12-22 13:13:06,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=601520.0, ans=0.0 2023-12-22 13:13:07,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=601520.0, ans=0.95 2023-12-22 13:13:14,608 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:13:15,228 INFO [train.py:886] (0/4) Epoch 19, batch 4450, loss[loss=0.01207, audio_tagging_loss=0.01207, over 25000.00 frames. ], tot_loss[loss=0.0138, audio_tagging_loss=0.0138, over 4938949.25 frames. ], batch size: 100, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:13:19,445 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.583e+01 2.870e+01 3.001e+01 3.186e+01 3.529e+01, threshold=6.001e+01, percent-clipped=0.0 2023-12-22 13:13:31,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=601653.3333333334, ans=0.125 2023-12-22 13:13:49,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=601786.6666666666, ans=0.0 2023-12-22 13:13:55,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=601786.6666666666, ans=10.0 2023-12-22 13:13:56,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=601786.6666666666, ans=0.0 2023-12-22 13:13:57,457 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-12-22 13:14:07,302 INFO [train.py:886] (0/4) Epoch 19, batch 4500, loss[loss=0.01569, audio_tagging_loss=0.01569, over 25000.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4942564.05 frames. ], batch size: 100, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:14:27,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=602053.3333333334, ans=0.125 2023-12-22 13:14:29,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=602053.3333333334, ans=0.0 2023-12-22 13:14:48,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=602186.6666666666, ans=0.1 2023-12-22 13:14:48,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2023-12-22 13:14:59,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=602253.3333333334, ans=0.125 2023-12-22 13:14:59,669 INFO [train.py:886] (0/4) Epoch 19, batch 4550, loss[loss=0.01447, audio_tagging_loss=0.01447, over 25000.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4941928.38 frames. ], batch size: 100, lr: 5.65e-03, grad_scale: 64.0 2023-12-22 13:15:02,431 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.607e+01 2.855e+01 2.993e+01 3.155e+01 3.667e+01, threshold=5.986e+01, percent-clipped=0.0 2023-12-22 13:15:03,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.07 vs. limit=22.5 2023-12-22 13:15:31,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=602453.3333333334, ans=0.125 2023-12-22 13:15:49,756 INFO [train.py:886] (0/4) Epoch 19, batch 4600, loss[loss=0.01546, audio_tagging_loss=0.01546, over 25000.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4942871.95 frames. ], batch size: 100, lr: 5.65e-03, grad_scale: 64.0 2023-12-22 13:15:58,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=602586.6666666666, ans=0.2 2023-12-22 13:16:01,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=602653.3333333334, ans=0.125 2023-12-22 13:16:41,041 INFO [train.py:886] (0/4) Epoch 19, batch 4650, loss[loss=0.01561, audio_tagging_loss=0.01561, over 25000.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4947653.33 frames. ], batch size: 100, lr: 5.65e-03, grad_scale: 64.0 2023-12-22 13:16:43,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=15.0 2023-12-22 13:16:43,878 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.645e+01 2.933e+01 3.063e+01 3.173e+01 3.851e+01, threshold=6.126e+01, percent-clipped=0.0 2023-12-22 13:16:58,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=602986.6666666666, ans=0.125 2023-12-22 13:17:00,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-12-22 13:17:01,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=603053.3333333334, ans=0.125 2023-12-22 13:17:09,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=603053.3333333334, ans=0.125 2023-12-22 13:17:28,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=603186.6666666666, ans=0.0 2023-12-22 13:17:28,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.45 vs. limit=22.5 2023-12-22 13:17:29,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=603253.3333333334, ans=0.0 2023-12-22 13:17:30,329 INFO [train.py:886] (0/4) Epoch 19, batch 4700, loss[loss=0.01648, audio_tagging_loss=0.01648, over 25000.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4942147.85 frames. ], batch size: 100, lr: 5.65e-03, grad_scale: 64.0 2023-12-22 13:17:37,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=603253.3333333334, ans=0.125 2023-12-22 13:17:40,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=603320.0, ans=0.2 2023-12-22 13:17:43,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=603320.0, ans=0.09899494936611666 2023-12-22 13:17:55,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=603386.6666666666, ans=0.1 2023-12-22 13:18:07,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=603453.3333333334, ans=0.2 2023-12-22 13:18:18,156 INFO [train.py:886] (0/4) Epoch 19, batch 4750, loss[loss=0.0151, audio_tagging_loss=0.0151, over 24750.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4935072.78 frames. ], batch size: 99, lr: 5.65e-03, grad_scale: 64.0 2023-12-22 13:18:20,768 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.666e+01 2.986e+01 3.070e+01 3.231e+01 3.664e+01, threshold=6.140e+01, percent-clipped=0.0 2023-12-22 13:18:23,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=603586.6666666666, ans=0.0 2023-12-22 13:18:29,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=603653.3333333334, ans=0.125 2023-12-22 13:18:33,297 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-19.pt 2023-12-22 13:18:51,858 INFO [train.py:886] (0/4) Epoch 20, batch 0, loss[loss=0.03009, audio_tagging_loss=0.03009, over 25000.00 frames. ], tot_loss[loss=0.03009, audio_tagging_loss=0.03009, over 25000.00 frames. ], batch size: 100, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:18:51,859 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 13:19:12,479 INFO [train.py:917] (0/4) Epoch 20, validation: loss=0.03315, audio_tagging_loss=0.03315, over 3737520.00 frames. 2023-12-22 13:19:12,480 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 13:19:23,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=603760.0, ans=0.125 2023-12-22 13:19:45,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=603893.3333333334, ans=0.1 2023-12-22 13:19:50,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=603893.3333333334, ans=15.0 2023-12-22 13:19:56,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=603960.0, ans=0.125 2023-12-22 13:19:59,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=603960.0, ans=0.05 2023-12-22 13:20:04,375 INFO [train.py:886] (0/4) Epoch 20, batch 50, loss[loss=0.01656, audio_tagging_loss=0.01656, over 25000.00 frames. ], tot_loss[loss=0.02156, audio_tagging_loss=0.02156, over 1115875.82 frames. ], batch size: 100, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:20:14,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=604093.3333333334, ans=0.0 2023-12-22 13:20:27,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.25 vs. limit=15.0 2023-12-22 13:20:33,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=604226.6666666666, ans=0.125 2023-12-22 13:20:42,979 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.780e+01 3.377e+01 3.713e+01 4.293e+01 9.552e+01, threshold=7.426e+01, percent-clipped=7.0 2023-12-22 13:20:46,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2023-12-22 13:20:48,490 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.71 vs. limit=12.0 2023-12-22 13:20:54,299 INFO [train.py:886] (0/4) Epoch 20, batch 100, loss[loss=0.01558, audio_tagging_loss=0.01558, over 25000.00 frames. ], tot_loss[loss=0.01881, audio_tagging_loss=0.01881, over 1968714.69 frames. ], batch size: 100, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:21:04,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.76 vs. limit=15.0 2023-12-22 13:21:23,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=604493.3333333334, ans=0.125 2023-12-22 13:21:39,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=604626.6666666666, ans=0.1 2023-12-22 13:21:46,237 INFO [train.py:886] (0/4) Epoch 20, batch 150, loss[loss=0.01793, audio_tagging_loss=0.01793, over 25000.00 frames. ], tot_loss[loss=0.01716, audio_tagging_loss=0.01716, over 2638023.09 frames. ], batch size: 100, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:22:14,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=604826.6666666666, ans=0.1 2023-12-22 13:22:24,477 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.643e+01 2.963e+01 3.102e+01 3.243e+01 3.699e+01, threshold=6.204e+01, percent-clipped=0.0 2023-12-22 13:22:29,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.49 vs. limit=22.5 2023-12-22 13:22:33,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=604960.0, ans=0.1 2023-12-22 13:22:35,853 INFO [train.py:886] (0/4) Epoch 20, batch 200, loss[loss=0.01487, audio_tagging_loss=0.01487, over 25000.00 frames. ], tot_loss[loss=0.0162, audio_tagging_loss=0.0162, over 3151132.61 frames. ], batch size: 100, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:22:55,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=605160.0, ans=0.125 2023-12-22 13:23:03,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2023-12-22 13:23:05,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=15.0 2023-12-22 13:23:05,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=605226.6666666666, ans=0.1 2023-12-22 13:23:13,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=605226.6666666666, ans=0.2 2023-12-22 13:23:16,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=605293.3333333334, ans=0.0 2023-12-22 13:23:20,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=605293.3333333334, ans=0.125 2023-12-22 13:23:25,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=605293.3333333334, ans=0.0 2023-12-22 13:23:27,217 INFO [train.py:886] (0/4) Epoch 20, batch 250, loss[loss=0.01344, audio_tagging_loss=0.01344, over 25000.00 frames. ], tot_loss[loss=0.0155, audio_tagging_loss=0.0155, over 3554146.53 frames. ], batch size: 100, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:23:54,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.43 vs. limit=10.0 2023-12-22 13:23:59,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=605560.0, ans=0.125 2023-12-22 13:24:01,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2023-12-22 13:24:03,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=605560.0, ans=0.1 2023-12-22 13:24:05,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=605560.0, ans=0.1 2023-12-22 13:24:05,955 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.603e+01 2.907e+01 3.036e+01 3.184e+01 3.948e+01, threshold=6.071e+01, percent-clipped=0.0 2023-12-22 13:24:07,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=605626.6666666666, ans=0.0 2023-12-22 13:24:10,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=605626.6666666666, ans=0.125 2023-12-22 13:24:17,995 INFO [train.py:886] (0/4) Epoch 20, batch 300, loss[loss=0.01404, audio_tagging_loss=0.01404, over 21884.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 3856897.68 frames. ], batch size: 107, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:24:28,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=605760.0, ans=0.0 2023-12-22 13:24:28,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=12.0 2023-12-22 13:25:08,343 INFO [train.py:886] (0/4) Epoch 20, batch 350, loss[loss=0.01141, audio_tagging_loss=0.01141, over 24750.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 4093875.08 frames. ], batch size: 99, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:25:26,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=606093.3333333334, ans=0.125 2023-12-22 13:25:47,579 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.469e+01 2.893e+01 2.999e+01 3.154e+01 3.798e+01, threshold=5.997e+01, percent-clipped=0.0 2023-12-22 13:25:50,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=606293.3333333334, ans=0.0 2023-12-22 13:25:59,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=606360.0, ans=0.1 2023-12-22 13:26:00,538 INFO [train.py:886] (0/4) Epoch 20, batch 400, loss[loss=0.01437, audio_tagging_loss=0.01437, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4284026.33 frames. ], batch size: 100, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:26:07,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=606360.0, ans=0.025 2023-12-22 13:26:15,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=606426.6666666666, ans=0.125 2023-12-22 13:26:31,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=606560.0, ans=0.125 2023-12-22 13:26:34,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=606560.0, ans=0.95 2023-12-22 13:26:50,002 INFO [train.py:886] (0/4) Epoch 20, batch 450, loss[loss=0.0124, audio_tagging_loss=0.0124, over 25000.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4436490.33 frames. ], batch size: 100, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:26:57,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-12-22 13:27:11,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2023-12-22 13:27:11,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.28 vs. limit=15.0 2023-12-22 13:27:14,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=606826.6666666666, ans=0.125 2023-12-22 13:27:20,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.97 vs. limit=6.0 2023-12-22 13:27:29,402 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.545e+01 2.863e+01 2.962e+01 3.098e+01 3.891e+01, threshold=5.924e+01, percent-clipped=0.0 2023-12-22 13:27:41,506 INFO [train.py:886] (0/4) Epoch 20, batch 500, loss[loss=0.0131, audio_tagging_loss=0.0131, over 25000.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4546492.56 frames. ], batch size: 100, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:27:42,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=607026.6666666666, ans=0.2 2023-12-22 13:27:56,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2023-12-22 13:28:15,205 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.63 vs. limit=22.5 2023-12-22 13:28:33,274 INFO [train.py:886] (0/4) Epoch 20, batch 550, loss[loss=0.01489, audio_tagging_loss=0.01489, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4642529.80 frames. ], batch size: 100, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:29:05,758 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2023-12-22 13:29:12,561 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+01 2.894e+01 3.022e+01 3.158e+01 3.590e+01, threshold=6.043e+01, percent-clipped=0.0 2023-12-22 13:29:23,966 INFO [train.py:886] (0/4) Epoch 20, batch 600, loss[loss=0.01418, audio_tagging_loss=0.01418, over 24750.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4714599.62 frames. ], batch size: 99, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:29:33,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=607760.0, ans=0.125 2023-12-22 13:29:34,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=607760.0, ans=0.1 2023-12-22 13:29:36,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=607760.0, ans=0.125 2023-12-22 13:29:45,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=15.0 2023-12-22 13:29:53,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=607893.3333333334, ans=0.125 2023-12-22 13:29:59,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.96 vs. limit=15.0 2023-12-22 13:30:04,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2023-12-22 13:30:15,642 INFO [train.py:886] (0/4) Epoch 20, batch 650, loss[loss=0.01361, audio_tagging_loss=0.01361, over 24750.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4760435.66 frames. ], batch size: 99, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:30:31,043 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.52 vs. limit=15.0 2023-12-22 13:30:31,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=608093.3333333334, ans=0.0 2023-12-22 13:30:31,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=608093.3333333334, ans=0.125 2023-12-22 13:30:33,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=608093.3333333334, ans=0.1 2023-12-22 13:30:54,377 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+01 2.917e+01 3.051e+01 3.204e+01 3.727e+01, threshold=6.102e+01, percent-clipped=0.0 2023-12-22 13:31:06,493 INFO [train.py:886] (0/4) Epoch 20, batch 700, loss[loss=0.01313, audio_tagging_loss=0.01313, over 24750.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4801671.60 frames. ], batch size: 99, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:31:09,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=608360.0, ans=0.0 2023-12-22 13:31:13,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=608360.0, ans=0.125 2023-12-22 13:31:14,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=608360.0, ans=0.05 2023-12-22 13:31:37,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=608560.0, ans=0.125 2023-12-22 13:31:37,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-12-22 13:31:47,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=608626.6666666666, ans=0.1 2023-12-22 13:31:47,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=608626.6666666666, ans=0.0 2023-12-22 13:31:48,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=608626.6666666666, ans=0.2 2023-12-22 13:31:53,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=608626.6666666666, ans=0.0 2023-12-22 13:31:57,662 INFO [train.py:886] (0/4) Epoch 20, batch 750, loss[loss=0.01454, audio_tagging_loss=0.01454, over 24750.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4838060.71 frames. ], batch size: 99, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:32:12,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.57 vs. limit=22.5 2023-12-22 13:32:15,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.79 vs. limit=15.0 2023-12-22 13:32:17,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=608826.6666666666, ans=0.125 2023-12-22 13:32:28,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=608893.3333333334, ans=0.1 2023-12-22 13:32:31,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=608893.3333333334, ans=0.0 2023-12-22 13:32:31,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=608893.3333333334, ans=0.0 2023-12-22 13:32:34,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=608893.3333333334, ans=0.0 2023-12-22 13:32:35,944 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.542e+01 2.906e+01 3.017e+01 3.126e+01 3.757e+01, threshold=6.033e+01, percent-clipped=0.0 2023-12-22 13:32:49,588 INFO [train.py:886] (0/4) Epoch 20, batch 800, loss[loss=0.01282, audio_tagging_loss=0.01282, over 25000.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4864173.29 frames. ], batch size: 100, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:33:01,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=609093.3333333334, ans=0.09899494936611666 2023-12-22 13:33:03,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=609093.3333333334, ans=0.0 2023-12-22 13:33:08,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=609160.0, ans=0.1 2023-12-22 13:33:20,341 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:33:20,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=609226.6666666666, ans=0.07 2023-12-22 13:33:21,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-12-22 13:33:40,121 INFO [train.py:886] (0/4) Epoch 20, batch 850, loss[loss=0.0137, audio_tagging_loss=0.0137, over 25000.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4888400.86 frames. ], batch size: 100, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:33:46,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=609360.0, ans=0.02 2023-12-22 13:33:56,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=609426.6666666666, ans=0.2 2023-12-22 13:34:04,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=609493.3333333334, ans=0.95 2023-12-22 13:34:05,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=609493.3333333334, ans=0.125 2023-12-22 13:34:11,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.47 vs. limit=10.0 2023-12-22 13:34:19,522 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.642e+01 2.973e+01 3.095e+01 3.258e+01 3.569e+01, threshold=6.190e+01, percent-clipped=0.0 2023-12-22 13:34:19,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=609560.0, ans=0.125 2023-12-22 13:34:24,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.00 vs. limit=22.5 2023-12-22 13:34:28,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=609626.6666666666, ans=0.125 2023-12-22 13:34:30,100 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.27 vs. limit=15.0 2023-12-22 13:34:32,285 INFO [train.py:886] (0/4) Epoch 20, batch 900, loss[loss=0.01357, audio_tagging_loss=0.01357, over 24750.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4902382.72 frames. ], batch size: 99, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:34:42,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=609760.0, ans=0.0 2023-12-22 13:35:03,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=16.10 vs. limit=15.0 2023-12-22 13:35:17,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.20 vs. limit=6.0 2023-12-22 13:35:24,647 INFO [train.py:886] (0/4) Epoch 20, batch 950, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 4906655.17 frames. ], batch size: 100, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:35:27,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 2023-12-22 13:35:28,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=610026.6666666666, ans=0.125 2023-12-22 13:35:55,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.39 vs. limit=10.0 2023-12-22 13:36:03,994 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.635e+01 2.893e+01 3.045e+01 3.239e+01 3.538e+01, threshold=6.090e+01, percent-clipped=0.0 2023-12-22 13:36:10,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=610293.3333333334, ans=0.125 2023-12-22 13:36:16,082 INFO [train.py:886] (0/4) Epoch 20, batch 1000, loss[loss=0.0134, audio_tagging_loss=0.0134, over 25000.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4916725.35 frames. ], batch size: 100, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:36:19,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=610360.0, ans=0.1 2023-12-22 13:36:19,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=610360.0, ans=0.125 2023-12-22 13:36:22,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=610360.0, ans=0.125 2023-12-22 13:36:35,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=610426.6666666666, ans=0.125 2023-12-22 13:36:59,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=610626.6666666666, ans=0.125 2023-12-22 13:37:03,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=610626.6666666666, ans=0.125 2023-12-22 13:37:08,805 INFO [train.py:886] (0/4) Epoch 20, batch 1050, loss[loss=0.01221, audio_tagging_loss=0.01221, over 25000.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4922061.99 frames. ], batch size: 100, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:37:16,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=610693.3333333334, ans=0.1 2023-12-22 13:37:47,458 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.594e+01 2.869e+01 3.046e+01 3.181e+01 3.757e+01, threshold=6.091e+01, percent-clipped=0.0 2023-12-22 13:38:00,258 INFO [train.py:886] (0/4) Epoch 20, batch 1100, loss[loss=0.01339, audio_tagging_loss=0.01339, over 25000.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4932494.89 frames. ], batch size: 100, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:38:20,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=12.0 2023-12-22 13:38:20,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=611160.0, ans=0.125 2023-12-22 13:38:21,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.24 vs. limit=22.5 2023-12-22 13:38:33,398 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.272e-02 2023-12-22 13:38:34,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611226.6666666666, ans=0.1 2023-12-22 13:38:50,981 INFO [train.py:886] (0/4) Epoch 20, batch 1150, loss[loss=0.01519, audio_tagging_loss=0.01519, over 24906.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4936723.33 frames. ], batch size: 100, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:38:56,862 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:39:01,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2023-12-22 13:39:08,762 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.620e-03 2023-12-22 13:39:29,145 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.553e+01 2.913e+01 3.020e+01 3.168e+01 3.585e+01, threshold=6.039e+01, percent-clipped=0.0 2023-12-22 13:39:30,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=611626.6666666666, ans=0.125 2023-12-22 13:39:31,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=611626.6666666666, ans=0.2 2023-12-22 13:39:42,697 INFO [train.py:886] (0/4) Epoch 20, batch 1200, loss[loss=0.01807, audio_tagging_loss=0.01807, over 24946.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4946475.09 frames. ], batch size: 100, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:40:08,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=611826.6666666666, ans=0.125 2023-12-22 13:40:15,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=611893.3333333334, ans=0.2 2023-12-22 13:40:23,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=611960.0, ans=0.2 2023-12-22 13:40:31,872 INFO [train.py:886] (0/4) Epoch 20, batch 1250, loss[loss=0.01413, audio_tagging_loss=0.01413, over 24750.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4946041.74 frames. ], batch size: 99, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:40:36,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=612026.6666666666, ans=0.0 2023-12-22 13:41:10,086 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.713e+01 2.956e+01 3.075e+01 3.204e+01 3.822e+01, threshold=6.150e+01, percent-clipped=0.0 2023-12-22 13:41:15,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=612293.3333333334, ans=0.0 2023-12-22 13:41:22,073 INFO [train.py:886] (0/4) Epoch 20, batch 1300, loss[loss=0.01516, audio_tagging_loss=0.01516, over 25000.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4945684.27 frames. ], batch size: 100, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:41:33,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2023-12-22 13:41:34,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=612426.6666666666, ans=0.0 2023-12-22 13:42:03,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=612626.6666666666, ans=0.1 2023-12-22 13:42:13,394 INFO [train.py:886] (0/4) Epoch 20, batch 1350, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24750.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 4944123.02 frames. ], batch size: 99, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:42:17,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=612693.3333333334, ans=0.125 2023-12-22 13:42:33,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=612826.6666666666, ans=0.05 2023-12-22 13:42:37,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-12-22 13:42:45,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=612893.3333333334, ans=0.1 2023-12-22 13:42:45,420 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-12-22 13:42:51,496 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.556e+01 2.862e+01 2.982e+01 3.143e+01 3.555e+01, threshold=5.964e+01, percent-clipped=0.0 2023-12-22 13:42:54,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=612960.0, ans=0.0 2023-12-22 13:43:02,830 INFO [train.py:886] (0/4) Epoch 20, batch 1400, loss[loss=0.0146, audio_tagging_loss=0.0146, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4948058.48 frames. ], batch size: 100, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:43:15,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=613093.3333333334, ans=0.0 2023-12-22 13:43:16,805 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:43:18,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=613093.3333333334, ans=0.1 2023-12-22 13:43:30,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=613160.0, ans=0.0 2023-12-22 13:43:38,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=613226.6666666666, ans=0.0 2023-12-22 13:43:40,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=613226.6666666666, ans=0.07 2023-12-22 13:43:50,037 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-92000.pt 2023-12-22 13:43:56,510 INFO [train.py:886] (0/4) Epoch 20, batch 1450, loss[loss=0.0133, audio_tagging_loss=0.0133, over 25000.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4946873.39 frames. ], batch size: 100, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:44:00,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=613360.0, ans=0.2 2023-12-22 13:44:03,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=613360.0, ans=0.0 2023-12-22 13:44:05,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=613426.6666666666, ans=0.125 2023-12-22 13:44:08,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=613426.6666666666, ans=0.0 2023-12-22 13:44:08,251 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2023-12-22 13:44:22,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=613493.3333333334, ans=0.0 2023-12-22 13:44:30,006 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:44:31,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.00 vs. limit=12.0 2023-12-22 13:44:34,560 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.572e+01 2.829e+01 3.018e+01 3.145e+01 3.579e+01, threshold=6.037e+01, percent-clipped=0.0 2023-12-22 13:44:45,942 INFO [train.py:886] (0/4) Epoch 20, batch 1500, loss[loss=0.01288, audio_tagging_loss=0.01288, over 25000.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4953982.22 frames. ], batch size: 100, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:44:47,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.09 vs. limit=15.0 2023-12-22 13:44:58,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=613760.0, ans=0.0 2023-12-22 13:45:16,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=613893.3333333334, ans=0.1 2023-12-22 13:45:17,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=613893.3333333334, ans=0.07 2023-12-22 13:45:20,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=613893.3333333334, ans=0.125 2023-12-22 13:45:31,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=613960.0, ans=0.125 2023-12-22 13:45:38,427 INFO [train.py:886] (0/4) Epoch 20, batch 1550, loss[loss=0.01339, audio_tagging_loss=0.01339, over 24750.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4950067.45 frames. ], batch size: 99, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:45:42,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=614026.6666666666, ans=0.1 2023-12-22 13:45:43,758 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=22.5 2023-12-22 13:45:57,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=614160.0, ans=0.2 2023-12-22 13:46:09,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=614226.6666666666, ans=0.2 2023-12-22 13:46:16,249 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.562e+01 2.942e+01 3.051e+01 3.184e+01 5.064e+01, threshold=6.103e+01, percent-clipped=0.0 2023-12-22 13:46:22,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=614293.3333333334, ans=0.05 2023-12-22 13:46:29,797 INFO [train.py:886] (0/4) Epoch 20, batch 1600, loss[loss=0.01237, audio_tagging_loss=0.01237, over 24750.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4947473.51 frames. ], batch size: 99, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:46:34,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=614360.0, ans=0.0 2023-12-22 13:46:54,366 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.095e-03 2023-12-22 13:46:59,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=614560.0, ans=0.5 2023-12-22 13:47:15,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=614626.6666666666, ans=0.125 2023-12-22 13:47:20,709 INFO [train.py:886] (0/4) Epoch 20, batch 1650, loss[loss=0.01578, audio_tagging_loss=0.01578, over 25000.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4944706.49 frames. ], batch size: 100, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:47:31,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=614760.0, ans=0.2 2023-12-22 13:47:34,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=614760.0, ans=0.125 2023-12-22 13:47:35,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=614760.0, ans=0.125 2023-12-22 13:47:39,130 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2023-12-22 13:47:43,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=614826.6666666666, ans=0.0 2023-12-22 13:47:43,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=614826.6666666666, ans=15.0 2023-12-22 13:47:52,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=614893.3333333334, ans=0.0 2023-12-22 13:47:59,615 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.531e+01 2.897e+01 3.020e+01 3.160e+01 3.964e+01, threshold=6.040e+01, percent-clipped=0.0 2023-12-22 13:48:06,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2023-12-22 13:48:12,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-12-22 13:48:13,735 INFO [train.py:886] (0/4) Epoch 20, batch 1700, loss[loss=0.01131, audio_tagging_loss=0.01131, over 21521.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4936052.57 frames. ], batch size: 107, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:48:24,292 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:48:26,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=615093.3333333334, ans=0.1 2023-12-22 13:48:27,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2023-12-22 13:48:31,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=615160.0, ans=0.2 2023-12-22 13:48:40,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=615160.0, ans=0.125 2023-12-22 13:48:45,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=615226.6666666666, ans=0.125 2023-12-22 13:48:53,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.00 vs. limit=15.0 2023-12-22 13:49:02,946 INFO [train.py:886] (0/4) Epoch 20, batch 1750, loss[loss=0.01356, audio_tagging_loss=0.01356, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4945008.48 frames. ], batch size: 100, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:49:18,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.41 vs. limit=22.5 2023-12-22 13:49:18,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=22.5 2023-12-22 13:49:27,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=615493.3333333334, ans=0.125 2023-12-22 13:49:31,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=615493.3333333334, ans=22.5 2023-12-22 13:49:31,714 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:49:35,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2023-12-22 13:49:41,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=615560.0, ans=0.125 2023-12-22 13:49:42,938 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.567e+01 2.888e+01 2.963e+01 3.094e+01 5.510e+01, threshold=5.927e+01, percent-clipped=0.0 2023-12-22 13:49:44,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=615626.6666666666, ans=0.125 2023-12-22 13:49:49,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=615626.6666666666, ans=0.2 2023-12-22 13:49:54,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=615693.3333333334, ans=0.0 2023-12-22 13:49:54,993 INFO [train.py:886] (0/4) Epoch 20, batch 1800, loss[loss=0.01346, audio_tagging_loss=0.01346, over 25000.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4951914.87 frames. ], batch size: 100, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:50:06,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=615760.0, ans=0.0 2023-12-22 13:50:06,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=615760.0, ans=0.125 2023-12-22 13:50:24,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=615826.6666666666, ans=0.2 2023-12-22 13:50:34,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=615960.0, ans=0.1 2023-12-22 13:50:47,407 INFO [train.py:886] (0/4) Epoch 20, batch 1850, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24750.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4951688.59 frames. ], batch size: 99, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:50:52,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=616026.6666666666, ans=0.125 2023-12-22 13:50:52,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=616026.6666666666, ans=0.125 2023-12-22 13:50:52,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2023-12-22 13:51:12,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=616160.0, ans=0.0 2023-12-22 13:51:17,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=616226.6666666666, ans=0.0 2023-12-22 13:51:25,863 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.662e+01 2.959e+01 3.073e+01 3.205e+01 3.995e+01, threshold=6.146e+01, percent-clipped=0.0 2023-12-22 13:51:34,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.85 vs. limit=10.0 2023-12-22 13:51:38,111 INFO [train.py:886] (0/4) Epoch 20, batch 1900, loss[loss=0.0146, audio_tagging_loss=0.0146, over 24750.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4945972.76 frames. ], batch size: 99, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:51:49,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=616426.6666666666, ans=0.0 2023-12-22 13:51:52,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=616426.6666666666, ans=0.2 2023-12-22 13:52:01,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=616493.3333333334, ans=0.0 2023-12-22 13:52:11,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=616560.0, ans=0.1 2023-12-22 13:52:22,928 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2023-12-22 13:52:30,204 INFO [train.py:886] (0/4) Epoch 20, batch 1950, loss[loss=0.01441, audio_tagging_loss=0.01441, over 25000.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4943173.57 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 32.0 2023-12-22 13:52:30,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=616693.3333333334, ans=0.2 2023-12-22 13:52:42,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=616760.0, ans=0.125 2023-12-22 13:52:43,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=616760.0, ans=0.2 2023-12-22 13:52:58,692 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.644e-03 2023-12-22 13:53:00,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.79 vs. limit=10.0 2023-12-22 13:53:09,468 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.571e+01 2.904e+01 3.050e+01 3.172e+01 4.406e+01, threshold=6.100e+01, percent-clipped=0.0 2023-12-22 13:53:11,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=616960.0, ans=0.125 2023-12-22 13:53:19,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=616960.0, ans=0.1 2023-12-22 13:53:22,151 INFO [train.py:886] (0/4) Epoch 20, batch 2000, loss[loss=0.01166, audio_tagging_loss=0.01166, over 25000.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4941769.61 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:53:25,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=617026.6666666666, ans=0.125 2023-12-22 13:53:48,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.85 vs. limit=22.5 2023-12-22 13:53:50,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=617160.0, ans=0.05 2023-12-22 13:54:03,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=617293.3333333334, ans=0.125 2023-12-22 13:54:05,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=617293.3333333334, ans=0.125 2023-12-22 13:54:14,208 INFO [train.py:886] (0/4) Epoch 20, batch 2050, loss[loss=0.01462, audio_tagging_loss=0.01462, over 25000.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4946994.95 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:54:21,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=617360.0, ans=0.125 2023-12-22 13:54:32,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=617426.6666666666, ans=0.2 2023-12-22 13:54:37,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=617493.3333333334, ans=0.1 2023-12-22 13:54:40,581 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=15.0 2023-12-22 13:54:42,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=617493.3333333334, ans=0.05 2023-12-22 13:54:47,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=617560.0, ans=0.125 2023-12-22 13:54:53,074 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.697e+01 2.903e+01 3.029e+01 3.164e+01 3.600e+01, threshold=6.057e+01, percent-clipped=0.0 2023-12-22 13:55:06,640 INFO [train.py:886] (0/4) Epoch 20, batch 2100, loss[loss=0.01271, audio_tagging_loss=0.01271, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4949634.99 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:55:29,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=617826.6666666666, ans=0.2 2023-12-22 13:55:32,696 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:55:35,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2023-12-22 13:55:57,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.02 vs. limit=6.0 2023-12-22 13:55:57,516 INFO [train.py:886] (0/4) Epoch 20, batch 2150, loss[loss=0.01476, audio_tagging_loss=0.01476, over 25000.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4952634.07 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:56:26,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=618160.0, ans=0.125 2023-12-22 13:56:31,422 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2023-12-22 13:56:37,616 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.487e+01 2.945e+01 3.054e+01 3.212e+01 3.650e+01, threshold=6.109e+01, percent-clipped=0.0 2023-12-22 13:56:49,059 INFO [train.py:886] (0/4) Epoch 20, batch 2200, loss[loss=0.01675, audio_tagging_loss=0.01675, over 24750.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4946772.47 frames. ], batch size: 99, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:57:14,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=618493.3333333334, ans=0.2 2023-12-22 13:57:16,343 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2023-12-22 13:57:24,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=618560.0, ans=0.1 2023-12-22 13:57:36,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=618626.6666666666, ans=0.1 2023-12-22 13:57:40,635 INFO [train.py:886] (0/4) Epoch 20, batch 2250, loss[loss=0.01225, audio_tagging_loss=0.01225, over 25000.00 frames. ], tot_loss[loss=0.0138, audio_tagging_loss=0.0138, over 4945122.26 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:57:44,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=618693.3333333334, ans=0.0 2023-12-22 13:57:51,231 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:57:54,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=12.0 2023-12-22 13:58:02,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=618826.6666666666, ans=0.125 2023-12-22 13:58:06,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=618826.6666666666, ans=0.2 2023-12-22 13:58:18,003 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.647e+01 2.946e+01 3.059e+01 3.211e+01 3.803e+01, threshold=6.118e+01, percent-clipped=0.0 2023-12-22 13:58:19,119 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:58:22,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=618960.0, ans=0.2 2023-12-22 13:58:22,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=618960.0, ans=0.1 2023-12-22 13:58:25,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=618960.0, ans=0.125 2023-12-22 13:58:25,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=618960.0, ans=0.0 2023-12-22 13:58:26,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=618960.0, ans=0.2 2023-12-22 13:58:29,455 INFO [train.py:886] (0/4) Epoch 20, batch 2300, loss[loss=0.01282, audio_tagging_loss=0.01282, over 24750.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4946409.12 frames. ], batch size: 99, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 13:58:29,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=619026.6666666666, ans=0.0 2023-12-22 13:58:32,072 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2023-12-22 13:58:44,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=619093.3333333334, ans=0.1 2023-12-22 13:58:46,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=619093.3333333334, ans=0.0 2023-12-22 13:59:21,881 INFO [train.py:886] (0/4) Epoch 20, batch 2350, loss[loss=0.01434, audio_tagging_loss=0.01434, over 24750.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4946205.26 frames. ], batch size: 99, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 13:59:47,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=619493.3333333334, ans=0.125 2023-12-22 13:59:49,872 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.59 vs. limit=15.0 2023-12-22 13:59:54,242 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:00:00,826 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.549e+01 2.868e+01 3.010e+01 3.128e+01 3.736e+01, threshold=6.020e+01, percent-clipped=0.0 2023-12-22 14:00:05,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=619626.6666666666, ans=0.2 2023-12-22 14:00:12,308 INFO [train.py:886] (0/4) Epoch 20, batch 2400, loss[loss=0.01417, audio_tagging_loss=0.01417, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4954220.22 frames. ], batch size: 100, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 14:00:23,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=619760.0, ans=0.04949747468305833 2023-12-22 14:00:26,871 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.18 vs. limit=15.0 2023-12-22 14:00:41,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=619893.3333333334, ans=0.0 2023-12-22 14:00:43,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=619893.3333333334, ans=0.2 2023-12-22 14:00:58,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=619960.0, ans=0.125 2023-12-22 14:01:03,797 INFO [train.py:886] (0/4) Epoch 20, batch 2450, loss[loss=0.01436, audio_tagging_loss=0.01436, over 25000.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4952758.05 frames. ], batch size: 100, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 14:01:04,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=620026.6666666666, ans=0.2 2023-12-22 14:01:11,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.01 vs. limit=6.0 2023-12-22 14:01:32,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=620160.0, ans=0.5 2023-12-22 14:01:36,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.16 vs. limit=15.0 2023-12-22 14:01:39,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=620226.6666666666, ans=0.0 2023-12-22 14:01:41,190 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+01 2.937e+01 3.082e+01 3.201e+01 3.649e+01, threshold=6.165e+01, percent-clipped=0.0 2023-12-22 14:01:44,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620293.3333333334, ans=0.1 2023-12-22 14:01:54,862 INFO [train.py:886] (0/4) Epoch 20, batch 2500, loss[loss=0.01525, audio_tagging_loss=0.01525, over 24750.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4955756.60 frames. ], batch size: 99, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 14:02:02,842 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-12-22 14:02:10,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=620426.6666666666, ans=15.0 2023-12-22 14:02:12,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=620493.3333333334, ans=0.0 2023-12-22 14:02:33,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=620560.0, ans=0.2 2023-12-22 14:02:34,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=620626.6666666666, ans=0.125 2023-12-22 14:02:37,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=620626.6666666666, ans=0.125 2023-12-22 14:02:44,679 INFO [train.py:886] (0/4) Epoch 20, batch 2550, loss[loss=0.01425, audio_tagging_loss=0.01425, over 25000.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4946546.61 frames. ], batch size: 100, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 14:02:48,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=620693.3333333334, ans=0.0 2023-12-22 14:03:10,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=620826.6666666666, ans=0.1 2023-12-22 14:03:12,519 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.62 vs. limit=22.5 2023-12-22 14:03:19,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=620893.3333333334, ans=0.125 2023-12-22 14:03:25,336 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.633e+01 2.957e+01 3.094e+01 3.249e+01 3.777e+01, threshold=6.188e+01, percent-clipped=0.0 2023-12-22 14:03:35,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=620960.0, ans=0.1 2023-12-22 14:03:37,539 INFO [train.py:886] (0/4) Epoch 20, batch 2600, loss[loss=0.01163, audio_tagging_loss=0.01163, over 24750.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4947318.60 frames. ], batch size: 99, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 14:03:49,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=621093.3333333334, ans=0.0 2023-12-22 14:04:11,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=621226.6666666666, ans=0.1 2023-12-22 14:04:11,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=621226.6666666666, ans=0.2 2023-12-22 14:04:15,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=621226.6666666666, ans=0.0 2023-12-22 14:04:16,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621226.6666666666, ans=0.1 2023-12-22 14:04:21,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=621293.3333333334, ans=0.0 2023-12-22 14:04:27,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=621293.3333333334, ans=0.0 2023-12-22 14:04:30,330 INFO [train.py:886] (0/4) Epoch 20, batch 2650, loss[loss=0.01528, audio_tagging_loss=0.01528, over 24750.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4945349.28 frames. ], batch size: 99, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:04:35,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=621360.0, ans=0.0 2023-12-22 14:04:39,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=621426.6666666666, ans=0.04949747468305833 2023-12-22 14:04:45,481 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:05:09,979 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.585e+01 2.862e+01 3.001e+01 3.149e+01 4.006e+01, threshold=6.003e+01, percent-clipped=0.0 2023-12-22 14:05:15,363 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2023-12-22 14:05:15,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=621626.6666666666, ans=0.125 2023-12-22 14:05:21,415 INFO [train.py:886] (0/4) Epoch 20, batch 2700, loss[loss=0.0132, audio_tagging_loss=0.0132, over 25000.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4948956.58 frames. ], batch size: 100, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:05:31,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=621760.0, ans=0.125 2023-12-22 14:05:36,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621760.0, ans=0.1 2023-12-22 14:05:43,606 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:06:11,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2023-12-22 14:06:12,930 INFO [train.py:886] (0/4) Epoch 20, batch 2750, loss[loss=0.01556, audio_tagging_loss=0.01556, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4954052.81 frames. ], batch size: 100, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:06:15,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=622026.6666666666, ans=0.125 2023-12-22 14:06:23,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=622093.3333333334, ans=0.125 2023-12-22 14:06:27,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=622093.3333333334, ans=0.2 2023-12-22 14:06:37,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=622160.0, ans=0.125 2023-12-22 14:06:37,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.42 vs. limit=22.5 2023-12-22 14:06:40,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=622160.0, ans=0.1 2023-12-22 14:06:52,647 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.580e+01 2.935e+01 3.098e+01 3.194e+01 3.617e+01, threshold=6.197e+01, percent-clipped=0.0 2023-12-22 14:07:04,114 INFO [train.py:886] (0/4) Epoch 20, batch 2800, loss[loss=0.01409, audio_tagging_loss=0.01409, over 24750.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4953833.50 frames. ], batch size: 99, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:07:04,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=622360.0, ans=0.0 2023-12-22 14:07:08,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=622360.0, ans=0.0 2023-12-22 14:07:15,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=622426.6666666666, ans=0.125 2023-12-22 14:07:16,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.64 vs. limit=10.0 2023-12-22 14:07:19,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=622426.6666666666, ans=0.04949747468305833 2023-12-22 14:07:24,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=622493.3333333334, ans=0.0 2023-12-22 14:07:41,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=622560.0, ans=0.125 2023-12-22 14:07:41,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=622560.0, ans=0.125 2023-12-22 14:07:43,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=622560.0, ans=0.125 2023-12-22 14:07:52,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=622626.6666666666, ans=0.125 2023-12-22 14:07:54,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=622626.6666666666, ans=0.0 2023-12-22 14:07:56,124 INFO [train.py:886] (0/4) Epoch 20, batch 2850, loss[loss=0.01565, audio_tagging_loss=0.01565, over 25000.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4949319.62 frames. ], batch size: 100, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:07:59,742 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.57 vs. limit=15.0 2023-12-22 14:08:05,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=622760.0, ans=0.0 2023-12-22 14:08:06,874 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.949e-02 2023-12-22 14:08:34,137 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.595e+01 2.924e+01 3.080e+01 3.230e+01 3.669e+01, threshold=6.161e+01, percent-clipped=0.0 2023-12-22 14:08:34,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=622893.3333333334, ans=0.125 2023-12-22 14:08:36,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=622960.0, ans=0.1 2023-12-22 14:08:37,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=622960.0, ans=0.2 2023-12-22 14:08:46,422 INFO [train.py:886] (0/4) Epoch 20, batch 2900, loss[loss=0.013, audio_tagging_loss=0.013, over 25000.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4946969.23 frames. ], batch size: 100, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:09:12,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2023-12-22 14:09:26,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.20 vs. limit=10.0 2023-12-22 14:09:33,102 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:09:36,745 INFO [train.py:886] (0/4) Epoch 20, batch 2950, loss[loss=0.01539, audio_tagging_loss=0.01539, over 25000.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4945772.12 frames. ], batch size: 100, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:09:36,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=623360.0, ans=0.125 2023-12-22 14:09:41,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=623360.0, ans=0.125 2023-12-22 14:09:45,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.49 vs. limit=6.0 2023-12-22 14:09:58,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.03 vs. limit=15.0 2023-12-22 14:10:04,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=623493.3333333334, ans=0.0 2023-12-22 14:10:08,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=623560.0, ans=0.1 2023-12-22 14:10:10,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=623560.0, ans=0.0 2023-12-22 14:10:14,857 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.630e+01 2.870e+01 3.046e+01 3.169e+01 3.607e+01, threshold=6.091e+01, percent-clipped=0.0 2023-12-22 14:10:28,855 INFO [train.py:886] (0/4) Epoch 20, batch 3000, loss[loss=0.01476, audio_tagging_loss=0.01476, over 25000.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4948080.50 frames. ], batch size: 100, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:10:28,857 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 14:10:38,199 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2178, 3.9526, 3.8977, 3.7188], device='cuda:0') 2023-12-22 14:10:46,594 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.3602, 3.2112, 3.4076, 3.4787, 3.4630, 3.6044, 2.6511, 2.7059], device='cuda:0') 2023-12-22 14:10:50,341 INFO [train.py:917] (0/4) Epoch 20, validation: loss=0.03313, audio_tagging_loss=0.03313, over 3737520.00 frames. 2023-12-22 14:10:50,341 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 14:10:50,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=623693.3333333334, ans=0.0 2023-12-22 14:10:54,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=623693.3333333334, ans=0.125 2023-12-22 14:10:54,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=623693.3333333334, ans=0.0 2023-12-22 14:11:10,473 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.08 vs. limit=15.0 2023-12-22 14:11:25,082 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=15.0 2023-12-22 14:11:29,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2023-12-22 14:11:34,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=623960.0, ans=15.0 2023-12-22 14:11:36,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2023-12-22 14:11:40,485 INFO [train.py:886] (0/4) Epoch 20, batch 3050, loss[loss=0.01645, audio_tagging_loss=0.01645, over 25000.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4948611.10 frames. ], batch size: 100, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:11:56,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.71 vs. limit=10.0 2023-12-22 14:12:14,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2023-12-22 14:12:18,570 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.616e+01 2.950e+01 3.022e+01 3.122e+01 3.698e+01, threshold=6.045e+01, percent-clipped=0.0 2023-12-22 14:12:30,781 INFO [train.py:886] (0/4) Epoch 20, batch 3100, loss[loss=0.01217, audio_tagging_loss=0.01217, over 24750.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4950342.69 frames. ], batch size: 99, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:12:39,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=624360.0, ans=0.125 2023-12-22 14:12:44,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=624426.6666666666, ans=0.05 2023-12-22 14:12:50,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=624493.3333333334, ans=0.125 2023-12-22 14:13:00,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=624560.0, ans=0.1 2023-12-22 14:13:07,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=624560.0, ans=0.04949747468305833 2023-12-22 14:13:11,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2023-12-22 14:13:19,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=624626.6666666666, ans=0.125 2023-12-22 14:13:20,714 INFO [train.py:886] (0/4) Epoch 20, batch 3150, loss[loss=0.01464, audio_tagging_loss=0.01464, over 24947.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4947947.31 frames. ], batch size: 100, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:13:27,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=624693.3333333334, ans=0.125 2023-12-22 14:13:32,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=624760.0, ans=0.2 2023-12-22 14:13:38,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=624760.0, ans=0.0 2023-12-22 14:13:39,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=624760.0, ans=0.125 2023-12-22 14:13:41,413 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-12-22 14:13:42,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=624826.6666666666, ans=0.2 2023-12-22 14:13:43,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=624826.6666666666, ans=0.1 2023-12-22 14:13:47,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=624826.6666666666, ans=0.1 2023-12-22 14:13:51,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=624893.3333333334, ans=0.2 2023-12-22 14:14:00,106 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.588e+01 2.956e+01 3.114e+01 3.316e+01 3.696e+01, threshold=6.228e+01, percent-clipped=0.0 2023-12-22 14:14:11,505 INFO [train.py:886] (0/4) Epoch 20, batch 3200, loss[loss=0.01451, audio_tagging_loss=0.01451, over 24750.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4947304.17 frames. ], batch size: 99, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:14:11,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=625026.6666666666, ans=0.0 2023-12-22 14:14:30,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=625093.3333333334, ans=0.125 2023-12-22 14:14:31,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=625160.0, ans=0.07 2023-12-22 14:14:34,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=625160.0, ans=0.2 2023-12-22 14:14:46,043 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.441e-02 2023-12-22 14:14:47,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=625226.6666666666, ans=0.125 2023-12-22 14:15:03,431 INFO [train.py:886] (0/4) Epoch 20, batch 3250, loss[loss=0.01275, audio_tagging_loss=0.01275, over 21910.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4950980.83 frames. ], batch size: 107, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:15:10,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=625360.0, ans=0.125 2023-12-22 14:15:12,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=625426.6666666666, ans=0.125 2023-12-22 14:15:12,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=625426.6666666666, ans=0.2 2023-12-22 14:15:12,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625426.6666666666, ans=0.1 2023-12-22 14:15:22,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=625493.3333333334, ans=0.2 2023-12-22 14:15:40,989 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.609e+01 2.857e+01 2.981e+01 3.142e+01 3.421e+01, threshold=5.962e+01, percent-clipped=0.0 2023-12-22 14:15:44,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.73 vs. limit=22.5 2023-12-22 14:15:47,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=625626.6666666666, ans=0.125 2023-12-22 14:15:50,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2023-12-22 14:15:51,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-12-22 14:15:52,999 INFO [train.py:886] (0/4) Epoch 20, batch 3300, loss[loss=0.01471, audio_tagging_loss=0.01471, over 25000.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4952564.90 frames. ], batch size: 100, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:15:53,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=625693.3333333334, ans=0.2 2023-12-22 14:15:54,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=625693.3333333334, ans=0.0 2023-12-22 14:16:05,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2023-12-22 14:16:34,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.46 vs. limit=22.5 2023-12-22 14:16:35,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=625960.0, ans=0.125 2023-12-22 14:16:43,894 INFO [train.py:886] (0/4) Epoch 20, batch 3350, loss[loss=0.01273, audio_tagging_loss=0.01273, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4958604.73 frames. ], batch size: 100, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:16:51,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=626026.6666666666, ans=0.07 2023-12-22 14:16:53,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=626093.3333333334, ans=0.1 2023-12-22 14:17:13,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=626226.6666666666, ans=0.1 2023-12-22 14:17:15,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=15.0 2023-12-22 14:17:18,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=626226.6666666666, ans=0.125 2023-12-22 14:17:21,211 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.663e+01 2.948e+01 3.074e+01 3.157e+01 3.734e+01, threshold=6.147e+01, percent-clipped=0.0 2023-12-22 14:17:24,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=626293.3333333334, ans=0.125 2023-12-22 14:17:28,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=626293.3333333334, ans=0.0 2023-12-22 14:17:30,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=626293.3333333334, ans=0.07 2023-12-22 14:17:33,304 INFO [train.py:886] (0/4) Epoch 20, batch 3400, loss[loss=0.01337, audio_tagging_loss=0.01337, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4961081.92 frames. ], batch size: 100, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:17:38,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=626360.0, ans=0.0 2023-12-22 14:17:40,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=626360.0, ans=0.125 2023-12-22 14:17:47,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=626426.6666666666, ans=0.0 2023-12-22 14:17:47,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=15.0 2023-12-22 14:17:50,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=626426.6666666666, ans=0.0 2023-12-22 14:18:06,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=626560.0, ans=0.125 2023-12-22 14:18:25,378 INFO [train.py:886] (0/4) Epoch 20, batch 3450, loss[loss=0.01461, audio_tagging_loss=0.01461, over 24750.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4955095.82 frames. ], batch size: 99, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:18:55,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=626826.6666666666, ans=0.1 2023-12-22 14:18:58,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=626893.3333333334, ans=0.0 2023-12-22 14:19:04,164 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.632e+01 2.964e+01 3.104e+01 3.232e+01 3.716e+01, threshold=6.208e+01, percent-clipped=0.0 2023-12-22 14:19:08,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=626960.0, ans=0.0 2023-12-22 14:19:11,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.45 vs. limit=10.0 2023-12-22 14:19:14,905 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:19:16,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=627026.6666666666, ans=0.025 2023-12-22 14:19:17,578 INFO [train.py:886] (0/4) Epoch 20, batch 3500, loss[loss=0.01347, audio_tagging_loss=0.01347, over 25000.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4950818.78 frames. ], batch size: 100, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:19:17,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=627026.6666666666, ans=0.125 2023-12-22 14:19:21,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=627026.6666666666, ans=0.125 2023-12-22 14:19:30,413 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=22.5 2023-12-22 14:19:31,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=627093.3333333334, ans=0.125 2023-12-22 14:19:32,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=627093.3333333334, ans=0.125 2023-12-22 14:19:51,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=627226.6666666666, ans=0.0 2023-12-22 14:20:01,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=627293.3333333334, ans=0.125 2023-12-22 14:20:05,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=627293.3333333334, ans=0.1 2023-12-22 14:20:07,843 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.49 vs. limit=15.0 2023-12-22 14:20:08,209 INFO [train.py:886] (0/4) Epoch 20, batch 3550, loss[loss=0.01608, audio_tagging_loss=0.01608, over 24750.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4954549.54 frames. ], batch size: 99, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:20:09,657 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.52 vs. limit=22.5 2023-12-22 14:20:31,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=627493.3333333334, ans=0.125 2023-12-22 14:20:32,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=627493.3333333334, ans=0.125 2023-12-22 14:20:39,479 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.98 vs. limit=6.0 2023-12-22 14:20:50,038 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.608e+01 2.881e+01 3.030e+01 3.174e+01 3.581e+01, threshold=6.061e+01, percent-clipped=0.0 2023-12-22 14:20:51,692 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.43 vs. limit=10.0 2023-12-22 14:20:56,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=627626.6666666666, ans=0.2 2023-12-22 14:21:01,409 INFO [train.py:886] (0/4) Epoch 20, batch 3600, loss[loss=0.01368, audio_tagging_loss=0.01368, over 25000.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4952572.38 frames. ], batch size: 100, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:21:06,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=627693.3333333334, ans=0.09899494936611666 2023-12-22 14:21:08,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2023-12-22 14:21:27,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.31 vs. limit=22.5 2023-12-22 14:21:29,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=627826.6666666666, ans=0.0 2023-12-22 14:21:30,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2023-12-22 14:21:41,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=627960.0, ans=0.125 2023-12-22 14:21:44,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2023-12-22 14:21:53,546 INFO [train.py:886] (0/4) Epoch 20, batch 3650, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4948521.72 frames. ], batch size: 100, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:22:08,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=628093.3333333334, ans=0.125 2023-12-22 14:22:08,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=12.0 2023-12-22 14:22:17,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=628160.0, ans=0.125 2023-12-22 14:22:28,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=628226.6666666666, ans=0.1 2023-12-22 14:22:29,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=628226.6666666666, ans=0.125 2023-12-22 14:22:32,328 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.583e+01 2.914e+01 3.056e+01 3.191e+01 3.710e+01, threshold=6.111e+01, percent-clipped=0.0 2023-12-22 14:22:35,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=628293.3333333334, ans=0.125 2023-12-22 14:22:43,674 INFO [train.py:886] (0/4) Epoch 20, batch 3700, loss[loss=0.01472, audio_tagging_loss=0.01472, over 25000.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4944874.76 frames. ], batch size: 100, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:22:45,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=628360.0, ans=0.125 2023-12-22 14:22:50,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=628360.0, ans=0.125 2023-12-22 14:22:52,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=628426.6666666666, ans=0.1 2023-12-22 14:23:35,210 INFO [train.py:886] (0/4) Epoch 20, batch 3750, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24952.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4948123.39 frames. ], batch size: 100, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:23:39,292 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:23:42,511 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=15.0 2023-12-22 14:24:11,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=628893.3333333334, ans=0.125 2023-12-22 14:24:14,147 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.638e+01 2.921e+01 3.146e+01 3.299e+01 3.816e+01, threshold=6.291e+01, percent-clipped=0.0 2023-12-22 14:24:14,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=628893.3333333334, ans=0.125 2023-12-22 14:24:14,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=628893.3333333334, ans=0.2 2023-12-22 14:24:25,604 INFO [train.py:886] (0/4) Epoch 20, batch 3800, loss[loss=0.01375, audio_tagging_loss=0.01375, over 24750.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4943986.91 frames. ], batch size: 99, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:24:36,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=629093.3333333334, ans=0.125 2023-12-22 14:24:41,480 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.99 vs. limit=12.0 2023-12-22 14:24:42,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=629093.3333333334, ans=0.125 2023-12-22 14:24:43,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=629093.3333333334, ans=0.125 2023-12-22 14:24:47,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=629160.0, ans=0.2 2023-12-22 14:24:55,527 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:25:04,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=629226.6666666666, ans=0.2 2023-12-22 14:25:18,293 INFO [train.py:886] (0/4) Epoch 20, batch 3850, loss[loss=0.01422, audio_tagging_loss=0.01422, over 25000.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4941634.23 frames. ], batch size: 100, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:25:28,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=629426.6666666666, ans=0.125 2023-12-22 14:25:33,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=629426.6666666666, ans=0.0 2023-12-22 14:25:38,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2023-12-22 14:25:44,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=629493.3333333334, ans=0.125 2023-12-22 14:25:57,051 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.593e+01 2.955e+01 3.052e+01 3.216e+01 3.711e+01, threshold=6.104e+01, percent-clipped=0.0 2023-12-22 14:25:57,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.68 vs. limit=22.5 2023-12-22 14:26:03,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=629626.6666666666, ans=0.2 2023-12-22 14:26:05,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=629626.6666666666, ans=0.125 2023-12-22 14:26:10,654 INFO [train.py:886] (0/4) Epoch 20, batch 3900, loss[loss=0.01523, audio_tagging_loss=0.01523, over 25000.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4942104.55 frames. ], batch size: 100, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:26:10,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=629693.3333333334, ans=0.05 2023-12-22 14:26:50,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.50 vs. limit=22.5 2023-12-22 14:26:54,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=629960.0, ans=0.0 2023-12-22 14:26:55,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.12 vs. limit=10.0 2023-12-22 14:27:01,637 INFO [train.py:886] (0/4) Epoch 20, batch 3950, loss[loss=0.01298, audio_tagging_loss=0.01298, over 25000.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4949060.35 frames. ], batch size: 100, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:27:09,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=630026.6666666666, ans=0.125 2023-12-22 14:27:10,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=630026.6666666666, ans=0.125 2023-12-22 14:27:10,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=630026.6666666666, ans=0.0 2023-12-22 14:27:20,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.06 vs. limit=15.0 2023-12-22 14:27:40,561 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.540e+01 2.892e+01 3.043e+01 3.159e+01 3.802e+01, threshold=6.086e+01, percent-clipped=0.0 2023-12-22 14:27:45,495 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-12-22 14:27:53,425 INFO [train.py:886] (0/4) Epoch 20, batch 4000, loss[loss=0.01393, audio_tagging_loss=0.01393, over 25000.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4949422.79 frames. ], batch size: 100, lr: 5.39e-03, grad_scale: 128.0 2023-12-22 14:28:00,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=630360.0, ans=0.125 2023-12-22 14:28:16,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=630493.3333333334, ans=0.125 2023-12-22 14:28:31,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=630560.0, ans=0.125 2023-12-22 14:28:34,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=630626.6666666666, ans=0.1 2023-12-22 14:28:37,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=630626.6666666666, ans=0.125 2023-12-22 14:28:42,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=630626.6666666666, ans=0.07 2023-12-22 14:28:44,050 INFO [train.py:886] (0/4) Epoch 20, batch 4050, loss[loss=0.0141, audio_tagging_loss=0.0141, over 24750.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4957128.27 frames. ], batch size: 99, lr: 5.38e-03, grad_scale: 128.0 2023-12-22 14:28:50,662 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.61 vs. limit=15.0 2023-12-22 14:28:53,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-12-22 14:29:04,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=630826.6666666666, ans=0.0 2023-12-22 14:29:24,772 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.684e+01 2.997e+01 3.121e+01 3.224e+01 3.703e+01, threshold=6.243e+01, percent-clipped=0.0 2023-12-22 14:29:35,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=631026.6666666666, ans=0.125 2023-12-22 14:29:36,667 INFO [train.py:886] (0/4) Epoch 20, batch 4100, loss[loss=0.01331, audio_tagging_loss=0.01331, over 24750.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4956067.19 frames. ], batch size: 99, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:29:46,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=631093.3333333334, ans=0.0 2023-12-22 14:29:56,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.35 vs. limit=22.5 2023-12-22 14:30:02,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=631160.0, ans=0.2 2023-12-22 14:30:16,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.19 vs. limit=22.5 2023-12-22 14:30:27,671 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.69 vs. limit=22.5 2023-12-22 14:30:28,232 INFO [train.py:886] (0/4) Epoch 20, batch 4150, loss[loss=0.01351, audio_tagging_loss=0.01351, over 24750.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4950126.23 frames. ], batch size: 99, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:30:32,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=631360.0, ans=0.2 2023-12-22 14:30:32,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.11 vs. limit=22.5 2023-12-22 14:30:32,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.43 vs. limit=12.0 2023-12-22 14:30:46,318 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.59 vs. limit=22.5 2023-12-22 14:30:57,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=631493.3333333334, ans=0.125 2023-12-22 14:30:58,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.58 vs. limit=22.5 2023-12-22 14:31:07,721 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.483e+01 2.915e+01 3.076e+01 3.208e+01 3.689e+01, threshold=6.152e+01, percent-clipped=0.0 2023-12-22 14:31:14,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=631626.6666666666, ans=0.125 2023-12-22 14:31:18,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=631693.3333333334, ans=0.125 2023-12-22 14:31:18,938 INFO [train.py:886] (0/4) Epoch 20, batch 4200, loss[loss=0.01301, audio_tagging_loss=0.01301, over 25000.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4949414.30 frames. ], batch size: 100, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:31:35,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=12.0 2023-12-22 14:31:36,627 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=15.0 2023-12-22 14:31:40,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=631826.6666666666, ans=0.2 2023-12-22 14:31:43,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=631826.6666666666, ans=0.1 2023-12-22 14:31:44,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=631826.6666666666, ans=0.125 2023-12-22 14:31:45,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=631826.6666666666, ans=0.0 2023-12-22 14:32:07,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=631960.0, ans=0.0 2023-12-22 14:32:11,372 INFO [train.py:886] (0/4) Epoch 20, batch 4250, loss[loss=0.01362, audio_tagging_loss=0.01362, over 25000.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4951352.19 frames. ], batch size: 100, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:32:13,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=632026.6666666666, ans=0.0 2023-12-22 14:32:32,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=632160.0, ans=0.125 2023-12-22 14:32:37,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=632160.0, ans=0.0 2023-12-22 14:32:45,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=632226.6666666666, ans=0.125 2023-12-22 14:32:46,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=632226.6666666666, ans=0.0 2023-12-22 14:32:51,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=632226.6666666666, ans=0.125 2023-12-22 14:32:52,163 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.605e+01 2.864e+01 3.006e+01 3.128e+01 3.399e+01, threshold=6.011e+01, percent-clipped=0.0 2023-12-22 14:32:56,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=632293.3333333334, ans=0.125 2023-12-22 14:32:57,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632293.3333333334, ans=0.1 2023-12-22 14:33:04,786 INFO [train.py:886] (0/4) Epoch 20, batch 4300, loss[loss=0.01344, audio_tagging_loss=0.01344, over 24750.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4952363.75 frames. ], batch size: 99, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:33:06,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2023-12-22 14:33:36,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=632560.0, ans=0.2 2023-12-22 14:33:40,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=632560.0, ans=0.07 2023-12-22 14:33:45,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=632626.6666666666, ans=0.0 2023-12-22 14:33:54,941 INFO [train.py:886] (0/4) Epoch 20, batch 4350, loss[loss=0.01407, audio_tagging_loss=0.01407, over 22088.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4953085.56 frames. ], batch size: 107, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:34:35,355 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+01 2.921e+01 3.074e+01 3.206e+01 3.879e+01, threshold=6.148e+01, percent-clipped=0.0 2023-12-22 14:34:37,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=632960.0, ans=0.07 2023-12-22 14:34:40,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=632960.0, ans=0.125 2023-12-22 14:34:41,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=632960.0, ans=0.125 2023-12-22 14:34:47,185 INFO [train.py:886] (0/4) Epoch 20, batch 4400, loss[loss=0.01284, audio_tagging_loss=0.01284, over 24750.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4947624.56 frames. ], batch size: 99, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:35:17,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633226.6666666666, ans=0.1 2023-12-22 14:35:29,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=633293.3333333334, ans=0.0 2023-12-22 14:35:34,530 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2023-12-22 14:35:38,542 INFO [train.py:886] (0/4) Epoch 20, batch 4450, loss[loss=0.01307, audio_tagging_loss=0.01307, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4944779.27 frames. ], batch size: 100, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:35:52,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=633426.6666666666, ans=0.04949747468305833 2023-12-22 14:35:53,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=633426.6666666666, ans=0.2 2023-12-22 14:36:20,616 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+01 2.929e+01 3.037e+01 3.190e+01 3.598e+01, threshold=6.073e+01, percent-clipped=0.0 2023-12-22 14:36:21,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=633626.6666666666, ans=0.07 2023-12-22 14:36:22,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=633626.6666666666, ans=0.125 2023-12-22 14:36:31,010 INFO [train.py:886] (0/4) Epoch 20, batch 4500, loss[loss=0.01219, audio_tagging_loss=0.01219, over 24032.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4939716.32 frames. ], batch size: 100, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:36:36,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2023-12-22 14:36:40,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=633760.0, ans=0.125 2023-12-22 14:36:49,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.19 vs. limit=15.0 2023-12-22 14:36:59,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=633826.6666666666, ans=0.2 2023-12-22 14:37:03,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=633893.3333333334, ans=0.07 2023-12-22 14:37:09,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=633893.3333333334, ans=0.02 2023-12-22 14:37:14,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.34 vs. limit=22.5 2023-12-22 14:37:24,385 INFO [train.py:886] (0/4) Epoch 20, batch 4550, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24750.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4944845.20 frames. ], batch size: 99, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:37:33,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2023-12-22 14:37:50,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=634160.0, ans=0.125 2023-12-22 14:37:56,076 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2023-12-22 14:38:03,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2023-12-22 14:38:04,249 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.486e+01 2.875e+01 3.021e+01 3.198e+01 3.721e+01, threshold=6.043e+01, percent-clipped=0.0 2023-12-22 14:38:04,422 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:38:14,808 INFO [train.py:886] (0/4) Epoch 20, batch 4600, loss[loss=0.01591, audio_tagging_loss=0.01591, over 24750.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4951086.40 frames. ], batch size: 99, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:38:17,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.56 vs. limit=5.0 2023-12-22 14:38:18,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=634360.0, ans=0.0 2023-12-22 14:38:29,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=634426.6666666666, ans=0.0 2023-12-22 14:38:31,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=634426.6666666666, ans=0.2 2023-12-22 14:38:34,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=634426.6666666666, ans=0.2 2023-12-22 14:38:36,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=634493.3333333334, ans=0.5 2023-12-22 14:38:37,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=634493.3333333334, ans=0.0 2023-12-22 14:38:37,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=634493.3333333334, ans=0.0 2023-12-22 14:38:42,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.93 vs. limit=12.0 2023-12-22 14:38:50,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=634560.0, ans=0.1 2023-12-22 14:38:56,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.60 vs. limit=22.5 2023-12-22 14:39:01,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=634626.6666666666, ans=0.125 2023-12-22 14:39:01,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=634626.6666666666, ans=15.0 2023-12-22 14:39:04,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=634626.6666666666, ans=10.0 2023-12-22 14:39:07,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=634693.3333333334, ans=0.04949747468305833 2023-12-22 14:39:08,017 INFO [train.py:886] (0/4) Epoch 20, batch 4650, loss[loss=0.01668, audio_tagging_loss=0.01668, over 24942.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4956108.79 frames. ], batch size: 100, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:39:13,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=634693.3333333334, ans=0.125 2023-12-22 14:39:19,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=634760.0, ans=0.2 2023-12-22 14:39:22,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=634760.0, ans=0.125 2023-12-22 14:39:26,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=634826.6666666666, ans=0.125 2023-12-22 14:39:31,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=634826.6666666666, ans=0.0 2023-12-22 14:39:47,499 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.597e+01 2.896e+01 3.053e+01 3.210e+01 3.598e+01, threshold=6.106e+01, percent-clipped=0.0 2023-12-22 14:39:55,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=634960.0, ans=0.2 2023-12-22 14:39:56,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=635026.6666666666, ans=0.2 2023-12-22 14:39:56,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=635026.6666666666, ans=0.0 2023-12-22 14:39:57,581 INFO [train.py:886] (0/4) Epoch 20, batch 4700, loss[loss=0.01388, audio_tagging_loss=0.01388, over 24750.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4950843.24 frames. ], batch size: 99, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:40:07,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=635093.3333333334, ans=0.0 2023-12-22 14:40:41,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=635293.3333333334, ans=0.125 2023-12-22 14:40:45,760 INFO [train.py:886] (0/4) Epoch 20, batch 4750, loss[loss=0.01398, audio_tagging_loss=0.01398, over 24750.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4948652.45 frames. ], batch size: 99, lr: 5.36e-03, grad_scale: 64.0 2023-12-22 14:40:55,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=635426.6666666666, ans=0.125 2023-12-22 14:40:56,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=635426.6666666666, ans=0.125 2023-12-22 14:40:57,185 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-12-22 14:41:01,064 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-20.pt 2023-12-22 14:41:19,714 INFO [train.py:886] (0/4) Epoch 21, batch 0, loss[loss=0.02889, audio_tagging_loss=0.02889, over 25000.00 frames. ], tot_loss[loss=0.02889, audio_tagging_loss=0.02889, over 25000.00 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:41:19,715 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 14:41:40,672 INFO [train.py:917] (0/4) Epoch 21, validation: loss=0.03243, audio_tagging_loss=0.03243, over 3737520.00 frames. 2023-12-22 14:41:40,672 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 14:41:51,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=635533.3333333334, ans=0.0 2023-12-22 14:42:04,965 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.625e+01 2.952e+01 3.172e+01 3.843e+01 8.854e+01, threshold=6.343e+01, percent-clipped=8.0 2023-12-22 14:42:11,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=635666.6666666666, ans=0.0 2023-12-22 14:42:31,014 INFO [train.py:886] (0/4) Epoch 21, batch 50, loss[loss=0.01858, audio_tagging_loss=0.01858, over 25000.00 frames. ], tot_loss[loss=0.0215, audio_tagging_loss=0.0215, over 1114328.26 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:42:37,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=635800.0, ans=0.125 2023-12-22 14:42:39,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=635800.0, ans=0.0 2023-12-22 14:43:14,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=636066.6666666666, ans=0.125 2023-12-22 14:43:21,767 INFO [train.py:886] (0/4) Epoch 21, batch 100, loss[loss=0.01704, audio_tagging_loss=0.01704, over 25000.00 frames. ], tot_loss[loss=0.01875, audio_tagging_loss=0.01875, over 1964862.72 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:43:46,537 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.874e+01 3.292e+01 3.512e+01 3.782e+01 4.878e+01, threshold=7.024e+01, percent-clipped=0.0 2023-12-22 14:43:55,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=636333.3333333334, ans=0.0 2023-12-22 14:43:59,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=636333.3333333334, ans=0.0 2023-12-22 14:44:13,136 INFO [train.py:886] (0/4) Epoch 21, batch 150, loss[loss=0.01299, audio_tagging_loss=0.01299, over 24750.00 frames. ], tot_loss[loss=0.01709, audio_tagging_loss=0.01709, over 2632881.84 frames. ], batch size: 99, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:44:16,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=636466.6666666666, ans=0.125 2023-12-22 14:44:34,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2023-12-22 14:44:49,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=636666.6666666666, ans=0.125 2023-12-22 14:44:58,868 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:45:03,388 INFO [train.py:886] (0/4) Epoch 21, batch 200, loss[loss=0.01741, audio_tagging_loss=0.01741, over 25000.00 frames. ], tot_loss[loss=0.01602, audio_tagging_loss=0.01602, over 3150568.15 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:45:07,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=636800.0, ans=0.125 2023-12-22 14:45:26,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=636933.3333333334, ans=0.2 2023-12-22 14:45:28,984 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.601e+01 2.981e+01 3.099e+01 3.247e+01 3.721e+01, threshold=6.198e+01, percent-clipped=0.0 2023-12-22 14:45:32,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=636933.3333333334, ans=0.125 2023-12-22 14:45:39,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2023-12-22 14:45:51,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=637066.6666666666, ans=0.125 2023-12-22 14:45:52,679 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:45:53,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=637066.6666666666, ans=0.125 2023-12-22 14:45:56,395 INFO [train.py:886] (0/4) Epoch 21, batch 250, loss[loss=0.01529, audio_tagging_loss=0.01529, over 25000.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 3557607.26 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:46:16,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=637266.6666666666, ans=0.0 2023-12-22 14:46:32,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=637333.3333333334, ans=0.125 2023-12-22 14:46:38,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.53 vs. limit=15.0 2023-12-22 14:46:48,392 INFO [train.py:886] (0/4) Epoch 21, batch 300, loss[loss=0.01841, audio_tagging_loss=0.01841, over 21349.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 3864297.05 frames. ], batch size: 107, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:47:05,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=637533.3333333334, ans=0.0 2023-12-22 14:47:12,439 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.578e+01 2.934e+01 3.059e+01 3.180e+01 3.932e+01, threshold=6.118e+01, percent-clipped=0.0 2023-12-22 14:47:13,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=637600.0, ans=0.0 2023-12-22 14:47:30,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2023-12-22 14:47:37,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=637733.3333333334, ans=0.0 2023-12-22 14:47:39,919 INFO [train.py:886] (0/4) Epoch 21, batch 350, loss[loss=0.01516, audio_tagging_loss=0.01516, over 25000.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4100551.89 frames. ], batch size: 100, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:47:44,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=637800.0, ans=0.2 2023-12-22 14:48:19,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=638000.0, ans=0.0 2023-12-22 14:48:32,060 INFO [train.py:886] (0/4) Epoch 21, batch 400, loss[loss=0.01491, audio_tagging_loss=0.01491, over 25000.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4291444.97 frames. ], batch size: 100, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:48:34,946 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:48:37,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=638133.3333333334, ans=0.125 2023-12-22 14:48:40,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=638133.3333333334, ans=0.125 2023-12-22 14:48:45,671 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.55 vs. limit=15.0 2023-12-22 14:48:56,994 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 2.877e+01 2.996e+01 3.140e+01 3.707e+01, threshold=5.991e+01, percent-clipped=0.0 2023-12-22 14:49:23,553 INFO [train.py:886] (0/4) Epoch 21, batch 450, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4439223.11 frames. ], batch size: 100, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:49:35,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=638533.3333333334, ans=0.125 2023-12-22 14:49:42,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=638533.3333333334, ans=0.125 2023-12-22 14:49:42,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=638533.3333333334, ans=0.0 2023-12-22 14:50:00,175 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2023-12-22 14:50:02,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=638666.6666666666, ans=6.0 2023-12-22 14:50:07,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=638733.3333333334, ans=0.2 2023-12-22 14:50:10,511 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2023-12-22 14:50:16,431 INFO [train.py:886] (0/4) Epoch 21, batch 500, loss[loss=0.01039, audio_tagging_loss=0.01039, over 23992.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4553555.03 frames. ], batch size: 100, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:50:30,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=638866.6666666666, ans=0.125 2023-12-22 14:50:38,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=638933.3333333334, ans=0.125 2023-12-22 14:50:41,262 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.599e+01 2.918e+01 3.045e+01 3.140e+01 3.683e+01, threshold=6.089e+01, percent-clipped=0.0 2023-12-22 14:50:48,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2023-12-22 14:50:59,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=639066.6666666666, ans=0.125 2023-12-22 14:51:07,963 INFO [train.py:886] (0/4) Epoch 21, batch 550, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4634736.48 frames. ], batch size: 100, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:51:23,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=639200.0, ans=0.125 2023-12-22 14:51:29,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=639266.6666666666, ans=0.0 2023-12-22 14:51:50,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=639400.0, ans=0.0 2023-12-22 14:51:56,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639400.0, ans=0.1 2023-12-22 14:51:59,313 INFO [train.py:886] (0/4) Epoch 21, batch 600, loss[loss=0.01678, audio_tagging_loss=0.01678, over 24750.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4703485.43 frames. ], batch size: 99, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:52:01,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=639466.6666666666, ans=0.0 2023-12-22 14:52:19,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=639600.0, ans=0.0 2023-12-22 14:52:24,051 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.726e+01 2.947e+01 3.052e+01 3.171e+01 3.659e+01, threshold=6.104e+01, percent-clipped=0.0 2023-12-22 14:52:31,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=639666.6666666666, ans=0.09899494936611666 2023-12-22 14:52:47,309 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.13 vs. limit=22.5 2023-12-22 14:52:51,554 INFO [train.py:886] (0/4) Epoch 21, batch 650, loss[loss=0.01714, audio_tagging_loss=0.01714, over 24939.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4750015.39 frames. ], batch size: 100, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:53:13,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=639933.3333333334, ans=0.125 2023-12-22 14:53:22,065 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-96000.pt 2023-12-22 14:53:34,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=640066.6666666666, ans=0.07 2023-12-22 14:53:36,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=640066.6666666666, ans=0.125 2023-12-22 14:53:41,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=640066.6666666666, ans=0.125 2023-12-22 14:53:45,859 INFO [train.py:886] (0/4) Epoch 21, batch 700, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4787903.58 frames. ], batch size: 100, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:53:49,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=640133.3333333334, ans=0.0 2023-12-22 14:53:53,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=640133.3333333334, ans=0.0 2023-12-22 14:54:00,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=640200.0, ans=0.0 2023-12-22 14:54:03,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=640200.0, ans=0.2 2023-12-22 14:54:05,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=640266.6666666666, ans=0.125 2023-12-22 14:54:09,893 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.593e+01 2.917e+01 3.103e+01 3.196e+01 3.559e+01, threshold=6.207e+01, percent-clipped=0.0 2023-12-22 14:54:33,385 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.58 vs. limit=22.5 2023-12-22 14:54:37,368 INFO [train.py:886] (0/4) Epoch 21, batch 750, loss[loss=0.01412, audio_tagging_loss=0.01412, over 24750.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4823287.97 frames. ], batch size: 99, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:54:40,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=640466.6666666666, ans=0.125 2023-12-22 14:54:50,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=640533.3333333334, ans=0.0 2023-12-22 14:54:53,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=640533.3333333334, ans=0.1 2023-12-22 14:55:12,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=640666.6666666666, ans=0.125 2023-12-22 14:55:27,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=640733.3333333334, ans=0.0 2023-12-22 14:55:30,039 INFO [train.py:886] (0/4) Epoch 21, batch 800, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4849162.93 frames. ], batch size: 100, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:55:36,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.26 vs. limit=12.0 2023-12-22 14:55:46,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.00 vs. limit=10.0 2023-12-22 14:55:55,170 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.577e+01 2.887e+01 3.023e+01 3.199e+01 3.565e+01, threshold=6.047e+01, percent-clipped=0.0 2023-12-22 14:56:10,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=641066.6666666666, ans=10.0 2023-12-22 14:56:12,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=641066.6666666666, ans=0.125 2023-12-22 14:56:14,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2023-12-22 14:56:16,319 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.73 vs. limit=12.0 2023-12-22 14:56:18,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=641066.6666666666, ans=0.07 2023-12-22 14:56:20,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=641133.3333333334, ans=0.0 2023-12-22 14:56:21,514 INFO [train.py:886] (0/4) Epoch 21, batch 850, loss[loss=0.01212, audio_tagging_loss=0.01212, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4875754.37 frames. ], batch size: 100, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:56:36,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=641200.0, ans=0.07 2023-12-22 14:57:02,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=641333.3333333334, ans=15.0 2023-12-22 14:57:06,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=641400.0, ans=0.0 2023-12-22 14:57:10,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=641400.0, ans=0.125 2023-12-22 14:57:13,896 INFO [train.py:886] (0/4) Epoch 21, batch 900, loss[loss=0.01522, audio_tagging_loss=0.01522, over 24750.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4889920.82 frames. ], batch size: 99, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:57:15,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=641466.6666666666, ans=0.125 2023-12-22 14:57:15,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=641466.6666666666, ans=0.125 2023-12-22 14:57:25,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=641533.3333333334, ans=0.1 2023-12-22 14:57:39,213 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.708e+01 2.906e+01 3.042e+01 3.221e+01 3.641e+01, threshold=6.083e+01, percent-clipped=0.0 2023-12-22 14:57:39,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=641600.0, ans=0.125 2023-12-22 14:57:50,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=641666.6666666666, ans=0.125 2023-12-22 14:57:52,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.27 vs. limit=22.5 2023-12-22 14:58:03,329 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:58:04,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=641733.3333333334, ans=0.125 2023-12-22 14:58:06,023 INFO [train.py:886] (0/4) Epoch 21, batch 950, loss[loss=0.01376, audio_tagging_loss=0.01376, over 24750.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4895013.85 frames. ], batch size: 99, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:58:10,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.05 vs. limit=10.0 2023-12-22 14:58:17,624 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.37 vs. limit=12.0 2023-12-22 14:58:19,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=641866.6666666666, ans=0.0 2023-12-22 14:58:19,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=641866.6666666666, ans=0.0 2023-12-22 14:58:28,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=641933.3333333334, ans=0.125 2023-12-22 14:58:31,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=641933.3333333334, ans=0.2 2023-12-22 14:58:44,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=642000.0, ans=0.09899494936611666 2023-12-22 14:58:56,670 INFO [train.py:886] (0/4) Epoch 21, batch 1000, loss[loss=0.01156, audio_tagging_loss=0.01156, over 24750.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4904548.72 frames. ], batch size: 99, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:59:02,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.13 vs. limit=12.0 2023-12-22 14:59:12,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2023-12-22 14:59:13,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=642200.0, ans=0.125 2023-12-22 14:59:17,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=642266.6666666666, ans=0.125 2023-12-22 14:59:20,512 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2023-12-22 14:59:21,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=642266.6666666666, ans=0.125 2023-12-22 14:59:21,803 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.543e+01 2.900e+01 3.063e+01 3.236e+01 3.644e+01, threshold=6.126e+01, percent-clipped=0.0 2023-12-22 14:59:48,509 INFO [train.py:886] (0/4) Epoch 21, batch 1050, loss[loss=0.0102, audio_tagging_loss=0.0102, over 25000.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4914820.09 frames. ], batch size: 100, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 14:59:51,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-12-22 15:00:16,207 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:00:20,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=642666.6666666666, ans=0.2 2023-12-22 15:00:28,743 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.052e-02 2023-12-22 15:00:29,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=642733.3333333334, ans=0.0 2023-12-22 15:00:31,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2023-12-22 15:00:40,270 INFO [train.py:886] (0/4) Epoch 21, batch 1100, loss[loss=0.01483, audio_tagging_loss=0.01483, over 25000.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4923861.46 frames. ], batch size: 100, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:00:44,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=642800.0, ans=0.0 2023-12-22 15:00:52,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=642866.6666666666, ans=0.125 2023-12-22 15:01:04,308 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.630e+01 2.897e+01 3.078e+01 3.244e+01 5.460e+01, threshold=6.156e+01, percent-clipped=0.0 2023-12-22 15:01:14,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=643000.0, ans=0.125 2023-12-22 15:01:15,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=643000.0, ans=0.0 2023-12-22 15:01:19,239 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=12.0 2023-12-22 15:01:30,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=643066.6666666666, ans=0.125 2023-12-22 15:01:32,030 INFO [train.py:886] (0/4) Epoch 21, batch 1150, loss[loss=0.01372, audio_tagging_loss=0.01372, over 25000.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4933148.69 frames. ], batch size: 100, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:01:35,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=643133.3333333334, ans=0.2 2023-12-22 15:01:48,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=643200.0, ans=0.125 2023-12-22 15:01:53,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=643266.6666666666, ans=0.0 2023-12-22 15:01:53,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=643266.6666666666, ans=0.0 2023-12-22 15:02:15,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=643400.0, ans=0.125 2023-12-22 15:02:16,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643400.0, ans=0.1 2023-12-22 15:02:23,709 INFO [train.py:886] (0/4) Epoch 21, batch 1200, loss[loss=0.01477, audio_tagging_loss=0.01477, over 25000.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4942266.53 frames. ], batch size: 100, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:02:41,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=643533.3333333334, ans=0.0 2023-12-22 15:02:41,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=12.0 2023-12-22 15:02:44,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643600.0, ans=0.1 2023-12-22 15:02:47,767 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.628e+01 2.913e+01 3.055e+01 3.244e+01 3.742e+01, threshold=6.111e+01, percent-clipped=0.0 2023-12-22 15:02:56,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643666.6666666666, ans=0.1 2023-12-22 15:03:14,487 INFO [train.py:886] (0/4) Epoch 21, batch 1250, loss[loss=0.01349, audio_tagging_loss=0.01349, over 24750.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4946406.86 frames. ], batch size: 99, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:03:18,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=643800.0, ans=0.125 2023-12-22 15:03:31,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=643866.6666666666, ans=0.125 2023-12-22 15:03:39,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=643933.3333333334, ans=0.125 2023-12-22 15:03:48,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=644000.0, ans=0.0 2023-12-22 15:04:07,510 INFO [train.py:886] (0/4) Epoch 21, batch 1300, loss[loss=0.01209, audio_tagging_loss=0.01209, over 24750.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4944025.35 frames. ], batch size: 99, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:04:26,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=644200.0, ans=0.125 2023-12-22 15:04:33,043 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.559e+01 2.961e+01 3.123e+01 3.298e+01 3.701e+01, threshold=6.246e+01, percent-clipped=0.0 2023-12-22 15:04:55,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=644400.0, ans=0.04949747468305833 2023-12-22 15:04:59,027 INFO [train.py:886] (0/4) Epoch 21, batch 1350, loss[loss=0.01361, audio_tagging_loss=0.01361, over 25000.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4943540.89 frames. ], batch size: 100, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:05:35,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.76 vs. limit=6.0 2023-12-22 15:05:36,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=644666.6666666666, ans=0.025 2023-12-22 15:05:37,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.00 vs. limit=12.0 2023-12-22 15:05:47,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=644733.3333333334, ans=0.2 2023-12-22 15:05:49,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.80 vs. limit=22.5 2023-12-22 15:05:50,489 INFO [train.py:886] (0/4) Epoch 21, batch 1400, loss[loss=0.01101, audio_tagging_loss=0.01101, over 25000.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4944976.91 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:06:00,327 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-12-22 15:06:12,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=644933.3333333334, ans=0.025 2023-12-22 15:06:12,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=644933.3333333334, ans=0.95 2023-12-22 15:06:13,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=12.0 2023-12-22 15:06:15,741 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.531e+01 2.918e+01 3.034e+01 3.190e+01 3.697e+01, threshold=6.068e+01, percent-clipped=0.0 2023-12-22 15:06:24,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=645000.0, ans=0.125 2023-12-22 15:06:36,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.85 vs. limit=22.5 2023-12-22 15:06:43,070 INFO [train.py:886] (0/4) Epoch 21, batch 1450, loss[loss=0.01144, audio_tagging_loss=0.01144, over 24750.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4949769.86 frames. ], batch size: 99, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:06:45,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=645133.3333333334, ans=0.0 2023-12-22 15:06:58,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=645200.0, ans=0.125 2023-12-22 15:07:16,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.11 vs. limit=15.0 2023-12-22 15:07:21,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=645333.3333333334, ans=0.2 2023-12-22 15:07:21,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2023-12-22 15:07:24,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2023-12-22 15:07:26,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=645400.0, ans=0.125 2023-12-22 15:07:33,412 INFO [train.py:886] (0/4) Epoch 21, batch 1500, loss[loss=0.01329, audio_tagging_loss=0.01329, over 25000.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4951301.40 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:07:37,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=645466.6666666666, ans=0.125 2023-12-22 15:07:40,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=645466.6666666666, ans=0.0 2023-12-22 15:07:54,821 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2023-12-22 15:07:59,016 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.580e+01 2.868e+01 3.009e+01 3.172e+01 3.976e+01, threshold=6.018e+01, percent-clipped=0.0 2023-12-22 15:08:10,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=645666.6666666666, ans=0.125 2023-12-22 15:08:23,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2023-12-22 15:08:25,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=645800.0, ans=0.0 2023-12-22 15:08:26,458 INFO [train.py:886] (0/4) Epoch 21, batch 1550, loss[loss=0.0124, audio_tagging_loss=0.0124, over 24061.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4947802.40 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:08:34,356 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:08:35,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=645866.6666666666, ans=0.125 2023-12-22 15:08:35,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=645866.6666666666, ans=0.125 2023-12-22 15:08:50,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=22.5 2023-12-22 15:08:51,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=645933.3333333334, ans=0.1 2023-12-22 15:09:04,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=646000.0, ans=0.125 2023-12-22 15:09:09,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=646066.6666666666, ans=0.125 2023-12-22 15:09:12,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=646066.6666666666, ans=0.2 2023-12-22 15:09:14,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=646066.6666666666, ans=0.125 2023-12-22 15:09:16,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=646066.6666666666, ans=0.1 2023-12-22 15:09:18,963 INFO [train.py:886] (0/4) Epoch 21, batch 1600, loss[loss=0.01496, audio_tagging_loss=0.01496, over 24750.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4944914.31 frames. ], batch size: 99, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:09:22,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=646133.3333333334, ans=0.125 2023-12-22 15:09:42,940 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.609e+01 3.012e+01 3.134e+01 3.270e+01 4.139e+01, threshold=6.268e+01, percent-clipped=0.0 2023-12-22 15:09:50,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=646333.3333333334, ans=0.1 2023-12-22 15:09:53,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2023-12-22 15:09:59,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.17 vs. limit=6.0 2023-12-22 15:10:08,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2023-12-22 15:10:09,631 INFO [train.py:886] (0/4) Epoch 21, batch 1650, loss[loss=0.01209, audio_tagging_loss=0.01209, over 21489.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4941001.12 frames. ], batch size: 107, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:10:50,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.40 vs. limit=15.0 2023-12-22 15:11:02,001 INFO [train.py:886] (0/4) Epoch 21, batch 1700, loss[loss=0.01085, audio_tagging_loss=0.01085, over 25000.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4944431.26 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:11:08,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=646800.0, ans=0.0 2023-12-22 15:11:27,468 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.501e+01 2.927e+01 3.026e+01 3.153e+01 3.833e+01, threshold=6.051e+01, percent-clipped=0.0 2023-12-22 15:11:41,080 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=15.0 2023-12-22 15:11:45,692 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2023-12-22 15:11:54,511 INFO [train.py:886] (0/4) Epoch 21, batch 1750, loss[loss=0.01435, audio_tagging_loss=0.01435, over 25000.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4941584.85 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:11:58,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=647133.3333333334, ans=0.125 2023-12-22 15:12:10,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=647200.0, ans=0.025 2023-12-22 15:12:18,267 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.09 vs. limit=22.5 2023-12-22 15:12:27,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=647333.3333333334, ans=0.125 2023-12-22 15:12:46,354 INFO [train.py:886] (0/4) Epoch 21, batch 1800, loss[loss=0.01371, audio_tagging_loss=0.01371, over 25000.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4949249.44 frames. ], batch size: 100, lr: 5.18e-03, grad_scale: 32.0 2023-12-22 15:12:49,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=647466.6666666666, ans=0.1 2023-12-22 15:12:53,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=647466.6666666666, ans=0.125 2023-12-22 15:13:03,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=647533.3333333334, ans=0.0 2023-12-22 15:13:04,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=647533.3333333334, ans=0.125 2023-12-22 15:13:11,089 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.641e+01 2.946e+01 3.053e+01 3.147e+01 3.589e+01, threshold=6.107e+01, percent-clipped=0.0 2023-12-22 15:13:38,498 INFO [train.py:886] (0/4) Epoch 21, batch 1850, loss[loss=0.01317, audio_tagging_loss=0.01317, over 24750.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4949093.96 frames. ], batch size: 99, lr: 5.18e-03, grad_scale: 32.0 2023-12-22 15:13:51,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=647866.6666666666, ans=0.1 2023-12-22 15:13:55,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=647866.6666666666, ans=0.125 2023-12-22 15:13:57,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=647933.3333333334, ans=0.07 2023-12-22 15:14:23,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=648066.6666666666, ans=0.2 2023-12-22 15:14:29,463 INFO [train.py:886] (0/4) Epoch 21, batch 1900, loss[loss=0.01296, audio_tagging_loss=0.01296, over 24750.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4941054.17 frames. ], batch size: 99, lr: 5.18e-03, grad_scale: 32.0 2023-12-22 15:14:36,283 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.02 vs. limit=15.0 2023-12-22 15:14:38,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=648133.3333333334, ans=0.125 2023-12-22 15:14:38,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=648133.3333333334, ans=0.125 2023-12-22 15:14:52,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2023-12-22 15:14:54,203 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.641e+01 2.961e+01 3.101e+01 3.299e+01 3.976e+01, threshold=6.202e+01, percent-clipped=0.0 2023-12-22 15:14:59,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=648333.3333333334, ans=0.125 2023-12-22 15:15:16,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=648400.0, ans=0.1 2023-12-22 15:15:21,528 INFO [train.py:886] (0/4) Epoch 21, batch 1950, loss[loss=0.01534, audio_tagging_loss=0.01534, over 24750.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4943171.02 frames. ], batch size: 99, lr: 5.18e-03, grad_scale: 32.0 2023-12-22 15:15:23,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=648466.6666666666, ans=0.2 2023-12-22 15:15:33,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=648533.3333333334, ans=0.0 2023-12-22 15:15:53,386 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.14 vs. limit=22.5 2023-12-22 15:16:13,180 INFO [train.py:886] (0/4) Epoch 21, batch 2000, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4942711.36 frames. ], batch size: 100, lr: 5.18e-03, grad_scale: 64.0 2023-12-22 15:16:15,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=648800.0, ans=0.0 2023-12-22 15:16:37,592 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.627e+01 2.920e+01 3.050e+01 3.228e+01 3.734e+01, threshold=6.100e+01, percent-clipped=0.0 2023-12-22 15:16:49,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=649000.0, ans=0.2 2023-12-22 15:16:59,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649066.6666666666, ans=0.1 2023-12-22 15:17:03,710 INFO [train.py:886] (0/4) Epoch 21, batch 2050, loss[loss=0.01653, audio_tagging_loss=0.01653, over 25000.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4948862.82 frames. ], batch size: 100, lr: 5.18e-03, grad_scale: 64.0 2023-12-22 15:17:04,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=649133.3333333334, ans=0.0 2023-12-22 15:17:12,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=649133.3333333334, ans=0.125 2023-12-22 15:17:49,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 2023-12-22 15:17:56,871 INFO [train.py:886] (0/4) Epoch 21, batch 2100, loss[loss=0.01228, audio_tagging_loss=0.01228, over 25000.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4954198.82 frames. ], batch size: 100, lr: 5.18e-03, grad_scale: 64.0 2023-12-22 15:18:01,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=649466.6666666666, ans=0.125 2023-12-22 15:18:07,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=649533.3333333334, ans=0.125 2023-12-22 15:18:10,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.50 vs. limit=22.5 2023-12-22 15:18:21,961 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 2.928e+01 3.055e+01 3.207e+01 3.778e+01, threshold=6.110e+01, percent-clipped=0.0 2023-12-22 15:18:24,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=649600.0, ans=0.2 2023-12-22 15:18:34,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=649666.6666666666, ans=0.125 2023-12-22 15:18:35,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.32 vs. limit=15.0 2023-12-22 15:18:36,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=649666.6666666666, ans=0.0 2023-12-22 15:18:47,491 INFO [train.py:886] (0/4) Epoch 21, batch 2150, loss[loss=0.02045, audio_tagging_loss=0.02045, over 24951.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4952551.68 frames. ], batch size: 100, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:18:49,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=649800.0, ans=0.125 2023-12-22 15:18:54,965 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=12.0 2023-12-22 15:19:07,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=649933.3333333334, ans=0.125 2023-12-22 15:19:14,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=649933.3333333334, ans=0.125 2023-12-22 15:19:19,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=650000.0, ans=0.125 2023-12-22 15:19:30,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=650066.6666666666, ans=0.125 2023-12-22 15:19:37,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-12-22 15:19:38,968 INFO [train.py:886] (0/4) Epoch 21, batch 2200, loss[loss=0.01404, audio_tagging_loss=0.01404, over 24750.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4947522.44 frames. ], batch size: 99, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:19:40,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.75 vs. limit=22.5 2023-12-22 15:19:55,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.68 vs. limit=6.0 2023-12-22 15:20:02,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=650266.6666666666, ans=0.125 2023-12-22 15:20:04,331 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.613e+01 2.983e+01 3.116e+01 3.285e+01 3.791e+01, threshold=6.231e+01, percent-clipped=0.0 2023-12-22 15:20:18,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=650400.0, ans=0.0 2023-12-22 15:20:30,692 INFO [train.py:886] (0/4) Epoch 21, batch 2250, loss[loss=0.01416, audio_tagging_loss=0.01416, over 24750.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4942512.30 frames. ], batch size: 99, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:20:43,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650533.3333333334, ans=0.1 2023-12-22 15:20:48,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=650533.3333333334, ans=0.125 2023-12-22 15:20:57,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-12-22 15:20:58,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=650600.0, ans=0.125 2023-12-22 15:21:01,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2023-12-22 15:21:22,069 INFO [train.py:886] (0/4) Epoch 21, batch 2300, loss[loss=0.01138, audio_tagging_loss=0.01138, over 24750.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4942146.23 frames. ], batch size: 99, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:21:31,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=650800.0, ans=0.0 2023-12-22 15:21:32,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2023-12-22 15:21:39,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=650866.6666666666, ans=0.0 2023-12-22 15:21:39,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=650866.6666666666, ans=0.04949747468305833 2023-12-22 15:21:43,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=650933.3333333334, ans=0.125 2023-12-22 15:21:46,904 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.617e+01 2.891e+01 3.019e+01 3.134e+01 3.586e+01, threshold=6.037e+01, percent-clipped=0.0 2023-12-22 15:21:48,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2023-12-22 15:21:55,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=651000.0, ans=0.2 2023-12-22 15:22:07,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.03 vs. limit=10.0 2023-12-22 15:22:14,360 INFO [train.py:886] (0/4) Epoch 21, batch 2350, loss[loss=0.01259, audio_tagging_loss=0.01259, over 25000.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4943924.71 frames. ], batch size: 100, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:22:16,376 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:22:16,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=651133.3333333334, ans=0.0 2023-12-22 15:22:17,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=651133.3333333334, ans=0.125 2023-12-22 15:22:21,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=651133.3333333334, ans=0.1 2023-12-22 15:22:32,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.26 vs. limit=22.5 2023-12-22 15:22:38,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=651266.6666666666, ans=0.0 2023-12-22 15:22:42,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=651266.6666666666, ans=0.1 2023-12-22 15:23:05,388 INFO [train.py:886] (0/4) Epoch 21, batch 2400, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4944617.01 frames. ], batch size: 100, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:23:08,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=651466.6666666666, ans=0.0 2023-12-22 15:23:29,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=651600.0, ans=0.2 2023-12-22 15:23:30,198 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.620e+01 2.897e+01 3.019e+01 3.181e+01 3.470e+01, threshold=6.039e+01, percent-clipped=0.0 2023-12-22 15:23:31,460 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.039e+00 2023-12-22 15:23:33,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=651600.0, ans=0.1 2023-12-22 15:23:48,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=651733.3333333334, ans=22.5 2023-12-22 15:23:50,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=651733.3333333334, ans=0.0 2023-12-22 15:23:51,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=651733.3333333334, ans=0.0 2023-12-22 15:23:51,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=651733.3333333334, ans=0.2 2023-12-22 15:23:57,949 INFO [train.py:886] (0/4) Epoch 21, batch 2450, loss[loss=0.01414, audio_tagging_loss=0.01414, over 25000.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4950499.40 frames. ], batch size: 100, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:24:08,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=651866.6666666666, ans=0.07 2023-12-22 15:24:25,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=651933.3333333334, ans=0.125 2023-12-22 15:24:39,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=652066.6666666666, ans=0.125 2023-12-22 15:24:44,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=652066.6666666666, ans=0.0 2023-12-22 15:24:50,603 INFO [train.py:886] (0/4) Epoch 21, batch 2500, loss[loss=0.01502, audio_tagging_loss=0.01502, over 25000.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4948683.83 frames. ], batch size: 100, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:25:13,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=652266.6666666666, ans=0.125 2023-12-22 15:25:14,693 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.708e+01 3.022e+01 3.140e+01 3.250e+01 3.693e+01, threshold=6.280e+01, percent-clipped=0.0 2023-12-22 15:25:22,552 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.77 vs. limit=15.0 2023-12-22 15:25:27,869 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:25:32,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=652400.0, ans=0.05 2023-12-22 15:25:39,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=652400.0, ans=0.125 2023-12-22 15:25:40,920 INFO [train.py:886] (0/4) Epoch 21, batch 2550, loss[loss=0.01312, audio_tagging_loss=0.01312, over 24750.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4940364.97 frames. ], batch size: 99, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:25:52,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=652533.3333333334, ans=0.125 2023-12-22 15:26:05,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.27 vs. limit=22.5 2023-12-22 15:26:06,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.06 vs. limit=10.0 2023-12-22 15:26:19,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=652666.6666666666, ans=0.125 2023-12-22 15:26:28,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2023-12-22 15:26:34,205 INFO [train.py:886] (0/4) Epoch 21, batch 2600, loss[loss=0.01141, audio_tagging_loss=0.01141, over 25000.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4945396.53 frames. ], batch size: 100, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:26:34,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=652800.0, ans=0.125 2023-12-22 15:26:39,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=652800.0, ans=0.2 2023-12-22 15:26:40,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=12.0 2023-12-22 15:26:53,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=652933.3333333334, ans=0.07 2023-12-22 15:26:58,940 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.515e+01 2.930e+01 3.065e+01 3.223e+01 3.938e+01, threshold=6.130e+01, percent-clipped=0.0 2023-12-22 15:27:14,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=653066.6666666666, ans=0.07 2023-12-22 15:27:21,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=653066.6666666666, ans=0.125 2023-12-22 15:27:24,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.14 vs. limit=15.0 2023-12-22 15:27:26,041 INFO [train.py:886] (0/4) Epoch 21, batch 2650, loss[loss=0.01411, audio_tagging_loss=0.01411, over 25000.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4950530.33 frames. ], batch size: 100, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:27:26,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=653133.3333333334, ans=0.125 2023-12-22 15:27:26,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.29 vs. limit=15.0 2023-12-22 15:27:41,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=653200.0, ans=0.0 2023-12-22 15:27:54,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=653266.6666666666, ans=0.125 2023-12-22 15:28:04,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.62 vs. limit=12.0 2023-12-22 15:28:10,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=653400.0, ans=0.1 2023-12-22 15:28:14,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=653400.0, ans=0.125 2023-12-22 15:28:17,703 INFO [train.py:886] (0/4) Epoch 21, batch 2700, loss[loss=0.01347, audio_tagging_loss=0.01347, over 22057.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4952711.77 frames. ], batch size: 107, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:28:20,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=653466.6666666666, ans=0.0 2023-12-22 15:28:26,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=653466.6666666666, ans=0.125 2023-12-22 15:28:27,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=653533.3333333334, ans=0.07 2023-12-22 15:28:35,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.24 vs. limit=22.5 2023-12-22 15:28:38,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=653600.0, ans=0.0 2023-12-22 15:28:43,282 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 2.932e+01 3.064e+01 3.237e+01 3.661e+01, threshold=6.127e+01, percent-clipped=0.0 2023-12-22 15:28:55,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=653666.6666666666, ans=0.125 2023-12-22 15:29:02,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=653733.3333333334, ans=0.125 2023-12-22 15:29:10,338 INFO [train.py:886] (0/4) Epoch 21, batch 2750, loss[loss=0.01458, audio_tagging_loss=0.01458, over 24920.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4954536.89 frames. ], batch size: 100, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:29:20,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=653866.6666666666, ans=0.2 2023-12-22 15:29:24,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2023-12-22 15:29:46,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=654000.0, ans=0.95 2023-12-22 15:29:49,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.91 vs. limit=10.0 2023-12-22 15:29:53,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=654066.6666666666, ans=0.0 2023-12-22 15:29:57,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=654066.6666666666, ans=0.125 2023-12-22 15:29:57,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=654066.6666666666, ans=0.125 2023-12-22 15:29:57,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=654066.6666666666, ans=0.125 2023-12-22 15:30:00,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=654133.3333333334, ans=0.125 2023-12-22 15:30:02,221 INFO [train.py:886] (0/4) Epoch 21, batch 2800, loss[loss=0.01369, audio_tagging_loss=0.01369, over 24750.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4951657.41 frames. ], batch size: 99, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:30:15,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=654200.0, ans=0.0 2023-12-22 15:30:26,924 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.655e+01 2.987e+01 3.081e+01 3.261e+01 3.744e+01, threshold=6.161e+01, percent-clipped=0.0 2023-12-22 15:30:30,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=654266.6666666666, ans=0.125 2023-12-22 15:30:46,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2023-12-22 15:30:54,050 INFO [train.py:886] (0/4) Epoch 21, batch 2850, loss[loss=0.01226, audio_tagging_loss=0.01226, over 24750.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4942796.17 frames. ], batch size: 99, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:31:02,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=654533.3333333334, ans=0.125 2023-12-22 15:31:11,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2023-12-22 15:31:19,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=654600.0, ans=0.0 2023-12-22 15:31:28,977 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=12.0 2023-12-22 15:31:41,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=654733.3333333334, ans=0.125 2023-12-22 15:31:46,812 INFO [train.py:886] (0/4) Epoch 21, batch 2900, loss[loss=0.01455, audio_tagging_loss=0.01455, over 24750.00 frames. ], tot_loss[loss=0.01353, audio_tagging_loss=0.01353, over 4940583.76 frames. ], batch size: 99, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:31:56,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=654866.6666666666, ans=0.2 2023-12-22 15:32:09,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.88 vs. limit=12.0 2023-12-22 15:32:11,072 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.657e+01 2.895e+01 3.036e+01 3.201e+01 4.104e+01, threshold=6.072e+01, percent-clipped=0.0 2023-12-22 15:32:37,565 INFO [train.py:886] (0/4) Epoch 21, batch 2950, loss[loss=0.01355, audio_tagging_loss=0.01355, over 21980.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4938437.71 frames. ], batch size: 107, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:33:05,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=655266.6666666666, ans=0.0 2023-12-22 15:33:25,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=655400.0, ans=0.125 2023-12-22 15:33:25,293 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.10 vs. limit=22.5 2023-12-22 15:33:29,614 INFO [train.py:886] (0/4) Epoch 21, batch 3000, loss[loss=0.01116, audio_tagging_loss=0.01116, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4941616.01 frames. ], batch size: 100, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:33:29,616 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 15:33:36,902 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.2761, 3.2252, 3.7355, 3.7432], device='cuda:0') 2023-12-22 15:33:38,423 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.2488, 3.6833, 3.8165, 3.5531], device='cuda:0') 2023-12-22 15:33:50,881 INFO [train.py:917] (0/4) Epoch 21, validation: loss=0.03274, audio_tagging_loss=0.03274, over 3737520.00 frames. 2023-12-22 15:33:50,882 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 15:33:51,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=655466.6666666666, ans=0.125 2023-12-22 15:34:02,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=655533.3333333334, ans=0.125 2023-12-22 15:34:02,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=655533.3333333334, ans=0.1 2023-12-22 15:34:14,635 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.605e+01 2.876e+01 3.036e+01 3.151e+01 3.734e+01, threshold=6.071e+01, percent-clipped=0.0 2023-12-22 15:34:25,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=655666.6666666666, ans=0.1 2023-12-22 15:34:40,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=655800.0, ans=0.125 2023-12-22 15:34:41,408 INFO [train.py:886] (0/4) Epoch 21, batch 3050, loss[loss=0.01577, audio_tagging_loss=0.01577, over 25000.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4945785.98 frames. ], batch size: 100, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:34:52,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=655866.6666666666, ans=0.125 2023-12-22 15:34:56,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=655866.6666666666, ans=0.125 2023-12-22 15:35:18,035 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.45 vs. limit=10.0 2023-12-22 15:35:20,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.86 vs. limit=22.5 2023-12-22 15:35:28,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=656066.6666666666, ans=0.09899494936611666 2023-12-22 15:35:28,682 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:35:33,732 INFO [train.py:886] (0/4) Epoch 21, batch 3100, loss[loss=0.01342, audio_tagging_loss=0.01342, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4946533.57 frames. ], batch size: 100, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:35:34,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=656133.3333333334, ans=0.0 2023-12-22 15:35:45,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=656200.0, ans=0.1 2023-12-22 15:35:58,324 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.673e+01 2.955e+01 3.066e+01 3.256e+01 3.692e+01, threshold=6.132e+01, percent-clipped=0.0 2023-12-22 15:36:08,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2023-12-22 15:36:19,270 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.60 vs. limit=15.0 2023-12-22 15:36:25,940 INFO [train.py:886] (0/4) Epoch 21, batch 3150, loss[loss=0.01447, audio_tagging_loss=0.01447, over 24750.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4943186.56 frames. ], batch size: 99, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:36:26,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=656466.6666666666, ans=15.0 2023-12-22 15:36:57,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=656666.6666666666, ans=0.125 2023-12-22 15:37:13,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=656733.3333333334, ans=10.0 2023-12-22 15:37:16,942 INFO [train.py:886] (0/4) Epoch 21, batch 3200, loss[loss=0.01392, audio_tagging_loss=0.01392, over 24750.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4942647.50 frames. ], batch size: 99, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:37:21,063 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:37:24,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=656800.0, ans=0.125 2023-12-22 15:37:27,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=656866.6666666666, ans=0.125 2023-12-22 15:37:28,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=656866.6666666666, ans=0.0 2023-12-22 15:37:30,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=22.5 2023-12-22 15:37:39,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2023-12-22 15:37:42,409 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.707e+01 2.933e+01 3.051e+01 3.239e+01 4.108e+01, threshold=6.103e+01, percent-clipped=0.0 2023-12-22 15:37:55,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=657000.0, ans=0.1 2023-12-22 15:38:09,722 INFO [train.py:886] (0/4) Epoch 21, batch 3250, loss[loss=0.01243, audio_tagging_loss=0.01243, over 24750.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4945883.30 frames. ], batch size: 99, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:38:57,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=657400.0, ans=0.125 2023-12-22 15:39:00,452 INFO [train.py:886] (0/4) Epoch 21, batch 3300, loss[loss=0.01216, audio_tagging_loss=0.01216, over 24750.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4947840.21 frames. ], batch size: 99, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:39:14,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=657533.3333333334, ans=0.0 2023-12-22 15:39:24,986 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.536e+01 2.897e+01 3.042e+01 3.162e+01 3.785e+01, threshold=6.083e+01, percent-clipped=0.0 2023-12-22 15:39:50,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=657800.0, ans=0.07 2023-12-22 15:39:51,624 INFO [train.py:886] (0/4) Epoch 21, batch 3350, loss[loss=0.01063, audio_tagging_loss=0.01063, over 21768.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4945325.63 frames. ], batch size: 107, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:39:54,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2023-12-22 15:40:11,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.84 vs. limit=10.0 2023-12-22 15:40:29,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=658000.0, ans=0.125 2023-12-22 15:40:43,797 INFO [train.py:886] (0/4) Epoch 21, batch 3400, loss[loss=0.01558, audio_tagging_loss=0.01558, over 24750.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4954036.15 frames. ], batch size: 99, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:41:07,159 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.745e+01 2.968e+01 3.084e+01 3.241e+01 3.911e+01, threshold=6.167e+01, percent-clipped=0.0 2023-12-22 15:41:28,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.00 vs. limit=22.5 2023-12-22 15:41:32,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=658400.0, ans=0.125 2023-12-22 15:41:34,385 INFO [train.py:886] (0/4) Epoch 21, batch 3450, loss[loss=0.01306, audio_tagging_loss=0.01306, over 24750.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4949963.61 frames. ], batch size: 99, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:41:59,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=658600.0, ans=0.0 2023-12-22 15:42:25,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=658733.3333333334, ans=0.0 2023-12-22 15:42:27,473 INFO [train.py:886] (0/4) Epoch 21, batch 3500, loss[loss=0.01433, audio_tagging_loss=0.01433, over 24750.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4945395.31 frames. ], batch size: 99, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:42:33,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=658800.0, ans=0.125 2023-12-22 15:42:52,327 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+01 2.914e+01 3.083e+01 3.218e+01 3.665e+01, threshold=6.166e+01, percent-clipped=0.0 2023-12-22 15:42:53,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=658933.3333333334, ans=0.125 2023-12-22 15:43:18,618 INFO [train.py:886] (0/4) Epoch 21, batch 3550, loss[loss=0.01255, audio_tagging_loss=0.01255, over 25000.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4948543.10 frames. ], batch size: 100, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:43:32,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=659200.0, ans=0.1 2023-12-22 15:44:10,680 INFO [train.py:886] (0/4) Epoch 21, batch 3600, loss[loss=0.01368, audio_tagging_loss=0.01368, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4949439.95 frames. ], batch size: 100, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:44:24,400 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.05 vs. limit=15.0 2023-12-22 15:44:36,330 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.499e+01 2.979e+01 3.108e+01 3.250e+01 3.657e+01, threshold=6.215e+01, percent-clipped=0.0 2023-12-22 15:44:56,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=659733.3333333334, ans=0.0 2023-12-22 15:45:02,658 INFO [train.py:886] (0/4) Epoch 21, batch 3650, loss[loss=0.0136, audio_tagging_loss=0.0136, over 25000.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4955576.92 frames. ], batch size: 100, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:45:06,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=659800.0, ans=0.1 2023-12-22 15:45:10,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=659800.0, ans=0.05 2023-12-22 15:45:23,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=659933.3333333334, ans=15.0 2023-12-22 15:45:26,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=659933.3333333334, ans=0.1 2023-12-22 15:45:29,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.59 vs. limit=15.0 2023-12-22 15:45:54,400 INFO [train.py:886] (0/4) Epoch 21, batch 3700, loss[loss=0.01114, audio_tagging_loss=0.01114, over 25000.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4954678.59 frames. ], batch size: 100, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:46:05,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=660200.0, ans=0.04949747468305833 2023-12-22 15:46:10,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.09 vs. limit=22.5 2023-12-22 15:46:19,996 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+01 2.928e+01 3.055e+01 3.227e+01 3.842e+01, threshold=6.110e+01, percent-clipped=0.0 2023-12-22 15:46:47,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=22.5 2023-12-22 15:46:47,450 INFO [train.py:886] (0/4) Epoch 21, batch 3750, loss[loss=0.01285, audio_tagging_loss=0.01285, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4955374.05 frames. ], batch size: 100, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:46:55,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660466.6666666666, ans=0.1 2023-12-22 15:47:10,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2023-12-22 15:47:23,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=660666.6666666666, ans=0.0 2023-12-22 15:47:38,458 INFO [train.py:886] (0/4) Epoch 21, batch 3800, loss[loss=0.01113, audio_tagging_loss=0.01113, over 24750.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4953851.77 frames. ], batch size: 99, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:47:45,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=660800.0, ans=0.0 2023-12-22 15:47:49,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=660866.6666666666, ans=0.125 2023-12-22 15:47:53,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=660866.6666666666, ans=0.2 2023-12-22 15:48:03,536 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.617e+01 3.001e+01 3.115e+01 3.242e+01 4.083e+01, threshold=6.229e+01, percent-clipped=0.0 2023-12-22 15:48:12,537 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:48:15,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.39 vs. limit=10.0 2023-12-22 15:48:26,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=661066.6666666666, ans=0.0 2023-12-22 15:48:30,948 INFO [train.py:886] (0/4) Epoch 21, batch 3850, loss[loss=0.01077, audio_tagging_loss=0.01077, over 24750.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4946589.40 frames. ], batch size: 99, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:48:34,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=661133.3333333334, ans=0.125 2023-12-22 15:48:36,826 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:48:44,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=661200.0, ans=0.125 2023-12-22 15:49:01,859 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:49:09,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=661333.3333333334, ans=0.0 2023-12-22 15:49:12,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2023-12-22 15:49:22,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=661400.0, ans=0.0 2023-12-22 15:49:23,640 INFO [train.py:886] (0/4) Epoch 21, batch 3900, loss[loss=0.0121, audio_tagging_loss=0.0121, over 25000.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4950242.85 frames. ], batch size: 100, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:49:25,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.61 vs. limit=15.0 2023-12-22 15:49:40,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.09 vs. limit=10.0 2023-12-22 15:49:47,898 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.727e+01 2.909e+01 3.084e+01 3.230e+01 3.604e+01, threshold=6.168e+01, percent-clipped=0.0 2023-12-22 15:50:08,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=661733.3333333334, ans=0.125 2023-12-22 15:50:14,912 INFO [train.py:886] (0/4) Epoch 21, batch 3950, loss[loss=0.01301, audio_tagging_loss=0.01301, over 25000.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4956407.96 frames. ], batch size: 100, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:50:17,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=661800.0, ans=0.0 2023-12-22 15:50:18,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2023-12-22 15:50:19,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=15.0 2023-12-22 15:50:22,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=661800.0, ans=0.125 2023-12-22 15:50:33,149 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:50:34,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=661866.6666666666, ans=0.125 2023-12-22 15:50:41,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=661933.3333333334, ans=0.125 2023-12-22 15:51:03,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=662066.6666666666, ans=0.2 2023-12-22 15:51:07,222 INFO [train.py:886] (0/4) Epoch 21, batch 4000, loss[loss=0.01298, audio_tagging_loss=0.01298, over 25000.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4963002.63 frames. ], batch size: 100, lr: 5.13e-03, grad_scale: 128.0 2023-12-22 15:51:23,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=662200.0, ans=0.1 2023-12-22 15:51:28,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=662266.6666666666, ans=0.125 2023-12-22 15:51:33,825 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.648e+01 2.959e+01 3.062e+01 3.233e+01 3.752e+01, threshold=6.123e+01, percent-clipped=0.0 2023-12-22 15:51:35,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=662266.6666666666, ans=0.125 2023-12-22 15:51:36,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=662266.6666666666, ans=0.0 2023-12-22 15:51:37,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662333.3333333334, ans=0.1 2023-12-22 15:51:48,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=12.0 2023-12-22 15:51:59,475 INFO [train.py:886] (0/4) Epoch 21, batch 4050, loss[loss=0.01539, audio_tagging_loss=0.01539, over 24750.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4962253.92 frames. ], batch size: 99, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:52:06,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=662466.6666666666, ans=0.125 2023-12-22 15:52:16,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=662533.3333333334, ans=0.125 2023-12-22 15:52:30,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=662666.6666666666, ans=0.125 2023-12-22 15:52:30,811 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.362e-02 2023-12-22 15:52:44,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=662733.3333333334, ans=0.125 2023-12-22 15:52:51,220 INFO [train.py:886] (0/4) Epoch 21, batch 4100, loss[loss=0.01308, audio_tagging_loss=0.01308, over 24750.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4956727.35 frames. ], batch size: 99, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:53:01,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=12.0 2023-12-22 15:53:01,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.64 vs. limit=15.0 2023-12-22 15:53:06,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=662866.6666666666, ans=0.125 2023-12-22 15:53:17,033 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.663e+01 2.959e+01 3.122e+01 3.290e+01 3.671e+01, threshold=6.244e+01, percent-clipped=0.0 2023-12-22 15:53:22,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=663000.0, ans=0.1 2023-12-22 15:53:43,117 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:53:43,768 INFO [train.py:886] (0/4) Epoch 21, batch 4150, loss[loss=0.01332, audio_tagging_loss=0.01332, over 24093.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4944168.86 frames. ], batch size: 100, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:53:48,653 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:53:48,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=663133.3333333334, ans=0.125 2023-12-22 15:53:49,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=663133.3333333334, ans=0.125 2023-12-22 15:54:03,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663266.6666666666, ans=0.1 2023-12-22 15:54:09,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=663266.6666666666, ans=0.2 2023-12-22 15:54:16,450 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.36 vs. limit=22.5 2023-12-22 15:54:17,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=663333.3333333334, ans=0.125 2023-12-22 15:54:35,388 INFO [train.py:886] (0/4) Epoch 21, batch 4200, loss[loss=0.01242, audio_tagging_loss=0.01242, over 24750.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4946919.96 frames. ], batch size: 99, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:54:38,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=663466.6666666666, ans=12.0 2023-12-22 15:54:43,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=663466.6666666666, ans=0.1 2023-12-22 15:54:57,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=663600.0, ans=0.2 2023-12-22 15:55:00,453 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.534e+01 2.926e+01 3.050e+01 3.273e+01 3.755e+01, threshold=6.101e+01, percent-clipped=0.0 2023-12-22 15:55:03,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=663600.0, ans=0.125 2023-12-22 15:55:14,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=663666.6666666666, ans=0.125 2023-12-22 15:55:27,270 INFO [train.py:886] (0/4) Epoch 21, batch 4250, loss[loss=0.01338, audio_tagging_loss=0.01338, over 25000.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4953882.14 frames. ], batch size: 100, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:55:35,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=663800.0, ans=0.0 2023-12-22 15:55:38,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2023-12-22 15:56:01,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=664000.0, ans=0.2 2023-12-22 15:56:02,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=664000.0, ans=0.125 2023-12-22 15:56:20,058 INFO [train.py:886] (0/4) Epoch 21, batch 4300, loss[loss=0.01109, audio_tagging_loss=0.01109, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4960053.63 frames. ], batch size: 100, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:56:21,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=664133.3333333334, ans=0.125 2023-12-22 15:56:45,829 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.556e+01 2.943e+01 3.122e+01 3.224e+01 4.020e+01, threshold=6.245e+01, percent-clipped=0.0 2023-12-22 15:56:52,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=664333.3333333334, ans=0.125 2023-12-22 15:56:55,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=664333.3333333334, ans=0.125 2023-12-22 15:56:56,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=664333.3333333334, ans=0.125 2023-12-22 15:56:58,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=664333.3333333334, ans=0.025 2023-12-22 15:57:10,830 INFO [train.py:886] (0/4) Epoch 21, batch 4350, loss[loss=0.01353, audio_tagging_loss=0.01353, over 25000.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4955536.40 frames. ], batch size: 100, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:57:12,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=664466.6666666666, ans=0.0 2023-12-22 15:57:22,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=664533.3333333334, ans=0.0 2023-12-22 15:57:29,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=664533.3333333334, ans=0.0 2023-12-22 15:57:34,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2023-12-22 15:57:36,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=664600.0, ans=0.125 2023-12-22 15:57:46,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=664666.6666666666, ans=0.5 2023-12-22 15:57:49,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=664666.6666666666, ans=0.1 2023-12-22 15:57:51,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=664733.3333333334, ans=0.125 2023-12-22 15:58:03,337 INFO [train.py:886] (0/4) Epoch 21, batch 4400, loss[loss=0.01753, audio_tagging_loss=0.01753, over 22323.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4949033.83 frames. ], batch size: 107, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:58:04,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=664800.0, ans=0.125 2023-12-22 15:58:09,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=664800.0, ans=0.0 2023-12-22 15:58:26,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=12.0 2023-12-22 15:58:29,328 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.790e+01 3.057e+01 3.154e+01 3.271e+01 4.005e+01, threshold=6.308e+01, percent-clipped=0.0 2023-12-22 15:58:35,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=665000.0, ans=0.125 2023-12-22 15:58:38,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=665000.0, ans=0.125 2023-12-22 15:58:38,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=665000.0, ans=0.0 2023-12-22 15:58:45,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=665066.6666666666, ans=0.05 2023-12-22 15:58:55,003 INFO [train.py:886] (0/4) Epoch 21, batch 4450, loss[loss=0.01378, audio_tagging_loss=0.01378, over 25000.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4947527.19 frames. ], batch size: 100, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:58:57,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=665133.3333333334, ans=0.2 2023-12-22 15:59:06,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=665200.0, ans=0.1 2023-12-22 15:59:06,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=665200.0, ans=0.0 2023-12-22 15:59:16,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2023-12-22 15:59:16,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=665266.6666666666, ans=0.95 2023-12-22 15:59:23,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=665266.6666666666, ans=0.0 2023-12-22 15:59:35,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2023-12-22 15:59:46,934 INFO [train.py:886] (0/4) Epoch 21, batch 4500, loss[loss=0.01419, audio_tagging_loss=0.01419, over 24750.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4947848.93 frames. ], batch size: 99, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:00:08,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=665600.0, ans=0.0 2023-12-22 16:00:12,609 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.625e+01 2.928e+01 3.056e+01 3.221e+01 3.659e+01, threshold=6.113e+01, percent-clipped=0.0 2023-12-22 16:00:20,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=665666.6666666666, ans=0.04949747468305833 2023-12-22 16:00:21,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=665666.6666666666, ans=0.125 2023-12-22 16:00:26,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=665733.3333333334, ans=0.0 2023-12-22 16:00:27,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.88 vs. limit=12.0 2023-12-22 16:00:30,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=665733.3333333334, ans=0.125 2023-12-22 16:00:35,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.54 vs. limit=10.0 2023-12-22 16:00:39,019 INFO [train.py:886] (0/4) Epoch 21, batch 4550, loss[loss=0.01341, audio_tagging_loss=0.01341, over 25000.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4952780.09 frames. ], batch size: 100, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:00:41,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=665800.0, ans=0.0 2023-12-22 16:00:45,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.94 vs. limit=6.0 2023-12-22 16:00:54,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=665866.6666666666, ans=0.09899494936611666 2023-12-22 16:01:10,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-12-22 16:01:11,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=666000.0, ans=0.125 2023-12-22 16:01:19,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=666066.6666666666, ans=0.125 2023-12-22 16:01:28,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=666133.3333333334, ans=0.125 2023-12-22 16:01:29,219 INFO [train.py:886] (0/4) Epoch 21, batch 4600, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4953152.98 frames. ], batch size: 100, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:01:31,273 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:01:36,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=666133.3333333334, ans=0.125 2023-12-22 16:01:37,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=666133.3333333334, ans=0.125 2023-12-22 16:01:38,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=666133.3333333334, ans=0.95 2023-12-22 16:01:47,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=666200.0, ans=0.1 2023-12-22 16:01:49,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=666200.0, ans=0.125 2023-12-22 16:01:55,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=666266.6666666666, ans=0.0 2023-12-22 16:01:55,668 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.599e+01 2.941e+01 3.039e+01 3.146e+01 3.835e+01, threshold=6.079e+01, percent-clipped=0.0 2023-12-22 16:01:57,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=666266.6666666666, ans=0.025 2023-12-22 16:02:03,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2023-12-22 16:02:21,769 INFO [train.py:886] (0/4) Epoch 21, batch 4650, loss[loss=0.01529, audio_tagging_loss=0.01529, over 25000.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4957484.75 frames. ], batch size: 100, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:02:21,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=666466.6666666666, ans=0.125 2023-12-22 16:02:25,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=666466.6666666666, ans=0.1 2023-12-22 16:02:31,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=666533.3333333334, ans=0.125 2023-12-22 16:02:38,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=666533.3333333334, ans=0.125 2023-12-22 16:02:52,506 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-100000.pt 2023-12-22 16:02:58,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=666666.6666666666, ans=0.125 2023-12-22 16:03:13,782 INFO [train.py:886] (0/4) Epoch 21, batch 4700, loss[loss=0.01501, audio_tagging_loss=0.01501, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4955364.37 frames. ], batch size: 100, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:03:17,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=666800.0, ans=0.1 2023-12-22 16:03:35,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=666933.3333333334, ans=0.0 2023-12-22 16:03:37,514 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.714e+01 3.014e+01 3.141e+01 3.308e+01 3.967e+01, threshold=6.283e+01, percent-clipped=0.0 2023-12-22 16:03:44,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=667000.0, ans=0.1 2023-12-22 16:03:49,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=667000.0, ans=0.125 2023-12-22 16:04:01,476 INFO [train.py:886] (0/4) Epoch 21, batch 4750, loss[loss=0.01331, audio_tagging_loss=0.01331, over 24750.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4950137.47 frames. ], batch size: 99, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:04:07,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=667133.3333333334, ans=0.1 2023-12-22 16:04:10,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=667200.0, ans=0.0 2023-12-22 16:04:16,687 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-21.pt 2023-12-22 16:04:35,441 INFO [train.py:886] (0/4) Epoch 22, batch 0, loss[loss=0.03402, audio_tagging_loss=0.03402, over 19950.00 frames. ], tot_loss[loss=0.03402, audio_tagging_loss=0.03402, over 19950.00 frames. ], batch size: 107, lr: 4.99e-03, grad_scale: 64.0 2023-12-22 16:04:35,442 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 16:04:49,055 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.2447, 3.1808, 3.6645, 3.6578], device='cuda:0') 2023-12-22 16:04:55,983 INFO [train.py:917] (0/4) Epoch 22, validation: loss=0.03204, audio_tagging_loss=0.03204, over 3737520.00 frames. 2023-12-22 16:04:55,983 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 16:05:01,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=667240.0, ans=0.0 2023-12-22 16:05:02,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.01 vs. limit=6.0 2023-12-22 16:05:03,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=667240.0, ans=0.125 2023-12-22 16:05:28,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=667440.0, ans=0.1 2023-12-22 16:05:29,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=667440.0, ans=0.125 2023-12-22 16:05:41,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=667506.6666666666, ans=0.2 2023-12-22 16:05:47,414 INFO [train.py:886] (0/4) Epoch 22, batch 50, loss[loss=0.01825, audio_tagging_loss=0.01825, over 25000.00 frames. ], tot_loss[loss=0.02155, audio_tagging_loss=0.02155, over 1114410.14 frames. ], batch size: 100, lr: 4.99e-03, grad_scale: 32.0 2023-12-22 16:05:56,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=667573.3333333334, ans=0.2 2023-12-22 16:05:58,091 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.665e+01 3.143e+01 3.722e+01 4.421e+01 9.512e+01, threshold=7.444e+01, percent-clipped=8.0 2023-12-22 16:06:13,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=667706.6666666666, ans=0.0 2023-12-22 16:06:22,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=667773.3333333334, ans=0.1 2023-12-22 16:06:38,671 INFO [train.py:886] (0/4) Epoch 22, batch 100, loss[loss=0.01342, audio_tagging_loss=0.01342, over 25000.00 frames. ], tot_loss[loss=0.01863, audio_tagging_loss=0.01863, over 1970452.09 frames. ], batch size: 100, lr: 4.99e-03, grad_scale: 32.0 2023-12-22 16:06:39,190 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.83 vs. limit=15.0 2023-12-22 16:06:50,496 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:06:50,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=667973.3333333334, ans=0.0 2023-12-22 16:06:51,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=667973.3333333334, ans=0.0 2023-12-22 16:07:11,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=15.0 2023-12-22 16:07:22,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668173.3333333334, ans=0.1 2023-12-22 16:07:30,448 INFO [train.py:886] (0/4) Epoch 22, batch 150, loss[loss=0.01298, audio_tagging_loss=0.01298, over 25000.00 frames. ], tot_loss[loss=0.01694, audio_tagging_loss=0.01694, over 2626524.31 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:07:41,255 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.747e+01 3.091e+01 3.297e+01 3.433e+01 3.866e+01, threshold=6.595e+01, percent-clipped=0.0 2023-12-22 16:07:49,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.91 vs. limit=10.0 2023-12-22 16:08:21,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.35 vs. limit=6.0 2023-12-22 16:08:22,588 INFO [train.py:886] (0/4) Epoch 22, batch 200, loss[loss=0.01385, audio_tagging_loss=0.01385, over 24011.00 frames. ], tot_loss[loss=0.01583, audio_tagging_loss=0.01583, over 3144961.95 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:08:37,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.75 vs. limit=15.0 2023-12-22 16:08:45,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=668706.6666666666, ans=0.125 2023-12-22 16:08:48,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2023-12-22 16:08:49,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=668706.6666666666, ans=0.125 2023-12-22 16:09:04,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.30 vs. limit=22.5 2023-12-22 16:09:05,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=668840.0, ans=0.5 2023-12-22 16:09:06,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=12.0 2023-12-22 16:09:12,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=668840.0, ans=0.0 2023-12-22 16:09:12,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.79 vs. limit=15.0 2023-12-22 16:09:14,312 INFO [train.py:886] (0/4) Epoch 22, batch 250, loss[loss=0.0127, audio_tagging_loss=0.0127, over 25000.00 frames. ], tot_loss[loss=0.01521, audio_tagging_loss=0.01521, over 3545879.99 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:09:16,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=668906.6666666666, ans=0.125 2023-12-22 16:09:24,426 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.578e+01 2.955e+01 3.079e+01 3.215e+01 4.174e+01, threshold=6.159e+01, percent-clipped=0.0 2023-12-22 16:09:29,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=668973.3333333334, ans=0.125 2023-12-22 16:10:02,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=669173.3333333334, ans=0.0 2023-12-22 16:10:06,723 INFO [train.py:886] (0/4) Epoch 22, batch 300, loss[loss=0.01427, audio_tagging_loss=0.01427, over 24750.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 3860186.16 frames. ], batch size: 99, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:10:15,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=669240.0, ans=0.125 2023-12-22 16:10:25,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=669306.6666666666, ans=0.2 2023-12-22 16:10:33,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=669373.3333333334, ans=0.2 2023-12-22 16:10:49,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=22.5 2023-12-22 16:10:58,099 INFO [train.py:886] (0/4) Epoch 22, batch 350, loss[loss=0.01137, audio_tagging_loss=0.01137, over 24750.00 frames. ], tot_loss[loss=0.01465, audio_tagging_loss=0.01465, over 4097068.82 frames. ], batch size: 99, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:10:59,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=669573.3333333334, ans=0.07 2023-12-22 16:10:59,283 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.540e-02 2023-12-22 16:11:08,936 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.576e+01 2.952e+01 3.101e+01 3.215e+01 3.819e+01, threshold=6.201e+01, percent-clipped=0.0 2023-12-22 16:11:09,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=669640.0, ans=0.125 2023-12-22 16:11:10,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=669640.0, ans=0.125 2023-12-22 16:11:22,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=669706.6666666666, ans=0.125 2023-12-22 16:11:28,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=669773.3333333334, ans=0.0 2023-12-22 16:11:36,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=669773.3333333334, ans=0.1 2023-12-22 16:11:50,141 INFO [train.py:886] (0/4) Epoch 22, batch 400, loss[loss=0.01075, audio_tagging_loss=0.01075, over 25000.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4283792.08 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:11:53,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.31 vs. limit=10.0 2023-12-22 16:12:02,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=669973.3333333334, ans=0.125 2023-12-22 16:12:08,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=669973.3333333334, ans=0.125 2023-12-22 16:12:12,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=670040.0, ans=0.125 2023-12-22 16:12:17,452 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.28 vs. limit=22.5 2023-12-22 16:12:20,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=670106.6666666666, ans=0.0 2023-12-22 16:12:30,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.97 vs. limit=15.0 2023-12-22 16:12:41,793 INFO [train.py:886] (0/4) Epoch 22, batch 450, loss[loss=0.01255, audio_tagging_loss=0.01255, over 25000.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4434653.56 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:12:52,159 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.77 vs. limit=10.0 2023-12-22 16:12:52,652 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.604e+01 2.929e+01 3.055e+01 3.182e+01 3.732e+01, threshold=6.110e+01, percent-clipped=0.0 2023-12-22 16:12:59,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=670306.6666666666, ans=0.0 2023-12-22 16:13:01,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=670373.3333333334, ans=0.125 2023-12-22 16:13:14,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=670440.0, ans=0.1 2023-12-22 16:13:31,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2023-12-22 16:13:32,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=670573.3333333334, ans=0.125 2023-12-22 16:13:33,374 INFO [train.py:886] (0/4) Epoch 22, batch 500, loss[loss=0.01287, audio_tagging_loss=0.01287, over 25000.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4544188.60 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:13:33,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=670573.3333333334, ans=0.1 2023-12-22 16:13:39,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.77 vs. limit=15.0 2023-12-22 16:13:43,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=670640.0, ans=0.5 2023-12-22 16:13:59,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=670706.6666666666, ans=0.1 2023-12-22 16:14:03,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=670706.6666666666, ans=0.1 2023-12-22 16:14:05,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=670773.3333333334, ans=0.125 2023-12-22 16:14:25,945 INFO [train.py:886] (0/4) Epoch 22, batch 550, loss[loss=0.01316, audio_tagging_loss=0.01316, over 24750.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4632444.14 frames. ], batch size: 99, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:14:36,068 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.670e+01 2.918e+01 3.052e+01 3.206e+01 3.698e+01, threshold=6.105e+01, percent-clipped=0.0 2023-12-22 16:14:36,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=670973.3333333334, ans=0.0 2023-12-22 16:14:44,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=671040.0, ans=0.1 2023-12-22 16:14:46,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=671040.0, ans=0.125 2023-12-22 16:14:52,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-12-22 16:14:59,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=671106.6666666666, ans=0.125 2023-12-22 16:15:16,931 INFO [train.py:886] (0/4) Epoch 22, batch 600, loss[loss=0.01213, audio_tagging_loss=0.01213, over 24750.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4701383.41 frames. ], batch size: 99, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:16:02,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=671506.6666666666, ans=0.0 2023-12-22 16:16:08,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=671506.6666666666, ans=0.0 2023-12-22 16:16:09,861 INFO [train.py:886] (0/4) Epoch 22, batch 650, loss[loss=0.01622, audio_tagging_loss=0.01622, over 25000.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4751733.57 frames. ], batch size: 100, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:16:12,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=671573.3333333334, ans=0.125 2023-12-22 16:16:15,841 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:16:15,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=671573.3333333334, ans=0.125 2023-12-22 16:16:19,417 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.741e+01 2.962e+01 3.088e+01 3.252e+01 3.665e+01, threshold=6.175e+01, percent-clipped=0.0 2023-12-22 16:16:21,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=671640.0, ans=0.2 2023-12-22 16:16:38,580 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2023-12-22 16:16:45,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2023-12-22 16:16:48,190 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-12-22 16:16:51,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2023-12-22 16:16:59,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2023-12-22 16:17:01,203 INFO [train.py:886] (0/4) Epoch 22, batch 700, loss[loss=0.01474, audio_tagging_loss=0.01474, over 25000.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4799377.85 frames. ], batch size: 100, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:17:15,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=671973.3333333334, ans=0.125 2023-12-22 16:17:27,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=672040.0, ans=0.1 2023-12-22 16:17:28,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.10 vs. limit=10.0 2023-12-22 16:17:47,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=672173.3333333334, ans=0.0 2023-12-22 16:17:52,035 INFO [train.py:886] (0/4) Epoch 22, batch 750, loss[loss=0.01091, audio_tagging_loss=0.01091, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4834496.87 frames. ], batch size: 100, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:17:52,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=672240.0, ans=0.125 2023-12-22 16:17:55,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=672240.0, ans=0.04949747468305833 2023-12-22 16:18:02,883 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.741e+01 3.002e+01 3.128e+01 3.298e+01 3.708e+01, threshold=6.256e+01, percent-clipped=0.0 2023-12-22 16:18:21,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=12.0 2023-12-22 16:18:24,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2023-12-22 16:18:25,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=672440.0, ans=0.1 2023-12-22 16:18:45,118 INFO [train.py:886] (0/4) Epoch 22, batch 800, loss[loss=0.01175, audio_tagging_loss=0.01175, over 25000.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4861549.56 frames. ], batch size: 100, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:18:47,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=672573.3333333334, ans=0.0 2023-12-22 16:18:50,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0 2023-12-22 16:18:50,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.54 vs. limit=15.0 2023-12-22 16:18:51,404 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2023-12-22 16:19:05,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=672706.6666666666, ans=0.125 2023-12-22 16:19:06,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=672706.6666666666, ans=0.125 2023-12-22 16:19:13,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=672706.6666666666, ans=0.125 2023-12-22 16:19:21,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=672773.3333333334, ans=0.125 2023-12-22 16:19:36,119 INFO [train.py:886] (0/4) Epoch 22, batch 850, loss[loss=0.01459, audio_tagging_loss=0.01459, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4886932.84 frames. ], batch size: 100, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:19:47,079 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.653e+01 2.942e+01 3.056e+01 3.166e+01 3.620e+01, threshold=6.111e+01, percent-clipped=0.0 2023-12-22 16:19:58,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=673040.0, ans=0.125 2023-12-22 16:20:02,460 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.85 vs. limit=15.0 2023-12-22 16:20:28,690 INFO [train.py:886] (0/4) Epoch 22, batch 900, loss[loss=0.0121, audio_tagging_loss=0.0121, over 24750.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4899220.65 frames. ], batch size: 99, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:20:43,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=673306.6666666666, ans=0.125 2023-12-22 16:20:44,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673306.6666666666, ans=0.1 2023-12-22 16:20:50,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=673373.3333333334, ans=0.015 2023-12-22 16:21:20,368 INFO [train.py:886] (0/4) Epoch 22, batch 950, loss[loss=0.01387, audio_tagging_loss=0.01387, over 24945.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4904345.02 frames. ], batch size: 100, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:21:30,888 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+01 2.985e+01 3.099e+01 3.290e+01 3.638e+01, threshold=6.198e+01, percent-clipped=0.0 2023-12-22 16:21:31,627 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2023-12-22 16:21:34,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2023-12-22 16:21:47,013 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:22:04,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=673840.0, ans=0.0 2023-12-22 16:22:10,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.02 vs. limit=6.0 2023-12-22 16:22:11,947 INFO [train.py:886] (0/4) Epoch 22, batch 1000, loss[loss=0.01335, audio_tagging_loss=0.01335, over 24750.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4906138.28 frames. ], batch size: 99, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:22:19,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=673906.6666666666, ans=0.125 2023-12-22 16:22:31,486 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.71 vs. limit=15.0 2023-12-22 16:22:33,679 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.37 vs. limit=10.0 2023-12-22 16:22:35,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=674040.0, ans=0.125 2023-12-22 16:22:39,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=674040.0, ans=0.125 2023-12-22 16:22:46,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2023-12-22 16:22:47,368 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:23:03,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=674173.3333333334, ans=0.0 2023-12-22 16:23:05,200 INFO [train.py:886] (0/4) Epoch 22, batch 1050, loss[loss=0.01356, audio_tagging_loss=0.01356, over 24750.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4912970.96 frames. ], batch size: 99, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:23:06,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=674240.0, ans=0.125 2023-12-22 16:23:14,727 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.557e+01 2.919e+01 3.070e+01 3.239e+01 4.205e+01, threshold=6.141e+01, percent-clipped=0.0 2023-12-22 16:23:36,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=674440.0, ans=0.0 2023-12-22 16:23:39,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=674440.0, ans=0.125 2023-12-22 16:23:41,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=674440.0, ans=0.1 2023-12-22 16:23:56,077 INFO [train.py:886] (0/4) Epoch 22, batch 1100, loss[loss=0.01248, audio_tagging_loss=0.01248, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4918237.69 frames. ], batch size: 100, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:24:06,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=674640.0, ans=0.0 2023-12-22 16:24:06,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=674640.0, ans=0.1 2023-12-22 16:24:46,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.38 vs. limit=22.5 2023-12-22 16:24:49,291 INFO [train.py:886] (0/4) Epoch 22, batch 1150, loss[loss=0.01356, audio_tagging_loss=0.01356, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4926923.60 frames. ], batch size: 100, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:24:59,415 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.584e+01 2.884e+01 2.992e+01 3.161e+01 3.623e+01, threshold=5.985e+01, percent-clipped=0.0 2023-12-22 16:25:02,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=674973.3333333334, ans=0.2 2023-12-22 16:25:18,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=675040.0, ans=0.0 2023-12-22 16:25:40,893 INFO [train.py:886] (0/4) Epoch 22, batch 1200, loss[loss=0.0175, audio_tagging_loss=0.0175, over 24942.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4939268.77 frames. ], batch size: 100, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:25:51,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=675306.6666666666, ans=0.09899494936611666 2023-12-22 16:25:55,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=675306.6666666666, ans=0.07 2023-12-22 16:26:01,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.00 vs. limit=10.0 2023-12-22 16:26:29,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-12-22 16:26:29,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=675506.6666666666, ans=0.125 2023-12-22 16:26:32,300 INFO [train.py:886] (0/4) Epoch 22, batch 1250, loss[loss=0.01556, audio_tagging_loss=0.01556, over 24750.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4935772.87 frames. ], batch size: 99, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:26:36,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=675573.3333333334, ans=0.125 2023-12-22 16:26:42,494 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.77 vs. limit=10.0 2023-12-22 16:26:43,011 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.720e+01 3.009e+01 3.140e+01 3.242e+01 3.734e+01, threshold=6.280e+01, percent-clipped=0.0 2023-12-22 16:27:24,924 INFO [train.py:886] (0/4) Epoch 22, batch 1300, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4928915.77 frames. ], batch size: 99, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:27:43,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=675973.3333333334, ans=0.125 2023-12-22 16:27:46,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=676040.0, ans=0.0 2023-12-22 16:28:08,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2023-12-22 16:28:17,147 INFO [train.py:886] (0/4) Epoch 22, batch 1350, loss[loss=0.01255, audio_tagging_loss=0.01255, over 25000.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4932291.29 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:28:17,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=676240.0, ans=0.125 2023-12-22 16:28:27,391 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+01 2.919e+01 3.091e+01 3.263e+01 3.767e+01, threshold=6.183e+01, percent-clipped=0.0 2023-12-22 16:28:29,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=676306.6666666666, ans=0.125 2023-12-22 16:28:59,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-12-22 16:29:08,763 INFO [train.py:886] (0/4) Epoch 22, batch 1400, loss[loss=0.01386, audio_tagging_loss=0.01386, over 25000.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4938006.69 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:29:09,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=676573.3333333334, ans=0.0 2023-12-22 16:29:15,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=676573.3333333334, ans=0.125 2023-12-22 16:29:41,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2023-12-22 16:29:49,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.81 vs. limit=15.0 2023-12-22 16:29:51,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=676840.0, ans=0.125 2023-12-22 16:30:00,718 INFO [train.py:886] (0/4) Epoch 22, batch 1450, loss[loss=0.0109, audio_tagging_loss=0.0109, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4948433.88 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:30:10,223 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.636e+01 2.926e+01 3.092e+01 3.201e+01 4.336e+01, threshold=6.185e+01, percent-clipped=0.0 2023-12-22 16:30:18,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=676973.3333333334, ans=0.2 2023-12-22 16:30:26,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=677040.0, ans=0.0 2023-12-22 16:30:46,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-12-22 16:30:51,772 INFO [train.py:886] (0/4) Epoch 22, batch 1500, loss[loss=0.01379, audio_tagging_loss=0.01379, over 25000.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4955161.49 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:30:58,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=677240.0, ans=0.125 2023-12-22 16:31:10,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=677306.6666666666, ans=0.0 2023-12-22 16:31:17,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=677373.3333333334, ans=0.125 2023-12-22 16:31:19,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=677373.3333333334, ans=0.0 2023-12-22 16:31:32,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=677440.0, ans=0.0 2023-12-22 16:31:41,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=677506.6666666666, ans=0.0 2023-12-22 16:31:44,047 INFO [train.py:886] (0/4) Epoch 22, batch 1550, loss[loss=0.01294, audio_tagging_loss=0.01294, over 24750.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4946112.04 frames. ], batch size: 99, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:31:49,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=677573.3333333334, ans=0.125 2023-12-22 16:31:54,080 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.798e+01 3.056e+01 3.155e+01 3.308e+01 3.901e+01, threshold=6.310e+01, percent-clipped=0.0 2023-12-22 16:32:05,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=12.0 2023-12-22 16:32:28,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=677840.0, ans=0.0 2023-12-22 16:32:35,994 INFO [train.py:886] (0/4) Epoch 22, batch 1600, loss[loss=0.01185, audio_tagging_loss=0.01185, over 24750.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4939327.42 frames. ], batch size: 99, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:32:39,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=15.0 2023-12-22 16:32:42,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=677906.6666666666, ans=0.125 2023-12-22 16:32:49,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=677973.3333333334, ans=0.0 2023-12-22 16:32:53,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=677973.3333333334, ans=0.1 2023-12-22 16:32:59,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=678040.0, ans=0.125 2023-12-22 16:33:08,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=678106.6666666666, ans=0.125 2023-12-22 16:33:08,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=678106.6666666666, ans=0.125 2023-12-22 16:33:08,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=678106.6666666666, ans=10.0 2023-12-22 16:33:10,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=678106.6666666666, ans=0.125 2023-12-22 16:33:10,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.24 vs. limit=15.0 2023-12-22 16:33:15,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=678106.6666666666, ans=0.1 2023-12-22 16:33:15,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2023-12-22 16:33:17,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=678173.3333333334, ans=0.5 2023-12-22 16:33:26,343 INFO [train.py:886] (0/4) Epoch 22, batch 1650, loss[loss=0.01374, audio_tagging_loss=0.01374, over 25000.00 frames. ], tot_loss[loss=0.01353, audio_tagging_loss=0.01353, over 4941504.61 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:33:27,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=678240.0, ans=0.125 2023-12-22 16:33:30,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=678240.0, ans=0.125 2023-12-22 16:33:37,917 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.694e+01 2.963e+01 3.106e+01 3.211e+01 3.845e+01, threshold=6.212e+01, percent-clipped=0.0 2023-12-22 16:33:46,638 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2023-12-22 16:33:55,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=678373.3333333334, ans=0.125 2023-12-22 16:34:19,398 INFO [train.py:886] (0/4) Epoch 22, batch 1700, loss[loss=0.01423, audio_tagging_loss=0.01423, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4939163.71 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:34:24,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=678573.3333333334, ans=0.125 2023-12-22 16:34:31,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=678640.0, ans=0.125 2023-12-22 16:34:43,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=678706.6666666666, ans=0.125 2023-12-22 16:34:48,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2023-12-22 16:34:55,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=678773.3333333334, ans=0.125 2023-12-22 16:34:55,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.17 vs. limit=15.0 2023-12-22 16:34:56,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=678773.3333333334, ans=0.125 2023-12-22 16:34:58,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=12.0 2023-12-22 16:35:06,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=678840.0, ans=0.125 2023-12-22 16:35:11,972 INFO [train.py:886] (0/4) Epoch 22, batch 1750, loss[loss=0.01357, audio_tagging_loss=0.01357, over 25000.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4944573.61 frames. ], batch size: 100, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:35:12,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=678906.6666666666, ans=0.125 2023-12-22 16:35:13,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=678906.6666666666, ans=0.125 2023-12-22 16:35:19,198 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2023-12-22 16:35:22,231 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.554e+01 2.914e+01 2.997e+01 3.169e+01 3.655e+01, threshold=5.994e+01, percent-clipped=0.0 2023-12-22 16:35:25,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=678973.3333333334, ans=0.125 2023-12-22 16:35:40,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=679040.0, ans=0.0 2023-12-22 16:35:49,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=679106.6666666666, ans=0.2 2023-12-22 16:36:03,019 INFO [train.py:886] (0/4) Epoch 22, batch 1800, loss[loss=0.01258, audio_tagging_loss=0.01258, over 25000.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4944473.48 frames. ], batch size: 100, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:36:03,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=679240.0, ans=0.1 2023-12-22 16:36:05,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=679240.0, ans=0.125 2023-12-22 16:36:07,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=679240.0, ans=0.0 2023-12-22 16:36:23,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=679373.3333333334, ans=0.1 2023-12-22 16:36:55,376 INFO [train.py:886] (0/4) Epoch 22, batch 1850, loss[loss=0.01335, audio_tagging_loss=0.01335, over 21280.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4941381.09 frames. ], batch size: 107, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:36:59,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.37 vs. limit=15.0 2023-12-22 16:37:05,643 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.729e+01 2.979e+01 3.098e+01 3.249e+01 3.883e+01, threshold=6.197e+01, percent-clipped=0.0 2023-12-22 16:37:10,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.41 vs. limit=15.0 2023-12-22 16:37:34,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=679773.3333333334, ans=0.125 2023-12-22 16:37:46,025 INFO [train.py:886] (0/4) Epoch 22, batch 1900, loss[loss=0.01439, audio_tagging_loss=0.01439, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4939462.41 frames. ], batch size: 100, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:37:56,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=679973.3333333334, ans=0.2 2023-12-22 16:37:58,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=679973.3333333334, ans=0.05 2023-12-22 16:38:08,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=680040.0, ans=0.125 2023-12-22 16:38:11,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=680040.0, ans=0.125 2023-12-22 16:38:13,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=680040.0, ans=0.0 2023-12-22 16:38:38,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=680240.0, ans=0.09899494936611666 2023-12-22 16:38:39,054 INFO [train.py:886] (0/4) Epoch 22, batch 1950, loss[loss=0.01269, audio_tagging_loss=0.01269, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4941774.61 frames. ], batch size: 100, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:38:48,507 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.589e+01 3.054e+01 3.166e+01 3.335e+01 3.897e+01, threshold=6.333e+01, percent-clipped=0.0 2023-12-22 16:39:01,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=680373.3333333334, ans=0.0 2023-12-22 16:39:01,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.06 vs. limit=15.0 2023-12-22 16:39:03,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=680373.3333333334, ans=0.125 2023-12-22 16:39:30,781 INFO [train.py:886] (0/4) Epoch 22, batch 2000, loss[loss=0.01434, audio_tagging_loss=0.01434, over 24750.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4944839.65 frames. ], batch size: 99, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:39:50,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=680706.6666666666, ans=0.035 2023-12-22 16:40:03,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=680773.3333333334, ans=0.0 2023-12-22 16:40:09,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=680773.3333333334, ans=0.0 2023-12-22 16:40:10,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=680773.3333333334, ans=0.125 2023-12-22 16:40:11,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=680840.0, ans=0.125 2023-12-22 16:40:12,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.81 vs. limit=10.0 2023-12-22 16:40:19,924 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-12-22 16:40:22,161 INFO [train.py:886] (0/4) Epoch 22, batch 2050, loss[loss=0.01261, audio_tagging_loss=0.01261, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4944819.00 frames. ], batch size: 99, lr: 4.94e-03, grad_scale: 64.0 2023-12-22 16:40:25,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=680906.6666666666, ans=0.0 2023-12-22 16:40:33,015 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.587e+01 2.841e+01 3.013e+01 3.146e+01 3.558e+01, threshold=6.025e+01, percent-clipped=0.0 2023-12-22 16:40:34,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=680973.3333333334, ans=0.125 2023-12-22 16:40:39,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=680973.3333333334, ans=0.0 2023-12-22 16:40:51,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=681040.0, ans=0.1 2023-12-22 16:40:55,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.77 vs. limit=6.0 2023-12-22 16:41:01,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=681106.6666666666, ans=0.2 2023-12-22 16:41:09,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=681173.3333333334, ans=0.125 2023-12-22 16:41:13,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=681240.0, ans=0.0 2023-12-22 16:41:13,762 INFO [train.py:886] (0/4) Epoch 22, batch 2100, loss[loss=0.01541, audio_tagging_loss=0.01541, over 25000.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4951483.15 frames. ], batch size: 100, lr: 4.94e-03, grad_scale: 64.0 2023-12-22 16:41:16,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=681240.0, ans=0.125 2023-12-22 16:41:44,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=681440.0, ans=0.0 2023-12-22 16:42:05,494 INFO [train.py:886] (0/4) Epoch 22, batch 2150, loss[loss=0.01414, audio_tagging_loss=0.01414, over 24750.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4949869.94 frames. ], batch size: 99, lr: 4.94e-03, grad_scale: 64.0 2023-12-22 16:42:05,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681573.3333333334, ans=0.1 2023-12-22 16:42:12,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=681573.3333333334, ans=0.1 2023-12-22 16:42:15,653 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.651e+01 3.015e+01 3.093e+01 3.215e+01 3.763e+01, threshold=6.186e+01, percent-clipped=0.0 2023-12-22 16:42:21,126 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.303e-02 2023-12-22 16:42:29,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.56 vs. limit=15.0 2023-12-22 16:42:35,356 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.51 vs. limit=15.0 2023-12-22 16:42:47,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=681840.0, ans=0.125 2023-12-22 16:42:47,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=681840.0, ans=0.0 2023-12-22 16:42:51,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=681840.0, ans=0.125 2023-12-22 16:42:53,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681840.0, ans=0.1 2023-12-22 16:42:57,807 INFO [train.py:886] (0/4) Epoch 22, batch 2200, loss[loss=0.0159, audio_tagging_loss=0.0159, over 24750.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4937917.21 frames. ], batch size: 99, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:43:03,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=681906.6666666666, ans=0.09899494936611666 2023-12-22 16:43:06,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=681906.6666666666, ans=0.0 2023-12-22 16:43:09,367 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:43:19,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=682040.0, ans=0.125 2023-12-22 16:43:26,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=682040.0, ans=0.125 2023-12-22 16:43:28,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=682106.6666666666, ans=0.125 2023-12-22 16:43:49,669 INFO [train.py:886] (0/4) Epoch 22, batch 2250, loss[loss=0.01221, audio_tagging_loss=0.01221, over 25000.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4942313.82 frames. ], batch size: 100, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:44:00,831 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+01 2.973e+01 3.104e+01 3.289e+01 3.674e+01, threshold=6.209e+01, percent-clipped=0.0 2023-12-22 16:44:02,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=682306.6666666666, ans=0.125 2023-12-22 16:44:02,373 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-12-22 16:44:07,151 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-12-22 16:44:07,715 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:44:11,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.84 vs. limit=22.5 2023-12-22 16:44:13,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=682373.3333333334, ans=0.07 2023-12-22 16:44:15,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682373.3333333334, ans=0.1 2023-12-22 16:44:27,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=682440.0, ans=0.125 2023-12-22 16:44:34,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2023-12-22 16:44:38,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=682506.6666666666, ans=0.125 2023-12-22 16:44:42,454 INFO [train.py:886] (0/4) Epoch 22, batch 2300, loss[loss=0.01379, audio_tagging_loss=0.01379, over 24750.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4944826.21 frames. ], batch size: 99, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:45:09,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=682706.6666666666, ans=0.125 2023-12-22 16:45:22,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682773.3333333334, ans=0.1 2023-12-22 16:45:33,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=682906.6666666666, ans=0.2 2023-12-22 16:45:34,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=682906.6666666666, ans=0.2 2023-12-22 16:45:34,702 INFO [train.py:886] (0/4) Epoch 22, batch 2350, loss[loss=0.01456, audio_tagging_loss=0.01456, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4945987.89 frames. ], batch size: 100, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:45:45,017 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.610e+01 2.951e+01 3.052e+01 3.214e+01 3.845e+01, threshold=6.104e+01, percent-clipped=0.0 2023-12-22 16:45:45,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=682973.3333333334, ans=0.125 2023-12-22 16:45:57,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.51 vs. limit=15.0 2023-12-22 16:46:15,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2023-12-22 16:46:27,131 INFO [train.py:886] (0/4) Epoch 22, batch 2400, loss[loss=0.01455, audio_tagging_loss=0.01455, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4952378.66 frames. ], batch size: 100, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:46:38,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=683306.6666666666, ans=0.125 2023-12-22 16:46:39,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=683306.6666666666, ans=0.0 2023-12-22 16:46:42,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=683306.6666666666, ans=0.2 2023-12-22 16:46:45,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.88 vs. limit=6.0 2023-12-22 16:46:50,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=683373.3333333334, ans=0.0 2023-12-22 16:46:56,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=683373.3333333334, ans=0.0 2023-12-22 16:47:00,925 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2023-12-22 16:47:15,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=683506.6666666666, ans=0.05 2023-12-22 16:47:18,260 INFO [train.py:886] (0/4) Epoch 22, batch 2450, loss[loss=0.01441, audio_tagging_loss=0.01441, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4954658.19 frames. ], batch size: 100, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:47:29,198 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.542e+01 2.984e+01 3.077e+01 3.217e+01 3.781e+01, threshold=6.155e+01, percent-clipped=0.0 2023-12-22 16:47:35,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=683640.0, ans=0.125 2023-12-22 16:47:36,714 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:48:00,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=683840.0, ans=0.0 2023-12-22 16:48:03,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=683840.0, ans=0.125 2023-12-22 16:48:04,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=683840.0, ans=0.5 2023-12-22 16:48:08,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=683840.0, ans=0.125 2023-12-22 16:48:08,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=683840.0, ans=0.125 2023-12-22 16:48:10,594 INFO [train.py:886] (0/4) Epoch 22, batch 2500, loss[loss=0.01336, audio_tagging_loss=0.01336, over 25000.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4948971.85 frames. ], batch size: 100, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:48:14,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=683906.6666666666, ans=0.125 2023-12-22 16:48:26,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=683973.3333333334, ans=0.09899494936611666 2023-12-22 16:48:28,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=12.0 2023-12-22 16:49:03,076 INFO [train.py:886] (0/4) Epoch 22, batch 2550, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4947865.36 frames. ], batch size: 99, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:49:03,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=684240.0, ans=0.125 2023-12-22 16:49:13,125 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.756e+01 2.972e+01 3.101e+01 3.260e+01 3.822e+01, threshold=6.203e+01, percent-clipped=0.0 2023-12-22 16:49:44,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=684506.6666666666, ans=0.125 2023-12-22 16:49:53,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=684573.3333333334, ans=0.125 2023-12-22 16:49:54,513 INFO [train.py:886] (0/4) Epoch 22, batch 2600, loss[loss=0.01305, audio_tagging_loss=0.01305, over 25000.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4949528.26 frames. ], batch size: 100, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:50:05,808 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2023-12-22 16:50:14,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=684706.6666666666, ans=0.0 2023-12-22 16:50:23,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=684706.6666666666, ans=0.125 2023-12-22 16:50:38,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=684840.0, ans=0.125 2023-12-22 16:50:43,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=684840.0, ans=0.125 2023-12-22 16:50:46,542 INFO [train.py:886] (0/4) Epoch 22, batch 2650, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4952054.03 frames. ], batch size: 100, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:50:56,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=684973.3333333334, ans=0.125 2023-12-22 16:50:56,719 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.462e+01 2.949e+01 3.109e+01 3.258e+01 4.396e+01, threshold=6.219e+01, percent-clipped=0.0 2023-12-22 16:51:00,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2023-12-22 16:51:07,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=685040.0, ans=0.0 2023-12-22 16:51:09,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=685040.0, ans=0.1 2023-12-22 16:51:12,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=685040.0, ans=0.2 2023-12-22 16:51:33,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=685173.3333333334, ans=0.0 2023-12-22 16:51:38,257 INFO [train.py:886] (0/4) Epoch 22, batch 2700, loss[loss=0.01383, audio_tagging_loss=0.01383, over 24750.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4953207.02 frames. ], batch size: 99, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:51:42,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=685240.0, ans=0.0 2023-12-22 16:51:43,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=685240.0, ans=0.1 2023-12-22 16:51:56,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=685306.6666666666, ans=0.1 2023-12-22 16:52:00,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=685373.3333333334, ans=0.1 2023-12-22 16:52:02,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=685373.3333333334, ans=0.0 2023-12-22 16:52:29,510 INFO [train.py:886] (0/4) Epoch 22, batch 2750, loss[loss=0.0139, audio_tagging_loss=0.0139, over 24750.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4955264.89 frames. ], batch size: 99, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:52:39,642 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.604e+01 2.941e+01 3.076e+01 3.293e+01 3.896e+01, threshold=6.152e+01, percent-clipped=0.0 2023-12-22 16:52:45,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=685640.0, ans=0.125 2023-12-22 16:52:53,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=685706.6666666666, ans=0.125 2023-12-22 16:53:02,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=685773.3333333334, ans=0.1 2023-12-22 16:53:09,416 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2023-12-22 16:53:22,640 INFO [train.py:886] (0/4) Epoch 22, batch 2800, loss[loss=0.01705, audio_tagging_loss=0.01705, over 24950.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4955875.29 frames. ], batch size: 100, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:53:34,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=685973.3333333334, ans=0.0 2023-12-22 16:54:11,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=686173.3333333334, ans=0.0 2023-12-22 16:54:13,874 INFO [train.py:886] (0/4) Epoch 22, batch 2850, loss[loss=0.01398, audio_tagging_loss=0.01398, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4950694.32 frames. ], batch size: 100, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:54:15,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=686240.0, ans=0.125 2023-12-22 16:54:21,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=686240.0, ans=0.125 2023-12-22 16:54:24,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=686306.6666666666, ans=0.125 2023-12-22 16:54:24,834 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.661e+01 2.963e+01 3.132e+01 3.265e+01 3.712e+01, threshold=6.264e+01, percent-clipped=0.0 2023-12-22 16:54:27,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=686306.6666666666, ans=0.125 2023-12-22 16:54:40,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=686373.3333333334, ans=0.1 2023-12-22 16:54:59,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686506.6666666666, ans=0.1 2023-12-22 16:55:00,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=686506.6666666666, ans=0.0 2023-12-22 16:55:01,480 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:55:06,041 INFO [train.py:886] (0/4) Epoch 22, batch 2900, loss[loss=0.01371, audio_tagging_loss=0.01371, over 24750.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4944253.53 frames. ], batch size: 99, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:55:27,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.08 vs. limit=15.0 2023-12-22 16:55:36,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=22.5 2023-12-22 16:55:40,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2023-12-22 16:55:53,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=686840.0, ans=0.125 2023-12-22 16:55:58,681 INFO [train.py:886] (0/4) Epoch 22, batch 2950, loss[loss=0.01118, audio_tagging_loss=0.01118, over 25000.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4951010.45 frames. ], batch size: 100, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:56:08,727 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.545e+01 2.913e+01 3.029e+01 3.205e+01 3.789e+01, threshold=6.058e+01, percent-clipped=0.0 2023-12-22 16:56:10,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=686973.3333333334, ans=0.05 2023-12-22 16:56:12,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=686973.3333333334, ans=0.0 2023-12-22 16:56:13,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.90 vs. limit=12.0 2023-12-22 16:56:14,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=686973.3333333334, ans=0.2 2023-12-22 16:56:30,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=687106.6666666666, ans=0.125 2023-12-22 16:56:33,115 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.11 vs. limit=15.0 2023-12-22 16:56:50,251 INFO [train.py:886] (0/4) Epoch 22, batch 3000, loss[loss=0.01711, audio_tagging_loss=0.01711, over 23961.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4948462.86 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 16:56:50,253 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 16:57:11,829 INFO [train.py:917] (0/4) Epoch 22, validation: loss=0.03274, audio_tagging_loss=0.03274, over 3737520.00 frames. 2023-12-22 16:57:11,830 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 16:57:12,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=687240.0, ans=0.2 2023-12-22 16:57:17,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=687240.0, ans=0.1 2023-12-22 16:57:28,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=687306.6666666666, ans=0.0 2023-12-22 16:57:41,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=687440.0, ans=0.035 2023-12-22 16:58:03,728 INFO [train.py:886] (0/4) Epoch 22, batch 3050, loss[loss=0.01441, audio_tagging_loss=0.01441, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4956791.33 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 16:58:06,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=687573.3333333334, ans=0.2 2023-12-22 16:58:08,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=687573.3333333334, ans=0.05 2023-12-22 16:58:10,511 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.59 vs. limit=12.0 2023-12-22 16:58:13,865 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.728e+01 2.983e+01 3.097e+01 3.226e+01 3.702e+01, threshold=6.194e+01, percent-clipped=0.0 2023-12-22 16:58:16,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=687640.0, ans=0.125 2023-12-22 16:58:30,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.45 vs. limit=10.0 2023-12-22 16:58:45,905 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-12-22 16:58:56,224 INFO [train.py:886] (0/4) Epoch 22, batch 3100, loss[loss=0.01232, audio_tagging_loss=0.01232, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4958211.32 frames. ], batch size: 99, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 16:58:57,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=687906.6666666666, ans=0.2 2023-12-22 16:59:05,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=687973.3333333334, ans=0.0 2023-12-22 16:59:20,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=688040.0, ans=0.125 2023-12-22 16:59:28,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=15.0 2023-12-22 16:59:45,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=688173.3333333334, ans=0.2 2023-12-22 16:59:48,290 INFO [train.py:886] (0/4) Epoch 22, batch 3150, loss[loss=0.0122, audio_tagging_loss=0.0122, over 24750.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4951936.54 frames. ], batch size: 99, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 16:59:59,267 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.678e+01 2.993e+01 3.103e+01 3.261e+01 3.891e+01, threshold=6.205e+01, percent-clipped=0.0 2023-12-22 17:00:07,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=688306.6666666666, ans=0.125 2023-12-22 17:00:19,989 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.642e-03 2023-12-22 17:00:23,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=688440.0, ans=0.125 2023-12-22 17:00:26,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-12-22 17:00:40,705 INFO [train.py:886] (0/4) Epoch 22, batch 3200, loss[loss=0.01349, audio_tagging_loss=0.01349, over 24750.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4947561.52 frames. ], batch size: 99, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 17:00:43,035 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.82 vs. limit=22.5 2023-12-22 17:00:54,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=688640.0, ans=0.0 2023-12-22 17:00:55,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=688640.0, ans=0.125 2023-12-22 17:00:55,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=15.0 2023-12-22 17:01:04,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=688706.6666666666, ans=0.1 2023-12-22 17:01:31,996 INFO [train.py:886] (0/4) Epoch 22, batch 3250, loss[loss=0.01279, audio_tagging_loss=0.01279, over 24750.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4946006.95 frames. ], batch size: 99, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 17:01:39,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=688906.6666666666, ans=0.125 2023-12-22 17:01:42,252 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.602e+01 2.942e+01 3.078e+01 3.201e+01 3.535e+01, threshold=6.156e+01, percent-clipped=0.0 2023-12-22 17:01:50,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=688973.3333333334, ans=0.125 2023-12-22 17:01:54,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=689040.0, ans=0.125 2023-12-22 17:02:12,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=689173.3333333334, ans=0.125 2023-12-22 17:02:13,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=689173.3333333334, ans=0.0 2023-12-22 17:02:24,450 INFO [train.py:886] (0/4) Epoch 22, batch 3300, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4948801.95 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 17:02:29,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=689240.0, ans=0.0 2023-12-22 17:02:43,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=689306.6666666666, ans=0.0 2023-12-22 17:02:44,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=689373.3333333334, ans=0.125 2023-12-22 17:02:47,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=689373.3333333334, ans=10.0 2023-12-22 17:03:00,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=689440.0, ans=0.0 2023-12-22 17:03:16,373 INFO [train.py:886] (0/4) Epoch 22, batch 3350, loss[loss=0.01106, audio_tagging_loss=0.01106, over 25000.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4944355.62 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 17:03:22,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=689573.3333333334, ans=0.125 2023-12-22 17:03:27,219 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.659e+01 2.983e+01 3.128e+01 3.276e+01 3.724e+01, threshold=6.256e+01, percent-clipped=0.0 2023-12-22 17:03:32,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=689640.0, ans=0.0 2023-12-22 17:03:33,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=689640.0, ans=0.04949747468305833 2023-12-22 17:03:35,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=689640.0, ans=0.125 2023-12-22 17:03:39,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=689706.6666666666, ans=0.125 2023-12-22 17:03:53,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.00 vs. limit=22.5 2023-12-22 17:04:01,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=689840.0, ans=0.0 2023-12-22 17:04:08,156 INFO [train.py:886] (0/4) Epoch 22, batch 3400, loss[loss=0.0153, audio_tagging_loss=0.0153, over 25000.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4948377.55 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 17:04:22,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=689973.3333333334, ans=0.125 2023-12-22 17:04:33,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.05 vs. limit=6.0 2023-12-22 17:04:33,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=690040.0, ans=0.025 2023-12-22 17:04:34,027 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2023-12-22 17:05:00,568 INFO [train.py:886] (0/4) Epoch 22, batch 3450, loss[loss=0.01523, audio_tagging_loss=0.01523, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4950990.44 frames. ], batch size: 99, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:05:04,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-12-22 17:05:10,805 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+01 3.072e+01 3.167e+01 3.266e+01 3.818e+01, threshold=6.334e+01, percent-clipped=0.0 2023-12-22 17:05:12,384 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=22.5 2023-12-22 17:05:44,129 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:05:45,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=690506.6666666666, ans=0.125 2023-12-22 17:05:52,068 INFO [train.py:886] (0/4) Epoch 22, batch 3500, loss[loss=0.01273, audio_tagging_loss=0.01273, over 24750.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4944036.83 frames. ], batch size: 99, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:06:27,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=690773.3333333334, ans=0.125 2023-12-22 17:06:40,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=690840.0, ans=0.95 2023-12-22 17:06:44,596 INFO [train.py:886] (0/4) Epoch 22, batch 3550, loss[loss=0.01097, audio_tagging_loss=0.01097, over 25000.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4947926.19 frames. ], batch size: 100, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:06:50,874 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.81 vs. limit=15.0 2023-12-22 17:06:54,157 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.635e+01 2.964e+01 3.144e+01 3.307e+01 3.937e+01, threshold=6.289e+01, percent-clipped=0.0 2023-12-22 17:06:54,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=690973.3333333334, ans=0.0 2023-12-22 17:06:59,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=690973.3333333334, ans=0.0 2023-12-22 17:07:07,115 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.25 vs. limit=10.0 2023-12-22 17:07:28,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=691173.3333333334, ans=0.125 2023-12-22 17:07:35,782 INFO [train.py:886] (0/4) Epoch 22, batch 3600, loss[loss=0.01146, audio_tagging_loss=0.01146, over 25000.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4952463.92 frames. ], batch size: 100, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:07:37,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=691240.0, ans=0.125 2023-12-22 17:07:38,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.92 vs. limit=15.0 2023-12-22 17:07:57,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=691373.3333333334, ans=0.1 2023-12-22 17:07:58,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.55 vs. limit=15.0 2023-12-22 17:08:01,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=691373.3333333334, ans=0.125 2023-12-22 17:08:16,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=691440.0, ans=0.1 2023-12-22 17:08:20,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=691506.6666666666, ans=0.0 2023-12-22 17:08:26,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=691506.6666666666, ans=0.5 2023-12-22 17:08:28,092 INFO [train.py:886] (0/4) Epoch 22, batch 3650, loss[loss=0.01451, audio_tagging_loss=0.01451, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4957522.80 frames. ], batch size: 100, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:08:38,309 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.534e+01 2.894e+01 3.035e+01 3.158e+01 3.520e+01, threshold=6.071e+01, percent-clipped=0.0 2023-12-22 17:09:00,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=691773.3333333334, ans=0.125 2023-12-22 17:09:07,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2023-12-22 17:09:08,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=691840.0, ans=0.0 2023-12-22 17:09:09,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=691840.0, ans=0.2 2023-12-22 17:09:16,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=691840.0, ans=0.125 2023-12-22 17:09:18,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=691840.0, ans=0.125 2023-12-22 17:09:19,858 INFO [train.py:886] (0/4) Epoch 22, batch 3700, loss[loss=0.01688, audio_tagging_loss=0.01688, over 25000.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4960958.08 frames. ], batch size: 100, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:09:22,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=691906.6666666666, ans=0.0 2023-12-22 17:09:37,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=691973.3333333334, ans=0.125 2023-12-22 17:09:40,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2023-12-22 17:09:46,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=692040.0, ans=0.125 2023-12-22 17:10:08,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=692173.3333333334, ans=0.125 2023-12-22 17:10:10,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=692173.3333333334, ans=0.2 2023-12-22 17:10:12,268 INFO [train.py:886] (0/4) Epoch 22, batch 3750, loss[loss=0.01572, audio_tagging_loss=0.01572, over 25000.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4962474.41 frames. ], batch size: 100, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:10:17,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=692240.0, ans=0.1 2023-12-22 17:10:22,354 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.766e+01 3.031e+01 3.113e+01 3.271e+01 3.807e+01, threshold=6.227e+01, percent-clipped=0.0 2023-12-22 17:10:23,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=692306.6666666666, ans=0.09899494936611666 2023-12-22 17:10:27,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=692306.6666666666, ans=0.0 2023-12-22 17:10:27,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.79 vs. limit=22.5 2023-12-22 17:10:29,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=692306.6666666666, ans=0.125 2023-12-22 17:10:39,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=692373.3333333334, ans=0.0 2023-12-22 17:10:46,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=15.0 2023-12-22 17:10:51,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=692440.0, ans=0.125 2023-12-22 17:11:04,400 INFO [train.py:886] (0/4) Epoch 22, batch 3800, loss[loss=0.0132, audio_tagging_loss=0.0132, over 25000.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4951738.08 frames. ], batch size: 100, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:11:50,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=692840.0, ans=0.0 2023-12-22 17:11:55,869 INFO [train.py:886] (0/4) Epoch 22, batch 3850, loss[loss=0.01298, audio_tagging_loss=0.01298, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4949462.53 frames. ], batch size: 99, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:12:06,652 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.634e+01 3.023e+01 3.138e+01 3.280e+01 3.905e+01, threshold=6.276e+01, percent-clipped=0.0 2023-12-22 17:12:12,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=692973.3333333334, ans=0.125 2023-12-22 17:12:18,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2023-12-22 17:12:22,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=693040.0, ans=0.0 2023-12-22 17:12:28,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.53 vs. limit=22.5 2023-12-22 17:12:34,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=693106.6666666666, ans=0.1 2023-12-22 17:12:47,260 INFO [train.py:886] (0/4) Epoch 22, batch 3900, loss[loss=0.01395, audio_tagging_loss=0.01395, over 24750.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4948655.32 frames. ], batch size: 99, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:12:52,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=693240.0, ans=0.125 2023-12-22 17:12:58,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=12.0 2023-12-22 17:13:01,158 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-104000.pt 2023-12-22 17:13:35,567 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.09 vs. limit=10.0 2023-12-22 17:13:40,283 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=15.0 2023-12-22 17:13:41,586 INFO [train.py:886] (0/4) Epoch 22, batch 3950, loss[loss=0.01376, audio_tagging_loss=0.01376, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4949046.56 frames. ], batch size: 100, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:13:51,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=693640.0, ans=0.125 2023-12-22 17:13:51,725 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.685e+01 2.965e+01 3.079e+01 3.254e+01 4.090e+01, threshold=6.157e+01, percent-clipped=0.0 2023-12-22 17:14:06,125 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.40 vs. limit=22.5 2023-12-22 17:14:06,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=693706.6666666666, ans=0.95 2023-12-22 17:14:08,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.54 vs. limit=15.0 2023-12-22 17:14:12,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2023-12-22 17:14:15,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=693773.3333333334, ans=0.0 2023-12-22 17:14:15,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=693773.3333333334, ans=0.0 2023-12-22 17:14:21,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=693773.3333333334, ans=0.125 2023-12-22 17:14:33,363 INFO [train.py:886] (0/4) Epoch 22, batch 4000, loss[loss=0.01224, audio_tagging_loss=0.01224, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4952497.92 frames. ], batch size: 100, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:14:47,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=693973.3333333334, ans=0.2 2023-12-22 17:14:47,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.90 vs. limit=15.0 2023-12-22 17:14:49,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=693973.3333333334, ans=0.0 2023-12-22 17:14:53,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=694040.0, ans=0.2 2023-12-22 17:14:54,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=694040.0, ans=0.0 2023-12-22 17:14:58,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=694040.0, ans=0.0 2023-12-22 17:15:02,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=694040.0, ans=0.0 2023-12-22 17:15:10,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-12-22 17:15:13,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=694106.6666666666, ans=0.125 2023-12-22 17:15:20,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=694173.3333333334, ans=0.1 2023-12-22 17:15:23,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.63 vs. limit=22.5 2023-12-22 17:15:25,317 INFO [train.py:886] (0/4) Epoch 22, batch 4050, loss[loss=0.01376, audio_tagging_loss=0.01376, over 24750.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4952876.71 frames. ], batch size: 99, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:15:36,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=694306.6666666666, ans=0.0 2023-12-22 17:15:37,179 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.720e+01 3.013e+01 3.150e+01 3.347e+01 3.751e+01, threshold=6.299e+01, percent-clipped=0.0 2023-12-22 17:15:50,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=694373.3333333334, ans=0.125 2023-12-22 17:15:53,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.71 vs. limit=10.0 2023-12-22 17:16:03,011 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:16:12,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=694506.6666666666, ans=0.025 2023-12-22 17:16:12,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=694506.6666666666, ans=0.07 2023-12-22 17:16:14,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=694506.6666666666, ans=0.2 2023-12-22 17:16:17,442 INFO [train.py:886] (0/4) Epoch 22, batch 4100, loss[loss=0.01428, audio_tagging_loss=0.01428, over 24750.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4946525.94 frames. ], batch size: 99, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:16:21,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=694573.3333333334, ans=0.125 2023-12-22 17:16:24,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=694573.3333333334, ans=0.2 2023-12-22 17:16:25,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=694573.3333333334, ans=0.0 2023-12-22 17:16:25,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=694573.3333333334, ans=0.125 2023-12-22 17:16:27,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=694640.0, ans=0.0 2023-12-22 17:16:36,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=694640.0, ans=0.0 2023-12-22 17:16:43,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=694706.6666666666, ans=0.125 2023-12-22 17:16:45,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=694706.6666666666, ans=0.125 2023-12-22 17:16:50,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=694773.3333333334, ans=0.0 2023-12-22 17:16:54,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=694773.3333333334, ans=0.2 2023-12-22 17:16:56,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694773.3333333334, ans=0.1 2023-12-22 17:16:58,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=15.0 2023-12-22 17:16:59,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=694840.0, ans=0.125 2023-12-22 17:17:06,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.00 vs. limit=10.0 2023-12-22 17:17:10,082 INFO [train.py:886] (0/4) Epoch 22, batch 4150, loss[loss=0.01352, audio_tagging_loss=0.01352, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4933512.00 frames. ], batch size: 100, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:17:10,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=694906.6666666666, ans=10.0 2023-12-22 17:17:21,196 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.699e+01 2.978e+01 3.136e+01 3.267e+01 3.850e+01, threshold=6.272e+01, percent-clipped=0.0 2023-12-22 17:17:25,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=694973.3333333334, ans=0.0 2023-12-22 17:17:27,132 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.48 vs. limit=12.0 2023-12-22 17:17:27,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=694973.3333333334, ans=0.125 2023-12-22 17:17:41,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=695106.6666666666, ans=0.125 2023-12-22 17:18:00,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.67 vs. limit=8.0 2023-12-22 17:18:01,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=695240.0, ans=0.1 2023-12-22 17:18:02,010 INFO [train.py:886] (0/4) Epoch 22, batch 4200, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4940220.84 frames. ], batch size: 100, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:18:07,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=695240.0, ans=0.125 2023-12-22 17:18:14,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=695306.6666666666, ans=0.09899494936611666 2023-12-22 17:18:16,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=695306.6666666666, ans=0.1 2023-12-22 17:18:20,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=695306.6666666666, ans=0.125 2023-12-22 17:18:30,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=695373.3333333334, ans=0.2 2023-12-22 17:18:39,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=695440.0, ans=0.125 2023-12-22 17:18:54,182 INFO [train.py:886] (0/4) Epoch 22, batch 4250, loss[loss=0.01682, audio_tagging_loss=0.01682, over 25000.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4943974.38 frames. ], batch size: 100, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:19:02,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=695573.3333333334, ans=0.1 2023-12-22 17:19:05,255 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.700e+01 2.977e+01 3.097e+01 3.231e+01 4.216e+01, threshold=6.193e+01, percent-clipped=0.0 2023-12-22 17:19:05,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2023-12-22 17:19:29,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=695773.3333333334, ans=0.0 2023-12-22 17:19:35,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=695840.0, ans=0.1 2023-12-22 17:19:44,876 INFO [train.py:886] (0/4) Epoch 22, batch 4300, loss[loss=0.0127, audio_tagging_loss=0.0127, over 25000.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4945831.01 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:19:54,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=695906.6666666666, ans=22.5 2023-12-22 17:20:00,256 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.58 vs. limit=22.5 2023-12-22 17:20:14,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.95 vs. limit=22.5 2023-12-22 17:20:23,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=696106.6666666666, ans=0.1 2023-12-22 17:20:37,671 INFO [train.py:886] (0/4) Epoch 22, batch 4350, loss[loss=0.01319, audio_tagging_loss=0.01319, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4948184.90 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:20:48,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.54 vs. limit=15.0 2023-12-22 17:20:48,810 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.645e+01 3.006e+01 3.144e+01 3.307e+01 3.616e+01, threshold=6.289e+01, percent-clipped=0.0 2023-12-22 17:20:50,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=696306.6666666666, ans=0.125 2023-12-22 17:21:09,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=696440.0, ans=0.1 2023-12-22 17:21:16,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.80 vs. limit=22.5 2023-12-22 17:21:19,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=696506.6666666666, ans=0.125 2023-12-22 17:21:29,648 INFO [train.py:886] (0/4) Epoch 22, batch 4400, loss[loss=0.01483, audio_tagging_loss=0.01483, over 24750.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4945512.65 frames. ], batch size: 99, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:21:37,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=696573.3333333334, ans=0.125 2023-12-22 17:21:47,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=696640.0, ans=0.0 2023-12-22 17:21:59,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=696706.6666666666, ans=0.0 2023-12-22 17:22:10,181 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:22:17,646 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2023-12-22 17:22:22,008 INFO [train.py:886] (0/4) Epoch 22, batch 4450, loss[loss=0.01223, audio_tagging_loss=0.01223, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4946726.49 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:22:32,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=696973.3333333334, ans=0.125 2023-12-22 17:22:33,125 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.622e+01 3.004e+01 3.128e+01 3.261e+01 3.806e+01, threshold=6.255e+01, percent-clipped=0.0 2023-12-22 17:22:39,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=15.0 2023-12-22 17:22:47,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-12-22 17:22:59,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.08 vs. limit=15.0 2023-12-22 17:23:04,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=697173.3333333334, ans=0.125 2023-12-22 17:23:07,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=697173.3333333334, ans=0.2 2023-12-22 17:23:10,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=697173.3333333334, ans=0.0 2023-12-22 17:23:12,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2023-12-22 17:23:13,874 INFO [train.py:886] (0/4) Epoch 22, batch 4500, loss[loss=0.01357, audio_tagging_loss=0.01357, over 25000.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4948966.82 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:23:16,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=697240.0, ans=0.125 2023-12-22 17:23:30,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=697306.6666666666, ans=0.125 2023-12-22 17:23:35,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=697373.3333333334, ans=0.125 2023-12-22 17:23:52,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=697440.0, ans=0.05 2023-12-22 17:24:04,990 INFO [train.py:886] (0/4) Epoch 22, batch 4550, loss[loss=0.01617, audio_tagging_loss=0.01617, over 25000.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4950932.93 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:24:17,626 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.664e+01 2.927e+01 3.030e+01 3.234e+01 3.642e+01, threshold=6.060e+01, percent-clipped=0.0 2023-12-22 17:24:18,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=15.0 2023-12-22 17:24:25,435 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=12.0 2023-12-22 17:24:35,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=697773.3333333334, ans=0.0 2023-12-22 17:24:42,321 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.76 vs. limit=6.0 2023-12-22 17:24:53,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=697840.0, ans=0.0 2023-12-22 17:24:54,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=697840.0, ans=0.0 2023-12-22 17:24:54,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2023-12-22 17:24:58,052 INFO [train.py:886] (0/4) Epoch 22, batch 4600, loss[loss=0.01356, audio_tagging_loss=0.01356, over 25000.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4955470.38 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:25:06,131 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.28 vs. limit=10.0 2023-12-22 17:25:16,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=698040.0, ans=0.125 2023-12-22 17:25:17,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=698040.0, ans=0.125 2023-12-22 17:25:23,458 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=12.0 2023-12-22 17:25:26,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=698040.0, ans=0.0 2023-12-22 17:25:31,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=698106.6666666666, ans=10.0 2023-12-22 17:25:39,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=698173.3333333334, ans=6.0 2023-12-22 17:25:49,212 INFO [train.py:886] (0/4) Epoch 22, batch 4650, loss[loss=0.01287, audio_tagging_loss=0.01287, over 25000.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4954247.16 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 32.0 2023-12-22 17:25:54,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=698240.0, ans=0.2 2023-12-22 17:26:00,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=698306.6666666666, ans=0.125 2023-12-22 17:26:02,029 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.620e+01 2.976e+01 3.135e+01 3.289e+01 3.676e+01, threshold=6.270e+01, percent-clipped=0.0 2023-12-22 17:26:14,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=698373.3333333334, ans=0.2 2023-12-22 17:26:36,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=698506.6666666666, ans=0.1 2023-12-22 17:26:39,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=698506.6666666666, ans=0.1 2023-12-22 17:26:41,300 INFO [train.py:886] (0/4) Epoch 22, batch 4700, loss[loss=0.01183, audio_tagging_loss=0.01183, over 24750.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4946657.67 frames. ], batch size: 99, lr: 4.88e-03, grad_scale: 32.0 2023-12-22 17:26:45,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=698573.3333333334, ans=0.2 2023-12-22 17:26:45,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.89 vs. limit=22.5 2023-12-22 17:26:53,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698640.0, ans=0.1 2023-12-22 17:26:54,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=698640.0, ans=0.2 2023-12-22 17:27:11,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=698773.3333333334, ans=0.125 2023-12-22 17:27:20,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=698840.0, ans=0.125 2023-12-22 17:27:28,044 INFO [train.py:886] (0/4) Epoch 22, batch 4750, loss[loss=0.01276, audio_tagging_loss=0.01276, over 24750.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4943051.93 frames. ], batch size: 99, lr: 4.87e-03, grad_scale: 32.0 2023-12-22 17:27:31,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=698906.6666666666, ans=0.1 2023-12-22 17:27:39,153 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.691e+01 3.016e+01 3.140e+01 3.268e+01 3.852e+01, threshold=6.281e+01, percent-clipped=0.0 2023-12-22 17:27:43,641 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-22.pt 2023-12-22 17:28:02,130 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2023-12-22 17:28:02,424 INFO [train.py:886] (0/4) Epoch 23, batch 0, loss[loss=0.03401, audio_tagging_loss=0.03401, over 20986.00 frames. ], tot_loss[loss=0.03401, audio_tagging_loss=0.03401, over 20986.00 frames. ], batch size: 107, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:28:02,425 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 17:28:23,548 INFO [train.py:917] (0/4) Epoch 23, validation: loss=0.03207, audio_tagging_loss=0.03207, over 3737520.00 frames. 2023-12-22 17:28:23,549 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 17:28:25,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=699013.3333333334, ans=0.1 2023-12-22 17:28:49,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2023-12-22 17:28:51,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=699146.6666666666, ans=0.1 2023-12-22 17:28:58,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2023-12-22 17:29:11,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2023-12-22 17:29:14,251 INFO [train.py:886] (0/4) Epoch 23, batch 50, loss[loss=0.01849, audio_tagging_loss=0.01849, over 25000.00 frames. ], tot_loss[loss=0.02122, audio_tagging_loss=0.02122, over 1122475.01 frames. ], batch size: 100, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:29:16,805 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2023-12-22 17:29:22,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=699346.6666666666, ans=0.125 2023-12-22 17:29:30,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=699413.3333333334, ans=0.1 2023-12-22 17:29:37,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=699480.0, ans=0.0 2023-12-22 17:29:56,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=699613.3333333334, ans=0.125 2023-12-22 17:30:02,797 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.034e+01 3.567e+01 3.829e+01 4.360e+01 9.695e+01, threshold=7.658e+01, percent-clipped=7.0 2023-12-22 17:30:07,314 INFO [train.py:886] (0/4) Epoch 23, batch 100, loss[loss=0.01448, audio_tagging_loss=0.01448, over 25000.00 frames. ], tot_loss[loss=0.01847, audio_tagging_loss=0.01847, over 1969239.21 frames. ], batch size: 100, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:30:25,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=22.5 2023-12-22 17:30:54,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=699946.6666666666, ans=0.1 2023-12-22 17:30:57,477 INFO [train.py:886] (0/4) Epoch 23, batch 150, loss[loss=0.01117, audio_tagging_loss=0.01117, over 25000.00 frames. ], tot_loss[loss=0.01682, audio_tagging_loss=0.01682, over 2635796.72 frames. ], batch size: 100, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:31:05,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=700013.3333333334, ans=0.2 2023-12-22 17:31:34,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=700213.3333333334, ans=0.035 2023-12-22 17:31:35,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=700213.3333333334, ans=0.125 2023-12-22 17:31:45,967 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.729e+01 3.066e+01 3.193e+01 3.321e+01 3.839e+01, threshold=6.387e+01, percent-clipped=0.0 2023-12-22 17:31:50,601 INFO [train.py:886] (0/4) Epoch 23, batch 200, loss[loss=0.01396, audio_tagging_loss=0.01396, over 25000.00 frames. ], tot_loss[loss=0.01588, audio_tagging_loss=0.01588, over 3149537.01 frames. ], batch size: 100, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:31:55,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=700346.6666666666, ans=0.0 2023-12-22 17:32:17,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=700480.0, ans=0.2 2023-12-22 17:32:18,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=700480.0, ans=0.2 2023-12-22 17:32:24,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=700546.6666666666, ans=0.125 2023-12-22 17:32:26,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=700546.6666666666, ans=0.125 2023-12-22 17:32:29,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.90 vs. limit=12.0 2023-12-22 17:32:39,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=700613.3333333334, ans=0.1 2023-12-22 17:32:41,857 INFO [train.py:886] (0/4) Epoch 23, batch 250, loss[loss=0.01256, audio_tagging_loss=0.01256, over 25000.00 frames. ], tot_loss[loss=0.01515, audio_tagging_loss=0.01515, over 3552814.45 frames. ], batch size: 100, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:32:49,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=700680.0, ans=0.125 2023-12-22 17:33:00,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=700746.6666666666, ans=0.125 2023-12-22 17:33:01,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=700746.6666666666, ans=0.1 2023-12-22 17:33:14,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=700880.0, ans=0.2 2023-12-22 17:33:14,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.25 vs. limit=15.0 2023-12-22 17:33:30,256 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.814e+01 3.034e+01 3.166e+01 3.369e+01 3.955e+01, threshold=6.333e+01, percent-clipped=0.0 2023-12-22 17:33:34,007 INFO [train.py:886] (0/4) Epoch 23, batch 300, loss[loss=0.01532, audio_tagging_loss=0.01532, over 24750.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 3863099.79 frames. ], batch size: 99, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:33:44,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=701080.0, ans=0.0 2023-12-22 17:33:56,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=701146.6666666666, ans=0.125 2023-12-22 17:34:07,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=701213.3333333334, ans=0.035 2023-12-22 17:34:08,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.68 vs. limit=15.0 2023-12-22 17:34:09,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=701213.3333333334, ans=0.0 2023-12-22 17:34:25,915 INFO [train.py:886] (0/4) Epoch 23, batch 350, loss[loss=0.01571, audio_tagging_loss=0.01571, over 24750.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4096322.35 frames. ], batch size: 99, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:34:39,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=701413.3333333334, ans=0.0 2023-12-22 17:34:51,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=701480.0, ans=0.125 2023-12-22 17:35:02,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=701546.6666666666, ans=0.0 2023-12-22 17:35:03,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=701546.6666666666, ans=0.125 2023-12-22 17:35:13,064 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.681e+01 2.991e+01 3.090e+01 3.268e+01 3.987e+01, threshold=6.180e+01, percent-clipped=0.0 2023-12-22 17:35:16,899 INFO [train.py:886] (0/4) Epoch 23, batch 400, loss[loss=0.01441, audio_tagging_loss=0.01441, over 24750.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4287625.26 frames. ], batch size: 99, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:35:26,075 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2023-12-22 17:35:34,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.34 vs. limit=22.5 2023-12-22 17:35:37,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=701813.3333333334, ans=0.0 2023-12-22 17:35:39,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=701813.3333333334, ans=0.125 2023-12-22 17:35:39,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=701813.3333333334, ans=0.0 2023-12-22 17:35:59,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.08 vs. limit=22.5 2023-12-22 17:36:08,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=702013.3333333334, ans=0.125 2023-12-22 17:36:08,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=702013.3333333334, ans=0.0 2023-12-22 17:36:09,370 INFO [train.py:886] (0/4) Epoch 23, batch 450, loss[loss=0.01513, audio_tagging_loss=0.01513, over 21554.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4428982.78 frames. ], batch size: 107, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:36:12,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.68 vs. limit=22.5 2023-12-22 17:36:19,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.67 vs. limit=15.0 2023-12-22 17:36:30,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=702146.6666666666, ans=0.0 2023-12-22 17:36:32,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=702146.6666666666, ans=0.125 2023-12-22 17:36:40,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.40 vs. limit=22.5 2023-12-22 17:36:43,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=702213.3333333334, ans=0.1 2023-12-22 17:36:43,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=702213.3333333334, ans=0.0 2023-12-22 17:36:47,747 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.59 vs. limit=12.0 2023-12-22 17:36:51,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=702280.0, ans=0.125 2023-12-22 17:36:52,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.01 vs. limit=15.0 2023-12-22 17:36:57,226 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.676e+01 2.894e+01 3.036e+01 3.209e+01 3.784e+01, threshold=6.073e+01, percent-clipped=0.0 2023-12-22 17:36:57,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=702280.0, ans=0.1 2023-12-22 17:37:02,429 INFO [train.py:886] (0/4) Epoch 23, batch 500, loss[loss=0.01435, audio_tagging_loss=0.01435, over 21427.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4536447.90 frames. ], batch size: 107, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:37:03,509 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:37:19,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=702413.3333333334, ans=0.125 2023-12-22 17:37:40,278 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.26 vs. limit=10.0 2023-12-22 17:37:53,963 INFO [train.py:886] (0/4) Epoch 23, batch 550, loss[loss=0.01717, audio_tagging_loss=0.01717, over 25000.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4630920.04 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:38:14,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=702813.3333333334, ans=0.0 2023-12-22 17:38:16,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=702813.3333333334, ans=0.125 2023-12-22 17:38:20,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=702813.3333333334, ans=0.2 2023-12-22 17:38:20,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=702813.3333333334, ans=0.2 2023-12-22 17:38:23,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=702880.0, ans=0.125 2023-12-22 17:38:30,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=702880.0, ans=0.0 2023-12-22 17:38:30,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=702880.0, ans=0.125 2023-12-22 17:38:32,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=702880.0, ans=0.125 2023-12-22 17:38:42,268 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.710e+01 2.974e+01 3.097e+01 3.242e+01 4.856e+01, threshold=6.195e+01, percent-clipped=0.0 2023-12-22 17:38:46,277 INFO [train.py:886] (0/4) Epoch 23, batch 600, loss[loss=0.0124, audio_tagging_loss=0.0124, over 24750.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4699146.93 frames. ], batch size: 99, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:38:47,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=703013.3333333334, ans=0.125 2023-12-22 17:38:49,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=703013.3333333334, ans=0.125 2023-12-22 17:38:51,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2023-12-22 17:38:52,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=703013.3333333334, ans=0.0 2023-12-22 17:39:00,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=703080.0, ans=0.125 2023-12-22 17:39:33,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=703280.0, ans=0.125 2023-12-22 17:39:38,010 INFO [train.py:886] (0/4) Epoch 23, batch 650, loss[loss=0.01372, audio_tagging_loss=0.01372, over 24750.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4748193.68 frames. ], batch size: 99, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:39:47,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=703346.6666666666, ans=0.1 2023-12-22 17:39:57,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=703480.0, ans=0.125 2023-12-22 17:40:01,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.99 vs. limit=15.0 2023-12-22 17:40:19,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=703613.3333333334, ans=0.125 2023-12-22 17:40:25,969 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.701e+01 3.073e+01 3.203e+01 3.360e+01 3.704e+01, threshold=6.407e+01, percent-clipped=0.0 2023-12-22 17:40:29,830 INFO [train.py:886] (0/4) Epoch 23, batch 700, loss[loss=0.01076, audio_tagging_loss=0.01076, over 25000.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4785125.49 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:40:38,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=703680.0, ans=0.2 2023-12-22 17:40:39,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=703746.6666666666, ans=0.1 2023-12-22 17:40:55,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=703813.3333333334, ans=0.2 2023-12-22 17:41:23,269 INFO [train.py:886] (0/4) Epoch 23, batch 750, loss[loss=0.01148, audio_tagging_loss=0.01148, over 25000.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4820258.53 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:41:45,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=704146.6666666666, ans=0.1 2023-12-22 17:41:45,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2023-12-22 17:41:49,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=704146.6666666666, ans=0.125 2023-12-22 17:42:10,097 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.669e+01 2.982e+01 3.107e+01 3.204e+01 3.694e+01, threshold=6.214e+01, percent-clipped=0.0 2023-12-22 17:42:13,948 INFO [train.py:886] (0/4) Epoch 23, batch 800, loss[loss=0.01232, audio_tagging_loss=0.01232, over 25000.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4850360.22 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:42:27,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2023-12-22 17:42:29,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=704413.3333333334, ans=0.1 2023-12-22 17:42:29,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=704413.3333333334, ans=0.1 2023-12-22 17:42:59,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2023-12-22 17:43:00,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.99 vs. limit=22.5 2023-12-22 17:43:05,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=704613.3333333334, ans=0.1 2023-12-22 17:43:06,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=704680.0, ans=0.125 2023-12-22 17:43:06,732 INFO [train.py:886] (0/4) Epoch 23, batch 850, loss[loss=0.01335, audio_tagging_loss=0.01335, over 25000.00 frames. ], tot_loss[loss=0.01337, audio_tagging_loss=0.01337, over 4873642.55 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:43:09,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=704680.0, ans=0.1 2023-12-22 17:43:12,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.07 vs. limit=6.0 2023-12-22 17:43:12,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.83 vs. limit=15.0 2023-12-22 17:43:15,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=704746.6666666666, ans=0.1 2023-12-22 17:43:18,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=704746.6666666666, ans=0.125 2023-12-22 17:43:19,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=704746.6666666666, ans=0.125 2023-12-22 17:43:23,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=704746.6666666666, ans=0.0 2023-12-22 17:43:47,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=704946.6666666666, ans=0.1 2023-12-22 17:43:52,802 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.669e+01 2.999e+01 3.165e+01 3.314e+01 4.054e+01, threshold=6.329e+01, percent-clipped=0.0 2023-12-22 17:43:58,119 INFO [train.py:886] (0/4) Epoch 23, batch 900, loss[loss=0.0139, audio_tagging_loss=0.0139, over 25000.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4892198.16 frames. ], batch size: 100, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:44:10,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=705080.0, ans=0.125 2023-12-22 17:44:12,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=705080.0, ans=0.125 2023-12-22 17:44:34,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2023-12-22 17:44:40,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=705280.0, ans=0.2 2023-12-22 17:44:44,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=705280.0, ans=0.0 2023-12-22 17:44:45,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2023-12-22 17:44:49,881 INFO [train.py:886] (0/4) Epoch 23, batch 950, loss[loss=0.01453, audio_tagging_loss=0.01453, over 24750.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4897840.86 frames. ], batch size: 99, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:45:04,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=705413.3333333334, ans=0.0 2023-12-22 17:45:38,084 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.739e+01 3.002e+01 3.153e+01 3.252e+01 3.769e+01, threshold=6.307e+01, percent-clipped=0.0 2023-12-22 17:45:40,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705613.3333333334, ans=0.1 2023-12-22 17:45:41,970 INFO [train.py:886] (0/4) Epoch 23, batch 1000, loss[loss=0.01339, audio_tagging_loss=0.01339, over 24750.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4908208.88 frames. ], batch size: 99, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:45:42,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=705680.0, ans=0.05 2023-12-22 17:45:49,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-12-22 17:46:22,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=705946.6666666666, ans=0.125 2023-12-22 17:46:24,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.45 vs. limit=22.5 2023-12-22 17:46:24,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=705946.6666666666, ans=0.2 2023-12-22 17:46:26,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=705946.6666666666, ans=0.1 2023-12-22 17:46:27,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=705946.6666666666, ans=0.125 2023-12-22 17:46:32,416 INFO [train.py:886] (0/4) Epoch 23, batch 1050, loss[loss=0.01386, audio_tagging_loss=0.01386, over 24750.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4913186.10 frames. ], batch size: 99, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:46:50,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=706080.0, ans=0.2 2023-12-22 17:47:05,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=706213.3333333334, ans=0.125 2023-12-22 17:47:19,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=706280.0, ans=0.05 2023-12-22 17:47:22,003 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.686e+01 2.936e+01 3.111e+01 3.242e+01 3.902e+01, threshold=6.222e+01, percent-clipped=0.0 2023-12-22 17:47:24,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2023-12-22 17:47:25,855 INFO [train.py:886] (0/4) Epoch 23, batch 1100, loss[loss=0.01221, audio_tagging_loss=0.01221, over 25000.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4918705.90 frames. ], batch size: 100, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:47:35,779 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2023-12-22 17:47:36,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=706413.3333333334, ans=0.2 2023-12-22 17:47:44,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=706413.3333333334, ans=0.0 2023-12-22 17:47:45,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=706413.3333333334, ans=22.5 2023-12-22 17:47:46,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=706480.0, ans=0.1 2023-12-22 17:47:53,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=706480.0, ans=0.0 2023-12-22 17:47:54,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=706480.0, ans=0.2 2023-12-22 17:47:59,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=706546.6666666666, ans=0.125 2023-12-22 17:48:08,018 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2023-12-22 17:48:14,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=706613.3333333334, ans=0.0 2023-12-22 17:48:15,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=706613.3333333334, ans=0.125 2023-12-22 17:48:18,185 INFO [train.py:886] (0/4) Epoch 23, batch 1150, loss[loss=0.01472, audio_tagging_loss=0.01472, over 25000.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4929957.55 frames. ], batch size: 100, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:48:27,608 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:48:33,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=706746.6666666666, ans=0.125 2023-12-22 17:48:49,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=706880.0, ans=0.125 2023-12-22 17:48:55,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=706880.0, ans=0.125 2023-12-22 17:49:05,123 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.619e+01 2.995e+01 3.117e+01 3.266e+01 4.017e+01, threshold=6.234e+01, percent-clipped=0.0 2023-12-22 17:49:08,949 INFO [train.py:886] (0/4) Epoch 23, batch 1200, loss[loss=0.01357, audio_tagging_loss=0.01357, over 24750.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4938675.23 frames. ], batch size: 99, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:49:13,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=707013.3333333334, ans=0.0 2023-12-22 17:49:27,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=707080.0, ans=0.0 2023-12-22 17:49:43,410 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:50:01,235 INFO [train.py:886] (0/4) Epoch 23, batch 1250, loss[loss=0.01305, audio_tagging_loss=0.01305, over 24750.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4942137.19 frames. ], batch size: 99, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:50:07,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=707346.6666666666, ans=0.125 2023-12-22 17:50:33,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=707546.6666666666, ans=0.125 2023-12-22 17:50:39,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=707546.6666666666, ans=0.0 2023-12-22 17:50:45,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.30 vs. limit=22.5 2023-12-22 17:50:46,990 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.599e+01 3.099e+01 3.181e+01 3.380e+01 4.641e+01, threshold=6.362e+01, percent-clipped=0.0 2023-12-22 17:50:47,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=707613.3333333334, ans=0.125 2023-12-22 17:50:50,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=707680.0, ans=0.0 2023-12-22 17:50:51,541 INFO [train.py:886] (0/4) Epoch 23, batch 1300, loss[loss=0.01106, audio_tagging_loss=0.01106, over 24750.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4938429.72 frames. ], batch size: 99, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:50:53,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=707680.0, ans=0.0 2023-12-22 17:50:55,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=707680.0, ans=0.125 2023-12-22 17:51:06,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=12.0 2023-12-22 17:51:16,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=707813.3333333334, ans=0.2 2023-12-22 17:51:20,153 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2023-12-22 17:51:39,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.07 vs. limit=6.0 2023-12-22 17:51:43,548 INFO [train.py:886] (0/4) Epoch 23, batch 1350, loss[loss=0.0136, audio_tagging_loss=0.0136, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4940153.30 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:51:52,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=708080.0, ans=0.125 2023-12-22 17:52:14,437 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=22.5 2023-12-22 17:52:16,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=708213.3333333334, ans=0.2 2023-12-22 17:52:17,608 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.22 vs. limit=22.5 2023-12-22 17:52:23,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=708280.0, ans=0.05 2023-12-22 17:52:31,179 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+01 2.956e+01 3.060e+01 3.186e+01 3.861e+01, threshold=6.119e+01, percent-clipped=0.0 2023-12-22 17:52:34,910 INFO [train.py:886] (0/4) Epoch 23, batch 1400, loss[loss=0.01416, audio_tagging_loss=0.01416, over 25000.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4947229.81 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:52:38,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=708346.6666666666, ans=0.2 2023-12-22 17:52:47,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=708413.3333333334, ans=0.2 2023-12-22 17:52:50,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=708413.3333333334, ans=0.0 2023-12-22 17:53:09,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=708546.6666666666, ans=0.125 2023-12-22 17:53:25,965 INFO [train.py:886] (0/4) Epoch 23, batch 1450, loss[loss=0.01598, audio_tagging_loss=0.01598, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4941553.84 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:53:28,051 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:53:29,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=708680.0, ans=0.2 2023-12-22 17:53:43,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=708746.6666666666, ans=0.0 2023-12-22 17:54:00,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=708880.0, ans=0.2 2023-12-22 17:54:02,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708880.0, ans=0.1 2023-12-22 17:54:04,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=708880.0, ans=0.0 2023-12-22 17:54:09,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=708946.6666666666, ans=0.05 2023-12-22 17:54:12,998 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.727e+01 3.011e+01 3.145e+01 3.302e+01 3.909e+01, threshold=6.290e+01, percent-clipped=0.0 2023-12-22 17:54:16,875 INFO [train.py:886] (0/4) Epoch 23, batch 1500, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4945576.21 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:54:20,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=709013.3333333334, ans=0.125 2023-12-22 17:54:20,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=709013.3333333334, ans=0.125 2023-12-22 17:54:29,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709080.0, ans=0.1 2023-12-22 17:54:34,032 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.65 vs. limit=10.0 2023-12-22 17:54:40,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=709146.6666666666, ans=0.0 2023-12-22 17:54:44,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=709146.6666666666, ans=0.125 2023-12-22 17:54:54,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=709213.3333333334, ans=0.2 2023-12-22 17:54:57,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=709280.0, ans=0.2 2023-12-22 17:55:04,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=709280.0, ans=0.0 2023-12-22 17:55:07,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=709346.6666666666, ans=0.09899494936611666 2023-12-22 17:55:08,680 INFO [train.py:886] (0/4) Epoch 23, batch 1550, loss[loss=0.01343, audio_tagging_loss=0.01343, over 25000.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4946393.44 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:55:29,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=709480.0, ans=0.125 2023-12-22 17:55:29,712 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.48 vs. limit=22.5 2023-12-22 17:55:39,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=709546.6666666666, ans=0.0 2023-12-22 17:55:55,131 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.654e+01 3.064e+01 3.162e+01 3.305e+01 3.702e+01, threshold=6.324e+01, percent-clipped=0.0 2023-12-22 17:55:59,692 INFO [train.py:886] (0/4) Epoch 23, batch 1600, loss[loss=0.01395, audio_tagging_loss=0.01395, over 22302.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4934279.12 frames. ], batch size: 107, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:56:21,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=709813.3333333334, ans=0.0 2023-12-22 17:56:29,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=709880.0, ans=0.125 2023-12-22 17:56:51,513 INFO [train.py:886] (0/4) Epoch 23, batch 1650, loss[loss=0.01374, audio_tagging_loss=0.01374, over 24750.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4935108.06 frames. ], batch size: 99, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:56:55,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=710013.3333333334, ans=0.1 2023-12-22 17:56:58,712 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2023-12-22 17:57:05,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.83 vs. limit=15.0 2023-12-22 17:57:15,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=710146.6666666666, ans=0.125 2023-12-22 17:57:16,527 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.92 vs. limit=22.5 2023-12-22 17:57:32,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=710280.0, ans=0.1 2023-12-22 17:57:38,707 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.643e+01 2.998e+01 3.094e+01 3.262e+01 4.064e+01, threshold=6.189e+01, percent-clipped=0.0 2023-12-22 17:57:43,148 INFO [train.py:886] (0/4) Epoch 23, batch 1700, loss[loss=0.01303, audio_tagging_loss=0.01303, over 24750.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4941735.00 frames. ], batch size: 99, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:57:47,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-12-22 17:58:02,416 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.67 vs. limit=5.0 2023-12-22 17:58:05,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.99 vs. limit=15.0 2023-12-22 17:58:17,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.14 vs. limit=15.0 2023-12-22 17:58:20,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=710546.6666666666, ans=0.05 2023-12-22 17:58:28,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=710613.3333333334, ans=0.125 2023-12-22 17:58:28,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=710613.3333333334, ans=0.2 2023-12-22 17:58:33,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710613.3333333334, ans=0.1 2023-12-22 17:58:34,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=710680.0, ans=0.0 2023-12-22 17:58:35,034 INFO [train.py:886] (0/4) Epoch 23, batch 1750, loss[loss=0.01287, audio_tagging_loss=0.01287, over 25000.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4945846.69 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:58:42,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.71 vs. limit=10.0 2023-12-22 17:58:47,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=710746.6666666666, ans=0.125 2023-12-22 17:58:54,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=710746.6666666666, ans=0.125 2023-12-22 17:59:01,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2023-12-22 17:59:03,183 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.66 vs. limit=6.0 2023-12-22 17:59:04,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710813.3333333334, ans=0.1 2023-12-22 17:59:22,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=22.5 2023-12-22 17:59:23,061 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.669e+01 3.006e+01 3.113e+01 3.269e+01 3.526e+01, threshold=6.226e+01, percent-clipped=0.0 2023-12-22 17:59:28,206 INFO [train.py:886] (0/4) Epoch 23, batch 1800, loss[loss=0.01265, audio_tagging_loss=0.01265, over 24750.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4949354.07 frames. ], batch size: 99, lr: 4.72e-03, grad_scale: 32.0 2023-12-22 17:59:29,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=711013.3333333334, ans=0.025 2023-12-22 17:59:54,441 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-12-22 18:00:04,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=711213.3333333334, ans=0.125 2023-12-22 18:00:06,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=711213.3333333334, ans=0.1 2023-12-22 18:00:18,267 INFO [train.py:886] (0/4) Epoch 23, batch 1850, loss[loss=0.01148, audio_tagging_loss=0.01148, over 24750.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4946376.35 frames. ], batch size: 99, lr: 4.72e-03, grad_scale: 32.0 2023-12-22 18:00:38,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=711480.0, ans=0.0 2023-12-22 18:00:45,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=711480.0, ans=0.1 2023-12-22 18:00:57,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=711546.6666666666, ans=0.1 2023-12-22 18:01:00,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=711613.3333333334, ans=0.125 2023-12-22 18:01:04,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=711613.3333333334, ans=0.2 2023-12-22 18:01:05,839 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.716e+01 3.028e+01 3.200e+01 3.336e+01 4.130e+01, threshold=6.400e+01, percent-clipped=0.0 2023-12-22 18:01:09,750 INFO [train.py:886] (0/4) Epoch 23, batch 1900, loss[loss=0.01324, audio_tagging_loss=0.01324, over 24750.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4943851.23 frames. ], batch size: 99, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:01:12,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=711680.0, ans=0.0 2023-12-22 18:01:20,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=711746.6666666666, ans=0.1 2023-12-22 18:01:40,748 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2023-12-22 18:02:00,778 INFO [train.py:886] (0/4) Epoch 23, batch 1950, loss[loss=0.01445, audio_tagging_loss=0.01445, over 24750.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4940840.56 frames. ], batch size: 99, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:02:03,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=712013.3333333334, ans=0.0 2023-12-22 18:02:20,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=712146.6666666666, ans=0.125 2023-12-22 18:02:27,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=712146.6666666666, ans=0.1 2023-12-22 18:02:33,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=12.0 2023-12-22 18:02:39,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=712280.0, ans=0.125 2023-12-22 18:02:40,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712280.0, ans=0.1 2023-12-22 18:02:45,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=712280.0, ans=0.125 2023-12-22 18:02:45,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=712280.0, ans=0.125 2023-12-22 18:02:45,745 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.588e+01 2.985e+01 3.120e+01 3.302e+01 3.747e+01, threshold=6.240e+01, percent-clipped=0.0 2023-12-22 18:02:47,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=712280.0, ans=0.2 2023-12-22 18:02:49,592 INFO [train.py:886] (0/4) Epoch 23, batch 2000, loss[loss=0.01254, audio_tagging_loss=0.01254, over 25000.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4941751.91 frames. ], batch size: 100, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:02:56,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=712346.6666666666, ans=0.0 2023-12-22 18:02:59,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=712413.3333333334, ans=0.04949747468305833 2023-12-22 18:03:08,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=712413.3333333334, ans=0.1 2023-12-22 18:03:15,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=712480.0, ans=0.025 2023-12-22 18:03:41,262 INFO [train.py:886] (0/4) Epoch 23, batch 2050, loss[loss=0.0131, audio_tagging_loss=0.0131, over 21188.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4943654.52 frames. ], batch size: 107, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:04:09,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=712813.3333333334, ans=0.125 2023-12-22 18:04:17,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=712880.0, ans=0.035 2023-12-22 18:04:20,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=712880.0, ans=0.125 2023-12-22 18:04:27,756 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.661e+01 2.995e+01 3.150e+01 3.287e+01 3.794e+01, threshold=6.300e+01, percent-clipped=0.0 2023-12-22 18:04:28,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=712946.6666666666, ans=0.025 2023-12-22 18:04:31,583 INFO [train.py:886] (0/4) Epoch 23, batch 2100, loss[loss=0.01118, audio_tagging_loss=0.01118, over 23978.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4950990.03 frames. ], batch size: 100, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:04:40,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=713013.3333333334, ans=0.2 2023-12-22 18:04:50,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=713080.0, ans=0.015 2023-12-22 18:05:13,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=713280.0, ans=0.125 2023-12-22 18:05:13,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=713280.0, ans=0.07 2023-12-22 18:05:19,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=713280.0, ans=0.0 2023-12-22 18:05:22,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=713280.0, ans=0.0 2023-12-22 18:05:24,613 INFO [train.py:886] (0/4) Epoch 23, batch 2150, loss[loss=0.0127, audio_tagging_loss=0.0127, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4954985.34 frames. ], batch size: 100, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:05:29,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=713346.6666666666, ans=0.0 2023-12-22 18:05:52,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=713480.0, ans=0.0 2023-12-22 18:05:59,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=713546.6666666666, ans=0.2 2023-12-22 18:06:11,598 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.649e+01 3.004e+01 3.147e+01 3.263e+01 3.799e+01, threshold=6.294e+01, percent-clipped=0.0 2023-12-22 18:06:11,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=713613.3333333334, ans=0.125 2023-12-22 18:06:16,135 INFO [train.py:886] (0/4) Epoch 23, batch 2200, loss[loss=0.01264, audio_tagging_loss=0.01264, over 24750.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4946463.57 frames. ], batch size: 99, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:06:51,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=713880.0, ans=0.1 2023-12-22 18:07:06,813 INFO [train.py:886] (0/4) Epoch 23, batch 2250, loss[loss=0.01012, audio_tagging_loss=0.01012, over 25000.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4944443.19 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:07:29,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=714146.6666666666, ans=0.125 2023-12-22 18:07:35,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=714146.6666666666, ans=0.035 2023-12-22 18:07:43,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=714213.3333333334, ans=0.0 2023-12-22 18:07:55,153 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.733e+01 2.950e+01 3.106e+01 3.281e+01 3.764e+01, threshold=6.212e+01, percent-clipped=0.0 2023-12-22 18:07:55,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=714280.0, ans=0.0 2023-12-22 18:07:57,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=714280.0, ans=0.2 2023-12-22 18:07:58,975 INFO [train.py:886] (0/4) Epoch 23, batch 2300, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4947328.50 frames. ], batch size: 99, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:08:13,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=714413.3333333334, ans=0.2 2023-12-22 18:08:51,195 INFO [train.py:886] (0/4) Epoch 23, batch 2350, loss[loss=0.01262, audio_tagging_loss=0.01262, over 25000.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4944952.18 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:08:59,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=714680.0, ans=0.125 2023-12-22 18:09:03,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=714746.6666666666, ans=15.0 2023-12-22 18:09:08,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-12-22 18:09:16,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=714813.3333333334, ans=0.125 2023-12-22 18:09:37,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714946.6666666666, ans=0.1 2023-12-22 18:09:38,597 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+01 2.969e+01 3.079e+01 3.242e+01 3.705e+01, threshold=6.159e+01, percent-clipped=0.0 2023-12-22 18:09:42,413 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:09:43,076 INFO [train.py:886] (0/4) Epoch 23, batch 2400, loss[loss=0.01143, audio_tagging_loss=0.01143, over 25000.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4945945.60 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:10:03,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2023-12-22 18:10:35,284 INFO [train.py:886] (0/4) Epoch 23, batch 2450, loss[loss=0.0117, audio_tagging_loss=0.0117, over 25000.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4953614.01 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:10:42,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=715346.6666666666, ans=0.035 2023-12-22 18:10:43,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.48 vs. limit=15.0 2023-12-22 18:10:43,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=715413.3333333334, ans=0.0 2023-12-22 18:11:01,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=715480.0, ans=0.125 2023-12-22 18:11:01,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2023-12-22 18:11:08,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=715546.6666666666, ans=0.125 2023-12-22 18:11:22,430 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.810e+01 2.984e+01 3.127e+01 3.304e+01 3.935e+01, threshold=6.253e+01, percent-clipped=0.0 2023-12-22 18:11:26,261 INFO [train.py:886] (0/4) Epoch 23, batch 2500, loss[loss=0.01368, audio_tagging_loss=0.01368, over 25000.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4946902.29 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:11:31,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=715680.0, ans=10.0 2023-12-22 18:11:37,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=715746.6666666666, ans=0.0 2023-12-22 18:11:53,658 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.40 vs. limit=22.5 2023-12-22 18:12:06,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=715880.0, ans=0.0 2023-12-22 18:12:07,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=715880.0, ans=0.1 2023-12-22 18:12:09,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=715946.6666666666, ans=0.0 2023-12-22 18:12:10,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.97 vs. limit=22.5 2023-12-22 18:12:13,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=715946.6666666666, ans=0.1 2023-12-22 18:12:18,520 INFO [train.py:886] (0/4) Epoch 23, batch 2550, loss[loss=0.01475, audio_tagging_loss=0.01475, over 22738.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4935134.74 frames. ], batch size: 107, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:12:43,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=716146.6666666666, ans=0.0 2023-12-22 18:13:05,418 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+01 2.995e+01 3.127e+01 3.270e+01 4.281e+01, threshold=6.254e+01, percent-clipped=0.0 2023-12-22 18:13:10,604 INFO [train.py:886] (0/4) Epoch 23, batch 2600, loss[loss=0.01151, audio_tagging_loss=0.01151, over 24750.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4940243.61 frames. ], batch size: 99, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:13:14,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=716346.6666666666, ans=0.125 2023-12-22 18:13:17,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=716346.6666666666, ans=0.125 2023-12-22 18:13:17,548 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.625e-03 2023-12-22 18:13:28,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=716413.3333333334, ans=0.2 2023-12-22 18:13:29,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=716480.0, ans=0.125 2023-12-22 18:13:32,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=716480.0, ans=0.125 2023-12-22 18:13:38,069 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:13:42,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=716546.6666666666, ans=0.125 2023-12-22 18:13:46,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=716546.6666666666, ans=0.125 2023-12-22 18:13:46,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.02 vs. limit=6.0 2023-12-22 18:14:00,211 INFO [train.py:886] (0/4) Epoch 23, batch 2650, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4935471.25 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:14:03,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=716680.0, ans=0.2 2023-12-22 18:14:05,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.38 vs. limit=22.5 2023-12-22 18:14:32,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=716880.0, ans=0.125 2023-12-22 18:14:44,995 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.63 vs. limit=15.0 2023-12-22 18:14:48,295 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.721e+01 3.007e+01 3.105e+01 3.259e+01 4.059e+01, threshold=6.210e+01, percent-clipped=0.0 2023-12-22 18:14:52,118 INFO [train.py:886] (0/4) Epoch 23, batch 2700, loss[loss=0.01421, audio_tagging_loss=0.01421, over 25000.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4941182.78 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:15:03,637 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:15:07,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=717080.0, ans=0.1 2023-12-22 18:15:29,057 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.91 vs. limit=15.0 2023-12-22 18:15:42,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=717346.6666666666, ans=0.0 2023-12-22 18:15:42,643 INFO [train.py:886] (0/4) Epoch 23, batch 2750, loss[loss=0.01495, audio_tagging_loss=0.01495, over 25000.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4948451.53 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:15:50,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=717346.6666666666, ans=0.0 2023-12-22 18:16:11,102 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=15.0 2023-12-22 18:16:23,628 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.580e-02 2023-12-22 18:16:30,041 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.651e+01 3.005e+01 3.171e+01 3.321e+01 3.773e+01, threshold=6.342e+01, percent-clipped=0.0 2023-12-22 18:16:31,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=717613.3333333334, ans=0.2 2023-12-22 18:16:33,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=717680.0, ans=0.125 2023-12-22 18:16:33,801 INFO [train.py:886] (0/4) Epoch 23, batch 2800, loss[loss=0.009568, audio_tagging_loss=0.009568, over 24063.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4951602.28 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:16:44,291 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:17:05,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=717880.0, ans=0.125 2023-12-22 18:17:12,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=717880.0, ans=0.1 2023-12-22 18:17:25,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.76 vs. limit=22.5 2023-12-22 18:17:25,536 INFO [train.py:886] (0/4) Epoch 23, batch 2850, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01337, audio_tagging_loss=0.01337, over 4950267.10 frames. ], batch size: 99, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:17:40,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.69 vs. limit=22.5 2023-12-22 18:17:41,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=718080.0, ans=0.125 2023-12-22 18:17:44,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=718146.6666666666, ans=0.2 2023-12-22 18:17:55,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=718213.3333333334, ans=0.125 2023-12-22 18:18:03,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2023-12-22 18:18:10,878 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.718e+01 3.010e+01 3.133e+01 3.291e+01 3.864e+01, threshold=6.266e+01, percent-clipped=0.0 2023-12-22 18:18:14,649 INFO [train.py:886] (0/4) Epoch 23, batch 2900, loss[loss=0.01322, audio_tagging_loss=0.01322, over 24750.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4942837.38 frames. ], batch size: 99, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:18:19,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=718346.6666666666, ans=0.125 2023-12-22 18:18:29,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=15.0 2023-12-22 18:18:31,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=718413.3333333334, ans=0.0 2023-12-22 18:18:32,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.34 vs. limit=15.0 2023-12-22 18:18:42,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=718480.0, ans=0.0 2023-12-22 18:18:48,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=718546.6666666666, ans=0.125 2023-12-22 18:18:48,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=718546.6666666666, ans=0.025 2023-12-22 18:18:57,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=718613.3333333334, ans=0.0 2023-12-22 18:19:00,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=718613.3333333334, ans=0.0 2023-12-22 18:19:05,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=718680.0, ans=0.125 2023-12-22 18:19:06,771 INFO [train.py:886] (0/4) Epoch 23, batch 2950, loss[loss=0.01326, audio_tagging_loss=0.01326, over 25000.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4944669.58 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:19:06,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=718680.0, ans=0.025 2023-12-22 18:19:15,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2023-12-22 18:19:18,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.43 vs. limit=15.0 2023-12-22 18:19:30,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2023-12-22 18:19:35,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=718813.3333333334, ans=0.125 2023-12-22 18:19:40,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=718880.0, ans=0.125 2023-12-22 18:19:40,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.83 vs. limit=22.5 2023-12-22 18:19:43,153 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2023-12-22 18:19:44,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2023-12-22 18:19:47,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=718946.6666666666, ans=0.2 2023-12-22 18:19:50,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=718946.6666666666, ans=0.1 2023-12-22 18:19:52,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=718946.6666666666, ans=0.2 2023-12-22 18:19:52,810 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.684e+01 2.910e+01 3.043e+01 3.179e+01 4.091e+01, threshold=6.086e+01, percent-clipped=0.0 2023-12-22 18:19:58,626 INFO [train.py:886] (0/4) Epoch 23, batch 3000, loss[loss=0.01367, audio_tagging_loss=0.01367, over 25000.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4951482.74 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:19:58,628 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 18:20:10,293 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3363, 3.4674, 4.2231, 3.8090], device='cuda:0') 2023-12-22 18:20:19,164 INFO [train.py:917] (0/4) Epoch 23, validation: loss=0.03349, audio_tagging_loss=0.03349, over 3737520.00 frames. 2023-12-22 18:20:19,165 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 18:20:27,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=719013.3333333334, ans=0.0 2023-12-22 18:20:30,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=719080.0, ans=0.1 2023-12-22 18:20:32,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=719080.0, ans=0.0 2023-12-22 18:20:36,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=719080.0, ans=0.125 2023-12-22 18:20:41,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=719146.6666666666, ans=0.125 2023-12-22 18:20:46,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=719146.6666666666, ans=0.0 2023-12-22 18:20:48,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.15 vs. limit=10.0 2023-12-22 18:20:51,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=719213.3333333334, ans=0.1 2023-12-22 18:20:54,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.95 vs. limit=10.0 2023-12-22 18:21:05,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=719280.0, ans=0.1 2023-12-22 18:21:11,888 INFO [train.py:886] (0/4) Epoch 23, batch 3050, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4960432.91 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:21:58,933 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.742e+01 3.000e+01 3.103e+01 3.222e+01 3.770e+01, threshold=6.206e+01, percent-clipped=0.0 2023-12-22 18:22:04,179 INFO [train.py:886] (0/4) Epoch 23, batch 3100, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4963113.52 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:22:06,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=719680.0, ans=0.125 2023-12-22 18:22:17,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=719746.6666666666, ans=0.2 2023-12-22 18:22:42,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=719880.0, ans=0.05 2023-12-22 18:22:42,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=719880.0, ans=0.0 2023-12-22 18:22:51,962 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-108000.pt 2023-12-22 18:22:56,621 INFO [train.py:886] (0/4) Epoch 23, batch 3150, loss[loss=0.01463, audio_tagging_loss=0.01463, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4954327.60 frames. ], batch size: 99, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:22:58,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=720013.3333333334, ans=0.125 2023-12-22 18:23:02,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=720013.3333333334, ans=0.125 2023-12-22 18:23:07,657 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:23:09,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=720080.0, ans=0.125 2023-12-22 18:23:22,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=720146.6666666666, ans=0.2 2023-12-22 18:23:44,193 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.698e+01 3.010e+01 3.145e+01 3.312e+01 3.855e+01, threshold=6.290e+01, percent-clipped=0.0 2023-12-22 18:23:48,788 INFO [train.py:886] (0/4) Epoch 23, batch 3200, loss[loss=0.01279, audio_tagging_loss=0.01279, over 24750.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4949983.41 frames. ], batch size: 99, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:24:01,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=720413.3333333334, ans=0.0 2023-12-22 18:24:01,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-12-22 18:24:12,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2023-12-22 18:24:36,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=720613.3333333334, ans=0.125 2023-12-22 18:24:40,416 INFO [train.py:886] (0/4) Epoch 23, batch 3250, loss[loss=0.01244, audio_tagging_loss=0.01244, over 25000.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4954700.85 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:24:45,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=720680.0, ans=0.125 2023-12-22 18:24:56,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=720746.6666666666, ans=0.0 2023-12-22 18:25:02,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.34 vs. limit=22.5 2023-12-22 18:25:02,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=720813.3333333334, ans=0.125 2023-12-22 18:25:28,112 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.689e+01 2.895e+01 3.079e+01 3.188e+01 4.978e+01, threshold=6.159e+01, percent-clipped=0.0 2023-12-22 18:25:30,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=720946.6666666666, ans=0.1 2023-12-22 18:25:32,122 INFO [train.py:886] (0/4) Epoch 23, batch 3300, loss[loss=0.01913, audio_tagging_loss=0.01913, over 24750.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4960979.22 frames. ], batch size: 99, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:25:41,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.70 vs. limit=15.0 2023-12-22 18:25:46,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=721080.0, ans=0.035 2023-12-22 18:25:55,482 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2023-12-22 18:26:17,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=721280.0, ans=0.2 2023-12-22 18:26:25,013 INFO [train.py:886] (0/4) Epoch 23, batch 3350, loss[loss=0.01334, audio_tagging_loss=0.01334, over 25000.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4964270.51 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:27:07,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=721613.3333333334, ans=0.125 2023-12-22 18:27:07,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2023-12-22 18:27:11,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=721613.3333333334, ans=0.2 2023-12-22 18:27:12,422 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.691e+01 3.001e+01 3.141e+01 3.291e+01 3.725e+01, threshold=6.283e+01, percent-clipped=0.0 2023-12-22 18:27:13,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=721613.3333333334, ans=0.125 2023-12-22 18:27:16,199 INFO [train.py:886] (0/4) Epoch 23, batch 3400, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4967068.21 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:27:28,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=721746.6666666666, ans=0.125 2023-12-22 18:27:28,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=721746.6666666666, ans=0.0 2023-12-22 18:27:30,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=721746.6666666666, ans=0.125 2023-12-22 18:27:31,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=721746.6666666666, ans=0.0 2023-12-22 18:27:42,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=721813.3333333334, ans=0.125 2023-12-22 18:27:43,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=721813.3333333334, ans=0.1 2023-12-22 18:28:06,897 INFO [train.py:886] (0/4) Epoch 23, batch 3450, loss[loss=0.01489, audio_tagging_loss=0.01489, over 24750.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4963222.74 frames. ], batch size: 99, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:28:19,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.47 vs. limit=22.5 2023-12-22 18:28:29,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=722146.6666666666, ans=0.125 2023-12-22 18:28:31,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=722146.6666666666, ans=0.125 2023-12-22 18:28:44,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=722213.3333333334, ans=0.125 2023-12-22 18:28:53,917 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+01 3.061e+01 3.208e+01 3.330e+01 3.695e+01, threshold=6.416e+01, percent-clipped=0.0 2023-12-22 18:28:58,436 INFO [train.py:886] (0/4) Epoch 23, batch 3500, loss[loss=0.01452, audio_tagging_loss=0.01452, over 24750.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4957560.80 frames. ], batch size: 99, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:29:03,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=722346.6666666666, ans=0.125 2023-12-22 18:29:05,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=722346.6666666666, ans=0.125 2023-12-22 18:29:09,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=722413.3333333334, ans=0.0 2023-12-22 18:29:12,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=722413.3333333334, ans=0.125 2023-12-22 18:29:23,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=722480.0, ans=0.125 2023-12-22 18:29:25,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=722480.0, ans=0.125 2023-12-22 18:29:42,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=722613.3333333334, ans=0.125 2023-12-22 18:29:48,879 INFO [train.py:886] (0/4) Epoch 23, batch 3550, loss[loss=0.01316, audio_tagging_loss=0.01316, over 25000.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4956009.68 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:29:49,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=722680.0, ans=0.125 2023-12-22 18:29:54,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=722680.0, ans=0.125 2023-12-22 18:30:02,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.62 vs. limit=8.0 2023-12-22 18:30:06,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722746.6666666666, ans=0.1 2023-12-22 18:30:08,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=722813.3333333334, ans=0.0 2023-12-22 18:30:16,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.42 vs. limit=22.5 2023-12-22 18:30:36,388 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.627e+01 2.997e+01 3.123e+01 3.295e+01 3.646e+01, threshold=6.246e+01, percent-clipped=0.0 2023-12-22 18:30:40,169 INFO [train.py:886] (0/4) Epoch 23, batch 3600, loss[loss=0.01163, audio_tagging_loss=0.01163, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4951581.44 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:30:54,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=723080.0, ans=0.125 2023-12-22 18:31:09,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=723146.6666666666, ans=0.125 2023-12-22 18:31:29,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=723280.0, ans=0.0 2023-12-22 18:31:31,388 INFO [train.py:886] (0/4) Epoch 23, batch 3650, loss[loss=0.009521, audio_tagging_loss=0.009521, over 25000.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4958372.65 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:31:35,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=723346.6666666666, ans=0.0 2023-12-22 18:31:44,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=723413.3333333334, ans=0.0 2023-12-22 18:31:49,439 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2023-12-22 18:32:08,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.16 vs. limit=6.0 2023-12-22 18:32:12,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=723546.6666666666, ans=0.2 2023-12-22 18:32:15,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=723613.3333333334, ans=0.125 2023-12-22 18:32:20,213 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.539e+01 2.967e+01 3.120e+01 3.208e+01 3.587e+01, threshold=6.240e+01, percent-clipped=0.0 2023-12-22 18:32:20,456 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:32:24,012 INFO [train.py:886] (0/4) Epoch 23, batch 3700, loss[loss=0.0125, audio_tagging_loss=0.0125, over 25000.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4962976.09 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:32:28,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=723680.0, ans=0.2 2023-12-22 18:32:34,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=723746.6666666666, ans=0.125 2023-12-22 18:32:49,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=723813.3333333334, ans=0.125 2023-12-22 18:33:03,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=723946.6666666666, ans=0.125 2023-12-22 18:33:12,983 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=12.0 2023-12-22 18:33:15,979 INFO [train.py:886] (0/4) Epoch 23, batch 3750, loss[loss=0.01288, audio_tagging_loss=0.01288, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4959507.67 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:33:23,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=724013.3333333334, ans=0.2 2023-12-22 18:33:29,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=724080.0, ans=0.125 2023-12-22 18:33:31,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2023-12-22 18:33:33,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=724080.0, ans=0.1 2023-12-22 18:33:59,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=724280.0, ans=0.0 2023-12-22 18:34:01,962 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.690e+01 3.057e+01 3.159e+01 3.344e+01 3.975e+01, threshold=6.319e+01, percent-clipped=0.0 2023-12-22 18:34:05,795 INFO [train.py:886] (0/4) Epoch 23, batch 3800, loss[loss=0.01085, audio_tagging_loss=0.01085, over 24042.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4957175.62 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:34:06,951 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:34:32,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=724480.0, ans=0.125 2023-12-22 18:34:36,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=724546.6666666666, ans=0.125 2023-12-22 18:34:43,911 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.16 vs. limit=22.5 2023-12-22 18:34:46,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.93 vs. limit=15.0 2023-12-22 18:34:48,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=724613.3333333334, ans=0.125 2023-12-22 18:34:50,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=724613.3333333334, ans=0.0 2023-12-22 18:34:57,456 INFO [train.py:886] (0/4) Epoch 23, batch 3850, loss[loss=0.01428, audio_tagging_loss=0.01428, over 24750.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4955557.36 frames. ], batch size: 99, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:35:07,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=724746.6666666666, ans=0.1 2023-12-22 18:35:32,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=724880.0, ans=0.125 2023-12-22 18:35:43,188 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.690e+01 3.009e+01 3.160e+01 3.363e+01 3.949e+01, threshold=6.320e+01, percent-clipped=0.0 2023-12-22 18:35:43,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.16 vs. limit=10.0 2023-12-22 18:35:45,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=724946.6666666666, ans=0.1 2023-12-22 18:35:49,170 INFO [train.py:886] (0/4) Epoch 23, batch 3900, loss[loss=0.01316, audio_tagging_loss=0.01316, over 25000.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4954742.53 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 128.0 2023-12-22 18:36:00,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=725080.0, ans=0.1 2023-12-22 18:36:05,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=725080.0, ans=0.125 2023-12-22 18:36:15,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=725146.6666666666, ans=0.125 2023-12-22 18:36:28,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2023-12-22 18:36:29,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=725280.0, ans=0.2 2023-12-22 18:36:30,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=725280.0, ans=0.125 2023-12-22 18:36:31,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=725280.0, ans=0.1 2023-12-22 18:36:40,540 INFO [train.py:886] (0/4) Epoch 23, batch 3950, loss[loss=0.0124, audio_tagging_loss=0.0124, over 25000.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4959596.64 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 128.0 2023-12-22 18:36:49,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=725413.3333333334, ans=0.2 2023-12-22 18:36:52,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=725413.3333333334, ans=0.1 2023-12-22 18:36:59,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=725413.3333333334, ans=0.125 2023-12-22 18:37:01,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.26 vs. limit=22.5 2023-12-22 18:37:03,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=725480.0, ans=0.125 2023-12-22 18:37:03,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=725480.0, ans=0.125 2023-12-22 18:37:30,101 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.629e+01 2.967e+01 3.138e+01 3.271e+01 3.711e+01, threshold=6.275e+01, percent-clipped=0.0 2023-12-22 18:37:30,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=725613.3333333334, ans=0.0 2023-12-22 18:37:33,021 INFO [train.py:886] (0/4) Epoch 23, batch 4000, loss[loss=0.01433, audio_tagging_loss=0.01433, over 25000.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4958952.66 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:37:45,713 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=12.0 2023-12-22 18:37:51,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=725813.3333333334, ans=0.125 2023-12-22 18:37:57,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=725813.3333333334, ans=0.0 2023-12-22 18:37:58,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.30 vs. limit=10.0 2023-12-22 18:38:19,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2023-12-22 18:38:23,321 INFO [train.py:886] (0/4) Epoch 23, batch 4050, loss[loss=0.0135, audio_tagging_loss=0.0135, over 24750.00 frames. ], tot_loss[loss=0.01337, audio_tagging_loss=0.01337, over 4962148.11 frames. ], batch size: 99, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:38:50,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=726146.6666666666, ans=0.0 2023-12-22 18:39:07,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=726280.0, ans=0.125 2023-12-22 18:39:09,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=726280.0, ans=0.1 2023-12-22 18:39:13,485 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.753e+01 3.032e+01 3.162e+01 3.330e+01 3.866e+01, threshold=6.323e+01, percent-clipped=0.0 2023-12-22 18:39:15,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=726346.6666666666, ans=0.125 2023-12-22 18:39:15,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=726346.6666666666, ans=0.1 2023-12-22 18:39:16,368 INFO [train.py:886] (0/4) Epoch 23, batch 4100, loss[loss=0.01621, audio_tagging_loss=0.01621, over 24750.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4952473.21 frames. ], batch size: 99, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:39:23,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=726346.6666666666, ans=0.0 2023-12-22 18:39:26,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=726413.3333333334, ans=0.1 2023-12-22 18:39:34,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=726413.3333333334, ans=0.125 2023-12-22 18:39:47,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=726546.6666666666, ans=0.0 2023-12-22 18:39:57,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=726613.3333333334, ans=0.0 2023-12-22 18:40:08,622 INFO [train.py:886] (0/4) Epoch 23, batch 4150, loss[loss=0.0124, audio_tagging_loss=0.0124, over 24750.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4948747.88 frames. ], batch size: 99, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:40:34,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=726813.3333333334, ans=0.125 2023-12-22 18:40:39,867 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:40:50,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=726946.6666666666, ans=0.2 2023-12-22 18:40:56,280 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.726e+01 3.004e+01 3.122e+01 3.283e+01 3.797e+01, threshold=6.243e+01, percent-clipped=0.0 2023-12-22 18:40:59,165 INFO [train.py:886] (0/4) Epoch 23, batch 4200, loss[loss=0.01145, audio_tagging_loss=0.01145, over 25000.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4945645.19 frames. ], batch size: 100, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:41:12,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=727080.0, ans=0.0 2023-12-22 18:41:47,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=727280.0, ans=0.125 2023-12-22 18:41:52,124 INFO [train.py:886] (0/4) Epoch 23, batch 4250, loss[loss=0.01149, audio_tagging_loss=0.01149, over 25000.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4951733.43 frames. ], batch size: 100, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:41:57,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=727346.6666666666, ans=15.0 2023-12-22 18:41:59,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=727346.6666666666, ans=0.125 2023-12-22 18:42:02,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2023-12-22 18:42:13,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=727480.0, ans=0.0 2023-12-22 18:42:13,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.50 vs. limit=15.0 2023-12-22 18:42:14,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2023-12-22 18:42:27,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=727546.6666666666, ans=0.0 2023-12-22 18:42:35,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=727613.3333333334, ans=0.125 2023-12-22 18:42:37,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=727613.3333333334, ans=0.2 2023-12-22 18:42:37,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=727613.3333333334, ans=0.125 2023-12-22 18:42:39,986 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.744e+01 3.028e+01 3.146e+01 3.307e+01 3.959e+01, threshold=6.292e+01, percent-clipped=0.0 2023-12-22 18:42:43,659 INFO [train.py:886] (0/4) Epoch 23, batch 4300, loss[loss=0.01458, audio_tagging_loss=0.01458, over 25000.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4960225.39 frames. ], batch size: 100, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:42:48,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=727680.0, ans=0.0 2023-12-22 18:42:50,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=727680.0, ans=0.2 2023-12-22 18:42:52,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=727680.0, ans=0.125 2023-12-22 18:43:13,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=727880.0, ans=0.0 2023-12-22 18:43:16,837 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:43:24,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=727946.6666666666, ans=10.0 2023-12-22 18:43:28,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=22.5 2023-12-22 18:43:28,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=727946.6666666666, ans=0.2 2023-12-22 18:43:35,867 INFO [train.py:886] (0/4) Epoch 23, batch 4350, loss[loss=0.01114, audio_tagging_loss=0.01114, over 24750.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4957984.19 frames. ], batch size: 99, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:43:44,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=728080.0, ans=0.0 2023-12-22 18:43:54,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=728080.0, ans=0.125 2023-12-22 18:44:07,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=728213.3333333334, ans=0.125 2023-12-22 18:44:17,614 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.90 vs. limit=15.0 2023-12-22 18:44:24,888 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.796e+01 3.084e+01 3.223e+01 3.361e+01 3.747e+01, threshold=6.446e+01, percent-clipped=0.0 2023-12-22 18:44:27,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.44 vs. limit=15.0 2023-12-22 18:44:27,823 INFO [train.py:886] (0/4) Epoch 23, batch 4400, loss[loss=0.01519, audio_tagging_loss=0.01519, over 24750.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4947335.89 frames. ], batch size: 99, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:44:41,904 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:44:43,358 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.79 vs. limit=22.5 2023-12-22 18:45:11,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.31 vs. limit=6.0 2023-12-22 18:45:17,122 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.824e-03 2023-12-22 18:45:18,737 INFO [train.py:886] (0/4) Epoch 23, batch 4450, loss[loss=0.01271, audio_tagging_loss=0.01271, over 25000.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4942492.53 frames. ], batch size: 100, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:45:24,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=728680.0, ans=0.05 2023-12-22 18:45:32,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=728746.6666666666, ans=0.125 2023-12-22 18:45:35,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=728746.6666666666, ans=0.2 2023-12-22 18:45:41,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.44 vs. limit=15.0 2023-12-22 18:46:08,133 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.762e+01 3.007e+01 3.166e+01 3.319e+01 4.167e+01, threshold=6.332e+01, percent-clipped=0.0 2023-12-22 18:46:10,955 INFO [train.py:886] (0/4) Epoch 23, batch 4500, loss[loss=0.01201, audio_tagging_loss=0.01201, over 21972.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4940582.64 frames. ], batch size: 107, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:46:17,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2023-12-22 18:46:19,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=15.0 2023-12-22 18:46:23,281 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:46:31,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=729146.6666666666, ans=0.07 2023-12-22 18:46:42,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=729213.3333333334, ans=0.1 2023-12-22 18:46:53,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=729280.0, ans=0.125 2023-12-22 18:47:03,351 INFO [train.py:886] (0/4) Epoch 23, batch 4550, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24750.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4945502.64 frames. ], batch size: 99, lr: 4.66e-03, grad_scale: 64.0 2023-12-22 18:47:03,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=729346.6666666666, ans=0.0 2023-12-22 18:47:05,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.73 vs. limit=10.0 2023-12-22 18:47:15,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=729413.3333333334, ans=0.125 2023-12-22 18:47:38,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=729546.6666666666, ans=0.125 2023-12-22 18:47:51,141 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.681e+01 2.974e+01 3.118e+01 3.247e+01 3.951e+01, threshold=6.237e+01, percent-clipped=0.0 2023-12-22 18:47:54,733 INFO [train.py:886] (0/4) Epoch 23, batch 4600, loss[loss=0.01399, audio_tagging_loss=0.01399, over 24750.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4946914.23 frames. ], batch size: 99, lr: 4.66e-03, grad_scale: 64.0 2023-12-22 18:47:54,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=729680.0, ans=0.5 2023-12-22 18:48:19,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=729813.3333333334, ans=0.125 2023-12-22 18:48:32,790 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=15.0 2023-12-22 18:48:44,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=729946.6666666666, ans=0.1 2023-12-22 18:48:44,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.07 vs. limit=8.0 2023-12-22 18:48:46,816 INFO [train.py:886] (0/4) Epoch 23, batch 4650, loss[loss=0.01523, audio_tagging_loss=0.01523, over 24941.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4942905.68 frames. ], batch size: 100, lr: 4.66e-03, grad_scale: 64.0 2023-12-22 18:48:47,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=730013.3333333334, ans=0.04949747468305833 2023-12-22 18:48:57,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.71 vs. limit=22.5 2023-12-22 18:49:11,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=730146.6666666666, ans=0.0 2023-12-22 18:49:13,773 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:49:17,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=730213.3333333334, ans=0.0 2023-12-22 18:49:20,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=730213.3333333334, ans=0.0 2023-12-22 18:49:30,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=730280.0, ans=0.125 2023-12-22 18:49:31,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=730280.0, ans=0.125 2023-12-22 18:49:33,715 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.719e+01 3.050e+01 3.178e+01 3.300e+01 3.735e+01, threshold=6.356e+01, percent-clipped=0.0 2023-12-22 18:49:33,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=730280.0, ans=0.125 2023-12-22 18:49:35,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=730346.6666666666, ans=0.1 2023-12-22 18:49:36,508 INFO [train.py:886] (0/4) Epoch 23, batch 4700, loss[loss=0.01332, audio_tagging_loss=0.01332, over 24750.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4943617.54 frames. ], batch size: 99, lr: 4.66e-03, grad_scale: 64.0 2023-12-22 18:49:40,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=730346.6666666666, ans=0.125 2023-12-22 18:49:47,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=730413.3333333334, ans=0.0 2023-12-22 18:50:10,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=730546.6666666666, ans=0.125 2023-12-22 18:50:16,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=15.0 2023-12-22 18:50:24,710 INFO [train.py:886] (0/4) Epoch 23, batch 4750, loss[loss=0.01208, audio_tagging_loss=0.01208, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4932828.16 frames. ], batch size: 99, lr: 4.66e-03, grad_scale: 64.0 2023-12-22 18:50:26,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=730680.0, ans=0.1 2023-12-22 18:50:32,247 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:50:34,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=730746.6666666666, ans=0.125 2023-12-22 18:50:37,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=730746.6666666666, ans=0.125 2023-12-22 18:50:39,904 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-23.pt 2023-12-22 18:50:58,842 INFO [train.py:886] (0/4) Epoch 24, batch 0, loss[loss=0.02868, audio_tagging_loss=0.02868, over 24104.00 frames. ], tot_loss[loss=0.02868, audio_tagging_loss=0.02868, over 24104.00 frames. ], batch size: 100, lr: 4.56e-03, grad_scale: 64.0 2023-12-22 18:50:58,844 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 18:51:19,324 INFO [train.py:917] (0/4) Epoch 24, validation: loss=0.03237, audio_tagging_loss=0.03237, over 3737520.00 frames. 2023-12-22 18:51:19,325 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 18:51:35,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=730853.3333333334, ans=0.125 2023-12-22 18:51:35,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=730853.3333333334, ans=0.1 2023-12-22 18:51:47,823 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.70 vs. limit=22.5 2023-12-22 18:51:51,196 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.764e+01 3.132e+01 3.325e+01 4.669e+01 9.691e+01, threshold=6.651e+01, percent-clipped=7.0 2023-12-22 18:52:00,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=731053.3333333334, ans=0.0 2023-12-22 18:52:10,593 INFO [train.py:886] (0/4) Epoch 24, batch 50, loss[loss=0.01904, audio_tagging_loss=0.01904, over 25000.00 frames. ], tot_loss[loss=0.02101, audio_tagging_loss=0.02101, over 1117228.76 frames. ], batch size: 100, lr: 4.56e-03, grad_scale: 64.0 2023-12-22 18:52:25,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=731186.6666666666, ans=0.0 2023-12-22 18:52:25,666 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2023-12-22 18:52:27,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2023-12-22 18:52:48,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=731320.0, ans=0.0 2023-12-22 18:53:01,991 INFO [train.py:886] (0/4) Epoch 24, batch 100, loss[loss=0.0145, audio_tagging_loss=0.0145, over 25000.00 frames. ], tot_loss[loss=0.01815, audio_tagging_loss=0.01815, over 1970798.97 frames. ], batch size: 100, lr: 4.56e-03, grad_scale: 64.0 2023-12-22 18:53:07,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=731453.3333333334, ans=0.0 2023-12-22 18:53:15,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=731520.0, ans=0.2 2023-12-22 18:53:19,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=731520.0, ans=0.125 2023-12-22 18:53:29,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2023-12-22 18:53:34,163 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.856e+01 3.361e+01 3.546e+01 3.763e+01 4.764e+01, threshold=7.093e+01, percent-clipped=0.0 2023-12-22 18:53:35,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=731653.3333333334, ans=0.125 2023-12-22 18:53:40,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=731653.3333333334, ans=0.125 2023-12-22 18:53:53,542 INFO [train.py:886] (0/4) Epoch 24, batch 150, loss[loss=0.0141, audio_tagging_loss=0.0141, over 25000.00 frames. ], tot_loss[loss=0.01662, audio_tagging_loss=0.01662, over 2634453.82 frames. ], batch size: 100, lr: 4.56e-03, grad_scale: 64.0 2023-12-22 18:53:53,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731786.6666666666, ans=0.1 2023-12-22 18:54:22,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731920.0, ans=0.1 2023-12-22 18:54:28,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=731986.6666666666, ans=0.125 2023-12-22 18:54:35,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2023-12-22 18:54:41,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=732053.3333333334, ans=0.125 2023-12-22 18:54:43,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=732053.3333333334, ans=0.1 2023-12-22 18:54:45,352 INFO [train.py:886] (0/4) Epoch 24, batch 200, loss[loss=0.0116, audio_tagging_loss=0.0116, over 24750.00 frames. ], tot_loss[loss=0.01565, audio_tagging_loss=0.01565, over 3153034.19 frames. ], batch size: 99, lr: 4.56e-03, grad_scale: 64.0 2023-12-22 18:54:49,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=732120.0, ans=0.125 2023-12-22 18:54:57,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.30 vs. limit=6.0 2023-12-22 18:55:17,474 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.730e+01 3.030e+01 3.161e+01 3.286e+01 3.848e+01, threshold=6.323e+01, percent-clipped=0.0 2023-12-22 18:55:22,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=732320.0, ans=0.0 2023-12-22 18:55:38,557 INFO [train.py:886] (0/4) Epoch 24, batch 250, loss[loss=0.01396, audio_tagging_loss=0.01396, over 25000.00 frames. ], tot_loss[loss=0.01514, audio_tagging_loss=0.01514, over 3558913.47 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 18:55:55,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=732520.0, ans=22.5 2023-12-22 18:56:17,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=732653.3333333334, ans=0.0 2023-12-22 18:56:30,535 INFO [train.py:886] (0/4) Epoch 24, batch 300, loss[loss=0.01339, audio_tagging_loss=0.01339, over 24750.00 frames. ], tot_loss[loss=0.01465, audio_tagging_loss=0.01465, over 3863113.01 frames. ], batch size: 99, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 18:56:35,770 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2023-12-22 18:56:42,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=732853.3333333334, ans=0.1 2023-12-22 18:56:44,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2023-12-22 18:56:56,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=732920.0, ans=0.125 2023-12-22 18:56:57,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=732920.0, ans=0.2 2023-12-22 18:57:02,323 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.786e+01 3.035e+01 3.194e+01 3.337e+01 3.823e+01, threshold=6.389e+01, percent-clipped=0.0 2023-12-22 18:57:09,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=732986.6666666666, ans=0.0 2023-12-22 18:57:14,457 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:57:21,964 INFO [train.py:886] (0/4) Epoch 24, batch 350, loss[loss=0.01248, audio_tagging_loss=0.01248, over 24750.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4097354.81 frames. ], batch size: 99, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 18:57:23,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=733120.0, ans=0.2 2023-12-22 18:57:43,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=733253.3333333334, ans=0.1 2023-12-22 18:57:47,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.99 vs. limit=6.0 2023-12-22 18:57:55,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=733320.0, ans=0.125 2023-12-22 18:58:10,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=12.0 2023-12-22 18:58:15,025 INFO [train.py:886] (0/4) Epoch 24, batch 400, loss[loss=0.01318, audio_tagging_loss=0.01318, over 25000.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4281491.11 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 18:58:21,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=733453.3333333334, ans=0.0 2023-12-22 18:58:23,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=733520.0, ans=0.125 2023-12-22 18:58:34,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.80 vs. limit=22.5 2023-12-22 18:58:47,568 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.554e+01 2.966e+01 3.157e+01 3.316e+01 3.738e+01, threshold=6.314e+01, percent-clipped=0.0 2023-12-22 18:58:49,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=733653.3333333334, ans=0.2 2023-12-22 18:59:05,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=733786.6666666666, ans=0.1 2023-12-22 18:59:05,910 INFO [train.py:886] (0/4) Epoch 24, batch 450, loss[loss=0.01127, audio_tagging_loss=0.01127, over 25000.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4435740.03 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 18:59:22,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=733853.3333333334, ans=0.0 2023-12-22 18:59:40,437 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2023-12-22 18:59:45,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.62 vs. limit=15.0 2023-12-22 18:59:58,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=734120.0, ans=0.125 2023-12-22 18:59:59,013 INFO [train.py:886] (0/4) Epoch 24, batch 500, loss[loss=0.0121, audio_tagging_loss=0.0121, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4551545.14 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 19:00:10,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=734186.6666666666, ans=0.0 2023-12-22 19:00:18,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=734253.3333333334, ans=0.125 2023-12-22 19:00:24,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=734253.3333333334, ans=0.125 2023-12-22 19:00:31,463 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+01 2.996e+01 3.110e+01 3.243e+01 4.536e+01, threshold=6.220e+01, percent-clipped=0.0 2023-12-22 19:00:41,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=734386.6666666666, ans=12.0 2023-12-22 19:00:42,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=734386.6666666666, ans=0.125 2023-12-22 19:00:50,178 INFO [train.py:886] (0/4) Epoch 24, batch 550, loss[loss=0.01115, audio_tagging_loss=0.01115, over 25000.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4647783.13 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 19:00:53,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=734453.3333333334, ans=0.0 2023-12-22 19:01:03,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=734520.0, ans=0.0 2023-12-22 19:01:06,807 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2023-12-22 19:01:14,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2023-12-22 19:01:30,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=734653.3333333334, ans=0.125 2023-12-22 19:01:41,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=734786.6666666666, ans=0.125 2023-12-22 19:01:42,433 INFO [train.py:886] (0/4) Epoch 24, batch 600, loss[loss=0.01359, audio_tagging_loss=0.01359, over 24750.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4720344.13 frames. ], batch size: 99, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 19:01:47,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=734786.6666666666, ans=0.125 2023-12-22 19:01:50,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734786.6666666666, ans=0.1 2023-12-22 19:02:11,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=734920.0, ans=0.125 2023-12-22 19:02:13,847 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.711e+01 3.028e+01 3.193e+01 3.328e+01 3.819e+01, threshold=6.386e+01, percent-clipped=0.0 2023-12-22 19:02:19,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=734986.6666666666, ans=0.125 2023-12-22 19:02:22,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=735053.3333333334, ans=0.0 2023-12-22 19:02:33,926 INFO [train.py:886] (0/4) Epoch 24, batch 650, loss[loss=0.01403, audio_tagging_loss=0.01403, over 22855.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4767060.16 frames. ], batch size: 107, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 19:02:35,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2023-12-22 19:02:46,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=735186.6666666666, ans=0.125 2023-12-22 19:02:48,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=735186.6666666666, ans=0.125 2023-12-22 19:03:10,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=735320.0, ans=0.1 2023-12-22 19:03:24,385 INFO [train.py:886] (0/4) Epoch 24, batch 700, loss[loss=0.0126, audio_tagging_loss=0.0126, over 22220.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4795948.19 frames. ], batch size: 107, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 19:03:40,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=735520.0, ans=0.2 2023-12-22 19:03:51,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=735586.6666666666, ans=0.2 2023-12-22 19:03:55,823 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.626e+01 3.024e+01 3.156e+01 3.315e+01 3.612e+01, threshold=6.313e+01, percent-clipped=0.0 2023-12-22 19:04:01,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=735653.3333333334, ans=0.125 2023-12-22 19:04:11,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=735720.0, ans=0.2 2023-12-22 19:04:15,532 INFO [train.py:886] (0/4) Epoch 24, batch 750, loss[loss=0.01325, audio_tagging_loss=0.01325, over 25000.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4833461.15 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:04:23,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=735786.6666666666, ans=0.2 2023-12-22 19:04:32,520 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:04:41,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=735920.0, ans=0.2 2023-12-22 19:04:42,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=15.0 2023-12-22 19:04:43,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=735920.0, ans=0.0 2023-12-22 19:04:47,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=735986.6666666666, ans=0.2 2023-12-22 19:05:01,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=736053.3333333334, ans=0.125 2023-12-22 19:05:05,911 INFO [train.py:886] (0/4) Epoch 24, batch 800, loss[loss=0.01414, audio_tagging_loss=0.01414, over 24750.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4862604.63 frames. ], batch size: 99, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:05:07,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=736120.0, ans=0.2 2023-12-22 19:05:13,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=15.0 2023-12-22 19:05:16,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=736120.0, ans=0.1 2023-12-22 19:05:19,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=736186.6666666666, ans=0.2 2023-12-22 19:05:24,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=736186.6666666666, ans=0.125 2023-12-22 19:05:24,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=736186.6666666666, ans=0.125 2023-12-22 19:05:38,410 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+01 2.955e+01 3.119e+01 3.236e+01 3.787e+01, threshold=6.239e+01, percent-clipped=0.0 2023-12-22 19:05:39,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=736320.0, ans=0.125 2023-12-22 19:05:39,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=736320.0, ans=0.125 2023-12-22 19:05:42,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=736320.0, ans=0.5 2023-12-22 19:05:57,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=736453.3333333334, ans=0.05 2023-12-22 19:05:58,616 INFO [train.py:886] (0/4) Epoch 24, batch 850, loss[loss=0.01356, audio_tagging_loss=0.01356, over 25000.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4890703.30 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:06:07,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=736453.3333333334, ans=0.125 2023-12-22 19:06:08,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.02 vs. limit=6.0 2023-12-22 19:06:11,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=736520.0, ans=0.0 2023-12-22 19:06:13,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=736520.0, ans=0.125 2023-12-22 19:06:17,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=736520.0, ans=0.125 2023-12-22 19:06:18,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=736586.6666666666, ans=0.125 2023-12-22 19:06:25,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2023-12-22 19:06:38,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=736653.3333333334, ans=0.0 2023-12-22 19:06:50,805 INFO [train.py:886] (0/4) Epoch 24, batch 900, loss[loss=0.01231, audio_tagging_loss=0.01231, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4910947.88 frames. ], batch size: 99, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:06:53,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=736786.6666666666, ans=0.125 2023-12-22 19:06:56,701 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2023-12-22 19:06:58,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=736786.6666666666, ans=0.1 2023-12-22 19:07:00,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=736853.3333333334, ans=0.0 2023-12-22 19:07:10,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736920.0, ans=0.1 2023-12-22 19:07:22,869 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.749e+01 3.064e+01 3.189e+01 3.308e+01 4.129e+01, threshold=6.377e+01, percent-clipped=0.0 2023-12-22 19:07:24,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=736986.6666666666, ans=0.0 2023-12-22 19:07:42,177 INFO [train.py:886] (0/4) Epoch 24, batch 950, loss[loss=0.01259, audio_tagging_loss=0.01259, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4910075.29 frames. ], batch size: 99, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:07:48,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=737120.0, ans=0.125 2023-12-22 19:08:08,512 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2023-12-22 19:08:09,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=737253.3333333334, ans=0.125 2023-12-22 19:08:23,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=737386.6666666666, ans=0.2 2023-12-22 19:08:26,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=737386.6666666666, ans=0.0 2023-12-22 19:08:27,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=737386.6666666666, ans=0.125 2023-12-22 19:08:34,717 INFO [train.py:886] (0/4) Epoch 24, batch 1000, loss[loss=0.01439, audio_tagging_loss=0.01439, over 25000.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4918555.14 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:09:06,969 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.839e+01 3.078e+01 3.149e+01 3.357e+01 3.682e+01, threshold=6.299e+01, percent-clipped=0.0 2023-12-22 19:09:15,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.99 vs. limit=15.0 2023-12-22 19:09:20,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2023-12-22 19:09:27,337 INFO [train.py:886] (0/4) Epoch 24, batch 1050, loss[loss=0.01103, audio_tagging_loss=0.01103, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4927506.31 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:09:30,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=737786.6666666666, ans=0.125 2023-12-22 19:09:34,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=737786.6666666666, ans=0.125 2023-12-22 19:09:37,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=737853.3333333334, ans=0.125 2023-12-22 19:09:45,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737853.3333333334, ans=0.1 2023-12-22 19:09:46,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=737920.0, ans=0.1 2023-12-22 19:09:53,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=737920.0, ans=0.0 2023-12-22 19:09:54,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=737920.0, ans=0.2 2023-12-22 19:10:18,277 INFO [train.py:886] (0/4) Epoch 24, batch 1100, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4932586.14 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:10:30,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=738186.6666666666, ans=0.125 2023-12-22 19:10:31,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=738186.6666666666, ans=0.125 2023-12-22 19:10:33,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=738186.6666666666, ans=0.2 2023-12-22 19:10:50,309 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.735e+01 2.982e+01 3.129e+01 3.253e+01 3.544e+01, threshold=6.257e+01, percent-clipped=0.0 2023-12-22 19:11:10,320 INFO [train.py:886] (0/4) Epoch 24, batch 1150, loss[loss=0.0131, audio_tagging_loss=0.0131, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4944060.66 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:11:14,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=738453.3333333334, ans=0.09899494936611666 2023-12-22 19:11:17,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.92 vs. limit=15.0 2023-12-22 19:11:18,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=738453.3333333334, ans=0.125 2023-12-22 19:11:49,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.78 vs. limit=22.5 2023-12-22 19:12:02,046 INFO [train.py:886] (0/4) Epoch 24, batch 1200, loss[loss=0.0114, audio_tagging_loss=0.0114, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4952208.34 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 128.0 2023-12-22 19:12:04,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=738786.6666666666, ans=0.0 2023-12-22 19:12:20,627 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:12:28,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.18 vs. limit=10.0 2023-12-22 19:12:29,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=738920.0, ans=0.0 2023-12-22 19:12:34,317 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.793e+01 3.042e+01 3.211e+01 3.340e+01 3.947e+01, threshold=6.423e+01, percent-clipped=0.0 2023-12-22 19:12:37,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=738986.6666666666, ans=0.0 2023-12-22 19:12:44,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=739053.3333333334, ans=0.1 2023-12-22 19:12:49,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=739053.3333333334, ans=0.125 2023-12-22 19:12:54,868 INFO [train.py:886] (0/4) Epoch 24, batch 1250, loss[loss=0.01145, audio_tagging_loss=0.01145, over 24750.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4945428.74 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 128.0 2023-12-22 19:12:59,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=739120.0, ans=0.5 2023-12-22 19:13:06,722 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2023-12-22 19:13:47,180 INFO [train.py:886] (0/4) Epoch 24, batch 1300, loss[loss=0.01223, audio_tagging_loss=0.01223, over 25000.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4946012.31 frames. ], batch size: 100, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:13:49,802 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.16 vs. limit=15.0 2023-12-22 19:14:00,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=739520.0, ans=0.0 2023-12-22 19:14:20,276 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.767e+01 3.066e+01 3.219e+01 3.358e+01 3.789e+01, threshold=6.439e+01, percent-clipped=0.0 2023-12-22 19:14:29,305 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2023-12-22 19:14:33,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.64 vs. limit=10.0 2023-12-22 19:14:36,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=739720.0, ans=0.04949747468305833 2023-12-22 19:14:38,043 INFO [train.py:886] (0/4) Epoch 24, batch 1350, loss[loss=0.01249, audio_tagging_loss=0.01249, over 24750.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4949125.24 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:14:42,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=739786.6666666666, ans=0.1 2023-12-22 19:15:02,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=739920.0, ans=0.125 2023-12-22 19:15:03,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=739920.0, ans=0.1 2023-12-22 19:15:29,108 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2023-12-22 19:15:30,559 INFO [train.py:886] (0/4) Epoch 24, batch 1400, loss[loss=0.01482, audio_tagging_loss=0.01482, over 24750.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4949013.10 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:16:03,680 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.772e+01 2.993e+01 3.155e+01 3.299e+01 3.866e+01, threshold=6.310e+01, percent-clipped=0.0 2023-12-22 19:16:22,179 INFO [train.py:886] (0/4) Epoch 24, batch 1450, loss[loss=0.01422, audio_tagging_loss=0.01422, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4947586.66 frames. ], batch size: 100, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:16:46,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=740586.6666666666, ans=0.0 2023-12-22 19:16:49,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=740586.6666666666, ans=0.0 2023-12-22 19:16:52,169 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2023-12-22 19:16:52,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=740653.3333333334, ans=0.0 2023-12-22 19:17:03,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=740720.0, ans=0.04949747468305833 2023-12-22 19:17:04,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=740720.0, ans=0.0 2023-12-22 19:17:10,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=740720.0, ans=0.0 2023-12-22 19:17:13,100 INFO [train.py:886] (0/4) Epoch 24, batch 1500, loss[loss=0.01236, audio_tagging_loss=0.01236, over 24059.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4947782.72 frames. ], batch size: 100, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:17:18,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=740786.6666666666, ans=0.125 2023-12-22 19:17:23,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=740853.3333333334, ans=0.2 2023-12-22 19:17:30,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=740853.3333333334, ans=0.125 2023-12-22 19:17:46,189 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.671e+01 3.063e+01 3.189e+01 3.318e+01 3.904e+01, threshold=6.378e+01, percent-clipped=0.0 2023-12-22 19:18:03,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=741053.3333333334, ans=0.0 2023-12-22 19:18:03,979 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2023-12-22 19:18:04,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741120.0, ans=0.1 2023-12-22 19:18:05,231 INFO [train.py:886] (0/4) Epoch 24, batch 1550, loss[loss=0.01376, audio_tagging_loss=0.01376, over 24750.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4949219.16 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:18:10,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=741120.0, ans=0.0 2023-12-22 19:18:23,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=741186.6666666666, ans=0.1 2023-12-22 19:18:27,005 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2023-12-22 19:18:27,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=741253.3333333334, ans=0.2 2023-12-22 19:18:28,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=741253.3333333334, ans=0.2 2023-12-22 19:18:34,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=741253.3333333334, ans=0.125 2023-12-22 19:18:54,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=741386.6666666666, ans=0.0 2023-12-22 19:18:56,365 INFO [train.py:886] (0/4) Epoch 24, batch 1600, loss[loss=0.01269, audio_tagging_loss=0.01269, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4943377.41 frames. ], batch size: 100, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:19:02,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2023-12-22 19:19:28,584 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.712e+01 3.060e+01 3.214e+01 3.363e+01 3.973e+01, threshold=6.429e+01, percent-clipped=0.0 2023-12-22 19:19:40,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.65 vs. limit=15.0 2023-12-22 19:19:44,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=741720.0, ans=0.04949747468305833 2023-12-22 19:19:46,330 INFO [train.py:886] (0/4) Epoch 24, batch 1650, loss[loss=0.01297, audio_tagging_loss=0.01297, over 24750.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4942708.30 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:20:00,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=741853.3333333334, ans=0.2 2023-12-22 19:20:39,649 INFO [train.py:886] (0/4) Epoch 24, batch 1700, loss[loss=0.01083, audio_tagging_loss=0.01083, over 24750.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4941238.41 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:20:41,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742120.0, ans=0.1 2023-12-22 19:20:43,011 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.91 vs. limit=22.5 2023-12-22 19:21:12,329 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.769e+01 2.988e+01 3.129e+01 3.264e+01 3.998e+01, threshold=6.258e+01, percent-clipped=0.0 2023-12-22 19:21:21,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742386.6666666666, ans=0.1 2023-12-22 19:21:27,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=742386.6666666666, ans=0.125 2023-12-22 19:21:29,580 INFO [train.py:886] (0/4) Epoch 24, batch 1750, loss[loss=0.01288, audio_tagging_loss=0.01288, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4944000.90 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:21:29,741 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:21:32,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=742453.3333333334, ans=0.125 2023-12-22 19:21:32,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=742453.3333333334, ans=0.0 2023-12-22 19:21:36,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.72 vs. limit=10.0 2023-12-22 19:21:44,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=742520.0, ans=0.0 2023-12-22 19:21:59,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=742653.3333333334, ans=0.0 2023-12-22 19:22:09,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=742653.3333333334, ans=0.0 2023-12-22 19:22:19,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.51 vs. limit=22.5 2023-12-22 19:22:22,569 INFO [train.py:886] (0/4) Epoch 24, batch 1800, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4944246.69 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:22:38,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-12-22 19:22:45,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=742920.0, ans=0.125 2023-12-22 19:22:46,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.98 vs. limit=10.0 2023-12-22 19:22:46,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=742920.0, ans=0.125 2023-12-22 19:22:55,018 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.809e+01 3.072e+01 3.188e+01 3.328e+01 3.715e+01, threshold=6.376e+01, percent-clipped=0.0 2023-12-22 19:23:01,132 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.40 vs. limit=22.5 2023-12-22 19:23:02,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=743053.3333333334, ans=0.5 2023-12-22 19:23:12,661 INFO [train.py:886] (0/4) Epoch 24, batch 1850, loss[loss=0.01284, audio_tagging_loss=0.01284, over 24750.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4950032.37 frames. ], batch size: 99, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:23:13,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=743120.0, ans=0.1 2023-12-22 19:23:22,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=743120.0, ans=0.025 2023-12-22 19:23:43,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=743320.0, ans=0.125 2023-12-22 19:24:03,129 INFO [train.py:886] (0/4) Epoch 24, batch 1900, loss[loss=0.01626, audio_tagging_loss=0.01626, over 24750.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4946221.18 frames. ], batch size: 99, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:24:28,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743586.6666666666, ans=0.1 2023-12-22 19:24:35,409 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+01 3.079e+01 3.221e+01 3.350e+01 3.840e+01, threshold=6.442e+01, percent-clipped=0.0 2023-12-22 19:24:37,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=743653.3333333334, ans=0.125 2023-12-22 19:24:51,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=743720.0, ans=0.125 2023-12-22 19:24:55,069 INFO [train.py:886] (0/4) Epoch 24, batch 1950, loss[loss=0.01389, audio_tagging_loss=0.01389, over 24750.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4944757.17 frames. ], batch size: 99, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:24:58,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=743786.6666666666, ans=0.0 2023-12-22 19:25:19,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=743920.0, ans=0.2 2023-12-22 19:25:29,224 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.531e-03 2023-12-22 19:25:30,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=743986.6666666666, ans=0.1 2023-12-22 19:25:31,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=12.0 2023-12-22 19:25:38,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=744053.3333333334, ans=0.125 2023-12-22 19:25:39,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=744053.3333333334, ans=0.125 2023-12-22 19:25:41,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=744053.3333333334, ans=0.1 2023-12-22 19:25:45,863 INFO [train.py:886] (0/4) Epoch 24, batch 2000, loss[loss=0.01162, audio_tagging_loss=0.01162, over 25000.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4943550.50 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:25:46,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=744120.0, ans=0.0 2023-12-22 19:26:18,953 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.621e+01 2.994e+01 3.122e+01 3.277e+01 4.126e+01, threshold=6.244e+01, percent-clipped=0.0 2023-12-22 19:26:31,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=744386.6666666666, ans=0.1 2023-12-22 19:26:34,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=744386.6666666666, ans=0.1 2023-12-22 19:26:38,096 INFO [train.py:886] (0/4) Epoch 24, batch 2050, loss[loss=0.01569, audio_tagging_loss=0.01569, over 25000.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4947978.41 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:26:42,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=744453.3333333334, ans=0.125 2023-12-22 19:26:44,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=744453.3333333334, ans=0.2 2023-12-22 19:26:59,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=744586.6666666666, ans=0.125 2023-12-22 19:27:17,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=744653.3333333334, ans=0.125 2023-12-22 19:27:19,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=744720.0, ans=0.125 2023-12-22 19:27:27,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=744720.0, ans=0.1 2023-12-22 19:27:28,901 INFO [train.py:886] (0/4) Epoch 24, batch 2100, loss[loss=0.00943, audio_tagging_loss=0.00943, over 25000.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4952966.34 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:27:50,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.29 vs. limit=10.0 2023-12-22 19:27:57,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=744920.0, ans=0.2 2023-12-22 19:28:02,085 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.817e+01 3.004e+01 3.137e+01 3.300e+01 3.808e+01, threshold=6.274e+01, percent-clipped=0.0 2023-12-22 19:28:04,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=744986.6666666666, ans=0.5 2023-12-22 19:28:12,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=745053.3333333334, ans=0.5 2023-12-22 19:28:14,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=745053.3333333334, ans=0.0 2023-12-22 19:28:19,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-12-22 19:28:21,334 INFO [train.py:886] (0/4) Epoch 24, batch 2150, loss[loss=0.0117, audio_tagging_loss=0.0117, over 21570.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4951606.35 frames. ], batch size: 107, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:28:47,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=745253.3333333334, ans=0.125 2023-12-22 19:28:56,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=745320.0, ans=0.0 2023-12-22 19:29:01,349 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=12.0 2023-12-22 19:29:13,597 INFO [train.py:886] (0/4) Epoch 24, batch 2200, loss[loss=0.01479, audio_tagging_loss=0.01479, over 24750.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4947350.17 frames. ], batch size: 99, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:29:20,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=745453.3333333334, ans=0.5 2023-12-22 19:29:39,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=745586.6666666666, ans=0.125 2023-12-22 19:29:45,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=745653.3333333334, ans=0.2 2023-12-22 19:29:46,459 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.589e+01 3.066e+01 3.183e+01 3.365e+01 3.931e+01, threshold=6.367e+01, percent-clipped=0.0 2023-12-22 19:30:04,879 INFO [train.py:886] (0/4) Epoch 24, batch 2250, loss[loss=0.01086, audio_tagging_loss=0.01086, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4944234.78 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:30:08,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=745786.6666666666, ans=0.2 2023-12-22 19:30:25,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.91 vs. limit=6.0 2023-12-22 19:30:38,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2023-12-22 19:30:39,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=745986.6666666666, ans=0.0 2023-12-22 19:30:53,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=746053.3333333334, ans=0.0 2023-12-22 19:30:56,858 INFO [train.py:886] (0/4) Epoch 24, batch 2300, loss[loss=0.01569, audio_tagging_loss=0.01569, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4948810.44 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:31:12,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=746186.6666666666, ans=0.125 2023-12-22 19:31:18,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=746253.3333333334, ans=0.125 2023-12-22 19:31:29,843 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.760e+01 2.999e+01 3.171e+01 3.296e+01 4.791e+01, threshold=6.342e+01, percent-clipped=0.0 2023-12-22 19:31:31,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=746320.0, ans=0.0 2023-12-22 19:31:32,207 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=12.0 2023-12-22 19:31:44,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=746386.6666666666, ans=0.2 2023-12-22 19:31:48,237 INFO [train.py:886] (0/4) Epoch 24, batch 2350, loss[loss=0.0143, audio_tagging_loss=0.0143, over 25000.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4951289.90 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:32:14,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.21 vs. limit=15.0 2023-12-22 19:32:16,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=746586.6666666666, ans=0.0 2023-12-22 19:32:20,475 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-112000.pt 2023-12-22 19:32:29,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=746653.3333333334, ans=0.0 2023-12-22 19:32:41,817 INFO [train.py:886] (0/4) Epoch 24, batch 2400, loss[loss=0.01544, audio_tagging_loss=0.01544, over 25000.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4954834.73 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:32:51,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=746853.3333333334, ans=0.1 2023-12-22 19:32:55,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.59 vs. limit=22.5 2023-12-22 19:33:05,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=746920.0, ans=0.0 2023-12-22 19:33:07,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2023-12-22 19:33:14,831 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.682e+01 2.999e+01 3.142e+01 3.302e+01 4.831e+01, threshold=6.284e+01, percent-clipped=0.0 2023-12-22 19:33:15,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=746986.6666666666, ans=0.125 2023-12-22 19:33:20,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=746986.6666666666, ans=0.035 2023-12-22 19:33:23,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2023-12-22 19:33:34,060 INFO [train.py:886] (0/4) Epoch 24, batch 2450, loss[loss=0.01458, audio_tagging_loss=0.01458, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4955435.40 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:33:35,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=747120.0, ans=0.0 2023-12-22 19:33:35,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=747120.0, ans=15.0 2023-12-22 19:33:52,018 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2023-12-22 19:34:00,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-12-22 19:34:12,140 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.746e-02 2023-12-22 19:34:25,706 INFO [train.py:886] (0/4) Epoch 24, batch 2500, loss[loss=0.01154, audio_tagging_loss=0.01154, over 25000.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4951009.39 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:34:53,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=747586.6666666666, ans=0.0 2023-12-22 19:34:58,730 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.722e+01 3.071e+01 3.224e+01 3.383e+01 4.585e+01, threshold=6.447e+01, percent-clipped=0.0 2023-12-22 19:35:07,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=747720.0, ans=0.125 2023-12-22 19:35:09,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=747720.0, ans=0.125 2023-12-22 19:35:10,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=747720.0, ans=0.0 2023-12-22 19:35:17,085 INFO [train.py:886] (0/4) Epoch 24, batch 2550, loss[loss=0.01211, audio_tagging_loss=0.01211, over 24750.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4951364.13 frames. ], batch size: 99, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:35:18,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=747786.6666666666, ans=0.125 2023-12-22 19:35:31,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.28 vs. limit=15.0 2023-12-22 19:35:36,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=747853.3333333334, ans=0.0 2023-12-22 19:35:47,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=747986.6666666666, ans=0.0 2023-12-22 19:36:09,581 INFO [train.py:886] (0/4) Epoch 24, batch 2600, loss[loss=0.01243, audio_tagging_loss=0.01243, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4953835.06 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:36:11,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=15.0 2023-12-22 19:36:19,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=748186.6666666666, ans=0.125 2023-12-22 19:36:21,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=748186.6666666666, ans=0.125 2023-12-22 19:36:23,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=748186.6666666666, ans=0.1 2023-12-22 19:36:27,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=748186.6666666666, ans=0.125 2023-12-22 19:36:33,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=748253.3333333334, ans=0.04949747468305833 2023-12-22 19:36:39,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2023-12-22 19:36:42,263 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.710e+01 3.007e+01 3.151e+01 3.323e+01 3.960e+01, threshold=6.303e+01, percent-clipped=0.0 2023-12-22 19:36:51,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=748386.6666666666, ans=0.0 2023-12-22 19:37:00,801 INFO [train.py:886] (0/4) Epoch 24, batch 2650, loss[loss=0.01126, audio_tagging_loss=0.01126, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4952252.99 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:37:16,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.50 vs. limit=10.0 2023-12-22 19:37:20,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=748586.6666666666, ans=0.125 2023-12-22 19:37:24,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=748586.6666666666, ans=0.125 2023-12-22 19:37:25,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=748586.6666666666, ans=0.1 2023-12-22 19:37:25,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=748586.6666666666, ans=0.0 2023-12-22 19:37:36,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=748653.3333333334, ans=0.1 2023-12-22 19:37:51,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=748786.6666666666, ans=0.0 2023-12-22 19:37:52,422 INFO [train.py:886] (0/4) Epoch 24, batch 2700, loss[loss=0.01128, audio_tagging_loss=0.01128, over 24750.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4951234.65 frames. ], batch size: 99, lr: 4.51e-03, grad_scale: 32.0 2023-12-22 19:37:58,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=748786.6666666666, ans=0.1 2023-12-22 19:38:10,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=748853.3333333334, ans=0.125 2023-12-22 19:38:25,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=748986.6666666666, ans=0.125 2023-12-22 19:38:26,499 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.684e+01 3.070e+01 3.195e+01 3.314e+01 3.793e+01, threshold=6.390e+01, percent-clipped=0.0 2023-12-22 19:38:31,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=748986.6666666666, ans=0.2 2023-12-22 19:38:36,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=749053.3333333334, ans=0.125 2023-12-22 19:38:37,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=749053.3333333334, ans=0.0 2023-12-22 19:38:44,134 INFO [train.py:886] (0/4) Epoch 24, batch 2750, loss[loss=0.01344, audio_tagging_loss=0.01344, over 24750.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4953177.67 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:39:11,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=749253.3333333334, ans=0.2 2023-12-22 19:39:12,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-12-22 19:39:25,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=749386.6666666666, ans=0.0 2023-12-22 19:39:35,695 INFO [train.py:886] (0/4) Epoch 24, batch 2800, loss[loss=0.01505, audio_tagging_loss=0.01505, over 24750.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4957055.46 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:39:36,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=749453.3333333334, ans=0.0 2023-12-22 19:39:38,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=749453.3333333334, ans=0.0 2023-12-22 19:40:09,522 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.619e+01 3.066e+01 3.201e+01 3.379e+01 3.899e+01, threshold=6.402e+01, percent-clipped=0.0 2023-12-22 19:40:28,438 INFO [train.py:886] (0/4) Epoch 24, batch 2850, loss[loss=0.01434, audio_tagging_loss=0.01434, over 24750.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4950650.48 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:40:32,858 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-12-22 19:40:49,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=749920.0, ans=15.0 2023-12-22 19:40:53,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=15.0 2023-12-22 19:40:55,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=749920.0, ans=0.0 2023-12-22 19:40:58,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=749920.0, ans=0.125 2023-12-22 19:41:04,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=749986.6666666666, ans=0.0 2023-12-22 19:41:17,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=750053.3333333334, ans=0.0 2023-12-22 19:41:19,644 INFO [train.py:886] (0/4) Epoch 24, batch 2900, loss[loss=0.01545, audio_tagging_loss=0.01545, over 24750.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4944458.03 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:41:32,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=750186.6666666666, ans=0.0 2023-12-22 19:41:37,742 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=15.0 2023-12-22 19:41:43,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=750253.3333333334, ans=0.125 2023-12-22 19:41:43,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.30 vs. limit=22.5 2023-12-22 19:41:53,959 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+01 3.002e+01 3.182e+01 3.351e+01 5.287e+01, threshold=6.364e+01, percent-clipped=0.0 2023-12-22 19:42:03,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.54 vs. limit=15.0 2023-12-22 19:42:12,216 INFO [train.py:886] (0/4) Epoch 24, batch 2950, loss[loss=0.01603, audio_tagging_loss=0.01603, over 25000.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4947979.65 frames. ], batch size: 100, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:42:14,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.39 vs. limit=15.0 2023-12-22 19:42:44,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=750653.3333333334, ans=0.0 2023-12-22 19:42:55,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=750720.0, ans=0.125 2023-12-22 19:43:03,931 INFO [train.py:886] (0/4) Epoch 24, batch 3000, loss[loss=0.01313, audio_tagging_loss=0.01313, over 23964.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4944529.10 frames. ], batch size: 100, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:43:03,933 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 19:43:12,714 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6786, 2.8826, 3.5231, 3.5353], device='cuda:0') 2023-12-22 19:43:24,945 INFO [train.py:917] (0/4) Epoch 24, validation: loss=0.03301, audio_tagging_loss=0.03301, over 3737520.00 frames. 2023-12-22 19:43:24,946 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 19:43:25,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=750786.6666666666, ans=0.125 2023-12-22 19:43:37,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2023-12-22 19:43:42,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=750853.3333333334, ans=0.125 2023-12-22 19:43:59,580 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.765e+01 2.995e+01 3.133e+01 3.253e+01 3.784e+01, threshold=6.265e+01, percent-clipped=0.0 2023-12-22 19:44:05,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=750986.6666666666, ans=0.125 2023-12-22 19:44:06,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=751053.3333333334, ans=0.125 2023-12-22 19:44:06,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=751053.3333333334, ans=0.0 2023-12-22 19:44:07,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.36 vs. limit=10.0 2023-12-22 19:44:08,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2023-12-22 19:44:15,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2023-12-22 19:44:17,112 INFO [train.py:886] (0/4) Epoch 24, batch 3050, loss[loss=0.01201, audio_tagging_loss=0.01201, over 25000.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4940139.80 frames. ], batch size: 100, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:44:24,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=751120.0, ans=0.0 2023-12-22 19:44:38,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=751253.3333333334, ans=0.125 2023-12-22 19:44:43,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=751253.3333333334, ans=0.125 2023-12-22 19:44:48,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=751320.0, ans=0.07 2023-12-22 19:45:06,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=6.0 2023-12-22 19:45:08,381 INFO [train.py:886] (0/4) Epoch 24, batch 3100, loss[loss=0.01338, audio_tagging_loss=0.01338, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4949080.76 frames. ], batch size: 100, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:45:21,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=15.0 2023-12-22 19:45:22,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=751520.0, ans=0.125 2023-12-22 19:45:35,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=751586.6666666666, ans=0.2 2023-12-22 19:45:43,073 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.717e+01 3.050e+01 3.194e+01 3.363e+01 4.178e+01, threshold=6.387e+01, percent-clipped=0.0 2023-12-22 19:45:57,192 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:45:59,788 INFO [train.py:886] (0/4) Epoch 24, batch 3150, loss[loss=0.0113, audio_tagging_loss=0.0113, over 24750.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4949526.45 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:46:12,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=751853.3333333334, ans=0.125 2023-12-22 19:46:12,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2023-12-22 19:46:37,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=751986.6666666666, ans=0.0 2023-12-22 19:46:41,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=752053.3333333334, ans=0.125 2023-12-22 19:46:44,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=752053.3333333334, ans=0.0 2023-12-22 19:46:52,402 INFO [train.py:886] (0/4) Epoch 24, batch 3200, loss[loss=0.01519, audio_tagging_loss=0.01519, over 24750.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4945203.69 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:46:58,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=752120.0, ans=0.0 2023-12-22 19:47:10,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.74 vs. limit=15.0 2023-12-22 19:47:26,254 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.746e+01 3.034e+01 3.141e+01 3.276e+01 3.703e+01, threshold=6.281e+01, percent-clipped=0.0 2023-12-22 19:47:42,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=752453.3333333334, ans=0.04949747468305833 2023-12-22 19:47:43,743 INFO [train.py:886] (0/4) Epoch 24, batch 3250, loss[loss=0.01089, audio_tagging_loss=0.01089, over 25000.00 frames. ], tot_loss[loss=0.01313, audio_tagging_loss=0.01313, over 4942564.80 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:47:51,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=752453.3333333334, ans=0.1 2023-12-22 19:48:00,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2023-12-22 19:48:04,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=752586.6666666666, ans=0.0 2023-12-22 19:48:05,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=752586.6666666666, ans=0.125 2023-12-22 19:48:13,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=752586.6666666666, ans=0.2 2023-12-22 19:48:24,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=752720.0, ans=0.0 2023-12-22 19:48:29,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=752720.0, ans=0.2 2023-12-22 19:48:32,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=752720.0, ans=0.2 2023-12-22 19:48:35,254 INFO [train.py:886] (0/4) Epoch 24, batch 3300, loss[loss=0.01392, audio_tagging_loss=0.01392, over 24750.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4950810.54 frames. ], batch size: 99, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:48:42,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=752786.6666666666, ans=0.0 2023-12-22 19:48:57,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=752920.0, ans=0.125 2023-12-22 19:49:04,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=752920.0, ans=0.125 2023-12-22 19:49:09,401 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+01 3.005e+01 3.126e+01 3.283e+01 3.763e+01, threshold=6.252e+01, percent-clipped=0.0 2023-12-22 19:49:18,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=753053.3333333334, ans=0.0 2023-12-22 19:49:27,685 INFO [train.py:886] (0/4) Epoch 24, batch 3350, loss[loss=0.01237, audio_tagging_loss=0.01237, over 25000.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4955578.11 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:50:10,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=753386.6666666666, ans=0.125 2023-12-22 19:50:14,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=753386.6666666666, ans=0.125 2023-12-22 19:50:19,757 INFO [train.py:886] (0/4) Epoch 24, batch 3400, loss[loss=0.01626, audio_tagging_loss=0.01626, over 24750.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4954030.88 frames. ], batch size: 99, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:50:29,194 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:50:31,130 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=12.0 2023-12-22 19:50:34,080 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2023-12-22 19:50:43,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=753586.6666666666, ans=0.125 2023-12-22 19:50:54,454 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.741e+01 3.015e+01 3.148e+01 3.318e+01 3.787e+01, threshold=6.295e+01, percent-clipped=0.0 2023-12-22 19:51:02,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=753720.0, ans=0.125 2023-12-22 19:51:04,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=753720.0, ans=0.0 2023-12-22 19:51:06,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=753720.0, ans=0.2 2023-12-22 19:51:11,216 INFO [train.py:886] (0/4) Epoch 24, batch 3450, loss[loss=0.0159, audio_tagging_loss=0.0159, over 24750.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4951763.20 frames. ], batch size: 99, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:51:30,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=753853.3333333334, ans=0.125 2023-12-22 19:51:31,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2023-12-22 19:51:41,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=753986.6666666666, ans=0.0 2023-12-22 19:51:58,356 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2023-12-22 19:52:01,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=754053.3333333334, ans=0.125 2023-12-22 19:52:03,980 INFO [train.py:886] (0/4) Epoch 24, batch 3500, loss[loss=0.01335, audio_tagging_loss=0.01335, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4948412.94 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:52:15,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=754186.6666666666, ans=0.125 2023-12-22 19:52:23,958 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2023-12-22 19:52:28,196 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.05 vs. limit=22.5 2023-12-22 19:52:32,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=754253.3333333334, ans=12.0 2023-12-22 19:52:38,069 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.772e+01 3.028e+01 3.212e+01 3.392e+01 3.862e+01, threshold=6.425e+01, percent-clipped=0.0 2023-12-22 19:52:39,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2023-12-22 19:52:54,930 INFO [train.py:886] (0/4) Epoch 24, batch 3550, loss[loss=0.01283, audio_tagging_loss=0.01283, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4936882.43 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:53:01,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=754453.3333333334, ans=0.125 2023-12-22 19:53:25,977 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2023-12-22 19:53:27,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=754653.3333333334, ans=0.0 2023-12-22 19:53:47,384 INFO [train.py:886] (0/4) Epoch 24, batch 3600, loss[loss=0.01298, audio_tagging_loss=0.01298, over 24750.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4940878.54 frames. ], batch size: 99, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:53:47,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=754786.6666666666, ans=0.2 2023-12-22 19:53:52,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=754786.6666666666, ans=0.2 2023-12-22 19:54:08,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=15.0 2023-12-22 19:54:20,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=754986.6666666666, ans=0.1 2023-12-22 19:54:20,696 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.714e+01 2.974e+01 3.085e+01 3.211e+01 3.722e+01, threshold=6.169e+01, percent-clipped=0.0 2023-12-22 19:54:27,167 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.54 vs. limit=15.0 2023-12-22 19:54:27,748 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2023-12-22 19:54:38,361 INFO [train.py:886] (0/4) Epoch 24, batch 3650, loss[loss=0.0142, audio_tagging_loss=0.0142, over 25000.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4949117.08 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:54:47,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=755186.6666666666, ans=0.125 2023-12-22 19:55:00,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755253.3333333334, ans=0.1 2023-12-22 19:55:29,371 INFO [train.py:886] (0/4) Epoch 24, batch 3700, loss[loss=0.01153, audio_tagging_loss=0.01153, over 24750.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4952994.37 frames. ], batch size: 99, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:55:53,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=755586.6666666666, ans=0.125 2023-12-22 19:56:00,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=755653.3333333334, ans=0.2 2023-12-22 19:56:01,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=755653.3333333334, ans=0.125 2023-12-22 19:56:03,430 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.793e+01 3.064e+01 3.214e+01 3.340e+01 3.763e+01, threshold=6.428e+01, percent-clipped=0.0 2023-12-22 19:56:20,934 INFO [train.py:886] (0/4) Epoch 24, batch 3750, loss[loss=0.01378, audio_tagging_loss=0.01378, over 24750.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4952136.67 frames. ], batch size: 99, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 19:57:00,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2023-12-22 19:57:12,851 INFO [train.py:886] (0/4) Epoch 24, batch 3800, loss[loss=0.01187, audio_tagging_loss=0.01187, over 24750.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4947725.52 frames. ], batch size: 99, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 19:57:29,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=756186.6666666666, ans=0.5 2023-12-22 19:57:32,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=756253.3333333334, ans=0.125 2023-12-22 19:57:45,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2023-12-22 19:57:45,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.69 vs. limit=15.0 2023-12-22 19:57:47,016 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.557e+01 3.075e+01 3.225e+01 3.353e+01 3.871e+01, threshold=6.449e+01, percent-clipped=0.0 2023-12-22 19:57:48,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2023-12-22 19:57:51,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=756320.0, ans=0.0 2023-12-22 19:58:04,511 INFO [train.py:886] (0/4) Epoch 24, batch 3850, loss[loss=0.01342, audio_tagging_loss=0.01342, over 24750.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4942573.43 frames. ], batch size: 99, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 19:58:04,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=756453.3333333334, ans=0.125 2023-12-22 19:58:06,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756453.3333333334, ans=0.1 2023-12-22 19:58:19,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2023-12-22 19:58:34,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=22.5 2023-12-22 19:58:44,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=756720.0, ans=0.0 2023-12-22 19:58:51,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=756720.0, ans=0.2 2023-12-22 19:58:56,719 INFO [train.py:886] (0/4) Epoch 24, batch 3900, loss[loss=0.01331, audio_tagging_loss=0.01331, over 25000.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4941563.07 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 19:58:56,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=756786.6666666666, ans=0.0 2023-12-22 19:59:12,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=756853.3333333334, ans=0.125 2023-12-22 19:59:14,957 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.05 vs. limit=6.0 2023-12-22 19:59:25,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=756986.6666666666, ans=0.5 2023-12-22 19:59:29,993 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.786e+01 3.006e+01 3.171e+01 3.357e+01 4.139e+01, threshold=6.342e+01, percent-clipped=0.0 2023-12-22 19:59:38,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.43 vs. limit=22.5 2023-12-22 19:59:46,972 INFO [train.py:886] (0/4) Epoch 24, batch 3950, loss[loss=0.01194, audio_tagging_loss=0.01194, over 25000.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4945196.94 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 19:59:47,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=757120.0, ans=0.0 2023-12-22 19:59:49,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=12.0 2023-12-22 19:59:54,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=757120.0, ans=0.125 2023-12-22 19:59:55,605 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2023-12-22 19:59:58,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=757186.6666666666, ans=0.0 2023-12-22 20:00:04,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.79 vs. limit=6.0 2023-12-22 20:00:21,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.28 vs. limit=15.0 2023-12-22 20:00:33,309 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.70 vs. limit=15.0 2023-12-22 20:00:39,000 INFO [train.py:886] (0/4) Epoch 24, batch 4000, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4949395.74 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:00:41,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2023-12-22 20:00:44,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=757453.3333333334, ans=0.0 2023-12-22 20:00:56,467 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2023-12-22 20:01:11,812 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.788e+01 3.071e+01 3.185e+01 3.335e+01 3.976e+01, threshold=6.370e+01, percent-clipped=0.0 2023-12-22 20:01:21,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=757720.0, ans=0.1 2023-12-22 20:01:24,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=757720.0, ans=0.2 2023-12-22 20:01:29,358 INFO [train.py:886] (0/4) Epoch 24, batch 4050, loss[loss=0.01383, audio_tagging_loss=0.01383, over 25000.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4941680.71 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:01:33,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=757786.6666666666, ans=0.125 2023-12-22 20:02:04,994 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2023-12-22 20:02:19,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=758120.0, ans=0.125 2023-12-22 20:02:20,031 INFO [train.py:886] (0/4) Epoch 24, batch 4100, loss[loss=0.01346, audio_tagging_loss=0.01346, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4932497.00 frames. ], batch size: 99, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:02:23,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=758120.0, ans=0.125 2023-12-22 20:02:33,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=758186.6666666666, ans=0.0 2023-12-22 20:02:36,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=758186.6666666666, ans=0.0 2023-12-22 20:02:40,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=758253.3333333334, ans=0.125 2023-12-22 20:02:50,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2023-12-22 20:02:54,133 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.829e+01 3.126e+01 3.267e+01 3.430e+01 4.193e+01, threshold=6.535e+01, percent-clipped=0.0 2023-12-22 20:03:11,636 INFO [train.py:886] (0/4) Epoch 24, batch 4150, loss[loss=0.01344, audio_tagging_loss=0.01344, over 24750.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4926554.37 frames. ], batch size: 99, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:03:19,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=758453.3333333334, ans=0.1 2023-12-22 20:03:35,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=758586.6666666666, ans=0.0 2023-12-22 20:03:36,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=758586.6666666666, ans=0.125 2023-12-22 20:03:43,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2023-12-22 20:03:58,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=758720.0, ans=0.125 2023-12-22 20:03:59,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=758720.0, ans=0.125 2023-12-22 20:04:03,476 INFO [train.py:886] (0/4) Epoch 24, batch 4200, loss[loss=0.01374, audio_tagging_loss=0.01374, over 24750.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4931655.46 frames. ], batch size: 99, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:04:06,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=758786.6666666666, ans=0.0 2023-12-22 20:04:21,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=758853.3333333334, ans=0.125 2023-12-22 20:04:21,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=758853.3333333334, ans=0.125 2023-12-22 20:04:38,258 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.640e+01 3.033e+01 3.184e+01 3.387e+01 4.147e+01, threshold=6.368e+01, percent-clipped=0.0 2023-12-22 20:04:55,889 INFO [train.py:886] (0/4) Epoch 24, batch 4250, loss[loss=0.01325, audio_tagging_loss=0.01325, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4940408.26 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:04:56,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=759120.0, ans=0.0 2023-12-22 20:04:59,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=759120.0, ans=15.0 2023-12-22 20:05:02,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=759120.0, ans=0.0 2023-12-22 20:05:04,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=759120.0, ans=0.125 2023-12-22 20:05:15,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2023-12-22 20:05:39,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.11 vs. limit=12.0 2023-12-22 20:05:47,498 INFO [train.py:886] (0/4) Epoch 24, batch 4300, loss[loss=0.01449, audio_tagging_loss=0.01449, over 25000.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4947650.27 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:05:53,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=759453.3333333334, ans=0.0 2023-12-22 20:05:59,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=759520.0, ans=0.95 2023-12-22 20:06:01,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=759520.0, ans=0.1 2023-12-22 20:06:10,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=759586.6666666666, ans=0.0 2023-12-22 20:06:14,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2023-12-22 20:06:21,638 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.739e+01 3.083e+01 3.257e+01 3.408e+01 3.899e+01, threshold=6.514e+01, percent-clipped=0.0 2023-12-22 20:06:22,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=759653.3333333334, ans=0.0 2023-12-22 20:06:26,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=759653.3333333334, ans=0.0 2023-12-22 20:06:31,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=759720.0, ans=0.125 2023-12-22 20:06:37,658 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2023-12-22 20:06:39,217 INFO [train.py:886] (0/4) Epoch 24, batch 4350, loss[loss=0.01261, audio_tagging_loss=0.01261, over 24750.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4954520.15 frames. ], batch size: 99, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:06:39,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759786.6666666666, ans=0.1 2023-12-22 20:07:00,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=759920.0, ans=0.1 2023-12-22 20:07:13,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.21 vs. limit=6.0 2023-12-22 20:07:21,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=760053.3333333334, ans=0.07 2023-12-22 20:07:29,309 INFO [train.py:886] (0/4) Epoch 24, batch 4400, loss[loss=0.01636, audio_tagging_loss=0.01636, over 24949.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4951102.19 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:07:34,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=760120.0, ans=0.125 2023-12-22 20:07:45,001 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:07:51,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=760253.3333333334, ans=0.0 2023-12-22 20:07:54,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=760253.3333333334, ans=0.0 2023-12-22 20:08:03,220 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.725e+01 3.094e+01 3.285e+01 3.414e+01 4.025e+01, threshold=6.569e+01, percent-clipped=0.0 2023-12-22 20:08:04,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=760320.0, ans=0.1 2023-12-22 20:08:16,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=760386.6666666666, ans=0.0 2023-12-22 20:08:17,521 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:08:20,884 INFO [train.py:886] (0/4) Epoch 24, batch 4450, loss[loss=0.01216, audio_tagging_loss=0.01216, over 25000.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4945586.24 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:08:25,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=760453.3333333334, ans=0.125 2023-12-22 20:08:39,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=760586.6666666666, ans=0.125 2023-12-22 20:08:41,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=760586.6666666666, ans=0.125 2023-12-22 20:08:42,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2023-12-22 20:09:06,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=760720.0, ans=0.125 2023-12-22 20:09:09,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=760720.0, ans=0.125 2023-12-22 20:09:11,065 INFO [train.py:886] (0/4) Epoch 24, batch 4500, loss[loss=0.01393, audio_tagging_loss=0.01393, over 25000.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4946723.58 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:09:11,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=760786.6666666666, ans=0.0 2023-12-22 20:09:23,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=760853.3333333334, ans=0.125 2023-12-22 20:09:29,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=760853.3333333334, ans=0.125 2023-12-22 20:09:44,440 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.800e+01 3.026e+01 3.195e+01 3.356e+01 3.785e+01, threshold=6.390e+01, percent-clipped=0.0 2023-12-22 20:09:45,010 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2023-12-22 20:10:01,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.56 vs. limit=10.0 2023-12-22 20:10:02,165 INFO [train.py:886] (0/4) Epoch 24, batch 4550, loss[loss=0.01273, audio_tagging_loss=0.01273, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4950508.77 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:10:12,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=761186.6666666666, ans=0.0 2023-12-22 20:10:13,544 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:10:38,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=12.0 2023-12-22 20:10:52,869 INFO [train.py:886] (0/4) Epoch 24, batch 4600, loss[loss=0.0123, audio_tagging_loss=0.0123, over 25000.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4956178.57 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:11:04,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=761520.0, ans=0.2 2023-12-22 20:11:22,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=761653.3333333334, ans=0.0 2023-12-22 20:11:26,352 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.788e+01 3.049e+01 3.188e+01 3.293e+01 3.934e+01, threshold=6.377e+01, percent-clipped=0.0 2023-12-22 20:11:29,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=761653.3333333334, ans=0.0 2023-12-22 20:11:43,932 INFO [train.py:886] (0/4) Epoch 24, batch 4650, loss[loss=0.01106, audio_tagging_loss=0.01106, over 24750.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4955912.19 frames. ], batch size: 99, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:11:54,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=761853.3333333334, ans=0.1 2023-12-22 20:11:58,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=761853.3333333334, ans=0.125 2023-12-22 20:12:10,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=761920.0, ans=0.125 2023-12-22 20:12:10,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=761920.0, ans=0.0 2023-12-22 20:12:22,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=761986.6666666666, ans=0.1 2023-12-22 20:12:28,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=22.5 2023-12-22 20:12:32,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=762053.3333333334, ans=0.125 2023-12-22 20:12:34,696 INFO [train.py:886] (0/4) Epoch 24, batch 4700, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4954707.30 frames. ], batch size: 99, lr: 4.47e-03, grad_scale: 64.0 2023-12-22 20:12:44,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=762186.6666666666, ans=0.125 2023-12-22 20:13:00,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=762253.3333333334, ans=0.1 2023-12-22 20:13:00,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=762253.3333333334, ans=0.125 2023-12-22 20:13:06,267 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.692e+01 3.057e+01 3.259e+01 3.422e+01 3.879e+01, threshold=6.518e+01, percent-clipped=0.0 2023-12-22 20:13:22,081 INFO [train.py:886] (0/4) Epoch 24, batch 4750, loss[loss=0.01262, audio_tagging_loss=0.01262, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4954231.92 frames. ], batch size: 99, lr: 4.46e-03, grad_scale: 64.0 2023-12-22 20:13:33,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.55 vs. limit=22.5 2023-12-22 20:13:37,095 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-24.pt 2023-12-22 20:13:57,231 INFO [train.py:886] (0/4) Epoch 25, batch 0, loss[loss=0.02897, audio_tagging_loss=0.02897, over 25000.00 frames. ], tot_loss[loss=0.02897, audio_tagging_loss=0.02897, over 25000.00 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:13:57,233 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 20:14:18,056 INFO [train.py:917] (0/4) Epoch 25, validation: loss=0.03205, audio_tagging_loss=0.03205, over 3737520.00 frames. 2023-12-22 20:14:18,057 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 20:14:18,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2023-12-22 20:14:20,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.80 vs. limit=15.0 2023-12-22 20:14:49,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=762760.0, ans=0.07 2023-12-22 20:14:49,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2023-12-22 20:14:57,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=762826.6666666666, ans=0.125 2023-12-22 20:15:01,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=762826.6666666666, ans=0.0 2023-12-22 20:15:09,643 INFO [train.py:886] (0/4) Epoch 25, batch 50, loss[loss=0.01789, audio_tagging_loss=0.01789, over 25000.00 frames. ], tot_loss[loss=0.02085, audio_tagging_loss=0.02085, over 1116348.14 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:15:13,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=762893.3333333334, ans=0.125 2023-12-22 20:15:16,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=762893.3333333334, ans=0.1 2023-12-22 20:15:27,233 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.949e+01 3.385e+01 3.837e+01 4.351e+01 9.829e+01, threshold=7.674e+01, percent-clipped=6.0 2023-12-22 20:16:00,577 INFO [train.py:886] (0/4) Epoch 25, batch 100, loss[loss=0.01646, audio_tagging_loss=0.01646, over 25000.00 frames. ], tot_loss[loss=0.01813, audio_tagging_loss=0.01813, over 1974692.23 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:16:07,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=763226.6666666666, ans=0.2 2023-12-22 20:16:10,600 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:16:52,786 INFO [train.py:886] (0/4) Epoch 25, batch 150, loss[loss=0.01388, audio_tagging_loss=0.01388, over 24002.00 frames. ], tot_loss[loss=0.01656, audio_tagging_loss=0.01656, over 2642505.36 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:17:05,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.85 vs. limit=15.0 2023-12-22 20:17:10,526 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.911e+01 3.174e+01 3.367e+01 3.565e+01 4.203e+01, threshold=6.734e+01, percent-clipped=0.0 2023-12-22 20:17:14,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=763693.3333333334, ans=0.1 2023-12-22 20:17:15,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=763693.3333333334, ans=0.125 2023-12-22 20:17:23,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763760.0, ans=0.1 2023-12-22 20:17:29,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=763760.0, ans=0.125 2023-12-22 20:17:44,277 INFO [train.py:886] (0/4) Epoch 25, batch 200, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01553, audio_tagging_loss=0.01553, over 3157827.61 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:18:12,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=764026.6666666666, ans=0.2 2023-12-22 20:18:12,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=764026.6666666666, ans=0.025 2023-12-22 20:18:14,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=764093.3333333334, ans=0.0 2023-12-22 20:18:24,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=764093.3333333334, ans=0.02 2023-12-22 20:18:34,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=764160.0, ans=0.0 2023-12-22 20:18:37,151 INFO [train.py:886] (0/4) Epoch 25, batch 250, loss[loss=0.01467, audio_tagging_loss=0.01467, over 24750.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 3558467.54 frames. ], batch size: 99, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:18:43,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=764226.6666666666, ans=0.125 2023-12-22 20:18:46,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=764293.3333333334, ans=0.125 2023-12-22 20:18:55,656 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.850e+01 3.053e+01 3.207e+01 3.358e+01 3.968e+01, threshold=6.413e+01, percent-clipped=0.0 2023-12-22 20:18:59,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=764360.0, ans=0.2 2023-12-22 20:19:03,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=764360.0, ans=0.125 2023-12-22 20:19:06,199 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.358e-03 2023-12-22 20:19:08,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=764426.6666666666, ans=0.125 2023-12-22 20:19:10,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.62 vs. limit=22.5 2023-12-22 20:19:19,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=764493.3333333334, ans=0.1 2023-12-22 20:19:23,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=764493.3333333334, ans=0.1 2023-12-22 20:19:28,795 INFO [train.py:886] (0/4) Epoch 25, batch 300, loss[loss=0.01391, audio_tagging_loss=0.01391, over 25000.00 frames. ], tot_loss[loss=0.01452, audio_tagging_loss=0.01452, over 3861975.82 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:19:29,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=764560.0, ans=0.0 2023-12-22 20:19:29,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=764560.0, ans=0.125 2023-12-22 20:20:20,618 INFO [train.py:886] (0/4) Epoch 25, batch 350, loss[loss=0.01494, audio_tagging_loss=0.01494, over 24750.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4102301.76 frames. ], batch size: 99, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:20:35,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=764960.0, ans=0.0 2023-12-22 20:20:40,130 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.757e+01 3.026e+01 3.207e+01 3.330e+01 3.805e+01, threshold=6.415e+01, percent-clipped=0.0 2023-12-22 20:20:50,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=765026.6666666666, ans=0.2 2023-12-22 20:21:12,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=765226.6666666666, ans=0.125 2023-12-22 20:21:13,459 INFO [train.py:886] (0/4) Epoch 25, batch 400, loss[loss=0.01447, audio_tagging_loss=0.01447, over 24750.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4290474.91 frames. ], batch size: 99, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:21:20,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=765226.6666666666, ans=0.125 2023-12-22 20:21:24,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.27 vs. limit=15.0 2023-12-22 20:21:31,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=765293.3333333334, ans=0.0 2023-12-22 20:21:50,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=765426.6666666666, ans=0.125 2023-12-22 20:21:53,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.94 vs. limit=15.0 2023-12-22 20:21:54,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=765493.3333333334, ans=0.2 2023-12-22 20:21:54,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=765493.3333333334, ans=0.125 2023-12-22 20:21:57,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=765493.3333333334, ans=0.1 2023-12-22 20:21:57,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=765493.3333333334, ans=0.07 2023-12-22 20:22:04,091 INFO [train.py:886] (0/4) Epoch 25, batch 450, loss[loss=0.01532, audio_tagging_loss=0.01532, over 25000.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4440350.53 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:22:05,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=765560.0, ans=0.125 2023-12-22 20:22:07,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=765560.0, ans=0.2 2023-12-22 20:22:14,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=15.0 2023-12-22 20:22:23,127 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.697e+01 3.006e+01 3.168e+01 3.345e+01 4.036e+01, threshold=6.336e+01, percent-clipped=0.0 2023-12-22 20:22:23,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=765626.6666666666, ans=0.5 2023-12-22 20:22:28,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=765693.3333333334, ans=0.125 2023-12-22 20:22:56,175 INFO [train.py:886] (0/4) Epoch 25, batch 500, loss[loss=0.01201, audio_tagging_loss=0.01201, over 24750.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4557901.40 frames. ], batch size: 99, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:22:57,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=765893.3333333334, ans=0.2 2023-12-22 20:22:58,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=765893.3333333334, ans=0.125 2023-12-22 20:23:09,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=765960.0, ans=0.0 2023-12-22 20:23:17,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.74 vs. limit=10.0 2023-12-22 20:23:38,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=766160.0, ans=0.1 2023-12-22 20:23:43,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=766160.0, ans=0.95 2023-12-22 20:23:45,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=766160.0, ans=0.1 2023-12-22 20:23:47,578 INFO [train.py:886] (0/4) Epoch 25, batch 550, loss[loss=0.01422, audio_tagging_loss=0.01422, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4645081.90 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:24:05,984 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.753e+01 3.053e+01 3.176e+01 3.328e+01 4.174e+01, threshold=6.352e+01, percent-clipped=0.0 2023-12-22 20:24:08,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=766360.0, ans=0.1 2023-12-22 20:24:08,254 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.02 vs. limit=22.5 2023-12-22 20:24:39,255 INFO [train.py:886] (0/4) Epoch 25, batch 600, loss[loss=0.01209, audio_tagging_loss=0.01209, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4713430.06 frames. ], batch size: 99, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:24:43,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=766560.0, ans=0.125 2023-12-22 20:24:50,501 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2023-12-22 20:24:52,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=766626.6666666666, ans=0.0 2023-12-22 20:25:05,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2023-12-22 20:25:21,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=766826.6666666666, ans=0.0 2023-12-22 20:25:24,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=766826.6666666666, ans=0.125 2023-12-22 20:25:31,418 INFO [train.py:886] (0/4) Epoch 25, batch 650, loss[loss=0.01468, audio_tagging_loss=0.01468, over 24750.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4760569.52 frames. ], batch size: 99, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:25:31,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=766893.3333333334, ans=0.125 2023-12-22 20:25:40,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=766960.0, ans=0.125 2023-12-22 20:25:41,079 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=12.0 2023-12-22 20:25:49,884 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.858e+01 3.083e+01 3.219e+01 3.359e+01 3.843e+01, threshold=6.437e+01, percent-clipped=0.0 2023-12-22 20:25:50,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=766960.0, ans=0.0 2023-12-22 20:26:04,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=767093.3333333334, ans=0.2 2023-12-22 20:26:11,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=767093.3333333334, ans=15.0 2023-12-22 20:26:21,178 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:26:23,594 INFO [train.py:886] (0/4) Epoch 25, batch 700, loss[loss=0.01544, audio_tagging_loss=0.01544, over 24750.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4802421.51 frames. ], batch size: 99, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:26:25,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=767226.6666666666, ans=0.125 2023-12-22 20:26:25,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=767226.6666666666, ans=0.0 2023-12-22 20:26:27,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2023-12-22 20:26:45,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=767360.0, ans=0.1 2023-12-22 20:26:47,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=767360.0, ans=0.0 2023-12-22 20:26:53,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2023-12-22 20:27:15,233 INFO [train.py:886] (0/4) Epoch 25, batch 750, loss[loss=0.01292, audio_tagging_loss=0.01292, over 24750.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4832182.03 frames. ], batch size: 99, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:27:28,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=767626.6666666666, ans=0.1 2023-12-22 20:27:32,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=15.0 2023-12-22 20:27:33,651 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.775e+01 3.045e+01 3.178e+01 3.302e+01 3.824e+01, threshold=6.355e+01, percent-clipped=0.0 2023-12-22 20:27:33,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=767626.6666666666, ans=0.0 2023-12-22 20:27:34,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=767693.3333333334, ans=0.125 2023-12-22 20:27:44,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=767693.3333333334, ans=0.125 2023-12-22 20:27:54,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=767760.0, ans=0.0 2023-12-22 20:27:55,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=767826.6666666666, ans=0.1 2023-12-22 20:28:06,854 INFO [train.py:886] (0/4) Epoch 25, batch 800, loss[loss=0.01287, audio_tagging_loss=0.01287, over 25000.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4860487.48 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:28:07,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=767893.3333333334, ans=0.125 2023-12-22 20:28:10,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=767893.3333333334, ans=0.1 2023-12-22 20:28:19,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=767960.0, ans=0.0 2023-12-22 20:28:20,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.53 vs. limit=15.0 2023-12-22 20:28:26,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.27 vs. limit=15.0 2023-12-22 20:28:30,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=768026.6666666666, ans=0.0 2023-12-22 20:28:54,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=768160.0, ans=0.2 2023-12-22 20:28:58,431 INFO [train.py:886] (0/4) Epoch 25, batch 850, loss[loss=0.01496, audio_tagging_loss=0.01496, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4883737.30 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:29:06,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=768226.6666666666, ans=0.125 2023-12-22 20:29:08,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.72 vs. limit=12.0 2023-12-22 20:29:17,718 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.700e+01 3.008e+01 3.165e+01 3.343e+01 3.656e+01, threshold=6.329e+01, percent-clipped=0.0 2023-12-22 20:29:39,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=768426.6666666666, ans=0.125 2023-12-22 20:29:43,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=768493.3333333334, ans=0.125 2023-12-22 20:29:49,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.74 vs. limit=22.5 2023-12-22 20:29:50,954 INFO [train.py:886] (0/4) Epoch 25, batch 900, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4899411.85 frames. ], batch size: 99, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:29:53,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=768560.0, ans=0.125 2023-12-22 20:29:55,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=768560.0, ans=0.2 2023-12-22 20:30:16,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=768693.3333333334, ans=0.0 2023-12-22 20:30:41,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=768826.6666666666, ans=0.1 2023-12-22 20:30:43,134 INFO [train.py:886] (0/4) Epoch 25, batch 950, loss[loss=0.01247, audio_tagging_loss=0.01247, over 24750.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4906670.15 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:30:47,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=768893.3333333334, ans=0.125 2023-12-22 20:30:57,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=768960.0, ans=0.125 2023-12-22 20:31:00,901 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.816e+01 3.092e+01 3.233e+01 3.411e+01 4.030e+01, threshold=6.467e+01, percent-clipped=0.0 2023-12-22 20:31:10,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=769026.6666666666, ans=0.125 2023-12-22 20:31:28,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=12.0 2023-12-22 20:31:29,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=769160.0, ans=0.125 2023-12-22 20:31:34,090 INFO [train.py:886] (0/4) Epoch 25, batch 1000, loss[loss=0.0133, audio_tagging_loss=0.0133, over 24750.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4911814.73 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:32:02,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=769360.0, ans=0.0 2023-12-22 20:32:05,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.13 vs. limit=15.0 2023-12-22 20:32:11,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=769426.6666666666, ans=0.0 2023-12-22 20:32:12,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=769426.6666666666, ans=0.2 2023-12-22 20:32:14,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=769426.6666666666, ans=0.1 2023-12-22 20:32:17,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=769493.3333333334, ans=0.0 2023-12-22 20:32:26,425 INFO [train.py:886] (0/4) Epoch 25, batch 1050, loss[loss=0.01194, audio_tagging_loss=0.01194, over 23961.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4917488.59 frames. ], batch size: 100, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:32:34,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=769560.0, ans=0.125 2023-12-22 20:32:35,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2023-12-22 20:32:44,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=769626.6666666666, ans=0.125 2023-12-22 20:32:44,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=769626.6666666666, ans=0.0 2023-12-22 20:32:44,770 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.666e+01 3.041e+01 3.195e+01 3.319e+01 3.773e+01, threshold=6.390e+01, percent-clipped=0.0 2023-12-22 20:32:59,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=769760.0, ans=0.2 2023-12-22 20:33:18,080 INFO [train.py:886] (0/4) Epoch 25, batch 1100, loss[loss=0.01214, audio_tagging_loss=0.01214, over 24750.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4929384.55 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:33:47,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=770026.6666666666, ans=0.125 2023-12-22 20:33:49,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=770093.3333333334, ans=0.125 2023-12-22 20:34:07,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=770160.0, ans=0.0 2023-12-22 20:34:09,552 INFO [train.py:886] (0/4) Epoch 25, batch 1150, loss[loss=0.0164, audio_tagging_loss=0.0164, over 25000.00 frames. ], tot_loss[loss=0.01313, audio_tagging_loss=0.01313, over 4937075.63 frames. ], batch size: 100, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:34:12,676 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.569e-03 2023-12-22 20:34:23,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=770293.3333333334, ans=0.07 2023-12-22 20:34:28,872 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.746e+01 3.047e+01 3.195e+01 3.340e+01 6.361e+01, threshold=6.391e+01, percent-clipped=0.0 2023-12-22 20:34:34,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=770360.0, ans=0.125 2023-12-22 20:35:02,099 INFO [train.py:886] (0/4) Epoch 25, batch 1200, loss[loss=0.01241, audio_tagging_loss=0.01241, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4936946.73 frames. ], batch size: 100, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:35:05,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.45 vs. limit=10.0 2023-12-22 20:35:32,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=770760.0, ans=0.125 2023-12-22 20:35:47,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=770826.6666666666, ans=0.1 2023-12-22 20:35:48,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=770826.6666666666, ans=0.125 2023-12-22 20:35:50,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=770826.6666666666, ans=6.0 2023-12-22 20:35:52,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=770826.6666666666, ans=0.125 2023-12-22 20:35:53,919 INFO [train.py:886] (0/4) Epoch 25, batch 1250, loss[loss=0.01562, audio_tagging_loss=0.01562, over 24750.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4929042.19 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:36:07,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=770960.0, ans=0.1 2023-12-22 20:36:09,396 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:36:12,983 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.822e+01 3.141e+01 3.235e+01 3.393e+01 3.874e+01, threshold=6.470e+01, percent-clipped=0.0 2023-12-22 20:36:14,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771026.6666666666, ans=0.1 2023-12-22 20:36:17,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=771026.6666666666, ans=0.125 2023-12-22 20:36:23,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=771026.6666666666, ans=0.2 2023-12-22 20:36:28,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=771093.3333333334, ans=0.2 2023-12-22 20:36:46,649 INFO [train.py:886] (0/4) Epoch 25, batch 1300, loss[loss=0.01296, audio_tagging_loss=0.01296, over 24750.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4931641.73 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:36:47,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771226.6666666666, ans=0.1 2023-12-22 20:36:48,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=771226.6666666666, ans=0.125 2023-12-22 20:37:04,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.17 vs. limit=12.0 2023-12-22 20:37:07,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=771360.0, ans=0.125 2023-12-22 20:37:16,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.41 vs. limit=22.5 2023-12-22 20:37:30,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=771493.3333333334, ans=0.125 2023-12-22 20:37:30,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771493.3333333334, ans=0.1 2023-12-22 20:37:38,471 INFO [train.py:886] (0/4) Epoch 25, batch 1350, loss[loss=0.01217, audio_tagging_loss=0.01217, over 24750.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4930699.45 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:37:56,908 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.618e+01 3.059e+01 3.211e+01 3.317e+01 3.906e+01, threshold=6.423e+01, percent-clipped=0.0 2023-12-22 20:38:05,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=771693.3333333334, ans=22.5 2023-12-22 20:38:29,828 INFO [train.py:886] (0/4) Epoch 25, batch 1400, loss[loss=0.01364, audio_tagging_loss=0.01364, over 25000.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4940608.60 frames. ], batch size: 100, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:38:31,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=771893.3333333334, ans=0.125 2023-12-22 20:38:52,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=772026.6666666666, ans=0.125 2023-12-22 20:39:02,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=772093.3333333334, ans=0.09899494936611666 2023-12-22 20:39:04,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.03 vs. limit=22.5 2023-12-22 20:39:05,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=772093.3333333334, ans=0.125 2023-12-22 20:39:12,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=772160.0, ans=0.125 2023-12-22 20:39:21,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.91 vs. limit=10.0 2023-12-22 20:39:22,108 INFO [train.py:886] (0/4) Epoch 25, batch 1450, loss[loss=0.01263, audio_tagging_loss=0.01263, over 21551.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4943231.19 frames. ], batch size: 107, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:39:25,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=772226.6666666666, ans=0.2 2023-12-22 20:39:34,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=772293.3333333334, ans=0.0 2023-12-22 20:39:40,579 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.735e+01 3.046e+01 3.154e+01 3.328e+01 3.789e+01, threshold=6.308e+01, percent-clipped=0.0 2023-12-22 20:40:14,144 INFO [train.py:886] (0/4) Epoch 25, batch 1500, loss[loss=0.0144, audio_tagging_loss=0.0144, over 25000.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4950824.72 frames. ], batch size: 100, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:40:19,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772560.0, ans=0.1 2023-12-22 20:40:36,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=772693.3333333334, ans=0.1 2023-12-22 20:41:05,302 INFO [train.py:886] (0/4) Epoch 25, batch 1550, loss[loss=0.01278, audio_tagging_loss=0.01278, over 24750.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4948814.18 frames. ], batch size: 99, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:41:11,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=772893.3333333334, ans=0.0 2023-12-22 20:41:12,297 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:41:21,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=772960.0, ans=0.125 2023-12-22 20:41:24,018 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.734e+01 3.051e+01 3.220e+01 3.373e+01 4.348e+01, threshold=6.439e+01, percent-clipped=0.0 2023-12-22 20:41:56,980 INFO [train.py:886] (0/4) Epoch 25, batch 1600, loss[loss=0.0139, audio_tagging_loss=0.0139, over 24750.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4947724.12 frames. ], batch size: 99, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:41:59,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=773226.6666666666, ans=0.125 2023-12-22 20:42:01,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=773226.6666666666, ans=0.125 2023-12-22 20:42:11,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=773293.3333333334, ans=0.0 2023-12-22 20:42:12,945 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-116000.pt 2023-12-22 20:42:17,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=773293.3333333334, ans=0.125 2023-12-22 20:42:45,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=773493.3333333334, ans=0.125 2023-12-22 20:42:46,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=773493.3333333334, ans=0.125 2023-12-22 20:42:50,553 INFO [train.py:886] (0/4) Epoch 25, batch 1650, loss[loss=0.0132, audio_tagging_loss=0.0132, over 22160.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4939650.02 frames. ], batch size: 107, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:42:51,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=773560.0, ans=0.07 2023-12-22 20:42:59,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=773560.0, ans=0.125 2023-12-22 20:43:03,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=773626.6666666666, ans=0.125 2023-12-22 20:43:06,391 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:43:08,967 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.690e+01 3.090e+01 3.219e+01 3.390e+01 4.071e+01, threshold=6.439e+01, percent-clipped=0.0 2023-12-22 20:43:10,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=773693.3333333334, ans=0.125 2023-12-22 20:43:12,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=773693.3333333334, ans=0.125 2023-12-22 20:43:39,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2023-12-22 20:43:40,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=773826.6666666666, ans=0.125 2023-12-22 20:43:42,120 INFO [train.py:886] (0/4) Epoch 25, batch 1700, loss[loss=0.01355, audio_tagging_loss=0.01355, over 21932.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4935445.23 frames. ], batch size: 107, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:43:45,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=773893.3333333334, ans=0.1 2023-12-22 20:43:48,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773893.3333333334, ans=0.1 2023-12-22 20:43:49,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=773893.3333333334, ans=0.0 2023-12-22 20:43:53,437 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.40 vs. limit=22.5 2023-12-22 20:44:20,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=774093.3333333334, ans=0.0 2023-12-22 20:44:22,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=774093.3333333334, ans=0.0 2023-12-22 20:44:23,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=774160.0, ans=0.125 2023-12-22 20:44:24,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=774160.0, ans=0.2 2023-12-22 20:44:26,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=774160.0, ans=0.1 2023-12-22 20:44:29,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=774160.0, ans=0.2 2023-12-22 20:44:34,480 INFO [train.py:886] (0/4) Epoch 25, batch 1750, loss[loss=0.0104, audio_tagging_loss=0.0104, over 25000.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4942230.21 frames. ], batch size: 100, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:44:36,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=774226.6666666666, ans=0.125 2023-12-22 20:44:40,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=774226.6666666666, ans=0.125 2023-12-22 20:44:44,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=774293.3333333334, ans=0.0 2023-12-22 20:44:52,983 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.733e+01 3.014e+01 3.131e+01 3.293e+01 4.047e+01, threshold=6.262e+01, percent-clipped=0.0 2023-12-22 20:44:54,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=774360.0, ans=0.2 2023-12-22 20:44:57,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774360.0, ans=0.1 2023-12-22 20:45:12,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=774426.6666666666, ans=0.2 2023-12-22 20:45:15,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774493.3333333334, ans=0.1 2023-12-22 20:45:19,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=774493.3333333334, ans=0.125 2023-12-22 20:45:26,032 INFO [train.py:886] (0/4) Epoch 25, batch 1800, loss[loss=0.01287, audio_tagging_loss=0.01287, over 25000.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4945807.58 frames. ], batch size: 100, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:45:32,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=774560.0, ans=0.0 2023-12-22 20:45:43,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=774626.6666666666, ans=0.0 2023-12-22 20:45:56,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=774693.3333333334, ans=0.07 2023-12-22 20:46:14,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=774826.6666666666, ans=0.125 2023-12-22 20:46:18,652 INFO [train.py:886] (0/4) Epoch 25, batch 1850, loss[loss=0.0164, audio_tagging_loss=0.0164, over 24750.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4948213.90 frames. ], batch size: 99, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:46:31,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=774960.0, ans=0.0 2023-12-22 20:46:32,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774960.0, ans=0.1 2023-12-22 20:46:37,075 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.812e+01 3.072e+01 3.202e+01 3.378e+01 4.183e+01, threshold=6.404e+01, percent-clipped=0.0 2023-12-22 20:46:37,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=774960.0, ans=0.2 2023-12-22 20:47:10,324 INFO [train.py:886] (0/4) Epoch 25, batch 1900, loss[loss=0.01332, audio_tagging_loss=0.01332, over 24750.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4939449.13 frames. ], batch size: 99, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:47:11,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775226.6666666666, ans=0.1 2023-12-22 20:47:56,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=775493.3333333334, ans=15.0 2023-12-22 20:48:02,431 INFO [train.py:886] (0/4) Epoch 25, batch 1950, loss[loss=0.01338, audio_tagging_loss=0.01338, over 24750.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4942101.16 frames. ], batch size: 99, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:48:03,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=775560.0, ans=0.125 2023-12-22 20:48:13,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=775626.6666666666, ans=0.125 2023-12-22 20:48:21,576 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+01 3.045e+01 3.163e+01 3.260e+01 3.752e+01, threshold=6.326e+01, percent-clipped=0.0 2023-12-22 20:48:40,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=775760.0, ans=0.125 2023-12-22 20:48:47,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=775826.6666666666, ans=0.07 2023-12-22 20:48:54,593 INFO [train.py:886] (0/4) Epoch 25, batch 2000, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4945097.62 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:49:29,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2023-12-22 20:49:45,785 INFO [train.py:886] (0/4) Epoch 25, batch 2050, loss[loss=0.01388, audio_tagging_loss=0.01388, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4948705.00 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:50:04,274 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.672e+01 3.006e+01 3.133e+01 3.319e+01 3.992e+01, threshold=6.266e+01, percent-clipped=0.0 2023-12-22 20:50:10,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=776360.0, ans=0.125 2023-12-22 20:50:20,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=776426.6666666666, ans=0.2 2023-12-22 20:50:22,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=776426.6666666666, ans=0.035 2023-12-22 20:50:25,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2023-12-22 20:50:27,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=776493.3333333334, ans=0.035 2023-12-22 20:50:30,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.70 vs. limit=15.0 2023-12-22 20:50:37,284 INFO [train.py:886] (0/4) Epoch 25, batch 2100, loss[loss=0.01646, audio_tagging_loss=0.01646, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4952906.05 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:50:45,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=776560.0, ans=0.125 2023-12-22 20:50:49,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=776626.6666666666, ans=0.125 2023-12-22 20:50:50,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.38 vs. limit=10.0 2023-12-22 20:50:50,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=776626.6666666666, ans=0.125 2023-12-22 20:51:08,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2023-12-22 20:51:19,128 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:51:28,929 INFO [train.py:886] (0/4) Epoch 25, batch 2150, loss[loss=0.01614, audio_tagging_loss=0.01614, over 24750.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4953592.21 frames. ], batch size: 99, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:51:32,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=776893.3333333334, ans=0.125 2023-12-22 20:51:47,384 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.715e+01 3.098e+01 3.255e+01 3.441e+01 3.883e+01, threshold=6.510e+01, percent-clipped=0.0 2023-12-22 20:51:52,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777026.6666666666, ans=0.1 2023-12-22 20:52:12,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=777160.0, ans=0.125 2023-12-22 20:52:14,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=777160.0, ans=0.125 2023-12-22 20:52:15,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=777160.0, ans=0.125 2023-12-22 20:52:21,124 INFO [train.py:886] (0/4) Epoch 25, batch 2200, loss[loss=0.01281, audio_tagging_loss=0.01281, over 24946.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4946111.19 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:52:24,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=777226.6666666666, ans=0.125 2023-12-22 20:52:27,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=777226.6666666666, ans=0.09899494936611666 2023-12-22 20:52:28,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=777226.6666666666, ans=0.125 2023-12-22 20:52:41,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=777360.0, ans=0.5 2023-12-22 20:52:47,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=777360.0, ans=0.125 2023-12-22 20:52:49,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=777360.0, ans=0.125 2023-12-22 20:53:05,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=777493.3333333334, ans=0.02 2023-12-22 20:53:05,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777493.3333333334, ans=0.1 2023-12-22 20:53:13,065 INFO [train.py:886] (0/4) Epoch 25, batch 2250, loss[loss=0.009848, audio_tagging_loss=0.009848, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4945924.01 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:53:20,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=777560.0, ans=0.2 2023-12-22 20:53:30,762 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.795e+01 3.106e+01 3.234e+01 3.414e+01 3.931e+01, threshold=6.468e+01, percent-clipped=0.0 2023-12-22 20:53:38,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=777693.3333333334, ans=0.1 2023-12-22 20:53:42,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=777693.3333333334, ans=0.0 2023-12-22 20:53:49,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=777760.0, ans=0.125 2023-12-22 20:54:03,753 INFO [train.py:886] (0/4) Epoch 25, batch 2300, loss[loss=0.01298, audio_tagging_loss=0.01298, over 25000.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4949244.83 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:54:14,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=777960.0, ans=0.125 2023-12-22 20:54:15,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=777960.0, ans=0.015 2023-12-22 20:54:38,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=778093.3333333334, ans=0.5 2023-12-22 20:54:46,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.67 vs. limit=22.5 2023-12-22 20:54:49,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=778160.0, ans=0.2 2023-12-22 20:54:55,983 INFO [train.py:886] (0/4) Epoch 25, batch 2350, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4950244.13 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:55:14,435 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.697e+01 3.039e+01 3.160e+01 3.337e+01 4.563e+01, threshold=6.321e+01, percent-clipped=0.0 2023-12-22 20:55:19,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=778360.0, ans=0.0 2023-12-22 20:55:27,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=778426.6666666666, ans=0.125 2023-12-22 20:55:27,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=778426.6666666666, ans=0.125 2023-12-22 20:55:31,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=778426.6666666666, ans=0.0 2023-12-22 20:55:43,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=778493.3333333334, ans=0.125 2023-12-22 20:55:44,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.93 vs. limit=10.0 2023-12-22 20:55:47,501 INFO [train.py:886] (0/4) Epoch 25, batch 2400, loss[loss=0.01287, audio_tagging_loss=0.01287, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4954840.80 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:55:47,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=778560.0, ans=0.04949747468305833 2023-12-22 20:55:51,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=778560.0, ans=0.0 2023-12-22 20:55:54,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=778560.0, ans=0.2 2023-12-22 20:56:09,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=778693.3333333334, ans=0.125 2023-12-22 20:56:11,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=778693.3333333334, ans=0.1 2023-12-22 20:56:11,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=778693.3333333334, ans=0.125 2023-12-22 20:56:39,827 INFO [train.py:886] (0/4) Epoch 25, batch 2450, loss[loss=0.01454, audio_tagging_loss=0.01454, over 24750.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4959733.81 frames. ], batch size: 99, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:56:46,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.59 vs. limit=22.5 2023-12-22 20:56:53,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=778960.0, ans=0.1 2023-12-22 20:56:58,315 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.730e+01 3.055e+01 3.201e+01 3.364e+01 3.978e+01, threshold=6.403e+01, percent-clipped=0.0 2023-12-22 20:57:02,014 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:57:31,410 INFO [train.py:886] (0/4) Epoch 25, batch 2500, loss[loss=0.01538, audio_tagging_loss=0.01538, over 24750.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4956497.97 frames. ], batch size: 99, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:57:45,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779293.3333333334, ans=0.1 2023-12-22 20:57:46,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=779293.3333333334, ans=0.04949747468305833 2023-12-22 20:58:10,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=779426.6666666666, ans=0.125 2023-12-22 20:58:15,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=779493.3333333334, ans=0.0 2023-12-22 20:58:18,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=779493.3333333334, ans=0.125 2023-12-22 20:58:21,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=779493.3333333334, ans=0.125 2023-12-22 20:58:22,686 INFO [train.py:886] (0/4) Epoch 25, batch 2550, loss[loss=0.01343, audio_tagging_loss=0.01343, over 21314.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4947788.70 frames. ], batch size: 107, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 20:58:30,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=779560.0, ans=0.1 2023-12-22 20:58:32,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=779626.6666666666, ans=0.0 2023-12-22 20:58:41,773 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.771e+01 3.116e+01 3.261e+01 3.402e+01 3.994e+01, threshold=6.522e+01, percent-clipped=0.0 2023-12-22 20:58:49,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=779693.3333333334, ans=0.1 2023-12-22 20:59:15,178 INFO [train.py:886] (0/4) Epoch 25, batch 2600, loss[loss=0.01138, audio_tagging_loss=0.01138, over 22143.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4944626.71 frames. ], batch size: 107, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 20:59:36,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=780026.6666666666, ans=0.0 2023-12-22 20:59:43,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=780026.6666666666, ans=0.125 2023-12-22 21:00:08,045 INFO [train.py:886] (0/4) Epoch 25, batch 2650, loss[loss=0.01222, audio_tagging_loss=0.01222, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4947572.51 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:00:14,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.21 vs. limit=22.5 2023-12-22 21:00:26,449 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.757e+01 3.067e+01 3.194e+01 3.330e+01 3.753e+01, threshold=6.389e+01, percent-clipped=0.0 2023-12-22 21:00:29,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=780360.0, ans=0.125 2023-12-22 21:00:33,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2023-12-22 21:00:55,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=780493.3333333334, ans=0.125 2023-12-22 21:00:57,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=780493.3333333334, ans=0.1 2023-12-22 21:00:59,654 INFO [train.py:886] (0/4) Epoch 25, batch 2700, loss[loss=0.01092, audio_tagging_loss=0.01092, over 21167.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4941338.72 frames. ], batch size: 107, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:01:03,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=780560.0, ans=0.5 2023-12-22 21:01:13,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=780626.6666666666, ans=0.125 2023-12-22 21:01:16,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=780626.6666666666, ans=0.0 2023-12-22 21:01:21,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=780693.3333333334, ans=0.2 2023-12-22 21:01:22,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=780693.3333333334, ans=0.0 2023-12-22 21:01:32,452 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.59 vs. limit=10.0 2023-12-22 21:01:40,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=780826.6666666666, ans=0.125 2023-12-22 21:01:46,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=780826.6666666666, ans=0.125 2023-12-22 21:01:51,169 INFO [train.py:886] (0/4) Epoch 25, batch 2750, loss[loss=0.01265, audio_tagging_loss=0.01265, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4945957.55 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:01:51,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=780893.3333333334, ans=0.125 2023-12-22 21:01:51,627 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.50 vs. limit=12.0 2023-12-22 21:01:52,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=780893.3333333334, ans=0.0 2023-12-22 21:02:04,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=780960.0, ans=0.125 2023-12-22 21:02:05,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.13 vs. limit=6.0 2023-12-22 21:02:09,601 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.725e+01 3.037e+01 3.216e+01 3.345e+01 3.821e+01, threshold=6.432e+01, percent-clipped=0.0 2023-12-22 21:02:42,964 INFO [train.py:886] (0/4) Epoch 25, batch 2800, loss[loss=0.01415, audio_tagging_loss=0.01415, over 24750.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4947229.03 frames. ], batch size: 99, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:03:01,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=781293.3333333334, ans=0.1 2023-12-22 21:03:13,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.38 vs. limit=22.5 2023-12-22 21:03:16,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=781426.6666666666, ans=0.125 2023-12-22 21:03:36,268 INFO [train.py:886] (0/4) Epoch 25, batch 2850, loss[loss=0.01328, audio_tagging_loss=0.01328, over 24750.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4947896.26 frames. ], batch size: 99, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:03:39,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=781560.0, ans=0.125 2023-12-22 21:03:51,130 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:03:51,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=781626.6666666666, ans=0.95 2023-12-22 21:03:54,792 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.736e+01 3.071e+01 3.202e+01 3.384e+01 4.019e+01, threshold=6.405e+01, percent-clipped=0.0 2023-12-22 21:03:59,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=781693.3333333334, ans=0.2 2023-12-22 21:04:02,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2023-12-22 21:04:02,876 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.24 vs. limit=22.5 2023-12-22 21:04:06,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=781760.0, ans=0.125 2023-12-22 21:04:11,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=781760.0, ans=0.1 2023-12-22 21:04:11,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=781760.0, ans=0.0 2023-12-22 21:04:28,007 INFO [train.py:886] (0/4) Epoch 25, batch 2900, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4939877.05 frames. ], batch size: 99, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:04:29,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=781893.3333333334, ans=0.0 2023-12-22 21:05:00,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=782093.3333333334, ans=0.2 2023-12-22 21:05:01,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=782093.3333333334, ans=0.125 2023-12-22 21:05:18,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=782226.6666666666, ans=0.125 2023-12-22 21:05:18,745 INFO [train.py:886] (0/4) Epoch 25, batch 2950, loss[loss=0.01288, audio_tagging_loss=0.01288, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4935288.86 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:05:26,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=782226.6666666666, ans=0.0 2023-12-22 21:05:32,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=782293.3333333334, ans=0.0 2023-12-22 21:05:37,121 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.793e+01 3.031e+01 3.168e+01 3.321e+01 3.698e+01, threshold=6.337e+01, percent-clipped=0.0 2023-12-22 21:05:53,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=782426.6666666666, ans=0.125 2023-12-22 21:05:57,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=782426.6666666666, ans=0.0 2023-12-22 21:06:10,412 INFO [train.py:886] (0/4) Epoch 25, batch 3000, loss[loss=0.01254, audio_tagging_loss=0.01254, over 25000.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4936043.56 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:06:10,414 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 21:06:18,377 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6922, 2.9394, 3.6876, 3.5556], device='cuda:0') 2023-12-22 21:06:31,865 INFO [train.py:917] (0/4) Epoch 25, validation: loss=0.0331, audio_tagging_loss=0.0331, over 3737520.00 frames. 2023-12-22 21:06:31,866 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 21:06:34,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2023-12-22 21:06:42,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=782626.6666666666, ans=0.2 2023-12-22 21:06:49,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.38 vs. limit=5.0 2023-12-22 21:06:49,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=782626.6666666666, ans=0.125 2023-12-22 21:07:01,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=782693.3333333334, ans=0.1 2023-12-22 21:07:15,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.85 vs. limit=22.5 2023-12-22 21:07:23,406 INFO [train.py:886] (0/4) Epoch 25, batch 3050, loss[loss=0.01568, audio_tagging_loss=0.01568, over 25000.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4941041.56 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:07:24,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=782893.3333333334, ans=0.0 2023-12-22 21:07:36,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=782960.0, ans=0.0 2023-12-22 21:07:37,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=782960.0, ans=0.125 2023-12-22 21:07:41,967 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.821e+01 3.063e+01 3.171e+01 3.346e+01 3.819e+01, threshold=6.341e+01, percent-clipped=0.0 2023-12-22 21:07:43,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783026.6666666666, ans=0.1 2023-12-22 21:07:55,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-12-22 21:07:57,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=783093.3333333334, ans=0.0 2023-12-22 21:08:04,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=783160.0, ans=0.0 2023-12-22 21:08:07,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=783160.0, ans=0.125 2023-12-22 21:08:07,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=783160.0, ans=0.125 2023-12-22 21:08:09,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=783160.0, ans=0.2 2023-12-22 21:08:10,568 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-12-22 21:08:15,768 INFO [train.py:886] (0/4) Epoch 25, batch 3100, loss[loss=0.01215, audio_tagging_loss=0.01215, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4946839.64 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:08:16,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=783226.6666666666, ans=0.02 2023-12-22 21:08:32,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=783293.3333333334, ans=0.125 2023-12-22 21:08:37,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2023-12-22 21:08:46,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=15.0 2023-12-22 21:09:08,007 INFO [train.py:886] (0/4) Epoch 25, batch 3150, loss[loss=0.01247, audio_tagging_loss=0.01247, over 24750.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4942571.39 frames. ], batch size: 99, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:09:21,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=783626.6666666666, ans=0.1 2023-12-22 21:09:26,509 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.818e+01 3.103e+01 3.282e+01 3.438e+01 3.839e+01, threshold=6.565e+01, percent-clipped=0.0 2023-12-22 21:09:43,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.35 vs. limit=6.0 2023-12-22 21:09:59,521 INFO [train.py:886] (0/4) Epoch 25, batch 3200, loss[loss=0.01418, audio_tagging_loss=0.01418, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4941571.21 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:10:02,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.55 vs. limit=22.5 2023-12-22 21:10:12,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=783960.0, ans=0.125 2023-12-22 21:10:19,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=784026.6666666666, ans=0.125 2023-12-22 21:10:20,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=784026.6666666666, ans=0.125 2023-12-22 21:10:49,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=784160.0, ans=0.1 2023-12-22 21:10:51,689 INFO [train.py:886] (0/4) Epoch 25, batch 3250, loss[loss=0.01101, audio_tagging_loss=0.01101, over 21976.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4940743.65 frames. ], batch size: 107, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:11:10,154 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.690e+01 3.040e+01 3.210e+01 3.374e+01 3.789e+01, threshold=6.419e+01, percent-clipped=0.0 2023-12-22 21:11:27,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=15.0 2023-12-22 21:11:28,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=784426.6666666666, ans=0.0 2023-12-22 21:11:41,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=784493.3333333334, ans=0.0 2023-12-22 21:11:42,438 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.80 vs. limit=15.0 2023-12-22 21:11:44,419 INFO [train.py:886] (0/4) Epoch 25, batch 3300, loss[loss=0.01546, audio_tagging_loss=0.01546, over 25000.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4944199.64 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:11:47,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=784560.0, ans=0.2 2023-12-22 21:11:54,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=784626.6666666666, ans=0.02 2023-12-22 21:11:56,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=784626.6666666666, ans=0.1 2023-12-22 21:12:09,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=784693.3333333334, ans=0.125 2023-12-22 21:12:24,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=784760.0, ans=0.0 2023-12-22 21:12:36,217 INFO [train.py:886] (0/4) Epoch 25, batch 3350, loss[loss=0.01147, audio_tagging_loss=0.01147, over 25000.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4947576.37 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:12:37,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=784893.3333333334, ans=0.07 2023-12-22 21:12:41,149 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:12:46,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=784960.0, ans=0.125 2023-12-22 21:12:54,043 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.707e+01 3.036e+01 3.188e+01 3.323e+01 3.889e+01, threshold=6.376e+01, percent-clipped=0.0 2023-12-22 21:12:59,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=785026.6666666666, ans=0.125 2023-12-22 21:13:16,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=785160.0, ans=0.1 2023-12-22 21:13:17,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=785160.0, ans=0.125 2023-12-22 21:13:24,638 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:13:27,262 INFO [train.py:886] (0/4) Epoch 25, batch 3400, loss[loss=0.01455, audio_tagging_loss=0.01455, over 25000.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4951208.97 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:13:37,076 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.60 vs. limit=12.0 2023-12-22 21:14:16,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=15.0 2023-12-22 21:14:19,125 INFO [train.py:886] (0/4) Epoch 25, batch 3450, loss[loss=0.01185, audio_tagging_loss=0.01185, over 21760.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4945490.18 frames. ], batch size: 107, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:14:35,840 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2023-12-22 21:14:38,181 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.865e+01 3.134e+01 3.253e+01 3.389e+01 3.963e+01, threshold=6.506e+01, percent-clipped=0.0 2023-12-22 21:14:41,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.43 vs. limit=10.0 2023-12-22 21:14:50,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=785760.0, ans=0.125 2023-12-22 21:15:11,231 INFO [train.py:886] (0/4) Epoch 25, batch 3500, loss[loss=0.0125, audio_tagging_loss=0.0125, over 22237.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4942110.20 frames. ], batch size: 107, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:15:11,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=785893.3333333334, ans=0.0 2023-12-22 21:15:13,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=785893.3333333334, ans=0.125 2023-12-22 21:15:18,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=12.0 2023-12-22 21:15:26,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=785960.0, ans=0.2 2023-12-22 21:15:31,411 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2023-12-22 21:15:37,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=786026.6666666666, ans=0.0 2023-12-22 21:15:38,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.88 vs. limit=22.5 2023-12-22 21:16:00,486 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-12-22 21:16:02,918 INFO [train.py:886] (0/4) Epoch 25, batch 3550, loss[loss=0.01095, audio_tagging_loss=0.01095, over 25000.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4938691.94 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:16:12,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=786226.6666666666, ans=0.0 2023-12-22 21:16:22,265 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.633e+01 3.021e+01 3.171e+01 3.367e+01 3.812e+01, threshold=6.343e+01, percent-clipped=0.0 2023-12-22 21:16:27,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=786360.0, ans=0.125 2023-12-22 21:16:28,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=786360.0, ans=0.1 2023-12-22 21:16:44,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2023-12-22 21:16:45,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=786493.3333333334, ans=0.125 2023-12-22 21:16:54,956 INFO [train.py:886] (0/4) Epoch 25, batch 3600, loss[loss=0.01453, audio_tagging_loss=0.01453, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4942190.91 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:17:17,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=786693.3333333334, ans=0.0 2023-12-22 21:17:18,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=12.0 2023-12-22 21:17:32,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=786760.0, ans=0.125 2023-12-22 21:17:33,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=786760.0, ans=0.125 2023-12-22 21:17:47,386 INFO [train.py:886] (0/4) Epoch 25, batch 3650, loss[loss=0.01522, audio_tagging_loss=0.01522, over 24750.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4945372.44 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:17:52,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=786893.3333333334, ans=0.125 2023-12-22 21:17:54,129 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:17:55,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=786893.3333333334, ans=0.0 2023-12-22 21:18:02,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=786960.0, ans=0.035 2023-12-22 21:18:05,036 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.785e+01 2.980e+01 3.189e+01 3.343e+01 3.889e+01, threshold=6.377e+01, percent-clipped=0.0 2023-12-22 21:18:08,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.49 vs. limit=12.0 2023-12-22 21:18:11,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=787026.6666666666, ans=0.0 2023-12-22 21:18:15,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=787026.6666666666, ans=0.1 2023-12-22 21:18:15,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.27 vs. limit=15.0 2023-12-22 21:18:25,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2023-12-22 21:18:38,508 INFO [train.py:886] (0/4) Epoch 25, batch 3700, loss[loss=0.01638, audio_tagging_loss=0.01638, over 25000.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4945394.28 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:18:58,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.31 vs. limit=22.5 2023-12-22 21:18:58,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=787360.0, ans=0.0 2023-12-22 21:19:30,715 INFO [train.py:886] (0/4) Epoch 25, batch 3750, loss[loss=0.01085, audio_tagging_loss=0.01085, over 24750.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4949223.55 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:19:49,013 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.767e+01 3.099e+01 3.227e+01 3.364e+01 3.864e+01, threshold=6.453e+01, percent-clipped=0.0 2023-12-22 21:20:21,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=787893.3333333334, ans=0.04949747468305833 2023-12-22 21:20:22,322 INFO [train.py:886] (0/4) Epoch 25, batch 3800, loss[loss=0.01328, audio_tagging_loss=0.01328, over 24750.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4936246.56 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:20:35,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=787960.0, ans=0.125 2023-12-22 21:20:40,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=787960.0, ans=0.125 2023-12-22 21:21:06,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=788160.0, ans=0.1 2023-12-22 21:21:14,608 INFO [train.py:886] (0/4) Epoch 25, batch 3850, loss[loss=0.0126, audio_tagging_loss=0.0126, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4943127.46 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:21:15,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=788226.6666666666, ans=0.0 2023-12-22 21:21:24,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=788293.3333333334, ans=22.5 2023-12-22 21:21:33,018 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.703e+01 3.115e+01 3.236e+01 3.422e+01 4.867e+01, threshold=6.472e+01, percent-clipped=0.0 2023-12-22 21:21:49,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=788426.6666666666, ans=0.0 2023-12-22 21:21:50,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=788426.6666666666, ans=0.125 2023-12-22 21:21:55,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=788493.3333333334, ans=0.125 2023-12-22 21:21:57,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2023-12-22 21:21:58,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.38 vs. limit=10.0 2023-12-22 21:22:06,292 INFO [train.py:886] (0/4) Epoch 25, batch 3900, loss[loss=0.01602, audio_tagging_loss=0.01602, over 25000.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4944564.82 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:22:12,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=788560.0, ans=0.0 2023-12-22 21:22:18,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=788626.6666666666, ans=0.125 2023-12-22 21:22:20,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=788626.6666666666, ans=0.125 2023-12-22 21:22:37,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=788760.0, ans=0.125 2023-12-22 21:22:37,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=788760.0, ans=0.125 2023-12-22 21:22:49,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=788826.6666666666, ans=0.1 2023-12-22 21:22:52,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=788826.6666666666, ans=0.1 2023-12-22 21:22:56,931 INFO [train.py:886] (0/4) Epoch 25, batch 3950, loss[loss=0.01213, audio_tagging_loss=0.01213, over 25000.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4951549.70 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:23:16,156 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.672e+01 3.014e+01 3.191e+01 3.353e+01 3.763e+01, threshold=6.383e+01, percent-clipped=0.0 2023-12-22 21:23:23,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=789026.6666666666, ans=0.125 2023-12-22 21:23:23,912 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.75 vs. limit=10.0 2023-12-22 21:23:49,438 INFO [train.py:886] (0/4) Epoch 25, batch 4000, loss[loss=0.01546, audio_tagging_loss=0.01546, over 25000.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4953626.19 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 128.0 2023-12-22 21:24:01,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=789293.3333333334, ans=0.0 2023-12-22 21:24:06,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-12-22 21:24:40,420 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:24:41,226 INFO [train.py:886] (0/4) Epoch 25, batch 4050, loss[loss=0.0133, audio_tagging_loss=0.0133, over 24750.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4957308.87 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:24:53,685 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.28 vs. limit=15.0 2023-12-22 21:25:01,299 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.755e+01 3.123e+01 3.229e+01 3.371e+01 4.451e+01, threshold=6.458e+01, percent-clipped=0.0 2023-12-22 21:25:10,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=789693.3333333334, ans=0.125 2023-12-22 21:25:12,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.14 vs. limit=22.5 2023-12-22 21:25:28,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=789826.6666666666, ans=0.125 2023-12-22 21:25:33,450 INFO [train.py:886] (0/4) Epoch 25, batch 4100, loss[loss=0.01093, audio_tagging_loss=0.01093, over 24750.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4948958.21 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:25:34,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2023-12-22 21:25:39,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=789893.3333333334, ans=0.0 2023-12-22 21:25:51,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.67 vs. limit=22.5 2023-12-22 21:25:52,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=789960.0, ans=0.125 2023-12-22 21:26:03,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=790026.6666666666, ans=0.125 2023-12-22 21:26:16,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=790160.0, ans=0.125 2023-12-22 21:26:24,978 INFO [train.py:886] (0/4) Epoch 25, batch 4150, loss[loss=0.01316, audio_tagging_loss=0.01316, over 24750.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4943491.44 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:26:29,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=790226.6666666666, ans=0.1 2023-12-22 21:26:30,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=15.0 2023-12-22 21:26:37,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=790293.3333333334, ans=0.125 2023-12-22 21:26:38,123 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.95 vs. limit=22.5 2023-12-22 21:26:44,958 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.625e+01 3.070e+01 3.176e+01 3.307e+01 3.814e+01, threshold=6.351e+01, percent-clipped=0.0 2023-12-22 21:26:49,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=790360.0, ans=0.0 2023-12-22 21:26:58,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2023-12-22 21:27:02,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=790426.6666666666, ans=0.2 2023-12-22 21:27:09,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=790493.3333333334, ans=0.2 2023-12-22 21:27:17,228 INFO [train.py:886] (0/4) Epoch 25, batch 4200, loss[loss=0.01432, audio_tagging_loss=0.01432, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4939120.50 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:27:25,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=790560.0, ans=0.125 2023-12-22 21:27:30,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=790626.6666666666, ans=0.125 2023-12-22 21:27:37,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=790693.3333333334, ans=0.0 2023-12-22 21:27:57,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=790760.0, ans=0.125 2023-12-22 21:28:08,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-12-22 21:28:09,321 INFO [train.py:886] (0/4) Epoch 25, batch 4250, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4948092.23 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:28:19,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=790960.0, ans=0.2 2023-12-22 21:28:29,340 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.694e+01 3.059e+01 3.182e+01 3.332e+01 3.915e+01, threshold=6.364e+01, percent-clipped=0.0 2023-12-22 21:28:47,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=791093.3333333334, ans=0.125 2023-12-22 21:28:48,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=791093.3333333334, ans=0.0 2023-12-22 21:29:01,502 INFO [train.py:886] (0/4) Epoch 25, batch 4300, loss[loss=0.01701, audio_tagging_loss=0.01701, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4954223.31 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:29:03,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=791226.6666666666, ans=0.125 2023-12-22 21:29:20,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=791293.3333333334, ans=0.125 2023-12-22 21:29:21,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=791360.0, ans=0.0 2023-12-22 21:29:38,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.30 vs. limit=22.5 2023-12-22 21:29:49,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=791493.3333333334, ans=0.2 2023-12-22 21:29:49,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.70 vs. limit=15.0 2023-12-22 21:29:52,949 INFO [train.py:886] (0/4) Epoch 25, batch 4350, loss[loss=0.01365, audio_tagging_loss=0.01365, over 25000.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4957220.57 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:30:12,260 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.736e+01 3.142e+01 3.246e+01 3.434e+01 4.775e+01, threshold=6.491e+01, percent-clipped=0.0 2023-12-22 21:30:14,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=791693.3333333334, ans=0.125 2023-12-22 21:30:31,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=791760.0, ans=0.09899494936611666 2023-12-22 21:30:44,405 INFO [train.py:886] (0/4) Epoch 25, batch 4400, loss[loss=0.01264, audio_tagging_loss=0.01264, over 24750.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4947346.85 frames. ], batch size: 99, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:31:02,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=791960.0, ans=0.0 2023-12-22 21:31:20,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=792093.3333333334, ans=0.125 2023-12-22 21:31:29,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=792160.0, ans=0.2 2023-12-22 21:31:35,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=792160.0, ans=0.05 2023-12-22 21:31:36,746 INFO [train.py:886] (0/4) Epoch 25, batch 4450, loss[loss=0.01385, audio_tagging_loss=0.01385, over 24750.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4943894.76 frames. ], batch size: 99, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:31:38,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=792226.6666666666, ans=0.125 2023-12-22 21:31:41,614 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.71 vs. limit=22.5 2023-12-22 21:31:41,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2023-12-22 21:31:55,987 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.817e+01 3.096e+01 3.289e+01 3.454e+01 4.109e+01, threshold=6.578e+01, percent-clipped=0.0 2023-12-22 21:32:02,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=792360.0, ans=10.0 2023-12-22 21:32:12,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.79 vs. limit=12.0 2023-12-22 21:32:25,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=792493.3333333334, ans=0.125 2023-12-22 21:32:28,142 INFO [train.py:886] (0/4) Epoch 25, batch 4500, loss[loss=0.01213, audio_tagging_loss=0.01213, over 25000.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4947106.78 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:32:41,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=792626.6666666666, ans=0.0 2023-12-22 21:32:48,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=792693.3333333334, ans=0.125 2023-12-22 21:32:50,821 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.19 vs. limit=22.5 2023-12-22 21:32:52,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=792693.3333333334, ans=10.0 2023-12-22 21:33:11,842 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2023-12-22 21:33:14,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=792826.6666666666, ans=0.125 2023-12-22 21:33:19,562 INFO [train.py:886] (0/4) Epoch 25, batch 4550, loss[loss=0.01512, audio_tagging_loss=0.01512, over 25000.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4953231.47 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:33:19,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=12.0 2023-12-22 21:33:28,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=792960.0, ans=0.1 2023-12-22 21:33:39,621 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.637e+01 3.079e+01 3.195e+01 3.326e+01 3.977e+01, threshold=6.390e+01, percent-clipped=0.0 2023-12-22 21:33:48,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=793026.6666666666, ans=0.125 2023-12-22 21:33:50,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=793093.3333333334, ans=0.0 2023-12-22 21:34:03,177 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.25 vs. limit=15.0 2023-12-22 21:34:08,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=793160.0, ans=0.125 2023-12-22 21:34:10,992 INFO [train.py:886] (0/4) Epoch 25, batch 4600, loss[loss=0.01171, audio_tagging_loss=0.01171, over 25000.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4958633.59 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:34:12,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=793226.6666666666, ans=0.1 2023-12-22 21:34:15,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.92 vs. limit=15.0 2023-12-22 21:34:22,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=793293.3333333334, ans=0.2 2023-12-22 21:34:33,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=793360.0, ans=0.07 2023-12-22 21:34:46,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=793426.6666666666, ans=0.125 2023-12-22 21:34:58,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2023-12-22 21:35:02,779 INFO [train.py:886] (0/4) Epoch 25, batch 4650, loss[loss=0.01203, audio_tagging_loss=0.01203, over 25000.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4958828.70 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:35:10,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=793560.0, ans=0.125 2023-12-22 21:35:15,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=793626.6666666666, ans=0.125 2023-12-22 21:35:22,746 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.714e+01 3.053e+01 3.192e+01 3.319e+01 4.042e+01, threshold=6.384e+01, percent-clipped=0.0 2023-12-22 21:35:30,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=793693.3333333334, ans=0.125 2023-12-22 21:35:35,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=793760.0, ans=0.125 2023-12-22 21:35:38,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=793760.0, ans=0.0 2023-12-22 21:35:43,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=793826.6666666666, ans=0.125 2023-12-22 21:35:53,205 INFO [train.py:886] (0/4) Epoch 25, batch 4700, loss[loss=0.01222, audio_tagging_loss=0.01222, over 22147.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4952382.26 frames. ], batch size: 107, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:36:13,685 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2023-12-22 21:36:25,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=794093.3333333334, ans=0.125 2023-12-22 21:36:26,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=794093.3333333334, ans=0.2 2023-12-22 21:36:30,080 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2023-12-22 21:36:34,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=794160.0, ans=0.0 2023-12-22 21:36:40,989 INFO [train.py:886] (0/4) Epoch 25, batch 4750, loss[loss=0.01303, audio_tagging_loss=0.01303, over 24750.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4946038.04 frames. ], batch size: 99, lr: 4.28e-03, grad_scale: 64.0 2023-12-22 21:36:46,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=794226.6666666666, ans=0.125 2023-12-22 21:36:56,206 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-25.pt 2023-12-22 21:37:15,784 INFO [train.py:886] (0/4) Epoch 26, batch 0, loss[loss=0.03058, audio_tagging_loss=0.03058, over 23990.00 frames. ], tot_loss[loss=0.03058, audio_tagging_loss=0.03058, over 23990.00 frames. ], batch size: 100, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:37:15,785 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 21:37:37,119 INFO [train.py:917] (0/4) Epoch 26, validation: loss=0.03272, audio_tagging_loss=0.03272, over 3737520.00 frames. 2023-12-22 21:37:37,120 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 21:37:38,390 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.72 vs. limit=15.0 2023-12-22 21:37:39,327 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.90 vs. limit=15.0 2023-12-22 21:37:41,534 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.775e+01 3.157e+01 3.286e+01 3.436e+01 9.011e+01, threshold=6.571e+01, percent-clipped=3.0 2023-12-22 21:37:44,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=794333.3333333334, ans=0.125 2023-12-22 21:37:48,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=794400.0, ans=0.2 2023-12-22 21:37:53,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=794400.0, ans=0.0 2023-12-22 21:37:56,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=794400.0, ans=0.1 2023-12-22 21:38:00,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.71 vs. limit=10.0 2023-12-22 21:38:18,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=794600.0, ans=0.0 2023-12-22 21:38:19,873 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.94 vs. limit=10.0 2023-12-22 21:38:28,765 INFO [train.py:886] (0/4) Epoch 26, batch 50, loss[loss=0.01721, audio_tagging_loss=0.01721, over 25000.00 frames. ], tot_loss[loss=0.0206, audio_tagging_loss=0.0206, over 1120839.48 frames. ], batch size: 100, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:38:29,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=794666.6666666666, ans=0.125 2023-12-22 21:38:34,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=794666.6666666666, ans=0.2 2023-12-22 21:38:35,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794666.6666666666, ans=0.1 2023-12-22 21:38:35,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=794666.6666666666, ans=0.04949747468305833 2023-12-22 21:38:40,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=794733.3333333334, ans=0.0 2023-12-22 21:38:43,358 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.19 vs. limit=22.5 2023-12-22 21:39:02,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=794866.6666666666, ans=0.2 2023-12-22 21:39:12,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=794933.3333333334, ans=0.125 2023-12-22 21:39:15,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=794933.3333333334, ans=0.0 2023-12-22 21:39:17,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=794933.3333333334, ans=0.125 2023-12-22 21:39:20,282 INFO [train.py:886] (0/4) Epoch 26, batch 100, loss[loss=0.01672, audio_tagging_loss=0.01672, over 25000.00 frames. ], tot_loss[loss=0.01776, audio_tagging_loss=0.01776, over 1977229.39 frames. ], batch size: 100, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:39:24,049 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.050e+01 3.579e+01 3.859e+01 4.416e+01 7.347e+01, threshold=7.717e+01, percent-clipped=4.0 2023-12-22 21:39:36,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=795066.6666666666, ans=0.95 2023-12-22 21:39:39,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=795066.6666666666, ans=0.2 2023-12-22 21:39:44,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=795133.3333333334, ans=0.125 2023-12-22 21:39:59,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=795200.0, ans=0.0 2023-12-22 21:40:11,822 INFO [train.py:886] (0/4) Epoch 26, batch 150, loss[loss=0.01449, audio_tagging_loss=0.01449, over 25000.00 frames. ], tot_loss[loss=0.01628, audio_tagging_loss=0.01628, over 2641534.14 frames. ], batch size: 100, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:40:33,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=795466.6666666666, ans=0.125 2023-12-22 21:40:43,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=795533.3333333334, ans=0.1 2023-12-22 21:40:54,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=795600.0, ans=0.0 2023-12-22 21:40:55,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=795600.0, ans=0.125 2023-12-22 21:40:57,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=795600.0, ans=0.125 2023-12-22 21:41:02,953 INFO [train.py:886] (0/4) Epoch 26, batch 200, loss[loss=0.01253, audio_tagging_loss=0.01253, over 25000.00 frames. ], tot_loss[loss=0.01531, audio_tagging_loss=0.01531, over 3159323.94 frames. ], batch size: 100, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:41:04,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=795666.6666666666, ans=0.125 2023-12-22 21:41:07,377 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.808e+01 3.182e+01 3.315e+01 3.522e+01 3.900e+01, threshold=6.631e+01, percent-clipped=0.0 2023-12-22 21:41:10,624 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=12.0 2023-12-22 21:41:17,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.73 vs. limit=15.0 2023-12-22 21:41:18,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=795733.3333333334, ans=0.125 2023-12-22 21:41:19,124 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.65 vs. limit=15.0 2023-12-22 21:41:32,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=795800.0, ans=0.125 2023-12-22 21:41:34,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=795866.6666666666, ans=0.015 2023-12-22 21:41:38,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=795866.6666666666, ans=0.0 2023-12-22 21:41:55,396 INFO [train.py:886] (0/4) Epoch 26, batch 250, loss[loss=0.01156, audio_tagging_loss=0.01156, over 22394.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 3557810.20 frames. ], batch size: 107, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:42:12,458 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2023-12-22 21:42:27,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796200.0, ans=0.1 2023-12-22 21:42:27,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=796200.0, ans=0.125 2023-12-22 21:42:45,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=796266.6666666666, ans=0.125 2023-12-22 21:42:47,248 INFO [train.py:886] (0/4) Epoch 26, batch 300, loss[loss=0.01383, audio_tagging_loss=0.01383, over 24750.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 3863041.79 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:42:50,984 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.813e+01 3.083e+01 3.251e+01 3.397e+01 4.143e+01, threshold=6.503e+01, percent-clipped=0.0 2023-12-22 21:43:27,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=796600.0, ans=0.125 2023-12-22 21:43:30,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=796600.0, ans=0.09899494936611666 2023-12-22 21:43:35,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=796600.0, ans=0.125 2023-12-22 21:43:39,414 INFO [train.py:886] (0/4) Epoch 26, batch 350, loss[loss=0.01237, audio_tagging_loss=0.01237, over 24750.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4103368.97 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:43:43,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=796666.6666666666, ans=0.0 2023-12-22 21:43:50,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=796733.3333333334, ans=0.125 2023-12-22 21:44:04,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=796800.0, ans=0.0 2023-12-22 21:44:11,580 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.64 vs. limit=22.5 2023-12-22 21:44:31,552 INFO [train.py:886] (0/4) Epoch 26, batch 400, loss[loss=0.01287, audio_tagging_loss=0.01287, over 24750.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4291602.90 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:44:33,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=797000.0, ans=0.125 2023-12-22 21:44:36,033 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.737e+01 3.039e+01 3.196e+01 3.377e+01 3.798e+01, threshold=6.392e+01, percent-clipped=0.0 2023-12-22 21:44:46,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=797066.6666666666, ans=0.125 2023-12-22 21:44:49,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.15 vs. limit=15.0 2023-12-22 21:44:51,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=797133.3333333334, ans=0.1 2023-12-22 21:45:01,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=797200.0, ans=0.125 2023-12-22 21:45:14,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=797266.6666666666, ans=0.0 2023-12-22 21:45:21,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=797266.6666666666, ans=0.125 2023-12-22 21:45:22,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=797333.3333333334, ans=0.0 2023-12-22 21:45:23,763 INFO [train.py:886] (0/4) Epoch 26, batch 450, loss[loss=0.0124, audio_tagging_loss=0.0124, over 25000.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4437684.48 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:45:39,010 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2023-12-22 21:46:00,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2023-12-22 21:46:04,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=797600.0, ans=0.0 2023-12-22 21:46:14,592 INFO [train.py:886] (0/4) Epoch 26, batch 500, loss[loss=0.01332, audio_tagging_loss=0.01332, over 25000.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4556911.68 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:46:16,051 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.38 vs. limit=22.5 2023-12-22 21:46:19,022 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.814e+01 3.074e+01 3.193e+01 3.348e+01 4.490e+01, threshold=6.386e+01, percent-clipped=0.0 2023-12-22 21:46:32,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.71 vs. limit=8.0 2023-12-22 21:46:34,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=797800.0, ans=0.125 2023-12-22 21:46:36,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=797800.0, ans=0.0 2023-12-22 21:46:45,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=797866.6666666666, ans=0.125 2023-12-22 21:47:05,453 INFO [train.py:886] (0/4) Epoch 26, batch 550, loss[loss=0.01635, audio_tagging_loss=0.01635, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4646665.77 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:47:17,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-12-22 21:47:34,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=12.0 2023-12-22 21:47:57,528 INFO [train.py:886] (0/4) Epoch 26, batch 600, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4718133.37 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:48:01,263 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.753e+01 3.107e+01 3.234e+01 3.355e+01 4.218e+01, threshold=6.468e+01, percent-clipped=0.0 2023-12-22 21:48:14,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=798400.0, ans=0.125 2023-12-22 21:48:31,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2023-12-22 21:48:48,526 INFO [train.py:886] (0/4) Epoch 26, batch 650, loss[loss=0.01527, audio_tagging_loss=0.01527, over 24750.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4763187.86 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:48:50,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=22.5 2023-12-22 21:48:59,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=798733.3333333334, ans=0.1 2023-12-22 21:49:14,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.44 vs. limit=15.0 2023-12-22 21:49:40,192 INFO [train.py:886] (0/4) Epoch 26, batch 700, loss[loss=0.01242, audio_tagging_loss=0.01242, over 25000.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4805960.03 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:49:43,946 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.720e+01 3.098e+01 3.257e+01 3.432e+01 3.751e+01, threshold=6.513e+01, percent-clipped=0.0 2023-12-22 21:49:47,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=799000.0, ans=0.125 2023-12-22 21:49:56,957 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:50:24,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=799266.6666666666, ans=0.07 2023-12-22 21:50:25,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=799266.6666666666, ans=0.0 2023-12-22 21:50:25,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=799266.6666666666, ans=0.125 2023-12-22 21:50:30,730 INFO [train.py:886] (0/4) Epoch 26, batch 750, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4835585.93 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:50:46,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=799400.0, ans=0.0 2023-12-22 21:50:46,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=799400.0, ans=0.125 2023-12-22 21:50:59,125 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.59 vs. limit=12.0 2023-12-22 21:51:09,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=799533.3333333334, ans=0.125 2023-12-22 21:51:11,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=799600.0, ans=0.125 2023-12-22 21:51:21,331 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.78 vs. limit=22.5 2023-12-22 21:51:22,661 INFO [train.py:886] (0/4) Epoch 26, batch 800, loss[loss=0.01103, audio_tagging_loss=0.01103, over 24003.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4860691.28 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:51:26,416 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.722e+01 3.081e+01 3.211e+01 3.380e+01 3.889e+01, threshold=6.421e+01, percent-clipped=0.0 2023-12-22 21:51:44,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=799800.0, ans=0.2 2023-12-22 21:51:45,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=799800.0, ans=0.125 2023-12-22 21:51:59,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=15.0 2023-12-22 21:52:04,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2023-12-22 21:52:10,698 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2023-12-22 21:52:14,985 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-120000.pt 2023-12-22 21:52:17,745 INFO [train.py:886] (0/4) Epoch 26, batch 850, loss[loss=0.01109, audio_tagging_loss=0.01109, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4880881.19 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:52:19,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=800000.0, ans=0.125 2023-12-22 21:52:33,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=800066.6666666666, ans=0.125 2023-12-22 21:52:53,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=800200.0, ans=0.125 2023-12-22 21:53:08,748 INFO [train.py:886] (0/4) Epoch 26, batch 900, loss[loss=0.01512, audio_tagging_loss=0.01512, over 24750.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4896215.87 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:53:13,187 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.742e+01 3.047e+01 3.202e+01 3.314e+01 4.091e+01, threshold=6.405e+01, percent-clipped=0.0 2023-12-22 21:53:45,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=800533.3333333334, ans=0.2 2023-12-22 21:53:45,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=800533.3333333334, ans=0.125 2023-12-22 21:54:01,653 INFO [train.py:886] (0/4) Epoch 26, batch 950, loss[loss=0.01452, audio_tagging_loss=0.01452, over 24750.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4886848.91 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:54:53,024 INFO [train.py:886] (0/4) Epoch 26, batch 1000, loss[loss=0.01038, audio_tagging_loss=0.01038, over 24750.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4903279.24 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:54:56,818 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.794e+01 3.055e+01 3.210e+01 3.341e+01 4.117e+01, threshold=6.420e+01, percent-clipped=0.0 2023-12-22 21:54:59,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=801000.0, ans=0.125 2023-12-22 21:55:18,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-12-22 21:55:24,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=801200.0, ans=0.0 2023-12-22 21:55:27,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=801200.0, ans=0.1 2023-12-22 21:55:27,207 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.99 vs. limit=10.0 2023-12-22 21:55:33,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=801266.6666666666, ans=0.0 2023-12-22 21:55:36,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=801266.6666666666, ans=0.125 2023-12-22 21:55:43,844 INFO [train.py:886] (0/4) Epoch 26, batch 1050, loss[loss=0.01056, audio_tagging_loss=0.01056, over 24035.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4913835.40 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:55:53,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=801333.3333333334, ans=0.125 2023-12-22 21:55:54,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=801400.0, ans=0.125 2023-12-22 21:55:55,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=801400.0, ans=0.1 2023-12-22 21:55:57,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.78 vs. limit=15.0 2023-12-22 21:56:10,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=801466.6666666666, ans=0.125 2023-12-22 21:56:14,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=801533.3333333334, ans=0.125 2023-12-22 21:56:15,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-12-22 21:56:36,432 INFO [train.py:886] (0/4) Epoch 26, batch 1100, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4923026.61 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:56:40,905 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.668e+01 3.073e+01 3.229e+01 3.408e+01 3.887e+01, threshold=6.457e+01, percent-clipped=0.0 2023-12-22 21:56:48,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2023-12-22 21:57:09,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=801866.6666666666, ans=0.07 2023-12-22 21:57:18,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=801933.3333333334, ans=0.1 2023-12-22 21:57:19,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=801933.3333333334, ans=0.125 2023-12-22 21:57:21,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801933.3333333334, ans=0.1 2023-12-22 21:57:21,910 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:57:28,142 INFO [train.py:886] (0/4) Epoch 26, batch 1150, loss[loss=0.01391, audio_tagging_loss=0.01391, over 24750.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4926097.44 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:58:15,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=802266.6666666666, ans=0.1 2023-12-22 21:58:18,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=802266.6666666666, ans=0.1 2023-12-22 21:58:20,319 INFO [train.py:886] (0/4) Epoch 26, batch 1200, loss[loss=0.01541, audio_tagging_loss=0.01541, over 25000.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4934609.07 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:58:24,027 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.663e+01 3.101e+01 3.239e+01 3.403e+01 4.197e+01, threshold=6.477e+01, percent-clipped=0.0 2023-12-22 21:58:32,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=802400.0, ans=0.125 2023-12-22 21:58:42,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=802466.6666666666, ans=0.125 2023-12-22 21:58:57,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.44 vs. limit=12.0 2023-12-22 21:59:11,799 INFO [train.py:886] (0/4) Epoch 26, batch 1250, loss[loss=0.01151, audio_tagging_loss=0.01151, over 24750.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4933638.78 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:59:17,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=802666.6666666666, ans=0.125 2023-12-22 21:59:27,742 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.84 vs. limit=6.0 2023-12-22 21:59:30,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=802733.3333333334, ans=0.125 2023-12-22 22:00:03,372 INFO [train.py:886] (0/4) Epoch 26, batch 1300, loss[loss=0.01222, audio_tagging_loss=0.01222, over 24750.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4936675.22 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 22:00:07,860 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.752e+01 3.138e+01 3.281e+01 3.471e+01 4.385e+01, threshold=6.561e+01, percent-clipped=0.0 2023-12-22 22:00:26,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.83 vs. limit=15.0 2023-12-22 22:00:47,390 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-12-22 22:00:50,888 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.35 vs. limit=22.5 2023-12-22 22:00:55,949 INFO [train.py:886] (0/4) Epoch 26, batch 1350, loss[loss=0.01446, audio_tagging_loss=0.01446, over 25000.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4941580.48 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 22:01:07,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=803400.0, ans=0.0 2023-12-22 22:01:15,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=803400.0, ans=0.0 2023-12-22 22:01:17,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=803466.6666666666, ans=0.125 2023-12-22 22:01:17,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=803466.6666666666, ans=0.1 2023-12-22 22:01:39,268 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:01:41,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=803600.0, ans=0.2 2023-12-22 22:01:46,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=803600.0, ans=0.125 2023-12-22 22:01:48,082 INFO [train.py:886] (0/4) Epoch 26, batch 1400, loss[loss=0.01272, audio_tagging_loss=0.01272, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4937178.08 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 22:01:51,855 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+01 3.025e+01 3.160e+01 3.304e+01 3.712e+01, threshold=6.320e+01, percent-clipped=0.0 2023-12-22 22:02:13,435 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.82 vs. limit=22.5 2023-12-22 22:02:22,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=12.0 2023-12-22 22:02:23,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=803866.6666666666, ans=0.125 2023-12-22 22:02:31,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=803933.3333333334, ans=0.1 2023-12-22 22:02:32,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=803933.3333333334, ans=0.125 2023-12-22 22:02:32,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=803933.3333333334, ans=0.0 2023-12-22 22:02:36,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.03 vs. limit=22.5 2023-12-22 22:02:37,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=803933.3333333334, ans=0.0 2023-12-22 22:02:38,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=804000.0, ans=0.125 2023-12-22 22:02:39,168 INFO [train.py:886] (0/4) Epoch 26, batch 1450, loss[loss=0.01078, audio_tagging_loss=0.01078, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4941694.86 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:02:41,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804000.0, ans=0.1 2023-12-22 22:03:23,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804266.6666666666, ans=0.1 2023-12-22 22:03:31,158 INFO [train.py:886] (0/4) Epoch 26, batch 1500, loss[loss=0.01234, audio_tagging_loss=0.01234, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4946455.52 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:03:32,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=804333.3333333334, ans=0.125 2023-12-22 22:03:35,694 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.803e+01 3.046e+01 3.201e+01 3.307e+01 3.722e+01, threshold=6.402e+01, percent-clipped=0.0 2023-12-22 22:03:47,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=804400.0, ans=0.125 2023-12-22 22:03:49,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804400.0, ans=0.1 2023-12-22 22:03:58,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=804466.6666666666, ans=0.125 2023-12-22 22:04:17,625 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:04:19,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=804600.0, ans=0.125 2023-12-22 22:04:23,289 INFO [train.py:886] (0/4) Epoch 26, batch 1550, loss[loss=0.01409, audio_tagging_loss=0.01409, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4944670.44 frames. ], batch size: 99, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:04:25,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.47 vs. limit=10.0 2023-12-22 22:04:29,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=804666.6666666666, ans=0.125 2023-12-22 22:04:40,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=804733.3333333334, ans=0.0 2023-12-22 22:04:46,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=804800.0, ans=0.125 2023-12-22 22:05:14,680 INFO [train.py:886] (0/4) Epoch 26, batch 1600, loss[loss=0.01391, audio_tagging_loss=0.01391, over 24750.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4943504.21 frames. ], batch size: 99, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:05:17,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2023-12-22 22:05:18,338 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.820e+01 3.137e+01 3.270e+01 3.392e+01 3.802e+01, threshold=6.540e+01, percent-clipped=0.0 2023-12-22 22:05:40,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=805133.3333333334, ans=0.1 2023-12-22 22:05:43,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=805133.3333333334, ans=0.125 2023-12-22 22:05:46,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=805200.0, ans=0.125 2023-12-22 22:06:07,025 INFO [train.py:886] (0/4) Epoch 26, batch 1650, loss[loss=0.01202, audio_tagging_loss=0.01202, over 24750.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4945448.17 frames. ], batch size: 99, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:06:15,873 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.34 vs. limit=6.0 2023-12-22 22:06:26,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.68 vs. limit=10.0 2023-12-22 22:06:54,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=805600.0, ans=0.125 2023-12-22 22:06:57,927 INFO [train.py:886] (0/4) Epoch 26, batch 1700, loss[loss=0.01084, audio_tagging_loss=0.01084, over 24059.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4937912.93 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:07:02,414 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.767e+01 3.061e+01 3.187e+01 3.330e+01 3.966e+01, threshold=6.373e+01, percent-clipped=0.0 2023-12-22 22:07:30,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.51 vs. limit=22.5 2023-12-22 22:07:35,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2023-12-22 22:07:41,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=805933.3333333334, ans=0.125 2023-12-22 22:07:51,031 INFO [train.py:886] (0/4) Epoch 26, batch 1750, loss[loss=0.0106, audio_tagging_loss=0.0106, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4946005.47 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:08:02,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=806066.6666666666, ans=0.0 2023-12-22 22:08:15,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=806133.3333333334, ans=0.125 2023-12-22 22:08:24,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=806200.0, ans=0.0 2023-12-22 22:08:36,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=806266.6666666666, ans=0.125 2023-12-22 22:08:41,207 INFO [train.py:886] (0/4) Epoch 26, batch 1800, loss[loss=0.0111, audio_tagging_loss=0.0111, over 25000.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4946802.38 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:08:45,756 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.749e+01 3.085e+01 3.206e+01 3.379e+01 3.984e+01, threshold=6.413e+01, percent-clipped=0.0 2023-12-22 22:09:00,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=806466.6666666666, ans=0.1 2023-12-22 22:09:33,099 INFO [train.py:886] (0/4) Epoch 26, batch 1850, loss[loss=0.01543, audio_tagging_loss=0.01543, over 24750.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4952344.20 frames. ], batch size: 99, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:09:44,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=806733.3333333334, ans=0.1 2023-12-22 22:09:45,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=806733.3333333334, ans=0.0 2023-12-22 22:09:52,603 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.529e-02 2023-12-22 22:10:06,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=806866.6666666666, ans=0.0 2023-12-22 22:10:10,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2023-12-22 22:10:25,614 INFO [train.py:886] (0/4) Epoch 26, batch 1900, loss[loss=0.01542, audio_tagging_loss=0.01542, over 24750.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4951138.44 frames. ], batch size: 99, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:10:30,065 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.864e+01 3.174e+01 3.329e+01 3.459e+01 4.807e+01, threshold=6.658e+01, percent-clipped=0.0 2023-12-22 22:10:32,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=807000.0, ans=0.0 2023-12-22 22:10:38,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=807066.6666666666, ans=0.0 2023-12-22 22:10:40,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=807066.6666666666, ans=0.125 2023-12-22 22:10:44,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=807133.3333333334, ans=0.125 2023-12-22 22:10:51,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=807133.3333333334, ans=0.1 2023-12-22 22:10:58,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=807200.0, ans=0.125 2023-12-22 22:11:04,572 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2023-12-22 22:11:09,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=15.0 2023-12-22 22:11:16,639 INFO [train.py:886] (0/4) Epoch 26, batch 1950, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4952296.10 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:11:38,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=807466.6666666666, ans=0.125 2023-12-22 22:11:50,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=807533.3333333334, ans=0.04949747468305833 2023-12-22 22:12:07,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-12-22 22:12:10,191 INFO [train.py:886] (0/4) Epoch 26, batch 2000, loss[loss=0.01091, audio_tagging_loss=0.01091, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4948855.25 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 64.0 2023-12-22 22:12:14,053 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.798e+01 3.056e+01 3.196e+01 3.415e+01 4.184e+01, threshold=6.392e+01, percent-clipped=0.0 2023-12-22 22:12:42,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=807866.6666666666, ans=0.125 2023-12-22 22:12:51,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-12-22 22:12:57,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-12-22 22:13:02,234 INFO [train.py:886] (0/4) Epoch 26, batch 2050, loss[loss=0.01282, audio_tagging_loss=0.01282, over 25000.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4946203.21 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:13:06,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=808000.0, ans=0.05 2023-12-22 22:13:13,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=808066.6666666666, ans=0.0 2023-12-22 22:13:27,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=15.0 2023-12-22 22:13:30,068 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:13:31,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2023-12-22 22:13:44,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=808266.6666666666, ans=0.015 2023-12-22 22:13:52,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=808333.3333333334, ans=0.0 2023-12-22 22:13:53,233 INFO [train.py:886] (0/4) Epoch 26, batch 2100, loss[loss=0.01395, audio_tagging_loss=0.01395, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4950339.36 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:13:56,967 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.782e+01 3.109e+01 3.258e+01 3.404e+01 4.082e+01, threshold=6.517e+01, percent-clipped=0.0 2023-12-22 22:13:58,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=808333.3333333334, ans=0.0 2023-12-22 22:13:58,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=808333.3333333334, ans=0.0 2023-12-22 22:14:02,835 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2023-12-22 22:14:16,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=808466.6666666666, ans=0.125 2023-12-22 22:14:30,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=808533.3333333334, ans=0.1 2023-12-22 22:14:41,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2023-12-22 22:14:43,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=808600.0, ans=0.05 2023-12-22 22:14:44,686 INFO [train.py:886] (0/4) Epoch 26, batch 2150, loss[loss=0.01248, audio_tagging_loss=0.01248, over 24750.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4952467.11 frames. ], batch size: 99, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:14:51,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=808666.6666666666, ans=0.1 2023-12-22 22:14:59,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=808733.3333333334, ans=0.125 2023-12-22 22:15:05,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2023-12-22 22:15:32,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=808933.3333333334, ans=10.0 2023-12-22 22:15:34,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=809000.0, ans=0.035 2023-12-22 22:15:36,298 INFO [train.py:886] (0/4) Epoch 26, batch 2200, loss[loss=0.01431, audio_tagging_loss=0.01431, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4950387.18 frames. ], batch size: 99, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:15:40,869 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.712e+01 3.112e+01 3.281e+01 3.454e+01 3.979e+01, threshold=6.563e+01, percent-clipped=0.0 2023-12-22 22:15:41,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=809000.0, ans=0.0 2023-12-22 22:15:48,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=809066.6666666666, ans=10.0 2023-12-22 22:15:59,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=809133.3333333334, ans=0.0 2023-12-22 22:16:15,859 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:16:28,679 INFO [train.py:886] (0/4) Epoch 26, batch 2250, loss[loss=0.01296, audio_tagging_loss=0.01296, over 25000.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4951038.75 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:16:31,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=809333.3333333334, ans=0.0 2023-12-22 22:16:31,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=809333.3333333334, ans=0.0 2023-12-22 22:16:37,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=809400.0, ans=0.125 2023-12-22 22:16:48,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.98 vs. limit=22.5 2023-12-22 22:16:52,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=809466.6666666666, ans=0.0 2023-12-22 22:16:58,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=809466.6666666666, ans=0.125 2023-12-22 22:17:04,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=809533.3333333334, ans=0.125 2023-12-22 22:17:05,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=809533.3333333334, ans=0.1 2023-12-22 22:17:12,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=809600.0, ans=0.125 2023-12-22 22:17:20,278 INFO [train.py:886] (0/4) Epoch 26, batch 2300, loss[loss=0.01303, audio_tagging_loss=0.01303, over 25000.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4948991.10 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:17:24,687 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.740e+01 3.089e+01 3.215e+01 3.416e+01 4.133e+01, threshold=6.430e+01, percent-clipped=0.0 2023-12-22 22:17:39,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=809800.0, ans=0.125 2023-12-22 22:17:41,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=809800.0, ans=0.0 2023-12-22 22:17:47,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=809800.0, ans=0.125 2023-12-22 22:17:59,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=809866.6666666666, ans=0.125 2023-12-22 22:18:07,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=809933.3333333334, ans=0.1 2023-12-22 22:18:11,850 INFO [train.py:886] (0/4) Epoch 26, batch 2350, loss[loss=0.01317, audio_tagging_loss=0.01317, over 25000.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4955340.86 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:19:03,981 INFO [train.py:886] (0/4) Epoch 26, batch 2400, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4957908.99 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:19:07,783 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.847e+01 3.068e+01 3.224e+01 3.358e+01 4.468e+01, threshold=6.448e+01, percent-clipped=0.0 2023-12-22 22:19:36,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=810533.3333333334, ans=0.04949747468305833 2023-12-22 22:19:51,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2023-12-22 22:19:55,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=810666.6666666666, ans=0.0 2023-12-22 22:19:56,220 INFO [train.py:886] (0/4) Epoch 26, batch 2450, loss[loss=0.01218, audio_tagging_loss=0.01218, over 24750.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4961406.73 frames. ], batch size: 99, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:20:01,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=810666.6666666666, ans=0.0 2023-12-22 22:20:03,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.09 vs. limit=15.0 2023-12-22 22:20:24,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.67 vs. limit=15.0 2023-12-22 22:20:35,437 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2023-12-22 22:20:47,798 INFO [train.py:886] (0/4) Epoch 26, batch 2500, loss[loss=0.01296, audio_tagging_loss=0.01296, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4955995.61 frames. ], batch size: 99, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:20:51,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=811000.0, ans=0.125 2023-12-22 22:20:52,293 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.780e+01 3.125e+01 3.301e+01 3.409e+01 3.789e+01, threshold=6.601e+01, percent-clipped=0.0 2023-12-22 22:21:07,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=811133.3333333334, ans=0.0 2023-12-22 22:21:07,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=811133.3333333334, ans=0.2 2023-12-22 22:21:12,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=811133.3333333334, ans=0.2 2023-12-22 22:21:18,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=811200.0, ans=0.0 2023-12-22 22:21:18,750 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=22.5 2023-12-22 22:21:19,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=811200.0, ans=0.2 2023-12-22 22:21:19,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=811200.0, ans=0.0 2023-12-22 22:21:38,990 INFO [train.py:886] (0/4) Epoch 26, batch 2550, loss[loss=0.01444, audio_tagging_loss=0.01444, over 24750.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4951250.81 frames. ], batch size: 99, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:21:42,043 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2023-12-22 22:22:09,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=811533.3333333334, ans=0.125 2023-12-22 22:22:17,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=811533.3333333334, ans=0.2 2023-12-22 22:22:30,087 INFO [train.py:886] (0/4) Epoch 26, batch 2600, loss[loss=0.0143, audio_tagging_loss=0.0143, over 24750.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4950938.89 frames. ], batch size: 99, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:22:35,137 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.819e+01 3.115e+01 3.236e+01 3.408e+01 3.889e+01, threshold=6.471e+01, percent-clipped=0.0 2023-12-22 22:22:46,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=811733.3333333334, ans=0.0 2023-12-22 22:22:49,605 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2023-12-22 22:22:53,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=811800.0, ans=0.1 2023-12-22 22:23:08,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=811866.6666666666, ans=0.125 2023-12-22 22:23:22,389 INFO [train.py:886] (0/4) Epoch 26, batch 2650, loss[loss=0.01228, audio_tagging_loss=0.01228, over 25000.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4954009.19 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:23:31,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=812066.6666666666, ans=0.0 2023-12-22 22:23:32,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=812066.6666666666, ans=0.0 2023-12-22 22:23:33,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=12.0 2023-12-22 22:24:14,621 INFO [train.py:886] (0/4) Epoch 26, batch 2700, loss[loss=0.01274, audio_tagging_loss=0.01274, over 24750.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4943478.10 frames. ], batch size: 99, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:24:15,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=812333.3333333334, ans=0.2 2023-12-22 22:24:18,367 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.764e+01 3.107e+01 3.255e+01 3.402e+01 3.998e+01, threshold=6.509e+01, percent-clipped=0.0 2023-12-22 22:24:18,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=812333.3333333334, ans=0.125 2023-12-22 22:24:29,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=812400.0, ans=0.125 2023-12-22 22:24:30,009 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.49 vs. limit=10.0 2023-12-22 22:24:36,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=812466.6666666666, ans=0.0 2023-12-22 22:24:37,191 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:24:38,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=812466.6666666666, ans=0.1 2023-12-22 22:24:42,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2023-12-22 22:24:50,161 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2023-12-22 22:24:55,380 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:24:57,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=812600.0, ans=0.125 2023-12-22 22:24:59,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=812600.0, ans=0.1 2023-12-22 22:25:03,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=12.0 2023-12-22 22:25:05,571 INFO [train.py:886] (0/4) Epoch 26, batch 2750, loss[loss=0.01087, audio_tagging_loss=0.01087, over 25000.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4943633.60 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:25:22,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-12-22 22:25:37,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=812866.6666666666, ans=0.035 2023-12-22 22:25:55,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2023-12-22 22:25:55,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=812933.3333333334, ans=0.2 2023-12-22 22:25:58,393 INFO [train.py:886] (0/4) Epoch 26, batch 2800, loss[loss=0.01395, audio_tagging_loss=0.01395, over 25000.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4943232.89 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:26:02,083 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.802e+01 3.113e+01 3.268e+01 3.447e+01 3.793e+01, threshold=6.536e+01, percent-clipped=0.0 2023-12-22 22:26:05,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=813000.0, ans=0.2 2023-12-22 22:26:32,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=813200.0, ans=0.2 2023-12-22 22:26:48,958 INFO [train.py:886] (0/4) Epoch 26, batch 2850, loss[loss=0.01339, audio_tagging_loss=0.01339, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4943228.95 frames. ], batch size: 99, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:26:49,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=813333.3333333334, ans=0.125 2023-12-22 22:26:51,651 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=12.0 2023-12-22 22:27:04,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=813400.0, ans=0.0 2023-12-22 22:27:08,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=813400.0, ans=0.04949747468305833 2023-12-22 22:27:11,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=813466.6666666666, ans=0.125 2023-12-22 22:27:12,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=813466.6666666666, ans=0.1 2023-12-22 22:27:16,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=813466.6666666666, ans=0.0 2023-12-22 22:27:40,978 INFO [train.py:886] (0/4) Epoch 26, batch 2900, loss[loss=0.01268, audio_tagging_loss=0.01268, over 24750.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4943619.00 frames. ], batch size: 99, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:27:44,731 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.685e+01 3.068e+01 3.241e+01 3.417e+01 3.879e+01, threshold=6.482e+01, percent-clipped=0.0 2023-12-22 22:28:12,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=813866.6666666666, ans=0.125 2023-12-22 22:28:13,480 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:28:16,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=813866.6666666666, ans=0.0 2023-12-22 22:28:19,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=813866.6666666666, ans=0.0 2023-12-22 22:28:33,261 INFO [train.py:886] (0/4) Epoch 26, batch 2950, loss[loss=0.01164, audio_tagging_loss=0.01164, over 22868.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4941269.59 frames. ], batch size: 107, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:28:43,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=814066.6666666666, ans=0.2 2023-12-22 22:28:52,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=814133.3333333334, ans=0.125 2023-12-22 22:29:01,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814133.3333333334, ans=0.1 2023-12-22 22:29:09,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2023-12-22 22:29:13,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=814266.6666666666, ans=0.0 2023-12-22 22:29:16,873 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.426e-02 2023-12-22 22:29:23,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=814333.3333333334, ans=0.2 2023-12-22 22:29:24,366 INFO [train.py:886] (0/4) Epoch 26, batch 3000, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4945512.15 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:29:24,368 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 22:29:31,354 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5559, 3.9662, 4.0821, 3.5996], device='cuda:0') 2023-12-22 22:29:45,070 INFO [train.py:917] (0/4) Epoch 26, validation: loss=0.03227, audio_tagging_loss=0.03227, over 3737520.00 frames. 2023-12-22 22:29:45,071 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 22:29:48,817 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.801e+01 3.069e+01 3.220e+01 3.381e+01 3.833e+01, threshold=6.439e+01, percent-clipped=0.0 2023-12-22 22:30:36,598 INFO [train.py:886] (0/4) Epoch 26, batch 3050, loss[loss=0.01355, audio_tagging_loss=0.01355, over 24750.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4948468.09 frames. ], batch size: 99, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:31:00,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.24 vs. limit=6.0 2023-12-22 22:31:11,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=814866.6666666666, ans=0.125 2023-12-22 22:31:25,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.56 vs. limit=15.0 2023-12-22 22:31:28,245 INFO [train.py:886] (0/4) Epoch 26, batch 3100, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4946084.10 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:31:32,716 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.774e+01 3.119e+01 3.253e+01 3.438e+01 3.800e+01, threshold=6.506e+01, percent-clipped=0.0 2023-12-22 22:31:33,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=815000.0, ans=0.0 2023-12-22 22:31:33,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=815000.0, ans=0.125 2023-12-22 22:31:35,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=815000.0, ans=0.125 2023-12-22 22:31:46,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=815066.6666666666, ans=0.0 2023-12-22 22:32:09,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=815200.0, ans=0.0 2023-12-22 22:32:13,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.29 vs. limit=15.0 2023-12-22 22:32:21,413 INFO [train.py:886] (0/4) Epoch 26, batch 3150, loss[loss=0.01268, audio_tagging_loss=0.01268, over 24750.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4942507.62 frames. ], batch size: 99, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:32:31,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=815400.0, ans=0.125 2023-12-22 22:32:32,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=815400.0, ans=0.0 2023-12-22 22:32:37,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=815400.0, ans=0.0 2023-12-22 22:32:42,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815466.6666666666, ans=0.1 2023-12-22 22:33:03,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.51 vs. limit=22.5 2023-12-22 22:33:09,305 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.03 vs. limit=15.0 2023-12-22 22:33:13,060 INFO [train.py:886] (0/4) Epoch 26, batch 3200, loss[loss=0.01598, audio_tagging_loss=0.01598, over 24750.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4939378.96 frames. ], batch size: 99, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:33:16,941 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.874e+01 3.137e+01 3.242e+01 3.409e+01 4.105e+01, threshold=6.485e+01, percent-clipped=0.0 2023-12-22 22:33:29,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=815733.3333333334, ans=0.125 2023-12-22 22:33:32,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=815800.0, ans=0.125 2023-12-22 22:33:59,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=815933.3333333334, ans=0.125 2023-12-22 22:34:04,552 INFO [train.py:886] (0/4) Epoch 26, batch 3250, loss[loss=0.01452, audio_tagging_loss=0.01452, over 22250.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4944956.02 frames. ], batch size: 107, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:34:16,632 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:34:19,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.52 vs. limit=6.0 2023-12-22 22:34:44,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=816266.6666666666, ans=0.125 2023-12-22 22:34:48,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=816266.6666666666, ans=0.0 2023-12-22 22:34:55,455 INFO [train.py:886] (0/4) Epoch 26, batch 3300, loss[loss=0.01129, audio_tagging_loss=0.01129, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4948771.24 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:34:56,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=816333.3333333334, ans=0.0 2023-12-22 22:35:00,011 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.721e+01 3.062e+01 3.233e+01 3.383e+01 3.983e+01, threshold=6.466e+01, percent-clipped=0.0 2023-12-22 22:35:01,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=816333.3333333334, ans=0.0 2023-12-22 22:35:05,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=816400.0, ans=0.1 2023-12-22 22:35:15,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=816466.6666666666, ans=0.125 2023-12-22 22:35:24,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=816466.6666666666, ans=0.2 2023-12-22 22:35:27,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=816533.3333333334, ans=0.125 2023-12-22 22:35:47,759 INFO [train.py:886] (0/4) Epoch 26, batch 3350, loss[loss=0.01083, audio_tagging_loss=0.01083, over 25000.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4949709.67 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:36:07,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=816800.0, ans=0.0 2023-12-22 22:36:14,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=816800.0, ans=0.125 2023-12-22 22:36:21,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=816866.6666666666, ans=0.0 2023-12-22 22:36:26,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2023-12-22 22:36:32,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=15.0 2023-12-22 22:36:39,801 INFO [train.py:886] (0/4) Epoch 26, batch 3400, loss[loss=0.0107, audio_tagging_loss=0.0107, over 24750.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4959216.65 frames. ], batch size: 99, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:36:44,330 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.878e+01 3.135e+01 3.242e+01 3.478e+01 3.841e+01, threshold=6.483e+01, percent-clipped=0.0 2023-12-22 22:36:53,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=817066.6666666666, ans=0.0 2023-12-22 22:36:53,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=817066.6666666666, ans=0.0 2023-12-22 22:37:00,914 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:37:18,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.20 vs. limit=10.0 2023-12-22 22:37:25,076 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.73 vs. limit=10.0 2023-12-22 22:37:31,879 INFO [train.py:886] (0/4) Epoch 26, batch 3450, loss[loss=0.01514, audio_tagging_loss=0.01514, over 24946.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4951342.07 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:37:33,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=817333.3333333334, ans=0.125 2023-12-22 22:37:39,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=817333.3333333334, ans=0.07 2023-12-22 22:37:44,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=817400.0, ans=0.0 2023-12-22 22:37:46,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=817400.0, ans=0.2 2023-12-22 22:37:47,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=817400.0, ans=0.0 2023-12-22 22:37:50,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=817400.0, ans=0.2 2023-12-22 22:38:09,473 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2023-12-22 22:38:10,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=817533.3333333334, ans=0.125 2023-12-22 22:38:10,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=817533.3333333334, ans=0.025 2023-12-22 22:38:12,589 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:38:17,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=817600.0, ans=0.125 2023-12-22 22:38:23,646 INFO [train.py:886] (0/4) Epoch 26, batch 3500, loss[loss=0.01447, audio_tagging_loss=0.01447, over 25000.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4945901.48 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:38:23,909 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:38:28,152 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.949e+01 3.137e+01 3.263e+01 3.427e+01 4.088e+01, threshold=6.525e+01, percent-clipped=0.0 2023-12-22 22:38:42,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=817733.3333333334, ans=0.2 2023-12-22 22:39:15,368 INFO [train.py:886] (0/4) Epoch 26, batch 3550, loss[loss=0.01388, audio_tagging_loss=0.01388, over 24750.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4949063.05 frames. ], batch size: 99, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:39:19,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=818000.0, ans=0.125 2023-12-22 22:39:36,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=818133.3333333334, ans=0.125 2023-12-22 22:40:08,438 INFO [train.py:886] (0/4) Epoch 26, batch 3600, loss[loss=0.015, audio_tagging_loss=0.015, over 25000.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4951043.96 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:40:12,303 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.685e+01 3.075e+01 3.238e+01 3.372e+01 3.764e+01, threshold=6.477e+01, percent-clipped=0.0 2023-12-22 22:40:21,831 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:40:23,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=818400.0, ans=0.125 2023-12-22 22:40:30,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=818466.6666666666, ans=0.05 2023-12-22 22:40:30,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=818466.6666666666, ans=0.125 2023-12-22 22:40:37,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=818466.6666666666, ans=0.09899494936611666 2023-12-22 22:40:38,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=818533.3333333334, ans=0.0 2023-12-22 22:40:43,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=818533.3333333334, ans=0.09899494936611666 2023-12-22 22:40:51,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=818600.0, ans=0.2 2023-12-22 22:41:00,184 INFO [train.py:886] (0/4) Epoch 26, batch 3650, loss[loss=0.01374, audio_tagging_loss=0.01374, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4952300.83 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:41:21,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.49 vs. limit=8.0 2023-12-22 22:41:29,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=818800.0, ans=0.1 2023-12-22 22:41:34,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=818866.6666666666, ans=0.125 2023-12-22 22:41:34,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=818866.6666666666, ans=0.1 2023-12-22 22:41:45,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=12.0 2023-12-22 22:41:52,788 INFO [train.py:886] (0/4) Epoch 26, batch 3700, loss[loss=0.01218, audio_tagging_loss=0.01218, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4953378.71 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:41:56,505 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.693e+01 3.107e+01 3.222e+01 3.395e+01 4.051e+01, threshold=6.444e+01, percent-clipped=0.0 2023-12-22 22:41:57,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=819000.0, ans=0.1 2023-12-22 22:42:00,363 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.64 vs. limit=6.0 2023-12-22 22:42:06,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=819066.6666666666, ans=0.2 2023-12-22 22:42:27,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.31 vs. limit=15.0 2023-12-22 22:42:33,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=819266.6666666666, ans=0.1 2023-12-22 22:42:35,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=819266.6666666666, ans=0.0 2023-12-22 22:42:37,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=819266.6666666666, ans=0.125 2023-12-22 22:42:43,637 INFO [train.py:886] (0/4) Epoch 26, batch 3750, loss[loss=0.01319, audio_tagging_loss=0.01319, over 24750.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4950559.55 frames. ], batch size: 99, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:42:44,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=819333.3333333334, ans=0.125 2023-12-22 22:42:50,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=12.0 2023-12-22 22:42:50,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2023-12-22 22:43:17,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=819533.3333333334, ans=0.125 2023-12-22 22:43:17,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=819533.3333333334, ans=0.125 2023-12-22 22:43:35,750 INFO [train.py:886] (0/4) Epoch 26, batch 3800, loss[loss=0.01406, audio_tagging_loss=0.01406, over 24750.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4943149.86 frames. ], batch size: 99, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:43:38,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2023-12-22 22:43:39,545 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.766e+01 3.124e+01 3.288e+01 3.411e+01 4.142e+01, threshold=6.577e+01, percent-clipped=0.0 2023-12-22 22:43:44,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.31 vs. limit=10.0 2023-12-22 22:43:50,076 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.58 vs. limit=6.0 2023-12-22 22:43:59,154 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2023-12-22 22:44:14,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=819866.6666666666, ans=0.125 2023-12-22 22:44:24,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=819933.3333333334, ans=0.2 2023-12-22 22:44:28,474 INFO [train.py:886] (0/4) Epoch 26, batch 3850, loss[loss=0.01684, audio_tagging_loss=0.01684, over 24926.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4935025.42 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:44:31,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=820000.0, ans=0.125 2023-12-22 22:44:36,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=820000.0, ans=0.0 2023-12-22 22:44:42,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=820066.6666666666, ans=0.125 2023-12-22 22:44:48,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.15 vs. limit=15.0 2023-12-22 22:45:18,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=820333.3333333334, ans=0.1 2023-12-22 22:45:19,631 INFO [train.py:886] (0/4) Epoch 26, batch 3900, loss[loss=0.01116, audio_tagging_loss=0.01116, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4939993.11 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:45:23,391 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.803e+01 3.084e+01 3.268e+01 3.397e+01 4.255e+01, threshold=6.537e+01, percent-clipped=0.0 2023-12-22 22:45:34,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=820400.0, ans=0.1 2023-12-22 22:45:46,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=820466.6666666666, ans=0.0 2023-12-22 22:45:51,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=820533.3333333334, ans=0.1 2023-12-22 22:45:53,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=820533.3333333334, ans=0.0 2023-12-22 22:46:11,691 INFO [train.py:886] (0/4) Epoch 26, batch 3950, loss[loss=0.01158, audio_tagging_loss=0.01158, over 25000.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4944740.61 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:46:14,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.26 vs. limit=10.0 2023-12-22 22:46:33,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=820800.0, ans=0.0 2023-12-22 22:46:37,354 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:47:03,347 INFO [train.py:886] (0/4) Epoch 26, batch 4000, loss[loss=0.01329, audio_tagging_loss=0.01329, over 24750.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4948672.19 frames. ], batch size: 99, lr: 4.13e-03, grad_scale: 128.0 2023-12-22 22:47:04,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=22.5 2023-12-22 22:47:07,817 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.820e+01 3.131e+01 3.262e+01 3.397e+01 4.571e+01, threshold=6.525e+01, percent-clipped=0.0 2023-12-22 22:47:10,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=821000.0, ans=0.125 2023-12-22 22:47:29,197 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:47:30,473 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2023-12-22 22:47:33,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=821200.0, ans=0.125 2023-12-22 22:47:38,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=821200.0, ans=0.1 2023-12-22 22:47:38,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=821200.0, ans=0.0 2023-12-22 22:47:44,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=821266.6666666666, ans=0.2 2023-12-22 22:47:44,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=12.0 2023-12-22 22:47:47,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=821266.6666666666, ans=0.125 2023-12-22 22:47:51,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-12-22 22:47:55,611 INFO [train.py:886] (0/4) Epoch 26, batch 4050, loss[loss=0.01381, audio_tagging_loss=0.01381, over 21237.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4946769.32 frames. ], batch size: 107, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:47:57,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=821333.3333333334, ans=0.125 2023-12-22 22:48:18,385 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.25 vs. limit=10.0 2023-12-22 22:48:19,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=821466.6666666666, ans=0.1 2023-12-22 22:48:31,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=821533.3333333334, ans=0.125 2023-12-22 22:48:47,404 INFO [train.py:886] (0/4) Epoch 26, batch 4100, loss[loss=0.011, audio_tagging_loss=0.011, over 24750.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4944711.71 frames. ], batch size: 99, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:48:48,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=821666.6666666666, ans=0.125 2023-12-22 22:48:52,856 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+01 3.137e+01 3.284e+01 3.434e+01 4.127e+01, threshold=6.569e+01, percent-clipped=0.0 2023-12-22 22:49:06,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=821733.3333333334, ans=0.125 2023-12-22 22:49:30,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=821933.3333333334, ans=0.125 2023-12-22 22:49:35,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=821933.3333333334, ans=0.125 2023-12-22 22:49:38,963 INFO [train.py:886] (0/4) Epoch 26, batch 4150, loss[loss=0.01056, audio_tagging_loss=0.01056, over 24750.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4936681.48 frames. ], batch size: 99, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:49:49,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=822066.6666666666, ans=0.125 2023-12-22 22:49:51,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=822066.6666666666, ans=0.2 2023-12-22 22:49:55,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=822066.6666666666, ans=0.2 2023-12-22 22:50:10,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=822200.0, ans=0.125 2023-12-22 22:50:31,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=822333.3333333334, ans=0.0 2023-12-22 22:50:32,180 INFO [train.py:886] (0/4) Epoch 26, batch 4200, loss[loss=0.01086, audio_tagging_loss=0.01086, over 25000.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4942326.54 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:50:33,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=822333.3333333334, ans=0.125 2023-12-22 22:50:37,014 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.765e+01 3.069e+01 3.218e+01 3.405e+01 4.073e+01, threshold=6.437e+01, percent-clipped=0.0 2023-12-22 22:50:49,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=822400.0, ans=0.0 2023-12-22 22:51:07,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=822533.3333333334, ans=0.0 2023-12-22 22:51:08,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=822533.3333333334, ans=0.0 2023-12-22 22:51:15,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=822600.0, ans=0.0 2023-12-22 22:51:19,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=822600.0, ans=0.125 2023-12-22 22:51:23,433 INFO [train.py:886] (0/4) Epoch 26, batch 4250, loss[loss=0.01405, audio_tagging_loss=0.01405, over 24750.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4947861.49 frames. ], batch size: 99, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:51:29,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.29 vs. limit=15.0 2023-12-22 22:51:43,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=822800.0, ans=0.2 2023-12-22 22:52:03,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822866.6666666666, ans=0.1 2023-12-22 22:52:05,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=822933.3333333334, ans=0.0 2023-12-22 22:52:14,948 INFO [train.py:886] (0/4) Epoch 26, batch 4300, loss[loss=0.0144, audio_tagging_loss=0.0144, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4951927.38 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:52:15,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=823000.0, ans=0.125 2023-12-22 22:52:19,687 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.718e+01 3.099e+01 3.212e+01 3.358e+01 3.889e+01, threshold=6.424e+01, percent-clipped=0.0 2023-12-22 22:52:33,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=823066.6666666666, ans=0.125 2023-12-22 22:52:42,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=823133.3333333334, ans=0.0 2023-12-22 22:52:42,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=823133.3333333334, ans=0.125 2023-12-22 22:52:49,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=823200.0, ans=0.0 2023-12-22 22:52:51,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.56 vs. limit=22.5 2023-12-22 22:52:52,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=823200.0, ans=0.0 2023-12-22 22:52:53,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2023-12-22 22:53:06,681 INFO [train.py:886] (0/4) Epoch 26, batch 4350, loss[loss=0.01466, audio_tagging_loss=0.01466, over 24942.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4960564.73 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:53:09,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=823333.3333333334, ans=0.0 2023-12-22 22:53:22,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=823400.0, ans=0.1 2023-12-22 22:53:33,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=823466.6666666666, ans=0.125 2023-12-22 22:53:35,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=823466.6666666666, ans=0.95 2023-12-22 22:53:43,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=823533.3333333334, ans=0.2 2023-12-22 22:53:50,761 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.31 vs. limit=22.5 2023-12-22 22:53:51,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=823600.0, ans=0.2 2023-12-22 22:53:55,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=823600.0, ans=0.1 2023-12-22 22:53:59,342 INFO [train.py:886] (0/4) Epoch 26, batch 4400, loss[loss=0.01216, audio_tagging_loss=0.01216, over 24005.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4953370.48 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:54:00,647 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=22.5 2023-12-22 22:54:04,076 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.798e+01 3.224e+01 3.341e+01 3.511e+01 3.923e+01, threshold=6.682e+01, percent-clipped=0.0 2023-12-22 22:54:11,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=823733.3333333334, ans=0.125 2023-12-22 22:54:22,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=22.5 2023-12-22 22:54:38,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=823866.6666666666, ans=0.5 2023-12-22 22:54:43,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=823933.3333333334, ans=0.025 2023-12-22 22:54:51,670 INFO [train.py:886] (0/4) Epoch 26, batch 4450, loss[loss=0.01238, audio_tagging_loss=0.01238, over 24750.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4953560.06 frames. ], batch size: 99, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:55:01,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2023-12-22 22:55:08,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=824066.6666666666, ans=0.125 2023-12-22 22:55:16,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=824133.3333333334, ans=0.0 2023-12-22 22:55:23,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=824200.0, ans=0.125 2023-12-22 22:55:40,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=824266.6666666666, ans=0.0 2023-12-22 22:55:43,461 INFO [train.py:886] (0/4) Epoch 26, batch 4500, loss[loss=0.01316, audio_tagging_loss=0.01316, over 25000.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4948134.93 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:55:48,314 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.836e+01 3.106e+01 3.253e+01 3.392e+01 3.736e+01, threshold=6.505e+01, percent-clipped=0.0 2023-12-22 22:56:11,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=824466.6666666666, ans=0.125 2023-12-22 22:56:32,921 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.70 vs. limit=15.0 2023-12-22 22:56:35,054 INFO [train.py:886] (0/4) Epoch 26, batch 4550, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4946253.59 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:56:35,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.81 vs. limit=22.5 2023-12-22 22:56:41,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=824666.6666666666, ans=0.125 2023-12-22 22:56:50,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=824733.3333333334, ans=10.0 2023-12-22 22:56:51,837 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:57:00,225 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:57:15,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=824933.3333333334, ans=0.125 2023-12-22 22:57:23,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=824933.3333333334, ans=0.02 2023-12-22 22:57:26,762 INFO [train.py:886] (0/4) Epoch 26, batch 4600, loss[loss=0.01274, audio_tagging_loss=0.01274, over 25000.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4955194.45 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:57:32,095 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.753e+01 3.119e+01 3.242e+01 3.395e+01 3.973e+01, threshold=6.484e+01, percent-clipped=0.0 2023-12-22 22:57:55,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=825133.3333333334, ans=0.07 2023-12-22 22:58:01,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=825200.0, ans=0.125 2023-12-22 22:58:19,446 INFO [train.py:886] (0/4) Epoch 26, batch 4650, loss[loss=0.01201, audio_tagging_loss=0.01201, over 25000.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4954004.91 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:58:38,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=825466.6666666666, ans=0.125 2023-12-22 22:58:44,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-12-22 22:58:51,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=825533.3333333334, ans=0.125 2023-12-22 22:59:09,846 INFO [train.py:886] (0/4) Epoch 26, batch 4700, loss[loss=0.01526, audio_tagging_loss=0.01526, over 24943.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4946194.47 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:59:14,957 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.846e+01 3.179e+01 3.317e+01 3.439e+01 3.833e+01, threshold=6.634e+01, percent-clipped=0.0 2023-12-22 22:59:19,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=825733.3333333334, ans=0.125 2023-12-22 22:59:26,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=825733.3333333334, ans=0.125 2023-12-22 22:59:42,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=825866.6666666666, ans=0.125 2023-12-22 22:59:55,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=825933.3333333334, ans=0.2 2023-12-22 22:59:57,201 INFO [train.py:886] (0/4) Epoch 26, batch 4750, loss[loss=0.01047, audio_tagging_loss=0.01047, over 24750.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4947490.38 frames. ], batch size: 99, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 23:00:01,353 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.45 vs. limit=22.5 2023-12-22 23:00:12,691 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-26.pt 2023-12-22 23:00:32,997 INFO [train.py:886] (0/4) Epoch 27, batch 0, loss[loss=0.02715, audio_tagging_loss=0.02715, over 25000.00 frames. ], tot_loss[loss=0.02715, audio_tagging_loss=0.02715, over 25000.00 frames. ], batch size: 100, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:00:32,998 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 23:00:46,039 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3318, 4.5771, 5.2192, 4.7638], device='cuda:0') 2023-12-22 23:00:51,868 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5567, 4.0587, 4.1100, 3.5450], device='cuda:0') 2023-12-22 23:00:53,967 INFO [train.py:917] (0/4) Epoch 27, validation: loss=0.03314, audio_tagging_loss=0.03314, over 3737520.00 frames. 2023-12-22 23:00:53,968 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 23:00:55,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=826106.6666666666, ans=0.125 2023-12-22 23:00:57,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.92 vs. limit=15.0 2023-12-22 23:01:21,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=826240.0, ans=0.125 2023-12-22 23:01:22,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=826240.0, ans=0.125 2023-12-22 23:01:22,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.80 vs. limit=15.0 2023-12-22 23:01:25,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=826306.6666666666, ans=0.0 2023-12-22 23:01:30,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=826306.6666666666, ans=0.125 2023-12-22 23:01:32,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=826306.6666666666, ans=0.0 2023-12-22 23:01:35,333 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.888e+01 3.296e+01 3.604e+01 4.648e+01 9.057e+01, threshold=7.208e+01, percent-clipped=9.0 2023-12-22 23:01:44,733 INFO [train.py:886] (0/4) Epoch 27, batch 50, loss[loss=0.0176, audio_tagging_loss=0.0176, over 25000.00 frames. ], tot_loss[loss=0.02055, audio_tagging_loss=0.02055, over 1120190.85 frames. ], batch size: 100, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:01:49,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=826440.0, ans=0.125 2023-12-22 23:01:50,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=826440.0, ans=0.125 2023-12-22 23:01:52,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=826440.0, ans=0.2 2023-12-22 23:01:56,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2023-12-22 23:02:01,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=826506.6666666666, ans=0.1 2023-12-22 23:02:11,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=826573.3333333334, ans=0.2 2023-12-22 23:02:17,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=826640.0, ans=0.0 2023-12-22 23:02:18,687 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-124000.pt 2023-12-22 23:02:39,486 INFO [train.py:886] (0/4) Epoch 27, batch 100, loss[loss=0.01367, audio_tagging_loss=0.01367, over 25000.00 frames. ], tot_loss[loss=0.0179, audio_tagging_loss=0.0179, over 1967904.16 frames. ], batch size: 100, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:02:43,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=826773.3333333334, ans=0.0 2023-12-22 23:02:43,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=826773.3333333334, ans=0.125 2023-12-22 23:02:44,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=826773.3333333334, ans=0.125 2023-12-22 23:02:50,105 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:02:54,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=826840.0, ans=0.02 2023-12-22 23:02:55,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=826840.0, ans=0.0 2023-12-22 23:03:09,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=826973.3333333334, ans=0.1 2023-12-22 23:03:19,780 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.044e+01 3.375e+01 3.585e+01 3.841e+01 4.539e+01, threshold=7.169e+01, percent-clipped=0.0 2023-12-22 23:03:27,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=827040.0, ans=0.0 2023-12-22 23:03:29,227 INFO [train.py:886] (0/4) Epoch 27, batch 150, loss[loss=0.01058, audio_tagging_loss=0.01058, over 25000.00 frames. ], tot_loss[loss=0.01614, audio_tagging_loss=0.01614, over 2633039.80 frames. ], batch size: 100, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:03:29,704 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.82 vs. limit=12.0 2023-12-22 23:03:53,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=827240.0, ans=0.1 2023-12-22 23:04:06,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=827306.6666666666, ans=0.125 2023-12-22 23:04:08,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=827306.6666666666, ans=0.125 2023-12-22 23:04:11,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=827373.3333333334, ans=0.0 2023-12-22 23:04:21,173 INFO [train.py:886] (0/4) Epoch 27, batch 200, loss[loss=0.0128, audio_tagging_loss=0.0128, over 25000.00 frames. ], tot_loss[loss=0.0151, audio_tagging_loss=0.0151, over 3153457.46 frames. ], batch size: 100, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:04:32,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=827506.6666666666, ans=0.0 2023-12-22 23:04:34,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=827506.6666666666, ans=0.1 2023-12-22 23:04:38,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=827506.6666666666, ans=0.04949747468305833 2023-12-22 23:04:54,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=827640.0, ans=0.2 2023-12-22 23:05:02,028 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.845e+01 3.150e+01 3.268e+01 3.453e+01 3.797e+01, threshold=6.536e+01, percent-clipped=0.0 2023-12-22 23:05:07,212 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.12 vs. limit=6.0 2023-12-22 23:05:12,174 INFO [train.py:886] (0/4) Epoch 27, batch 250, loss[loss=0.01518, audio_tagging_loss=0.01518, over 25000.00 frames. ], tot_loss[loss=0.01458, audio_tagging_loss=0.01458, over 3555966.41 frames. ], batch size: 100, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:05:42,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.36 vs. limit=6.0 2023-12-22 23:05:50,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=827973.3333333334, ans=0.125 2023-12-22 23:05:51,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=827973.3333333334, ans=0.0 2023-12-22 23:05:54,409 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:05:54,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=828040.0, ans=0.0 2023-12-22 23:06:04,766 INFO [train.py:886] (0/4) Epoch 27, batch 300, loss[loss=0.01327, audio_tagging_loss=0.01327, over 24750.00 frames. ], tot_loss[loss=0.01424, audio_tagging_loss=0.01424, over 3861690.37 frames. ], batch size: 99, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:06:28,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=828240.0, ans=0.0 2023-12-22 23:06:41,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=828306.6666666666, ans=0.2 2023-12-22 23:06:43,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=828306.6666666666, ans=0.05 2023-12-22 23:06:45,460 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.889e+01 3.171e+01 3.297e+01 3.491e+01 3.918e+01, threshold=6.593e+01, percent-clipped=0.0 2023-12-22 23:06:52,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=828373.3333333334, ans=0.07 2023-12-22 23:06:55,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=828373.3333333334, ans=0.1 2023-12-22 23:06:57,084 INFO [train.py:886] (0/4) Epoch 27, batch 350, loss[loss=0.01503, audio_tagging_loss=0.01503, over 25000.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4097252.71 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:07:05,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=828506.6666666666, ans=0.0 2023-12-22 23:07:09,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=828506.6666666666, ans=0.0 2023-12-22 23:07:13,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=828506.6666666666, ans=0.0 2023-12-22 23:07:22,201 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:07:25,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=828573.3333333334, ans=0.1 2023-12-22 23:07:33,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=828640.0, ans=0.05 2023-12-22 23:07:47,945 INFO [train.py:886] (0/4) Epoch 27, batch 400, loss[loss=0.01318, audio_tagging_loss=0.01318, over 25000.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4284720.68 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:07:53,017 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.96 vs. limit=22.5 2023-12-22 23:07:54,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=828773.3333333334, ans=0.05 2023-12-22 23:07:54,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=828773.3333333334, ans=0.0 2023-12-22 23:08:14,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.24 vs. limit=22.5 2023-12-22 23:08:30,058 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.745e+01 3.115e+01 3.244e+01 3.373e+01 3.819e+01, threshold=6.489e+01, percent-clipped=0.0 2023-12-22 23:08:40,200 INFO [train.py:886] (0/4) Epoch 27, batch 450, loss[loss=0.01372, audio_tagging_loss=0.01372, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4430613.75 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:08:46,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2023-12-22 23:08:51,125 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.33 vs. limit=22.5 2023-12-22 23:09:16,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=829306.6666666666, ans=0.1 2023-12-22 23:09:26,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=829373.3333333334, ans=0.0 2023-12-22 23:09:31,759 INFO [train.py:886] (0/4) Epoch 27, batch 500, loss[loss=0.01341, audio_tagging_loss=0.01341, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4551605.15 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:09:34,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=829440.0, ans=0.0 2023-12-22 23:09:45,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2023-12-22 23:09:51,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=829573.3333333334, ans=0.125 2023-12-22 23:09:52,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=829573.3333333334, ans=0.125 2023-12-22 23:09:58,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=829573.3333333334, ans=0.125 2023-12-22 23:10:00,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=829573.3333333334, ans=0.09899494936611666 2023-12-22 23:10:11,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=829640.0, ans=0.125 2023-12-22 23:10:13,346 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.842e+01 3.068e+01 3.193e+01 3.340e+01 3.913e+01, threshold=6.386e+01, percent-clipped=0.0 2023-12-22 23:10:23,514 INFO [train.py:886] (0/4) Epoch 27, batch 550, loss[loss=0.01305, audio_tagging_loss=0.01305, over 25000.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4644493.17 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:10:24,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.79 vs. limit=15.0 2023-12-22 23:10:25,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-12-22 23:10:26,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=829773.3333333334, ans=0.2 2023-12-22 23:10:32,177 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.67 vs. limit=22.5 2023-12-22 23:10:33,454 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-12-22 23:10:43,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=829906.6666666666, ans=0.0 2023-12-22 23:10:48,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=829906.6666666666, ans=0.0 2023-12-22 23:10:48,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=829906.6666666666, ans=0.1 2023-12-22 23:11:00,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=829973.3333333334, ans=0.1 2023-12-22 23:11:14,343 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:11:15,951 INFO [train.py:886] (0/4) Epoch 27, batch 600, loss[loss=0.01453, audio_tagging_loss=0.01453, over 24750.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4717024.13 frames. ], batch size: 99, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:11:19,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=830106.6666666666, ans=0.125 2023-12-22 23:11:31,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=830173.3333333334, ans=0.2 2023-12-22 23:11:35,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=830240.0, ans=0.125 2023-12-22 23:11:38,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2023-12-22 23:11:57,273 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.845e+01 3.188e+01 3.320e+01 3.491e+01 4.138e+01, threshold=6.639e+01, percent-clipped=0.0 2023-12-22 23:12:02,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830373.3333333334, ans=0.1 2023-12-22 23:12:07,458 INFO [train.py:886] (0/4) Epoch 27, batch 650, loss[loss=0.01305, audio_tagging_loss=0.01305, over 25000.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4763822.97 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:12:22,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=830506.6666666666, ans=0.0 2023-12-22 23:12:28,580 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:12:33,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=830573.3333333334, ans=0.1 2023-12-22 23:12:45,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=830640.0, ans=0.0 2023-12-22 23:12:45,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.51 vs. limit=12.0 2023-12-22 23:13:00,469 INFO [train.py:886] (0/4) Epoch 27, batch 700, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4800734.55 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:13:15,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=830840.0, ans=0.125 2023-12-22 23:13:36,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=830973.3333333334, ans=0.0 2023-12-22 23:13:38,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2023-12-22 23:13:38,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=830973.3333333334, ans=0.125 2023-12-22 23:13:39,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=830973.3333333334, ans=0.125 2023-12-22 23:13:41,262 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.788e+01 3.146e+01 3.254e+01 3.417e+01 3.974e+01, threshold=6.508e+01, percent-clipped=0.0 2023-12-22 23:13:53,008 INFO [train.py:886] (0/4) Epoch 27, batch 750, loss[loss=0.01362, audio_tagging_loss=0.01362, over 25000.00 frames. ], tot_loss[loss=0.01313, audio_tagging_loss=0.01313, over 4829576.89 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:14:18,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=831240.0, ans=0.07 2023-12-22 23:14:44,362 INFO [train.py:886] (0/4) Epoch 27, batch 800, loss[loss=0.01397, audio_tagging_loss=0.01397, over 25000.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4857376.27 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:14:50,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=831440.0, ans=0.1 2023-12-22 23:15:01,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=831506.6666666666, ans=0.5 2023-12-22 23:15:26,458 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.771e+01 3.100e+01 3.260e+01 3.401e+01 4.213e+01, threshold=6.521e+01, percent-clipped=0.0 2023-12-22 23:15:34,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=831706.6666666666, ans=0.125 2023-12-22 23:15:34,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=831706.6666666666, ans=0.0 2023-12-22 23:15:35,363 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.93 vs. limit=15.0 2023-12-22 23:15:36,708 INFO [train.py:886] (0/4) Epoch 27, batch 850, loss[loss=0.01178, audio_tagging_loss=0.01178, over 25000.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4883907.97 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:15:36,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=831773.3333333334, ans=0.125 2023-12-22 23:15:44,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=831773.3333333334, ans=0.0 2023-12-22 23:15:55,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=831840.0, ans=0.0 2023-12-22 23:16:01,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=831906.6666666666, ans=0.2 2023-12-22 23:16:28,941 INFO [train.py:886] (0/4) Epoch 27, batch 900, loss[loss=0.0153, audio_tagging_loss=0.0153, over 24946.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4905773.01 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:16:29,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=832106.6666666666, ans=0.125 2023-12-22 23:16:35,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=832106.6666666666, ans=0.035 2023-12-22 23:16:35,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.80 vs. limit=12.0 2023-12-22 23:16:42,051 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:16:45,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=832173.3333333334, ans=0.0 2023-12-22 23:16:47,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=832173.3333333334, ans=0.125 2023-12-22 23:16:55,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832240.0, ans=0.1 2023-12-22 23:16:55,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=832240.0, ans=0.0 2023-12-22 23:16:56,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=832240.0, ans=0.0 2023-12-22 23:16:58,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=832240.0, ans=0.125 2023-12-22 23:17:11,217 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.850e+01 3.162e+01 3.282e+01 3.411e+01 3.879e+01, threshold=6.564e+01, percent-clipped=0.0 2023-12-22 23:17:13,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=832373.3333333334, ans=0.0 2023-12-22 23:17:20,754 INFO [train.py:886] (0/4) Epoch 27, batch 950, loss[loss=0.01384, audio_tagging_loss=0.01384, over 24750.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4913448.96 frames. ], batch size: 99, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:17:27,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=832440.0, ans=0.125 2023-12-22 23:17:35,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-12-22 23:17:36,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.72 vs. limit=10.0 2023-12-22 23:17:48,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2023-12-22 23:18:04,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=832706.6666666666, ans=0.0 2023-12-22 23:18:13,708 INFO [train.py:886] (0/4) Epoch 27, batch 1000, loss[loss=0.01166, audio_tagging_loss=0.01166, over 24750.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4914663.78 frames. ], batch size: 99, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:18:15,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=12.0 2023-12-22 23:18:16,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=832773.3333333334, ans=0.0 2023-12-22 23:18:31,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=832840.0, ans=0.125 2023-12-22 23:18:34,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.54 vs. limit=15.0 2023-12-22 23:18:52,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=832973.3333333334, ans=0.125 2023-12-22 23:18:54,479 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.825e+01 3.109e+01 3.263e+01 3.461e+01 3.831e+01, threshold=6.526e+01, percent-clipped=0.0 2023-12-22 23:18:57,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=833040.0, ans=0.125 2023-12-22 23:19:04,677 INFO [train.py:886] (0/4) Epoch 27, batch 1050, loss[loss=0.01222, audio_tagging_loss=0.01222, over 24750.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4926137.24 frames. ], batch size: 99, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:19:04,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=833106.6666666666, ans=0.125 2023-12-22 23:19:09,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2023-12-22 23:19:57,230 INFO [train.py:886] (0/4) Epoch 27, batch 1100, loss[loss=0.01443, audio_tagging_loss=0.01443, over 24750.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4929596.42 frames. ], batch size: 99, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:20:04,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=833440.0, ans=0.125 2023-12-22 23:20:22,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=833573.3333333334, ans=0.07 2023-12-22 23:20:37,343 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.804e+01 3.138e+01 3.223e+01 3.425e+01 4.180e+01, threshold=6.446e+01, percent-clipped=0.0 2023-12-22 23:20:49,032 INFO [train.py:886] (0/4) Epoch 27, batch 1150, loss[loss=0.01438, audio_tagging_loss=0.01438, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4940341.69 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:21:20,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2023-12-22 23:21:31,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=834040.0, ans=0.125 2023-12-22 23:21:34,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=834040.0, ans=0.2 2023-12-22 23:21:34,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=834040.0, ans=0.0 2023-12-22 23:21:40,011 INFO [train.py:886] (0/4) Epoch 27, batch 1200, loss[loss=0.01243, audio_tagging_loss=0.01243, over 25000.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4945113.12 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:21:41,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=834106.6666666666, ans=0.0 2023-12-22 23:21:46,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=834106.6666666666, ans=0.1 2023-12-22 23:22:00,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=12.0 2023-12-22 23:22:04,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=834240.0, ans=0.125 2023-12-22 23:22:08,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=834240.0, ans=0.0 2023-12-22 23:22:14,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=834306.6666666666, ans=0.2 2023-12-22 23:22:20,813 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.906e+01 3.144e+01 3.270e+01 3.430e+01 4.377e+01, threshold=6.540e+01, percent-clipped=0.0 2023-12-22 23:22:28,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=834373.3333333334, ans=0.125 2023-12-22 23:22:31,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=834440.0, ans=0.125 2023-12-22 23:22:32,517 INFO [train.py:886] (0/4) Epoch 27, batch 1250, loss[loss=0.01164, audio_tagging_loss=0.01164, over 22335.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4938611.33 frames. ], batch size: 107, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:22:38,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=834440.0, ans=0.2 2023-12-22 23:22:38,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=22.5 2023-12-22 23:22:49,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=834506.6666666666, ans=0.2 2023-12-22 23:23:01,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834573.3333333334, ans=0.1 2023-12-22 23:23:07,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.28 vs. limit=15.0 2023-12-22 23:23:14,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.63 vs. limit=6.0 2023-12-22 23:23:16,293 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.14 vs. limit=10.0 2023-12-22 23:23:23,904 INFO [train.py:886] (0/4) Epoch 27, batch 1300, loss[loss=0.01514, audio_tagging_loss=0.01514, over 25000.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4933066.66 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:23:26,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.52 vs. limit=6.0 2023-12-22 23:23:32,386 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2023-12-22 23:24:02,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=834973.3333333334, ans=0.125 2023-12-22 23:24:05,916 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.886e+01 3.125e+01 3.290e+01 3.447e+01 4.331e+01, threshold=6.579e+01, percent-clipped=0.0 2023-12-22 23:24:12,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=835040.0, ans=0.1 2023-12-22 23:24:15,424 INFO [train.py:886] (0/4) Epoch 27, batch 1350, loss[loss=0.01034, audio_tagging_loss=0.01034, over 25000.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4933930.36 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:24:31,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=835173.3333333334, ans=0.1 2023-12-22 23:24:32,384 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.81 vs. limit=10.0 2023-12-22 23:24:39,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2023-12-22 23:24:51,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.42 vs. limit=15.0 2023-12-22 23:24:51,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2023-12-22 23:24:53,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=835306.6666666666, ans=0.125 2023-12-22 23:25:07,494 INFO [train.py:886] (0/4) Epoch 27, batch 1400, loss[loss=0.01403, audio_tagging_loss=0.01403, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4940964.38 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:25:20,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=835506.6666666666, ans=0.125 2023-12-22 23:25:45,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=835640.0, ans=0.05 2023-12-22 23:25:48,891 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.757e+01 3.050e+01 3.190e+01 3.364e+01 4.111e+01, threshold=6.380e+01, percent-clipped=0.0 2023-12-22 23:25:58,447 INFO [train.py:886] (0/4) Epoch 27, batch 1450, loss[loss=0.01383, audio_tagging_loss=0.01383, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4945739.15 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:25:58,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=835773.3333333334, ans=0.05 2023-12-22 23:26:02,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=15.0 2023-12-22 23:26:14,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=835840.0, ans=0.125 2023-12-22 23:26:19,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=835906.6666666666, ans=0.0 2023-12-22 23:26:35,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=835973.3333333334, ans=0.2 2023-12-22 23:26:43,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=12.0 2023-12-22 23:26:51,921 INFO [train.py:886] (0/4) Epoch 27, batch 1500, loss[loss=0.01514, audio_tagging_loss=0.01514, over 24750.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4946675.77 frames. ], batch size: 99, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:26:52,385 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2023-12-22 23:26:59,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=836106.6666666666, ans=0.0 2023-12-22 23:27:23,076 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=15.0 2023-12-22 23:27:32,533 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.805e+01 3.110e+01 3.282e+01 3.449e+01 4.134e+01, threshold=6.564e+01, percent-clipped=0.0 2023-12-22 23:27:33,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=836373.3333333334, ans=0.0 2023-12-22 23:27:39,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=836373.3333333334, ans=0.09899494936611666 2023-12-22 23:27:40,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.34 vs. limit=15.0 2023-12-22 23:27:42,849 INFO [train.py:886] (0/4) Epoch 27, batch 1550, loss[loss=0.01242, audio_tagging_loss=0.01242, over 24750.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4939405.68 frames. ], batch size: 99, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:27:50,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=836440.0, ans=0.125 2023-12-22 23:27:54,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=836506.6666666666, ans=0.125 2023-12-22 23:27:57,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=836506.6666666666, ans=0.125 2023-12-22 23:28:29,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=836706.6666666666, ans=0.2 2023-12-22 23:28:35,346 INFO [train.py:886] (0/4) Epoch 27, batch 1600, loss[loss=0.01242, audio_tagging_loss=0.01242, over 24750.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4939128.14 frames. ], batch size: 99, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:28:35,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=836773.3333333334, ans=0.2 2023-12-22 23:28:35,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=836773.3333333334, ans=0.0 2023-12-22 23:28:42,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=836773.3333333334, ans=0.2 2023-12-22 23:29:16,723 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.807e+01 3.145e+01 3.260e+01 3.442e+01 4.134e+01, threshold=6.520e+01, percent-clipped=0.0 2023-12-22 23:29:19,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=837040.0, ans=0.125 2023-12-22 23:29:27,601 INFO [train.py:886] (0/4) Epoch 27, batch 1650, loss[loss=0.01386, audio_tagging_loss=0.01386, over 24750.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4938328.77 frames. ], batch size: 99, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:29:30,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2023-12-22 23:29:40,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=837173.3333333334, ans=0.125 2023-12-22 23:29:43,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-12-22 23:29:51,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=837240.0, ans=0.125 2023-12-22 23:30:14,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=837373.3333333334, ans=0.0 2023-12-22 23:30:19,124 INFO [train.py:886] (0/4) Epoch 27, batch 1700, loss[loss=0.01302, audio_tagging_loss=0.01302, over 24750.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4942806.33 frames. ], batch size: 99, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:30:41,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=837573.3333333334, ans=0.125 2023-12-22 23:30:44,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=837573.3333333334, ans=0.125 2023-12-22 23:30:51,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=837640.0, ans=0.125 2023-12-22 23:30:51,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=837640.0, ans=0.0 2023-12-22 23:31:01,544 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.884e+01 3.145e+01 3.289e+01 3.408e+01 4.323e+01, threshold=6.579e+01, percent-clipped=0.0 2023-12-22 23:31:03,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=837706.6666666666, ans=0.0 2023-12-22 23:31:06,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=837706.6666666666, ans=0.125 2023-12-22 23:31:11,068 INFO [train.py:886] (0/4) Epoch 27, batch 1750, loss[loss=0.01524, audio_tagging_loss=0.01524, over 24750.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4945593.64 frames. ], batch size: 99, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:31:19,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=837773.3333333334, ans=0.125 2023-12-22 23:31:21,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=837840.0, ans=0.2 2023-12-22 23:31:48,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=837973.3333333334, ans=0.025 2023-12-22 23:31:49,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=837973.3333333334, ans=0.0 2023-12-22 23:31:52,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=838040.0, ans=0.0 2023-12-22 23:31:52,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=838040.0, ans=0.0 2023-12-22 23:31:56,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=838040.0, ans=0.125 2023-12-22 23:31:57,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=838040.0, ans=0.0 2023-12-22 23:32:03,068 INFO [train.py:886] (0/4) Epoch 27, batch 1800, loss[loss=0.01606, audio_tagging_loss=0.01606, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4950394.87 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:32:14,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=838173.3333333334, ans=0.125 2023-12-22 23:32:15,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2023-12-22 23:32:34,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=838306.6666666666, ans=0.125 2023-12-22 23:32:43,911 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.896e+01 3.173e+01 3.284e+01 3.409e+01 4.176e+01, threshold=6.568e+01, percent-clipped=0.0 2023-12-22 23:32:48,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=838373.3333333334, ans=0.125 2023-12-22 23:32:54,019 INFO [train.py:886] (0/4) Epoch 27, batch 1850, loss[loss=0.01538, audio_tagging_loss=0.01538, over 24945.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4955417.16 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:33:01,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=838440.0, ans=0.125 2023-12-22 23:33:14,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.19 vs. limit=10.0 2023-12-22 23:33:25,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=838640.0, ans=0.125 2023-12-22 23:33:32,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=838640.0, ans=0.125 2023-12-22 23:33:45,746 INFO [train.py:886] (0/4) Epoch 27, batch 1900, loss[loss=0.01244, audio_tagging_loss=0.01244, over 25000.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4951071.02 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:33:49,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=838773.3333333334, ans=0.0 2023-12-22 23:33:58,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=838840.0, ans=0.2 2023-12-22 23:34:11,873 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.03 vs. limit=6.0 2023-12-22 23:34:19,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=15.0 2023-12-22 23:34:25,799 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.935e+01 3.098e+01 3.249e+01 3.476e+01 4.111e+01, threshold=6.498e+01, percent-clipped=0.0 2023-12-22 23:34:36,041 INFO [train.py:886] (0/4) Epoch 27, batch 1950, loss[loss=0.01281, audio_tagging_loss=0.01281, over 24750.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4941575.72 frames. ], batch size: 99, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:34:44,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=839106.6666666666, ans=0.2 2023-12-22 23:34:52,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=839173.3333333334, ans=0.125 2023-12-22 23:34:53,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=839173.3333333334, ans=0.0 2023-12-22 23:34:57,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=839240.0, ans=0.125 2023-12-22 23:35:11,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=839306.6666666666, ans=0.125 2023-12-22 23:35:14,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=839306.6666666666, ans=0.125 2023-12-22 23:35:28,642 INFO [train.py:886] (0/4) Epoch 27, batch 2000, loss[loss=0.01337, audio_tagging_loss=0.01337, over 24750.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4943630.68 frames. ], batch size: 99, lr: 4.01e-03, grad_scale: 64.0 2023-12-22 23:35:30,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=839440.0, ans=0.1 2023-12-22 23:35:32,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=839440.0, ans=0.125 2023-12-22 23:35:37,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=839440.0, ans=0.1 2023-12-22 23:35:38,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=839506.6666666666, ans=0.1 2023-12-22 23:35:44,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=839506.6666666666, ans=0.125 2023-12-22 23:36:02,042 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2023-12-22 23:36:02,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=839640.0, ans=0.1 2023-12-22 23:36:09,866 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.727e+01 3.085e+01 3.253e+01 3.416e+01 3.912e+01, threshold=6.506e+01, percent-clipped=0.0 2023-12-22 23:36:21,422 INFO [train.py:886] (0/4) Epoch 27, batch 2050, loss[loss=0.01213, audio_tagging_loss=0.01213, over 25000.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4943941.28 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 64.0 2023-12-22 23:36:28,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=839773.3333333334, ans=0.125 2023-12-22 23:36:44,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=839906.6666666666, ans=0.125 2023-12-22 23:36:46,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=839906.6666666666, ans=0.125 2023-12-22 23:36:52,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=839973.3333333334, ans=0.0 2023-12-22 23:36:58,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=839973.3333333334, ans=0.2 2023-12-22 23:37:01,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=840040.0, ans=0.1 2023-12-22 23:37:11,546 INFO [train.py:886] (0/4) Epoch 27, batch 2100, loss[loss=0.01075, audio_tagging_loss=0.01075, over 25000.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4947415.86 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 64.0 2023-12-22 23:37:12,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=840106.6666666666, ans=0.125 2023-12-22 23:37:14,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=840106.6666666666, ans=0.1 2023-12-22 23:37:19,305 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.03 vs. limit=22.5 2023-12-22 23:37:34,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=840240.0, ans=0.2 2023-12-22 23:37:53,522 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.868e+01 3.144e+01 3.259e+01 3.394e+01 3.827e+01, threshold=6.517e+01, percent-clipped=0.0 2023-12-22 23:37:54,682 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:38:00,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=840373.3333333334, ans=0.125 2023-12-22 23:38:03,696 INFO [train.py:886] (0/4) Epoch 27, batch 2150, loss[loss=0.01465, audio_tagging_loss=0.01465, over 24750.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4950178.01 frames. ], batch size: 99, lr: 4.01e-03, grad_scale: 64.0 2023-12-22 23:38:18,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=840506.6666666666, ans=0.0 2023-12-22 23:38:43,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=840706.6666666666, ans=0.125 2023-12-22 23:38:53,875 INFO [train.py:886] (0/4) Epoch 27, batch 2200, loss[loss=0.01493, audio_tagging_loss=0.01493, over 24750.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4952038.50 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:39:29,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2023-12-22 23:39:35,835 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.899e+01 3.169e+01 3.290e+01 3.439e+01 3.968e+01, threshold=6.580e+01, percent-clipped=0.0 2023-12-22 23:39:45,339 INFO [train.py:886] (0/4) Epoch 27, batch 2250, loss[loss=0.01109, audio_tagging_loss=0.01109, over 24750.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4949292.36 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:39:54,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.36 vs. limit=22.5 2023-12-22 23:40:01,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=841173.3333333334, ans=0.125 2023-12-22 23:40:10,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.60 vs. limit=22.5 2023-12-22 23:40:11,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=841240.0, ans=0.05 2023-12-22 23:40:37,974 INFO [train.py:886] (0/4) Epoch 27, batch 2300, loss[loss=0.01283, audio_tagging_loss=0.01283, over 25000.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4948685.67 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:40:41,318 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2023-12-22 23:41:03,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=841573.3333333334, ans=0.0 2023-12-22 23:41:08,751 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:41:16,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2023-12-22 23:41:18,131 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.820e+01 3.111e+01 3.239e+01 3.412e+01 3.909e+01, threshold=6.478e+01, percent-clipped=0.0 2023-12-22 23:41:26,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=841706.6666666666, ans=0.0 2023-12-22 23:41:27,654 INFO [train.py:886] (0/4) Epoch 27, batch 2350, loss[loss=0.01354, audio_tagging_loss=0.01354, over 24750.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4951790.75 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:41:29,825 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:41:38,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=841840.0, ans=0.0 2023-12-22 23:41:39,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=841840.0, ans=15.0 2023-12-22 23:41:49,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=841906.6666666666, ans=0.0 2023-12-22 23:41:51,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=841906.6666666666, ans=0.0 2023-12-22 23:41:51,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=841906.6666666666, ans=0.2 2023-12-22 23:42:06,177 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.24 vs. limit=10.0 2023-12-22 23:42:13,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2023-12-22 23:42:20,079 INFO [train.py:886] (0/4) Epoch 27, batch 2400, loss[loss=0.0127, audio_tagging_loss=0.0127, over 25000.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4956898.52 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:42:21,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=842106.6666666666, ans=0.0 2023-12-22 23:42:37,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=842173.3333333334, ans=0.125 2023-12-22 23:42:54,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.79 vs. limit=15.0 2023-12-22 23:43:00,771 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.830e+01 3.113e+01 3.225e+01 3.408e+01 4.094e+01, threshold=6.450e+01, percent-clipped=0.0 2023-12-22 23:43:03,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=842373.3333333334, ans=0.05 2023-12-22 23:43:11,000 INFO [train.py:886] (0/4) Epoch 27, batch 2450, loss[loss=0.01206, audio_tagging_loss=0.01206, over 25000.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4961837.83 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:43:18,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=842440.0, ans=0.125 2023-12-22 23:43:21,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=842506.6666666666, ans=0.2 2023-12-22 23:43:37,234 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:43:38,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=842573.3333333334, ans=0.125 2023-12-22 23:43:41,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=842640.0, ans=0.0 2023-12-22 23:43:45,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=842640.0, ans=0.125 2023-12-22 23:43:50,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=842640.0, ans=0.04949747468305833 2023-12-22 23:44:03,520 INFO [train.py:886] (0/4) Epoch 27, batch 2500, loss[loss=0.01298, audio_tagging_loss=0.01298, over 24750.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4959922.50 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:44:04,043 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-12-22 23:44:11,589 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2023-12-22 23:44:17,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=842840.0, ans=0.125 2023-12-22 23:44:18,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=842840.0, ans=0.07 2023-12-22 23:44:27,739 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2023-12-22 23:44:28,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842906.6666666666, ans=0.1 2023-12-22 23:44:40,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=842973.3333333334, ans=0.125 2023-12-22 23:44:44,154 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.936e+01 3.187e+01 3.317e+01 3.432e+01 3.909e+01, threshold=6.634e+01, percent-clipped=0.0 2023-12-22 23:44:53,485 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2023-12-22 23:44:55,752 INFO [train.py:886] (0/4) Epoch 27, batch 2550, loss[loss=0.01445, audio_tagging_loss=0.01445, over 24942.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4946767.42 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:45:01,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=843106.6666666666, ans=0.125 2023-12-22 23:45:27,395 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.26 vs. limit=10.0 2023-12-22 23:45:46,010 INFO [train.py:886] (0/4) Epoch 27, batch 2600, loss[loss=0.01352, audio_tagging_loss=0.01352, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4946303.75 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:46:19,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=843640.0, ans=0.1 2023-12-22 23:46:23,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.56 vs. limit=22.5 2023-12-22 23:46:27,694 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.891e+01 3.099e+01 3.252e+01 3.438e+01 3.884e+01, threshold=6.504e+01, percent-clipped=0.0 2023-12-22 23:46:37,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=843773.3333333334, ans=0.05 2023-12-22 23:46:37,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2023-12-22 23:46:38,019 INFO [train.py:886] (0/4) Epoch 27, batch 2650, loss[loss=0.01349, audio_tagging_loss=0.01349, over 25000.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4948823.15 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:46:42,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.66 vs. limit=15.0 2023-12-22 23:46:45,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=843773.3333333334, ans=0.125 2023-12-22 23:46:52,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=843840.0, ans=0.125 2023-12-22 23:47:02,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=843906.6666666666, ans=0.0 2023-12-22 23:47:14,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2023-12-22 23:47:28,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=844040.0, ans=0.0 2023-12-22 23:47:30,278 INFO [train.py:886] (0/4) Epoch 27, batch 2700, loss[loss=0.009688, audio_tagging_loss=0.009688, over 25000.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4949426.86 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:47:32,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=844106.6666666666, ans=0.0 2023-12-22 23:47:39,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=844173.3333333334, ans=0.1 2023-12-22 23:47:56,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=844240.0, ans=0.125 2023-12-22 23:48:07,676 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=12.0 2023-12-22 23:48:10,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=844373.3333333334, ans=0.0 2023-12-22 23:48:11,088 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.927e+01 3.112e+01 3.261e+01 3.405e+01 4.046e+01, threshold=6.522e+01, percent-clipped=0.0 2023-12-22 23:48:21,322 INFO [train.py:886] (0/4) Epoch 27, batch 2750, loss[loss=0.01474, audio_tagging_loss=0.01474, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4955346.01 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:48:31,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=844506.6666666666, ans=0.0 2023-12-22 23:48:40,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=844506.6666666666, ans=0.125 2023-12-22 23:48:43,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=844573.3333333334, ans=0.125 2023-12-22 23:48:57,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=844640.0, ans=0.1 2023-12-22 23:49:11,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=844706.6666666666, ans=0.0 2023-12-22 23:49:13,587 INFO [train.py:886] (0/4) Epoch 27, batch 2800, loss[loss=0.009518, audio_tagging_loss=0.009518, over 24750.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4952308.36 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:49:13,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=844773.3333333334, ans=0.0 2023-12-22 23:49:29,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=844840.0, ans=0.5 2023-12-22 23:49:29,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2023-12-22 23:49:32,567 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2023-12-22 23:49:35,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=844906.6666666666, ans=0.0 2023-12-22 23:49:39,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=844906.6666666666, ans=0.2 2023-12-22 23:49:42,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=844906.6666666666, ans=0.0 2023-12-22 23:49:48,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=844973.3333333334, ans=0.2 2023-12-22 23:49:54,269 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.754e+01 3.194e+01 3.305e+01 3.452e+01 3.886e+01, threshold=6.610e+01, percent-clipped=0.0 2023-12-22 23:49:58,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=845040.0, ans=0.05 2023-12-22 23:49:58,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2023-12-22 23:50:01,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=845040.0, ans=0.125 2023-12-22 23:50:02,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2023-12-22 23:50:05,154 INFO [train.py:886] (0/4) Epoch 27, batch 2850, loss[loss=0.01329, audio_tagging_loss=0.01329, over 24750.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4947838.40 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:50:09,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=845106.6666666666, ans=0.0 2023-12-22 23:50:12,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=845106.6666666666, ans=0.2 2023-12-22 23:50:15,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.69 vs. limit=15.0 2023-12-22 23:50:18,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=845173.3333333334, ans=0.2 2023-12-22 23:50:37,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=845306.6666666666, ans=0.0 2023-12-22 23:50:52,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=12.0 2023-12-22 23:50:56,333 INFO [train.py:886] (0/4) Epoch 27, batch 2900, loss[loss=0.01344, audio_tagging_loss=0.01344, over 25000.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4948946.46 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:50:58,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=845440.0, ans=0.125 2023-12-22 23:51:09,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=845506.6666666666, ans=0.125 2023-12-22 23:51:23,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.22 vs. limit=22.5 2023-12-22 23:51:24,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=22.5 2023-12-22 23:51:27,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=845640.0, ans=0.09899494936611666 2023-12-22 23:51:37,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=845706.6666666666, ans=0.125 2023-12-22 23:51:37,798 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.846e+01 3.117e+01 3.211e+01 3.387e+01 4.000e+01, threshold=6.423e+01, percent-clipped=0.0 2023-12-22 23:51:48,684 INFO [train.py:886] (0/4) Epoch 27, batch 2950, loss[loss=0.01338, audio_tagging_loss=0.01338, over 25000.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4958000.52 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:51:50,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=845773.3333333334, ans=0.125 2023-12-22 23:51:55,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=845773.3333333334, ans=0.0 2023-12-22 23:52:03,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=845840.0, ans=0.1 2023-12-22 23:52:06,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=845840.0, ans=0.125 2023-12-22 23:52:19,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=15.0 2023-12-22 23:52:37,153 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:52:39,819 INFO [train.py:886] (0/4) Epoch 27, batch 3000, loss[loss=0.01126, audio_tagging_loss=0.01126, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4951830.61 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:52:39,820 INFO [train.py:909] (0/4) Computing validation loss 2023-12-22 23:53:00,360 INFO [train.py:917] (0/4) Epoch 27, validation: loss=0.03311, audio_tagging_loss=0.03311, over 3737520.00 frames. 2023-12-22 23:53:00,361 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-22 23:53:15,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=846173.3333333334, ans=0.125 2023-12-22 23:53:30,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=846306.6666666666, ans=0.0 2023-12-22 23:53:42,510 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.897e+01 3.118e+01 3.260e+01 3.397e+01 4.017e+01, threshold=6.519e+01, percent-clipped=0.0 2023-12-22 23:53:42,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=846373.3333333334, ans=0.015 2023-12-22 23:53:47,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=846373.3333333334, ans=0.125 2023-12-22 23:53:53,392 INFO [train.py:886] (0/4) Epoch 27, batch 3050, loss[loss=0.01251, audio_tagging_loss=0.01251, over 22696.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4952863.55 frames. ], batch size: 107, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:54:06,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2023-12-22 23:54:11,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=846506.6666666666, ans=0.125 2023-12-22 23:54:23,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=846640.0, ans=0.125 2023-12-22 23:54:44,003 INFO [train.py:886] (0/4) Epoch 27, batch 3100, loss[loss=0.01248, audio_tagging_loss=0.01248, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4956191.66 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:54:47,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=846773.3333333334, ans=0.035 2023-12-22 23:54:47,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=846773.3333333334, ans=0.125 2023-12-22 23:55:04,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=846906.6666666666, ans=0.2 2023-12-22 23:55:26,095 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.855e+01 3.155e+01 3.323e+01 3.468e+01 4.091e+01, threshold=6.646e+01, percent-clipped=0.0 2023-12-22 23:55:35,704 INFO [train.py:886] (0/4) Epoch 27, batch 3150, loss[loss=0.01362, audio_tagging_loss=0.01362, over 24750.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4943256.84 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:55:50,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=847173.3333333334, ans=0.5 2023-12-22 23:55:56,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=847240.0, ans=0.125 2023-12-22 23:56:13,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=847306.6666666666, ans=0.0 2023-12-22 23:56:17,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=847373.3333333334, ans=0.125 2023-12-22 23:56:27,857 INFO [train.py:886] (0/4) Epoch 27, batch 3200, loss[loss=0.01106, audio_tagging_loss=0.01106, over 25000.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4945748.97 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:56:42,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=847506.6666666666, ans=0.0 2023-12-22 23:57:07,719 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:57:08,443 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.746e+01 3.162e+01 3.265e+01 3.446e+01 4.055e+01, threshold=6.529e+01, percent-clipped=0.0 2023-12-22 23:57:11,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=847706.6666666666, ans=0.2 2023-12-22 23:57:17,861 INFO [train.py:886] (0/4) Epoch 27, batch 3250, loss[loss=0.0131, audio_tagging_loss=0.0131, over 24750.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4944332.82 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:57:18,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=847773.3333333334, ans=0.0 2023-12-22 23:57:24,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=847773.3333333334, ans=0.0 2023-12-22 23:57:26,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=847773.3333333334, ans=0.125 2023-12-22 23:57:34,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=847840.0, ans=0.0 2023-12-22 23:57:51,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=847973.3333333334, ans=0.1 2023-12-22 23:57:55,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=847973.3333333334, ans=0.125 2023-12-22 23:57:56,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=847973.3333333334, ans=0.09899494936611666 2023-12-22 23:58:10,818 INFO [train.py:886] (0/4) Epoch 27, batch 3300, loss[loss=0.01129, audio_tagging_loss=0.01129, over 21635.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4945667.06 frames. ], batch size: 107, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:58:23,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=848173.3333333334, ans=0.0 2023-12-22 23:58:37,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=848240.0, ans=22.5 2023-12-22 23:58:46,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=12.0 2023-12-22 23:58:51,328 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.800e+01 3.157e+01 3.288e+01 3.417e+01 4.683e+01, threshold=6.576e+01, percent-clipped=0.0 2023-12-22 23:59:02,298 INFO [train.py:886] (0/4) Epoch 27, batch 3350, loss[loss=0.01212, audio_tagging_loss=0.01212, over 25000.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4948294.47 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:59:04,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=848440.0, ans=0.1 2023-12-22 23:59:05,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=848440.0, ans=0.0 2023-12-22 23:59:17,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=848506.6666666666, ans=0.05 2023-12-22 23:59:28,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2023-12-22 23:59:28,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=848573.3333333334, ans=0.125 2023-12-22 23:59:32,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=848640.0, ans=0.035 2023-12-22 23:59:53,236 INFO [train.py:886] (0/4) Epoch 27, batch 3400, loss[loss=0.01034, audio_tagging_loss=0.01034, over 24025.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4954110.43 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-23 00:00:05,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=848840.0, ans=0.0 2023-12-23 00:00:12,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=848840.0, ans=0.0 2023-12-23 00:00:12,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2023-12-23 00:00:33,028 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.887e+01 3.185e+01 3.323e+01 3.461e+01 4.226e+01, threshold=6.645e+01, percent-clipped=0.0 2023-12-23 00:00:33,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=849040.0, ans=0.0 2023-12-23 00:00:45,417 INFO [train.py:886] (0/4) Epoch 27, batch 3450, loss[loss=0.01416, audio_tagging_loss=0.01416, over 23973.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4950591.79 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-23 00:00:48,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=849106.6666666666, ans=0.0 2023-12-23 00:00:51,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=849106.6666666666, ans=0.125 2023-12-23 00:00:58,083 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:00:58,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=849173.3333333334, ans=0.125 2023-12-23 00:01:00,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.74 vs. limit=22.5 2023-12-23 00:01:01,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=849173.3333333334, ans=0.07 2023-12-23 00:01:17,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=849306.6666666666, ans=0.0 2023-12-23 00:01:35,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=849440.0, ans=10.0 2023-12-23 00:01:36,108 INFO [train.py:886] (0/4) Epoch 27, batch 3500, loss[loss=0.01311, audio_tagging_loss=0.01311, over 24750.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4948888.57 frames. ], batch size: 99, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:01:36,369 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:01:44,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=849440.0, ans=0.0 2023-12-23 00:01:47,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.94 vs. limit=22.5 2023-12-23 00:01:49,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=849506.6666666666, ans=0.125 2023-12-23 00:01:55,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=849506.6666666666, ans=0.0 2023-12-23 00:02:08,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=849640.0, ans=0.5 2023-12-23 00:02:20,115 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+01 3.138e+01 3.283e+01 3.482e+01 4.072e+01, threshold=6.566e+01, percent-clipped=0.0 2023-12-23 00:02:29,536 INFO [train.py:886] (0/4) Epoch 27, batch 3550, loss[loss=0.0128, audio_tagging_loss=0.0128, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4944173.54 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:02:37,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=849773.3333333334, ans=0.125 2023-12-23 00:02:44,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=849840.0, ans=0.0 2023-12-23 00:02:50,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=849906.6666666666, ans=0.1 2023-12-23 00:02:59,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=849906.6666666666, ans=0.125 2023-12-23 00:03:00,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=849973.3333333334, ans=0.125 2023-12-23 00:03:09,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=850040.0, ans=0.125 2023-12-23 00:03:22,007 INFO [train.py:886] (0/4) Epoch 27, batch 3600, loss[loss=0.01016, audio_tagging_loss=0.01016, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4949748.76 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:03:28,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=850106.6666666666, ans=0.1 2023-12-23 00:03:35,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=850173.3333333334, ans=0.09899494936611666 2023-12-23 00:03:42,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=850240.0, ans=0.0 2023-12-23 00:03:50,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=850240.0, ans=0.125 2023-12-23 00:03:54,401 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.95 vs. limit=22.5 2023-12-23 00:04:03,113 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.828e+01 3.137e+01 3.279e+01 3.466e+01 4.135e+01, threshold=6.558e+01, percent-clipped=0.0 2023-12-23 00:04:03,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.70 vs. limit=8.0 2023-12-23 00:04:12,584 INFO [train.py:886] (0/4) Epoch 27, batch 3650, loss[loss=0.009558, audio_tagging_loss=0.009558, over 23973.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4945507.10 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:04:12,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=850440.0, ans=0.0 2023-12-23 00:04:16,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=850440.0, ans=0.0 2023-12-23 00:04:23,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=850506.6666666666, ans=0.125 2023-12-23 00:04:27,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=850506.6666666666, ans=0.125 2023-12-23 00:04:40,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=850573.3333333334, ans=0.125 2023-12-23 00:04:43,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=850640.0, ans=0.125 2023-12-23 00:04:47,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=850640.0, ans=0.0 2023-12-23 00:04:51,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=850640.0, ans=0.125 2023-12-23 00:05:01,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=850706.6666666666, ans=0.025 2023-12-23 00:05:04,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=850773.3333333334, ans=0.1 2023-12-23 00:05:05,190 INFO [train.py:886] (0/4) Epoch 27, batch 3700, loss[loss=0.01197, audio_tagging_loss=0.01197, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4947723.14 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:05:22,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=850840.0, ans=0.125 2023-12-23 00:05:36,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=850973.3333333334, ans=0.0 2023-12-23 00:05:46,645 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.920e+01 3.190e+01 3.298e+01 3.442e+01 3.856e+01, threshold=6.597e+01, percent-clipped=0.0 2023-12-23 00:05:53,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=851040.0, ans=0.125 2023-12-23 00:05:55,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=851106.6666666666, ans=0.0 2023-12-23 00:05:56,017 INFO [train.py:886] (0/4) Epoch 27, batch 3750, loss[loss=0.01128, audio_tagging_loss=0.01128, over 24085.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4948891.67 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:06:03,253 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2023-12-23 00:06:05,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=851106.6666666666, ans=0.1 2023-12-23 00:06:30,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=851306.6666666666, ans=0.2 2023-12-23 00:06:35,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=851306.6666666666, ans=0.0 2023-12-23 00:06:48,650 INFO [train.py:886] (0/4) Epoch 27, batch 3800, loss[loss=0.01037, audio_tagging_loss=0.01037, over 24750.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4946135.55 frames. ], batch size: 99, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:06:50,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=851440.0, ans=0.125 2023-12-23 00:06:51,102 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2023-12-23 00:06:59,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.89 vs. limit=10.0 2023-12-23 00:07:06,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=851506.6666666666, ans=0.0 2023-12-23 00:07:12,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=851573.3333333334, ans=0.1 2023-12-23 00:07:29,305 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.838e+01 3.145e+01 3.329e+01 3.462e+01 3.871e+01, threshold=6.658e+01, percent-clipped=0.0 2023-12-23 00:07:29,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=851706.6666666666, ans=0.125 2023-12-23 00:07:29,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=851706.6666666666, ans=0.125 2023-12-23 00:07:40,923 INFO [train.py:886] (0/4) Epoch 27, batch 3850, loss[loss=0.01326, audio_tagging_loss=0.01326, over 24750.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4942689.42 frames. ], batch size: 99, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:07:54,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=851840.0, ans=0.2 2023-12-23 00:08:00,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2023-12-23 00:08:01,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=851906.6666666666, ans=0.1 2023-12-23 00:08:09,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=851906.6666666666, ans=0.2 2023-12-23 00:08:12,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=851973.3333333334, ans=0.125 2023-12-23 00:08:13,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=851973.3333333334, ans=0.0 2023-12-23 00:08:32,556 INFO [train.py:886] (0/4) Epoch 27, batch 3900, loss[loss=0.01101, audio_tagging_loss=0.01101, over 25000.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4947542.52 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:08:40,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.61 vs. limit=22.5 2023-12-23 00:08:47,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.44 vs. limit=22.5 2023-12-23 00:09:11,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=852306.6666666666, ans=0.2 2023-12-23 00:09:15,477 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.748e+01 3.126e+01 3.278e+01 3.398e+01 3.984e+01, threshold=6.555e+01, percent-clipped=0.0 2023-12-23 00:09:24,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=852440.0, ans=0.2 2023-12-23 00:09:25,089 INFO [train.py:886] (0/4) Epoch 27, batch 3950, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4949485.52 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:09:25,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=852440.0, ans=0.0 2023-12-23 00:09:30,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=852440.0, ans=0.0 2023-12-23 00:09:36,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=852506.6666666666, ans=0.125 2023-12-23 00:09:38,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=852506.6666666666, ans=10.0 2023-12-23 00:09:48,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=852573.3333333334, ans=0.125 2023-12-23 00:10:00,072 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2023-12-23 00:10:16,914 INFO [train.py:886] (0/4) Epoch 27, batch 4000, loss[loss=0.01171, audio_tagging_loss=0.01171, over 25000.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4950888.20 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 128.0 2023-12-23 00:10:21,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=12.0 2023-12-23 00:10:24,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=852773.3333333334, ans=0.125 2023-12-23 00:10:25,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=15.0 2023-12-23 00:10:26,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=852773.3333333334, ans=0.2 2023-12-23 00:10:27,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=852840.0, ans=0.95 2023-12-23 00:10:28,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=852840.0, ans=0.035 2023-12-23 00:10:34,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=852840.0, ans=0.125 2023-12-23 00:10:37,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=852906.6666666666, ans=0.2 2023-12-23 00:10:50,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=852973.3333333334, ans=0.05 2023-12-23 00:10:59,971 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.833e+01 3.188e+01 3.348e+01 3.452e+01 3.928e+01, threshold=6.695e+01, percent-clipped=0.0 2023-12-23 00:11:08,495 INFO [train.py:886] (0/4) Epoch 27, batch 4050, loss[loss=0.01417, audio_tagging_loss=0.01417, over 24750.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4950171.27 frames. ], batch size: 99, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:11:24,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=853173.3333333334, ans=0.0 2023-12-23 00:11:35,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=853240.0, ans=0.125 2023-12-23 00:11:39,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853306.6666666666, ans=0.1 2023-12-23 00:11:42,781 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-128000.pt 2023-12-23 00:11:49,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2023-12-23 00:12:02,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=853440.0, ans=0.1 2023-12-23 00:12:02,800 INFO [train.py:886] (0/4) Epoch 27, batch 4100, loss[loss=0.01347, audio_tagging_loss=0.01347, over 24750.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4947159.77 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:12:06,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=853440.0, ans=0.0 2023-12-23 00:12:10,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=853440.0, ans=0.125 2023-12-23 00:12:20,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2023-12-23 00:12:44,653 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.808e+01 3.219e+01 3.295e+01 3.462e+01 3.940e+01, threshold=6.591e+01, percent-clipped=0.0 2023-12-23 00:12:44,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=853706.6666666666, ans=0.025 2023-12-23 00:12:49,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-12-23 00:12:54,568 INFO [train.py:886] (0/4) Epoch 27, batch 4150, loss[loss=0.01186, audio_tagging_loss=0.01186, over 24750.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4941493.62 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:13:03,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=853773.3333333334, ans=0.0 2023-12-23 00:13:03,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=853773.3333333334, ans=0.125 2023-12-23 00:13:37,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.67 vs. limit=10.0 2023-12-23 00:13:40,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854040.0, ans=0.1 2023-12-23 00:13:42,842 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.61 vs. limit=22.5 2023-12-23 00:13:46,159 INFO [train.py:886] (0/4) Epoch 27, batch 4200, loss[loss=0.01401, audio_tagging_loss=0.01401, over 24750.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4948574.68 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:13:46,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=854106.6666666666, ans=0.125 2023-12-23 00:14:04,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=854173.3333333334, ans=0.0 2023-12-23 00:14:11,173 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.60 vs. limit=10.0 2023-12-23 00:14:17,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2023-12-23 00:14:19,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=15.0 2023-12-23 00:14:27,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854373.3333333334, ans=0.1 2023-12-23 00:14:28,654 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.773e+01 3.191e+01 3.309e+01 3.476e+01 4.229e+01, threshold=6.619e+01, percent-clipped=0.0 2023-12-23 00:14:31,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=854373.3333333334, ans=0.95 2023-12-23 00:14:38,656 INFO [train.py:886] (0/4) Epoch 27, batch 4250, loss[loss=0.01257, audio_tagging_loss=0.01257, over 24750.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4951757.69 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:14:44,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=854440.0, ans=0.125 2023-12-23 00:15:12,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=854640.0, ans=0.125 2023-12-23 00:15:15,826 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2023-12-23 00:15:16,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=854640.0, ans=0.125 2023-12-23 00:15:18,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=854640.0, ans=0.0 2023-12-23 00:15:29,738 INFO [train.py:886] (0/4) Epoch 27, batch 4300, loss[loss=0.01266, audio_tagging_loss=0.01266, over 25000.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4961408.00 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:15:44,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=854840.0, ans=0.0 2023-12-23 00:15:50,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=854906.6666666666, ans=0.0 2023-12-23 00:15:59,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=854906.6666666666, ans=0.09899494936611666 2023-12-23 00:16:13,510 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.793e+01 3.183e+01 3.299e+01 3.496e+01 3.905e+01, threshold=6.598e+01, percent-clipped=0.0 2023-12-23 00:16:19,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=855040.0, ans=0.125 2023-12-23 00:16:22,698 INFO [train.py:886] (0/4) Epoch 27, batch 4350, loss[loss=0.01274, audio_tagging_loss=0.01274, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4963932.95 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:16:30,779 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=15.0 2023-12-23 00:16:37,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=855173.3333333334, ans=0.0 2023-12-23 00:16:46,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.64 vs. limit=22.5 2023-12-23 00:17:13,772 INFO [train.py:886] (0/4) Epoch 27, batch 4400, loss[loss=0.01219, audio_tagging_loss=0.01219, over 24750.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4957990.31 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:17:26,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=855506.6666666666, ans=0.015 2023-12-23 00:17:26,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=855506.6666666666, ans=0.125 2023-12-23 00:17:31,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-12-23 00:17:32,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=855506.6666666666, ans=0.0 2023-12-23 00:17:36,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=855573.3333333334, ans=0.2 2023-12-23 00:17:44,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=855640.0, ans=0.125 2023-12-23 00:17:54,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=855706.6666666666, ans=0.125 2023-12-23 00:17:55,452 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.876e+01 3.175e+01 3.302e+01 3.505e+01 4.332e+01, threshold=6.604e+01, percent-clipped=0.0 2023-12-23 00:18:03,947 INFO [train.py:886] (0/4) Epoch 27, batch 4450, loss[loss=0.01269, audio_tagging_loss=0.01269, over 24750.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4957324.63 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:18:14,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=855840.0, ans=0.125 2023-12-23 00:18:55,917 INFO [train.py:886] (0/4) Epoch 27, batch 4500, loss[loss=0.0124, audio_tagging_loss=0.0124, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4959548.26 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:18:57,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=856106.6666666666, ans=0.0 2023-12-23 00:19:06,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=856173.3333333334, ans=0.0 2023-12-23 00:19:06,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=856173.3333333334, ans=0.125 2023-12-23 00:19:08,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=856173.3333333334, ans=0.125 2023-12-23 00:19:12,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=856173.3333333334, ans=0.95 2023-12-23 00:19:20,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=856240.0, ans=0.04949747468305833 2023-12-23 00:19:27,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=856306.6666666666, ans=0.1 2023-12-23 00:19:36,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=856373.3333333334, ans=0.2 2023-12-23 00:19:37,242 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.864e+01 3.151e+01 3.331e+01 3.446e+01 3.976e+01, threshold=6.663e+01, percent-clipped=0.0 2023-12-23 00:19:39,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=12.0 2023-12-23 00:19:45,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2023-12-23 00:19:45,690 INFO [train.py:886] (0/4) Epoch 27, batch 4550, loss[loss=0.01181, audio_tagging_loss=0.01181, over 24066.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4949394.81 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:20:07,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=856573.3333333334, ans=0.0 2023-12-23 00:20:32,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=856706.6666666666, ans=0.1 2023-12-23 00:20:38,707 INFO [train.py:886] (0/4) Epoch 27, batch 4600, loss[loss=0.01247, audio_tagging_loss=0.01247, over 21816.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4946590.17 frames. ], batch size: 107, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:20:44,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=856773.3333333334, ans=0.125 2023-12-23 00:20:46,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=856773.3333333334, ans=0.125 2023-12-23 00:21:14,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=856973.3333333334, ans=0.0 2023-12-23 00:21:20,174 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.671e+01 3.147e+01 3.251e+01 3.413e+01 3.679e+01, threshold=6.502e+01, percent-clipped=0.0 2023-12-23 00:21:21,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=857040.0, ans=0.2 2023-12-23 00:21:28,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=857106.6666666666, ans=0.0 2023-12-23 00:21:30,182 INFO [train.py:886] (0/4) Epoch 27, batch 4650, loss[loss=0.01133, audio_tagging_loss=0.01133, over 25000.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4957179.98 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:21:32,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.56 vs. limit=22.5 2023-12-23 00:21:55,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=857240.0, ans=0.125 2023-12-23 00:22:20,065 INFO [train.py:886] (0/4) Epoch 27, batch 4700, loss[loss=0.01229, audio_tagging_loss=0.01229, over 24750.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4950237.58 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:22:57,959 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.951e+01 3.169e+01 3.333e+01 3.511e+01 4.698e+01, threshold=6.666e+01, percent-clipped=0.0 2023-12-23 00:23:07,082 INFO [train.py:886] (0/4) Epoch 27, batch 4750, loss[loss=0.01186, audio_tagging_loss=0.01186, over 24750.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4947363.99 frames. ], batch size: 99, lr: 3.96e-03, grad_scale: 64.0 2023-12-23 00:23:07,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=857773.3333333334, ans=0.125 2023-12-23 00:23:12,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=857773.3333333334, ans=0.125 2023-12-23 00:23:16,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=857840.0, ans=0.0 2023-12-23 00:23:22,610 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-27.pt 2023-12-23 00:23:42,401 INFO [train.py:886] (0/4) Epoch 28, batch 0, loss[loss=0.02546, audio_tagging_loss=0.02546, over 25000.00 frames. ], tot_loss[loss=0.02546, audio_tagging_loss=0.02546, over 25000.00 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:23:42,403 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 00:24:03,709 INFO [train.py:917] (0/4) Epoch 28, validation: loss=0.03329, audio_tagging_loss=0.03329, over 3737520.00 frames. 2023-12-23 00:24:03,709 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 00:24:26,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=858013.3333333334, ans=0.125 2023-12-23 00:24:53,493 INFO [train.py:886] (0/4) Epoch 28, batch 50, loss[loss=0.01928, audio_tagging_loss=0.01928, over 25000.00 frames. ], tot_loss[loss=0.02007, audio_tagging_loss=0.02007, over 1119016.49 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:25:08,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.37 vs. limit=22.5 2023-12-23 00:25:20,166 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.879e+01 3.617e+01 4.003e+01 4.694e+01 1.109e+02, threshold=8.005e+01, percent-clipped=9.0 2023-12-23 00:25:40,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2023-12-23 00:25:43,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=858480.0, ans=0.04949747468305833 2023-12-23 00:25:43,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=858480.0, ans=0.1 2023-12-23 00:25:44,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=858546.6666666666, ans=0.125 2023-12-23 00:25:45,109 INFO [train.py:886] (0/4) Epoch 28, batch 100, loss[loss=0.02003, audio_tagging_loss=0.02003, over 25000.00 frames. ], tot_loss[loss=0.01742, audio_tagging_loss=0.01742, over 1967659.62 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:26:10,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=858680.0, ans=0.1 2023-12-23 00:26:35,728 INFO [train.py:886] (0/4) Epoch 28, batch 150, loss[loss=0.01204, audio_tagging_loss=0.01204, over 24750.00 frames. ], tot_loss[loss=0.01594, audio_tagging_loss=0.01594, over 2637925.59 frames. ], batch size: 99, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:26:39,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.46 vs. limit=10.0 2023-12-23 00:26:58,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=859013.3333333334, ans=0.1 2023-12-23 00:27:02,516 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.973e+01 3.280e+01 3.440e+01 3.594e+01 4.041e+01, threshold=6.880e+01, percent-clipped=0.0 2023-12-23 00:27:10,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=859080.0, ans=0.1 2023-12-23 00:27:25,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2023-12-23 00:27:26,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=859213.3333333334, ans=0.04949747468305833 2023-12-23 00:27:27,346 INFO [train.py:886] (0/4) Epoch 28, batch 200, loss[loss=0.01373, audio_tagging_loss=0.01373, over 25000.00 frames. ], tot_loss[loss=0.01499, audio_tagging_loss=0.01499, over 3150400.80 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:27:29,802 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.32 vs. limit=10.0 2023-12-23 00:27:36,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=859280.0, ans=0.125 2023-12-23 00:27:45,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=859280.0, ans=0.1 2023-12-23 00:27:47,345 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:28:10,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=859480.0, ans=0.125 2023-12-23 00:28:17,871 INFO [train.py:886] (0/4) Epoch 28, batch 250, loss[loss=0.01375, audio_tagging_loss=0.01375, over 25000.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 3553290.83 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:28:26,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=859546.6666666666, ans=0.125 2023-12-23 00:28:43,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2023-12-23 00:28:43,837 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.923e+01 3.198e+01 3.303e+01 3.432e+01 3.896e+01, threshold=6.605e+01, percent-clipped=0.0 2023-12-23 00:28:59,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=859813.3333333334, ans=0.125 2023-12-23 00:29:00,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=859813.3333333334, ans=0.1 2023-12-23 00:29:08,729 INFO [train.py:886] (0/4) Epoch 28, batch 300, loss[loss=0.01338, audio_tagging_loss=0.01338, over 25000.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 3863245.54 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:29:13,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.88 vs. limit=22.5 2023-12-23 00:29:31,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=860013.3333333334, ans=0.1 2023-12-23 00:29:57,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=860146.6666666666, ans=0.2 2023-12-23 00:29:59,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=860146.6666666666, ans=0.0 2023-12-23 00:30:01,101 INFO [train.py:886] (0/4) Epoch 28, batch 350, loss[loss=0.01358, audio_tagging_loss=0.01358, over 25000.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4099333.26 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:30:09,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=860213.3333333334, ans=0.035 2023-12-23 00:30:09,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=860213.3333333334, ans=0.125 2023-12-23 00:30:24,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860346.6666666666, ans=0.1 2023-12-23 00:30:28,421 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.837e+01 3.152e+01 3.318e+01 3.475e+01 4.174e+01, threshold=6.637e+01, percent-clipped=0.0 2023-12-23 00:30:40,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=860413.3333333334, ans=0.125 2023-12-23 00:30:44,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=860480.0, ans=0.2 2023-12-23 00:30:48,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=860480.0, ans=0.1 2023-12-23 00:30:51,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=860546.6666666666, ans=0.1 2023-12-23 00:30:52,538 INFO [train.py:886] (0/4) Epoch 28, batch 400, loss[loss=0.0137, audio_tagging_loss=0.0137, over 25000.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4288157.40 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:30:58,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=860546.6666666666, ans=0.0 2023-12-23 00:31:02,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.76 vs. limit=15.0 2023-12-23 00:31:16,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=860680.0, ans=0.2 2023-12-23 00:31:16,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860680.0, ans=0.1 2023-12-23 00:31:18,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=860680.0, ans=0.125 2023-12-23 00:31:29,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=860746.6666666666, ans=0.2 2023-12-23 00:31:29,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=860746.6666666666, ans=0.1 2023-12-23 00:31:44,308 INFO [train.py:886] (0/4) Epoch 28, batch 450, loss[loss=0.01049, audio_tagging_loss=0.01049, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4437920.89 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:31:48,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=860880.0, ans=0.0 2023-12-23 00:31:50,424 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=15.0 2023-12-23 00:32:06,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=861013.3333333334, ans=0.125 2023-12-23 00:32:06,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=861013.3333333334, ans=0.2 2023-12-23 00:32:11,564 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.702e+01 3.063e+01 3.224e+01 3.410e+01 3.950e+01, threshold=6.447e+01, percent-clipped=0.0 2023-12-23 00:32:21,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=861080.0, ans=0.125 2023-12-23 00:32:26,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2023-12-23 00:32:31,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2023-12-23 00:32:34,089 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.37 vs. limit=15.0 2023-12-23 00:32:37,186 INFO [train.py:886] (0/4) Epoch 28, batch 500, loss[loss=0.01323, audio_tagging_loss=0.01323, over 22095.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4549913.31 frames. ], batch size: 107, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:32:37,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=861213.3333333334, ans=0.1 2023-12-23 00:32:43,259 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:32:49,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=861280.0, ans=0.125 2023-12-23 00:33:10,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.58 vs. limit=22.5 2023-12-23 00:33:28,207 INFO [train.py:886] (0/4) Epoch 28, batch 550, loss[loss=0.01183, audio_tagging_loss=0.01183, over 25000.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4645478.28 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:33:32,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=861546.6666666666, ans=0.125 2023-12-23 00:33:39,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=861613.3333333334, ans=0.0 2023-12-23 00:33:45,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2023-12-23 00:33:54,955 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.816e+01 3.118e+01 3.304e+01 3.442e+01 3.885e+01, threshold=6.607e+01, percent-clipped=0.0 2023-12-23 00:34:00,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=861746.6666666666, ans=0.1 2023-12-23 00:34:02,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2023-12-23 00:34:14,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=861813.3333333334, ans=0.125 2023-12-23 00:34:19,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.60 vs. limit=12.0 2023-12-23 00:34:20,598 INFO [train.py:886] (0/4) Epoch 28, batch 600, loss[loss=0.01292, audio_tagging_loss=0.01292, over 24750.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4709719.34 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:34:27,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=861880.0, ans=0.125 2023-12-23 00:34:35,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=861946.6666666666, ans=0.125 2023-12-23 00:34:55,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=862080.0, ans=0.125 2023-12-23 00:35:01,569 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:35:01,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=862146.6666666666, ans=0.0 2023-12-23 00:35:12,859 INFO [train.py:886] (0/4) Epoch 28, batch 650, loss[loss=0.01098, audio_tagging_loss=0.01098, over 24064.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4757009.91 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:35:16,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=862213.3333333334, ans=0.2 2023-12-23 00:35:20,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=862213.3333333334, ans=0.1 2023-12-23 00:35:26,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=862280.0, ans=0.125 2023-12-23 00:35:27,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=862280.0, ans=0.0 2023-12-23 00:35:38,965 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.866e+01 3.196e+01 3.336e+01 3.480e+01 3.806e+01, threshold=6.671e+01, percent-clipped=0.0 2023-12-23 00:35:45,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=862413.3333333334, ans=0.05 2023-12-23 00:36:03,755 INFO [train.py:886] (0/4) Epoch 28, batch 700, loss[loss=0.01183, audio_tagging_loss=0.01183, over 24750.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4800442.64 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:36:12,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=862546.6666666666, ans=0.0 2023-12-23 00:36:20,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=862613.3333333334, ans=0.0 2023-12-23 00:36:52,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=862813.3333333334, ans=0.125 2023-12-23 00:36:55,332 INFO [train.py:886] (0/4) Epoch 28, batch 750, loss[loss=0.01188, audio_tagging_loss=0.01188, over 24750.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4839734.20 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:36:57,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=862880.0, ans=0.125 2023-12-23 00:36:59,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=862880.0, ans=10.0 2023-12-23 00:37:15,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=863013.3333333334, ans=0.1 2023-12-23 00:37:23,192 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.734e+01 3.104e+01 3.286e+01 3.429e+01 3.776e+01, threshold=6.573e+01, percent-clipped=0.0 2023-12-23 00:37:26,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=863080.0, ans=0.125 2023-12-23 00:37:34,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=863080.0, ans=0.0 2023-12-23 00:37:45,978 INFO [train.py:886] (0/4) Epoch 28, batch 800, loss[loss=0.0132, audio_tagging_loss=0.0132, over 24750.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4865112.01 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:37:47,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=863213.3333333334, ans=0.125 2023-12-23 00:38:13,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=863346.6666666666, ans=0.125 2023-12-23 00:38:14,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=863346.6666666666, ans=0.0 2023-12-23 00:38:14,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=863346.6666666666, ans=0.0 2023-12-23 00:38:15,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=863346.6666666666, ans=0.2 2023-12-23 00:38:21,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=863413.3333333334, ans=0.125 2023-12-23 00:38:39,115 INFO [train.py:886] (0/4) Epoch 28, batch 850, loss[loss=0.01365, audio_tagging_loss=0.01365, over 24750.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4889404.46 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:38:50,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=863613.3333333334, ans=0.07 2023-12-23 00:39:06,378 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.957e+01 3.161e+01 3.315e+01 3.496e+01 3.961e+01, threshold=6.630e+01, percent-clipped=0.0 2023-12-23 00:39:09,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=863746.6666666666, ans=0.125 2023-12-23 00:39:11,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=863746.6666666666, ans=0.125 2023-12-23 00:39:31,530 INFO [train.py:886] (0/4) Epoch 28, batch 900, loss[loss=0.01433, audio_tagging_loss=0.01433, over 24750.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4893092.16 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:39:35,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=863880.0, ans=0.125 2023-12-23 00:39:37,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.24 vs. limit=12.0 2023-12-23 00:39:41,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.94 vs. limit=12.0 2023-12-23 00:39:42,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2023-12-23 00:40:06,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=864080.0, ans=0.125 2023-12-23 00:40:10,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=864080.0, ans=0.125 2023-12-23 00:40:21,519 INFO [train.py:886] (0/4) Epoch 28, batch 950, loss[loss=0.01252, audio_tagging_loss=0.01252, over 24750.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4897573.51 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:40:28,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=864213.3333333334, ans=0.0 2023-12-23 00:40:30,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=864213.3333333334, ans=10.0 2023-12-23 00:40:35,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=864280.0, ans=0.125 2023-12-23 00:40:37,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=864280.0, ans=0.5 2023-12-23 00:40:38,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=864280.0, ans=0.125 2023-12-23 00:40:42,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=864346.6666666666, ans=0.1 2023-12-23 00:40:44,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=864346.6666666666, ans=0.125 2023-12-23 00:40:47,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=864346.6666666666, ans=0.1 2023-12-23 00:40:48,394 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.699e+01 3.226e+01 3.347e+01 3.519e+01 4.208e+01, threshold=6.694e+01, percent-clipped=0.0 2023-12-23 00:40:53,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2023-12-23 00:40:55,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.07 vs. limit=22.5 2023-12-23 00:41:12,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=864480.0, ans=0.125 2023-12-23 00:41:14,084 INFO [train.py:886] (0/4) Epoch 28, batch 1000, loss[loss=0.01196, audio_tagging_loss=0.01196, over 25000.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4906171.50 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:41:18,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=864546.6666666666, ans=0.5 2023-12-23 00:41:20,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=864546.6666666666, ans=0.0 2023-12-23 00:41:41,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=864680.0, ans=0.0 2023-12-23 00:41:53,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=864746.6666666666, ans=0.125 2023-12-23 00:41:56,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=864813.3333333334, ans=0.0 2023-12-23 00:42:05,068 INFO [train.py:886] (0/4) Epoch 28, batch 1050, loss[loss=0.01254, audio_tagging_loss=0.01254, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4917128.85 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:42:17,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=864946.6666666666, ans=0.95 2023-12-23 00:42:31,253 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.791e+01 3.118e+01 3.211e+01 3.407e+01 4.036e+01, threshold=6.422e+01, percent-clipped=0.0 2023-12-23 00:42:56,968 INFO [train.py:886] (0/4) Epoch 28, batch 1100, loss[loss=0.0121, audio_tagging_loss=0.0121, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4928908.77 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:43:03,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=865213.3333333334, ans=22.5 2023-12-23 00:43:24,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=865346.6666666666, ans=0.1 2023-12-23 00:43:43,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.12 vs. limit=22.5 2023-12-23 00:43:48,804 INFO [train.py:886] (0/4) Epoch 28, batch 1150, loss[loss=0.01316, audio_tagging_loss=0.01316, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4934515.85 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:43:51,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=865546.6666666666, ans=15.0 2023-12-23 00:44:10,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.53 vs. limit=10.0 2023-12-23 00:44:14,775 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+01 3.161e+01 3.263e+01 3.387e+01 3.814e+01, threshold=6.527e+01, percent-clipped=0.0 2023-12-23 00:44:35,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=865813.3333333334, ans=0.125 2023-12-23 00:44:38,945 INFO [train.py:886] (0/4) Epoch 28, batch 1200, loss[loss=0.01109, audio_tagging_loss=0.01109, over 24750.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4944273.85 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:44:56,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=865946.6666666666, ans=0.0 2023-12-23 00:45:21,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=866146.6666666666, ans=0.2 2023-12-23 00:45:23,035 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.57 vs. limit=15.0 2023-12-23 00:45:30,752 INFO [train.py:886] (0/4) Epoch 28, batch 1250, loss[loss=0.01451, audio_tagging_loss=0.01451, over 21665.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4937634.37 frames. ], batch size: 107, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:45:34,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=12.0 2023-12-23 00:45:47,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2023-12-23 00:45:53,207 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:45:53,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=866346.6666666666, ans=0.0 2023-12-23 00:45:56,741 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.732e+01 3.240e+01 3.398e+01 3.546e+01 3.975e+01, threshold=6.796e+01, percent-clipped=0.0 2023-12-23 00:46:06,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.04 vs. limit=22.5 2023-12-23 00:46:08,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=866413.3333333334, ans=0.1 2023-12-23 00:46:10,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=866480.0, ans=0.0 2023-12-23 00:46:14,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.73 vs. limit=6.0 2023-12-23 00:46:21,499 INFO [train.py:886] (0/4) Epoch 28, batch 1300, loss[loss=0.01324, audio_tagging_loss=0.01324, over 24750.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4932172.48 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:46:34,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=866613.3333333334, ans=0.125 2023-12-23 00:46:50,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2023-12-23 00:47:09,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=866813.3333333334, ans=0.1 2023-12-23 00:47:11,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=866880.0, ans=0.025 2023-12-23 00:47:12,305 INFO [train.py:886] (0/4) Epoch 28, batch 1350, loss[loss=0.01316, audio_tagging_loss=0.01316, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4932970.41 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:47:15,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=866880.0, ans=0.02 2023-12-23 00:47:32,079 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.85 vs. limit=10.0 2023-12-23 00:47:33,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.35 vs. limit=22.5 2023-12-23 00:47:39,125 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.836e+01 3.136e+01 3.279e+01 3.497e+01 4.059e+01, threshold=6.558e+01, percent-clipped=0.0 2023-12-23 00:47:45,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=867080.0, ans=0.125 2023-12-23 00:47:55,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=867146.6666666666, ans=0.125 2023-12-23 00:47:57,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=867146.6666666666, ans=0.035 2023-12-23 00:48:00,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=867146.6666666666, ans=0.125 2023-12-23 00:48:03,362 INFO [train.py:886] (0/4) Epoch 28, batch 1400, loss[loss=0.01424, audio_tagging_loss=0.01424, over 24750.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4941180.85 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:48:04,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=867213.3333333334, ans=0.2 2023-12-23 00:48:26,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=867346.6666666666, ans=0.09899494936611666 2023-12-23 00:48:28,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=867346.6666666666, ans=0.125 2023-12-23 00:48:31,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.35 vs. limit=15.0 2023-12-23 00:48:33,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.83 vs. limit=22.5 2023-12-23 00:48:38,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=867413.3333333334, ans=0.125 2023-12-23 00:48:43,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2023-12-23 00:48:51,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=867480.0, ans=0.125 2023-12-23 00:48:54,805 INFO [train.py:886] (0/4) Epoch 28, batch 1450, loss[loss=0.01334, audio_tagging_loss=0.01334, over 25000.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4944663.65 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:48:55,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=867546.6666666666, ans=0.0 2023-12-23 00:48:57,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=867546.6666666666, ans=0.0 2023-12-23 00:49:02,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=867546.6666666666, ans=0.125 2023-12-23 00:49:04,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=867546.6666666666, ans=0.125 2023-12-23 00:49:08,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=867613.3333333334, ans=0.0 2023-12-23 00:49:22,066 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.828e+01 3.154e+01 3.288e+01 3.472e+01 3.862e+01, threshold=6.577e+01, percent-clipped=0.0 2023-12-23 00:49:23,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=867680.0, ans=0.0 2023-12-23 00:49:26,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867746.6666666666, ans=0.1 2023-12-23 00:49:46,851 INFO [train.py:886] (0/4) Epoch 28, batch 1500, loss[loss=0.01198, audio_tagging_loss=0.01198, over 25000.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4949057.94 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:49:47,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=867880.0, ans=0.05 2023-12-23 00:49:56,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=867946.6666666666, ans=0.0 2023-12-23 00:50:15,047 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2023-12-23 00:50:20,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=868080.0, ans=0.125 2023-12-23 00:50:38,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=868213.3333333334, ans=0.035 2023-12-23 00:50:39,575 INFO [train.py:886] (0/4) Epoch 28, batch 1550, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4954311.22 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:50:46,897 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.56 vs. limit=5.0 2023-12-23 00:50:56,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=868280.0, ans=0.0 2023-12-23 00:51:04,678 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.893e+01 3.216e+01 3.352e+01 3.508e+01 3.923e+01, threshold=6.705e+01, percent-clipped=0.0 2023-12-23 00:51:29,615 INFO [train.py:886] (0/4) Epoch 28, batch 1600, loss[loss=0.01317, audio_tagging_loss=0.01317, over 24750.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4949813.37 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:51:30,150 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2023-12-23 00:51:44,502 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2023-12-23 00:51:45,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2023-12-23 00:51:52,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=12.0 2023-12-23 00:52:01,041 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:52:20,934 INFO [train.py:886] (0/4) Epoch 28, batch 1650, loss[loss=0.01092, audio_tagging_loss=0.01092, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4951487.81 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:52:23,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=868880.0, ans=0.125 2023-12-23 00:52:38,175 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.40 vs. limit=6.0 2023-12-23 00:52:42,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=869013.3333333334, ans=0.2 2023-12-23 00:52:47,482 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.836e+01 3.157e+01 3.317e+01 3.477e+01 3.959e+01, threshold=6.634e+01, percent-clipped=0.0 2023-12-23 00:52:59,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=869080.0, ans=0.0 2023-12-23 00:53:04,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=869146.6666666666, ans=0.09899494936611666 2023-12-23 00:53:11,750 INFO [train.py:886] (0/4) Epoch 28, batch 1700, loss[loss=0.009928, audio_tagging_loss=0.009928, over 25000.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4950015.54 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:53:17,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=869213.3333333334, ans=0.0 2023-12-23 00:53:21,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=869280.0, ans=0.025 2023-12-23 00:53:47,555 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:53:51,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.59 vs. limit=10.0 2023-12-23 00:53:53,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=869480.0, ans=0.1 2023-12-23 00:54:02,267 INFO [train.py:886] (0/4) Epoch 28, batch 1750, loss[loss=0.01367, audio_tagging_loss=0.01367, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4958247.41 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:54:05,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=869546.6666666666, ans=0.125 2023-12-23 00:54:20,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=869613.3333333334, ans=0.125 2023-12-23 00:54:27,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=869680.0, ans=0.0 2023-12-23 00:54:29,675 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.893e+01 3.178e+01 3.276e+01 3.403e+01 3.949e+01, threshold=6.552e+01, percent-clipped=0.0 2023-12-23 00:54:30,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=869680.0, ans=0.125 2023-12-23 00:54:46,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=869813.3333333334, ans=0.2 2023-12-23 00:54:54,480 INFO [train.py:886] (0/4) Epoch 28, batch 1800, loss[loss=0.01472, audio_tagging_loss=0.01472, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4955088.51 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:55:28,977 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:55:36,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=870146.6666666666, ans=0.0 2023-12-23 00:55:43,834 INFO [train.py:886] (0/4) Epoch 28, batch 1850, loss[loss=0.01436, audio_tagging_loss=0.01436, over 24750.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4952909.08 frames. ], batch size: 99, lr: 3.86e-03, grad_scale: 32.0 2023-12-23 00:55:59,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.66 vs. limit=15.0 2023-12-23 00:56:00,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.42 vs. limit=22.5 2023-12-23 00:56:02,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=870280.0, ans=0.0 2023-12-23 00:56:03,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=870280.0, ans=0.125 2023-12-23 00:56:06,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.45 vs. limit=22.5 2023-12-23 00:56:10,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=870346.6666666666, ans=0.0 2023-12-23 00:56:11,269 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.825e+01 3.215e+01 3.384e+01 3.493e+01 4.271e+01, threshold=6.769e+01, percent-clipped=0.0 2023-12-23 00:56:12,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=870346.6666666666, ans=0.0 2023-12-23 00:56:14,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=870413.3333333334, ans=0.125 2023-12-23 00:56:16,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=870413.3333333334, ans=0.1 2023-12-23 00:56:28,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.56 vs. limit=22.5 2023-12-23 00:56:36,123 INFO [train.py:886] (0/4) Epoch 28, batch 1900, loss[loss=0.01126, audio_tagging_loss=0.01126, over 24750.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4946192.02 frames. ], batch size: 99, lr: 3.86e-03, grad_scale: 32.0 2023-12-23 00:56:47,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=870613.3333333334, ans=0.125 2023-12-23 00:56:48,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=870613.3333333334, ans=0.2 2023-12-23 00:57:04,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=870680.0, ans=0.2 2023-12-23 00:57:10,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.68 vs. limit=15.0 2023-12-23 00:57:25,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=870813.3333333334, ans=0.125 2023-12-23 00:57:28,663 INFO [train.py:886] (0/4) Epoch 28, batch 1950, loss[loss=0.01209, audio_tagging_loss=0.01209, over 25000.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4945249.31 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 32.0 2023-12-23 00:57:35,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=870880.0, ans=0.2 2023-12-23 00:57:53,802 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.813e+01 3.143e+01 3.252e+01 3.432e+01 3.799e+01, threshold=6.504e+01, percent-clipped=0.0 2023-12-23 00:58:11,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=871146.6666666666, ans=0.0 2023-12-23 00:58:14,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=871146.6666666666, ans=0.0 2023-12-23 00:58:17,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=871146.6666666666, ans=0.1 2023-12-23 00:58:19,549 INFO [train.py:886] (0/4) Epoch 28, batch 2000, loss[loss=0.01176, audio_tagging_loss=0.01176, over 24750.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4952470.16 frames. ], batch size: 99, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 00:58:31,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2023-12-23 00:58:32,450 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:58:32,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=871280.0, ans=0.125 2023-12-23 00:58:33,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=871280.0, ans=0.0 2023-12-23 00:58:35,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.13 vs. limit=22.5 2023-12-23 00:58:40,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=871346.6666666666, ans=0.0 2023-12-23 00:58:51,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=871413.3333333334, ans=0.0 2023-12-23 00:58:53,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=871413.3333333334, ans=0.025 2023-12-23 00:58:54,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2023-12-23 00:59:11,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=871546.6666666666, ans=0.125 2023-12-23 00:59:11,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=871546.6666666666, ans=0.125 2023-12-23 00:59:11,803 INFO [train.py:886] (0/4) Epoch 28, batch 2050, loss[loss=0.01113, audio_tagging_loss=0.01113, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4950382.26 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 00:59:13,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=871546.6666666666, ans=0.0 2023-12-23 00:59:17,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.49 vs. limit=10.0 2023-12-23 00:59:22,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=871613.3333333334, ans=0.125 2023-12-23 00:59:27,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=871613.3333333334, ans=0.125 2023-12-23 00:59:29,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=871613.3333333334, ans=0.125 2023-12-23 00:59:35,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2023-12-23 00:59:38,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=871680.0, ans=0.125 2023-12-23 00:59:38,686 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.794e+01 3.157e+01 3.303e+01 3.497e+01 3.860e+01, threshold=6.607e+01, percent-clipped=0.0 2023-12-23 00:59:41,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.10 vs. limit=10.0 2023-12-23 00:59:41,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=871680.0, ans=0.0 2023-12-23 00:59:48,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=871746.6666666666, ans=0.125 2023-12-23 01:00:00,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=871813.3333333334, ans=0.07 2023-12-23 01:00:02,003 INFO [train.py:886] (0/4) Epoch 28, batch 2100, loss[loss=0.01205, audio_tagging_loss=0.01205, over 25000.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4955799.59 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:00:04,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=871880.0, ans=0.0 2023-12-23 01:00:15,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=871946.6666666666, ans=0.0 2023-12-23 01:00:17,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=871946.6666666666, ans=0.0 2023-12-23 01:00:18,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=871946.6666666666, ans=0.015 2023-12-23 01:00:20,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=871946.6666666666, ans=0.0 2023-12-23 01:00:22,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=872013.3333333334, ans=0.125 2023-12-23 01:00:30,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=872013.3333333334, ans=0.125 2023-12-23 01:00:30,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2023-12-23 01:00:35,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=872080.0, ans=0.2 2023-12-23 01:00:37,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=12.0 2023-12-23 01:00:41,885 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2023-12-23 01:00:54,486 INFO [train.py:886] (0/4) Epoch 28, batch 2150, loss[loss=0.01318, audio_tagging_loss=0.01318, over 25000.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4957199.87 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:01:21,866 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.753e+01 3.234e+01 3.354e+01 3.492e+01 4.264e+01, threshold=6.708e+01, percent-clipped=0.0 2023-12-23 01:01:46,228 INFO [train.py:886] (0/4) Epoch 28, batch 2200, loss[loss=0.01248, audio_tagging_loss=0.01248, over 23975.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4950969.54 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:01:48,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=872546.6666666666, ans=0.0 2023-12-23 01:01:58,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=872613.3333333334, ans=0.0 2023-12-23 01:01:59,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=872613.3333333334, ans=0.125 2023-12-23 01:02:30,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=872813.3333333334, ans=0.0 2023-12-23 01:02:31,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=872813.3333333334, ans=0.2 2023-12-23 01:02:38,063 INFO [train.py:886] (0/4) Epoch 28, batch 2250, loss[loss=0.01242, audio_tagging_loss=0.01242, over 25000.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4946942.95 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:02:46,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=872880.0, ans=10.0 2023-12-23 01:03:04,778 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.837e+01 3.155e+01 3.335e+01 3.468e+01 4.219e+01, threshold=6.670e+01, percent-clipped=0.0 2023-12-23 01:03:11,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=873080.0, ans=0.0 2023-12-23 01:03:14,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-12-23 01:03:15,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2023-12-23 01:03:16,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=873080.0, ans=0.0 2023-12-23 01:03:30,847 INFO [train.py:886] (0/4) Epoch 28, batch 2300, loss[loss=0.01279, audio_tagging_loss=0.01279, over 24750.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4944568.12 frames. ], batch size: 99, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:03:33,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=873213.3333333334, ans=0.1 2023-12-23 01:03:35,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=873213.3333333334, ans=0.95 2023-12-23 01:03:47,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=873280.0, ans=0.0 2023-12-23 01:03:51,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=873346.6666666666, ans=0.125 2023-12-23 01:04:13,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=873480.0, ans=0.1 2023-12-23 01:04:22,665 INFO [train.py:886] (0/4) Epoch 28, batch 2350, loss[loss=0.01151, audio_tagging_loss=0.01151, over 24750.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4950781.04 frames. ], batch size: 99, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:04:29,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=873546.6666666666, ans=0.125 2023-12-23 01:04:48,802 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.776e+01 3.130e+01 3.254e+01 3.391e+01 3.968e+01, threshold=6.508e+01, percent-clipped=0.0 2023-12-23 01:04:57,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=873746.6666666666, ans=0.1 2023-12-23 01:05:02,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=873813.3333333334, ans=0.1 2023-12-23 01:05:04,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=873813.3333333334, ans=0.1 2023-12-23 01:05:05,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=873813.3333333334, ans=0.125 2023-12-23 01:05:13,635 INFO [train.py:886] (0/4) Epoch 28, batch 2400, loss[loss=0.01279, audio_tagging_loss=0.01279, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4953910.55 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:05:19,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=873880.0, ans=0.125 2023-12-23 01:05:58,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=874146.6666666666, ans=0.1 2023-12-23 01:06:02,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=874146.6666666666, ans=0.125 2023-12-23 01:06:03,940 INFO [train.py:886] (0/4) Epoch 28, batch 2450, loss[loss=0.01549, audio_tagging_loss=0.01549, over 25000.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4960539.99 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:06:17,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=874280.0, ans=0.125 2023-12-23 01:06:31,240 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.962e+01 3.185e+01 3.322e+01 3.528e+01 4.944e+01, threshold=6.645e+01, percent-clipped=0.0 2023-12-23 01:06:39,790 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:06:55,570 INFO [train.py:886] (0/4) Epoch 28, batch 2500, loss[loss=0.0141, audio_tagging_loss=0.0141, over 24750.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4958633.70 frames. ], batch size: 99, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:07:11,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=874613.3333333334, ans=0.125 2023-12-23 01:07:12,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=874613.3333333334, ans=0.0 2023-12-23 01:07:13,624 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:07:18,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=874680.0, ans=0.125 2023-12-23 01:07:23,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=874680.0, ans=0.2 2023-12-23 01:07:29,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=874746.6666666666, ans=0.125 2023-12-23 01:07:34,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=874746.6666666666, ans=0.2 2023-12-23 01:07:36,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=874813.3333333334, ans=0.125 2023-12-23 01:07:46,628 INFO [train.py:886] (0/4) Epoch 28, batch 2550, loss[loss=0.01346, audio_tagging_loss=0.01346, over 24932.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4954253.13 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:08:04,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=874946.6666666666, ans=0.1 2023-12-23 01:08:14,113 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.835e+01 3.213e+01 3.387e+01 3.518e+01 3.975e+01, threshold=6.773e+01, percent-clipped=0.0 2023-12-23 01:08:16,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=875013.3333333334, ans=0.09899494936611666 2023-12-23 01:08:23,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=875080.0, ans=0.125 2023-12-23 01:08:27,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=875146.6666666666, ans=0.125 2023-12-23 01:08:38,589 INFO [train.py:886] (0/4) Epoch 28, batch 2600, loss[loss=0.01334, audio_tagging_loss=0.01334, over 24750.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4953394.70 frames. ], batch size: 99, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:08:49,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=875280.0, ans=0.1 2023-12-23 01:09:03,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=875346.6666666666, ans=0.0 2023-12-23 01:09:04,927 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:09:15,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=875413.3333333334, ans=0.125 2023-12-23 01:09:17,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.75 vs. limit=10.0 2023-12-23 01:09:20,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2023-12-23 01:09:24,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=875480.0, ans=0.1 2023-12-23 01:09:30,270 INFO [train.py:886] (0/4) Epoch 28, batch 2650, loss[loss=0.0137, audio_tagging_loss=0.0137, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4946791.20 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:09:49,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=875613.3333333334, ans=0.125 2023-12-23 01:09:56,413 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.754e+01 3.093e+01 3.250e+01 3.446e+01 3.869e+01, threshold=6.500e+01, percent-clipped=0.0 2023-12-23 01:10:01,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=875746.6666666666, ans=0.1 2023-12-23 01:10:18,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.06 vs. limit=10.0 2023-12-23 01:10:21,854 INFO [train.py:886] (0/4) Epoch 28, batch 2700, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4950760.53 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:10:25,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=875880.0, ans=0.1 2023-12-23 01:10:37,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=875946.6666666666, ans=0.0 2023-12-23 01:10:47,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=876013.3333333334, ans=0.2 2023-12-23 01:10:56,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=876080.0, ans=0.125 2023-12-23 01:10:59,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876080.0, ans=0.1 2023-12-23 01:11:00,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=876080.0, ans=0.125 2023-12-23 01:11:12,600 INFO [train.py:886] (0/4) Epoch 28, batch 2750, loss[loss=0.0144, audio_tagging_loss=0.0144, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4957795.44 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:11:12,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=876213.3333333334, ans=0.125 2023-12-23 01:11:22,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=876280.0, ans=0.125 2023-12-23 01:11:24,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=22.5 2023-12-23 01:11:39,279 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.779e+01 3.198e+01 3.346e+01 3.453e+01 3.797e+01, threshold=6.692e+01, percent-clipped=0.0 2023-12-23 01:11:42,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=876346.6666666666, ans=0.125 2023-12-23 01:11:42,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=876346.6666666666, ans=0.125 2023-12-23 01:12:04,157 INFO [train.py:886] (0/4) Epoch 28, batch 2800, loss[loss=0.01241, audio_tagging_loss=0.01241, over 24750.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4956198.80 frames. ], batch size: 99, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:12:09,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=876546.6666666666, ans=0.035 2023-12-23 01:12:30,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-12-23 01:12:39,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=876746.6666666666, ans=0.0 2023-12-23 01:12:52,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=876813.3333333334, ans=0.125 2023-12-23 01:12:55,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=876813.3333333334, ans=0.2 2023-12-23 01:12:56,765 INFO [train.py:886] (0/4) Epoch 28, batch 2850, loss[loss=0.01348, audio_tagging_loss=0.01348, over 24750.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4950066.72 frames. ], batch size: 99, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:12:56,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=876880.0, ans=0.015 2023-12-23 01:13:08,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=876946.6666666666, ans=0.125 2023-12-23 01:13:20,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=877013.3333333334, ans=0.0 2023-12-23 01:13:23,945 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.944e+01 3.205e+01 3.349e+01 3.516e+01 3.954e+01, threshold=6.699e+01, percent-clipped=0.0 2023-12-23 01:13:26,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=877013.3333333334, ans=0.125 2023-12-23 01:13:47,382 INFO [train.py:886] (0/4) Epoch 28, batch 2900, loss[loss=0.0123, audio_tagging_loss=0.0123, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4942276.72 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:14:23,194 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.07 vs. limit=15.0 2023-12-23 01:14:38,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=877480.0, ans=0.0 2023-12-23 01:14:41,104 INFO [train.py:886] (0/4) Epoch 28, batch 2950, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4945414.15 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:15:08,593 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.855e+01 3.109e+01 3.285e+01 3.424e+01 3.829e+01, threshold=6.571e+01, percent-clipped=0.0 2023-12-23 01:15:13,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=877746.6666666666, ans=0.0 2023-12-23 01:15:14,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2023-12-23 01:15:16,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=877746.6666666666, ans=0.1 2023-12-23 01:15:19,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=12.0 2023-12-23 01:15:32,512 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.69 vs. limit=15.0 2023-12-23 01:15:33,434 INFO [train.py:886] (0/4) Epoch 28, batch 3000, loss[loss=0.01332, audio_tagging_loss=0.01332, over 25000.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4953463.92 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:15:33,435 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 01:15:51,141 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.0187, 2.0555, 3.1017, 2.3506, 3.6317, 2.5938, 1.2866, 2.1279], device='cuda:0') 2023-12-23 01:15:54,365 INFO [train.py:917] (0/4) Epoch 28, validation: loss=0.03338, audio_tagging_loss=0.03338, over 3737520.00 frames. 2023-12-23 01:15:54,366 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 01:16:16,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=878013.3333333334, ans=0.0 2023-12-23 01:16:29,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.98 vs. limit=22.5 2023-12-23 01:16:31,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=878080.0, ans=0.04949747468305833 2023-12-23 01:16:46,338 INFO [train.py:886] (0/4) Epoch 28, batch 3050, loss[loss=0.01421, audio_tagging_loss=0.01421, over 25000.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4955197.91 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:16:46,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.65 vs. limit=10.0 2023-12-23 01:16:56,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.53 vs. limit=10.0 2023-12-23 01:17:09,216 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.36 vs. limit=15.0 2023-12-23 01:17:14,049 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.873e+01 3.142e+01 3.296e+01 3.491e+01 4.100e+01, threshold=6.591e+01, percent-clipped=0.0 2023-12-23 01:17:20,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=878413.3333333334, ans=0.09899494936611666 2023-12-23 01:17:25,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=878413.3333333334, ans=0.0 2023-12-23 01:17:34,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=878480.0, ans=0.125 2023-12-23 01:17:37,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=878546.6666666666, ans=0.2 2023-12-23 01:17:38,118 INFO [train.py:886] (0/4) Epoch 28, batch 3100, loss[loss=0.01229, audio_tagging_loss=0.01229, over 25000.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4956833.07 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:17:43,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=878546.6666666666, ans=0.0 2023-12-23 01:17:59,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=878680.0, ans=0.125 2023-12-23 01:18:01,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=878680.0, ans=0.0 2023-12-23 01:18:06,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=878680.0, ans=0.125 2023-12-23 01:18:12,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=878746.6666666666, ans=0.1 2023-12-23 01:18:29,893 INFO [train.py:886] (0/4) Epoch 28, batch 3150, loss[loss=0.01782, audio_tagging_loss=0.01782, over 25000.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4951866.27 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:18:57,079 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.894e+01 3.254e+01 3.356e+01 3.478e+01 4.076e+01, threshold=6.712e+01, percent-clipped=0.0 2023-12-23 01:19:09,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=879080.0, ans=0.2 2023-12-23 01:19:17,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=879146.6666666666, ans=0.0 2023-12-23 01:19:20,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=879146.6666666666, ans=0.05 2023-12-23 01:19:22,608 INFO [train.py:886] (0/4) Epoch 28, batch 3200, loss[loss=0.009951, audio_tagging_loss=0.009951, over 25000.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4945549.40 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:19:33,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=879280.0, ans=0.0 2023-12-23 01:19:45,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2023-12-23 01:19:49,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=879346.6666666666, ans=0.125 2023-12-23 01:20:04,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2023-12-23 01:20:13,634 INFO [train.py:886] (0/4) Epoch 28, batch 3250, loss[loss=0.01323, audio_tagging_loss=0.01323, over 25000.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4948916.21 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:20:20,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=879546.6666666666, ans=0.0 2023-12-23 01:20:20,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.27 vs. limit=22.5 2023-12-23 01:20:32,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=879613.3333333334, ans=10.0 2023-12-23 01:20:36,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=879680.0, ans=0.0 2023-12-23 01:20:40,466 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.820e+01 3.146e+01 3.294e+01 3.408e+01 4.034e+01, threshold=6.589e+01, percent-clipped=0.0 2023-12-23 01:21:05,286 INFO [train.py:886] (0/4) Epoch 28, batch 3300, loss[loss=0.01378, audio_tagging_loss=0.01378, over 25000.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4953242.42 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:21:18,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=879946.6666666666, ans=0.125 2023-12-23 01:21:22,958 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-132000.pt 2023-12-23 01:21:38,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=880080.0, ans=0.0 2023-12-23 01:21:55,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=880146.6666666666, ans=0.125 2023-12-23 01:21:59,415 INFO [train.py:886] (0/4) Epoch 28, batch 3350, loss[loss=0.013, audio_tagging_loss=0.013, over 24750.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4957709.52 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:21:59,928 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.03 vs. limit=15.0 2023-12-23 01:22:04,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=880213.3333333334, ans=0.125 2023-12-23 01:22:10,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=880280.0, ans=0.125 2023-12-23 01:22:13,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=880280.0, ans=0.1 2023-12-23 01:22:26,187 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.722e+01 3.184e+01 3.328e+01 3.464e+01 4.025e+01, threshold=6.657e+01, percent-clipped=0.0 2023-12-23 01:22:50,983 INFO [train.py:886] (0/4) Epoch 28, batch 3400, loss[loss=0.01219, audio_tagging_loss=0.01219, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4959551.51 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:22:57,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=880546.6666666666, ans=0.125 2023-12-23 01:23:31,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=880813.3333333334, ans=0.1 2023-12-23 01:23:35,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2023-12-23 01:23:43,681 INFO [train.py:886] (0/4) Epoch 28, batch 3450, loss[loss=0.01113, audio_tagging_loss=0.01113, over 24750.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4947396.48 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:23:46,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=880880.0, ans=0.0 2023-12-23 01:24:10,423 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.974e+01 3.236e+01 3.395e+01 3.566e+01 3.957e+01, threshold=6.789e+01, percent-clipped=0.0 2023-12-23 01:24:27,687 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:24:30,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.83 vs. limit=6.0 2023-12-23 01:24:35,510 INFO [train.py:886] (0/4) Epoch 28, batch 3500, loss[loss=0.01097, audio_tagging_loss=0.01097, over 24750.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4944850.51 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:25:10,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=881413.3333333334, ans=0.125 2023-12-23 01:25:27,303 INFO [train.py:886] (0/4) Epoch 28, batch 3550, loss[loss=0.01176, audio_tagging_loss=0.01176, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4943150.33 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:25:31,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=881546.6666666666, ans=0.2 2023-12-23 01:25:46,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=881613.3333333334, ans=0.2 2023-12-23 01:25:47,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=881680.0, ans=0.0 2023-12-23 01:25:49,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.01 vs. limit=15.0 2023-12-23 01:25:53,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2023-12-23 01:25:54,595 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.851e+01 3.124e+01 3.262e+01 3.428e+01 4.045e+01, threshold=6.525e+01, percent-clipped=0.0 2023-12-23 01:25:55,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=881680.0, ans=0.0 2023-12-23 01:26:00,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=881746.6666666666, ans=0.125 2023-12-23 01:26:12,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=881813.3333333334, ans=0.0 2023-12-23 01:26:19,148 INFO [train.py:886] (0/4) Epoch 28, batch 3600, loss[loss=0.01093, audio_tagging_loss=0.01093, over 25000.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4948067.97 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:26:20,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=881880.0, ans=0.0 2023-12-23 01:27:00,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=882146.6666666666, ans=0.0 2023-12-23 01:27:10,031 INFO [train.py:886] (0/4) Epoch 28, batch 3650, loss[loss=0.01331, audio_tagging_loss=0.01331, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4955305.35 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:27:10,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=882213.3333333334, ans=0.2 2023-12-23 01:27:13,699 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:27:16,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-12-23 01:27:24,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2023-12-23 01:27:27,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=882280.0, ans=0.04949747468305833 2023-12-23 01:27:29,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=882280.0, ans=0.0 2023-12-23 01:27:31,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=882346.6666666666, ans=0.1 2023-12-23 01:27:32,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=882346.6666666666, ans=0.125 2023-12-23 01:27:33,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=882346.6666666666, ans=0.125 2023-12-23 01:27:34,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=882346.6666666666, ans=0.125 2023-12-23 01:27:37,444 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.830e+01 3.117e+01 3.248e+01 3.424e+01 3.950e+01, threshold=6.496e+01, percent-clipped=0.0 2023-12-23 01:27:38,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=882346.6666666666, ans=0.0 2023-12-23 01:27:40,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=882413.3333333334, ans=0.125 2023-12-23 01:28:02,354 INFO [train.py:886] (0/4) Epoch 28, batch 3700, loss[loss=0.01388, audio_tagging_loss=0.01388, over 21629.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4950327.72 frames. ], batch size: 107, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:28:07,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=882546.6666666666, ans=0.125 2023-12-23 01:28:09,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=882546.6666666666, ans=0.0 2023-12-23 01:28:13,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=882613.3333333334, ans=0.0 2023-12-23 01:28:34,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=882746.6666666666, ans=0.0 2023-12-23 01:28:45,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=882813.3333333334, ans=0.125 2023-12-23 01:28:49,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=882813.3333333334, ans=0.125 2023-12-23 01:28:54,362 INFO [train.py:886] (0/4) Epoch 28, batch 3750, loss[loss=0.01146, audio_tagging_loss=0.01146, over 24750.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4946535.33 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:28:56,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=882880.0, ans=0.125 2023-12-23 01:28:59,885 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:29:12,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=882946.6666666666, ans=0.1 2023-12-23 01:29:21,073 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.970e+01 3.212e+01 3.329e+01 3.465e+01 4.032e+01, threshold=6.658e+01, percent-clipped=0.0 2023-12-23 01:29:23,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2023-12-23 01:29:29,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=883080.0, ans=0.0 2023-12-23 01:29:33,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=883080.0, ans=0.125 2023-12-23 01:29:44,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=883146.6666666666, ans=0.2 2023-12-23 01:29:45,892 INFO [train.py:886] (0/4) Epoch 28, batch 3800, loss[loss=0.01175, audio_tagging_loss=0.01175, over 24750.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4941804.16 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:29:48,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=883213.3333333334, ans=0.1 2023-12-23 01:30:38,183 INFO [train.py:886] (0/4) Epoch 28, batch 3850, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24750.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4944724.17 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:30:54,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=883613.3333333334, ans=0.125 2023-12-23 01:31:05,057 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.788e+01 3.155e+01 3.335e+01 3.521e+01 4.328e+01, threshold=6.670e+01, percent-clipped=0.0 2023-12-23 01:31:11,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=883746.6666666666, ans=0.125 2023-12-23 01:31:29,956 INFO [train.py:886] (0/4) Epoch 28, batch 3900, loss[loss=0.01096, audio_tagging_loss=0.01096, over 25000.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4941755.36 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:31:35,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=883880.0, ans=0.0 2023-12-23 01:31:35,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=883880.0, ans=0.07 2023-12-23 01:31:37,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=883880.0, ans=0.09899494936611666 2023-12-23 01:32:03,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=884080.0, ans=0.125 2023-12-23 01:32:21,902 INFO [train.py:886] (0/4) Epoch 28, batch 3950, loss[loss=0.01268, audio_tagging_loss=0.01268, over 25000.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4945865.33 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:32:23,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=884213.3333333334, ans=0.5 2023-12-23 01:32:31,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=884280.0, ans=0.125 2023-12-23 01:32:35,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=884280.0, ans=0.5 2023-12-23 01:32:41,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=884280.0, ans=0.05 2023-12-23 01:32:47,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=884346.6666666666, ans=0.5 2023-12-23 01:32:49,270 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.873e+01 3.204e+01 3.330e+01 3.435e+01 3.816e+01, threshold=6.660e+01, percent-clipped=0.0 2023-12-23 01:32:54,651 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.54 vs. limit=22.5 2023-12-23 01:32:57,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=884413.3333333334, ans=0.09899494936611666 2023-12-23 01:33:10,155 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.82 vs. limit=15.0 2023-12-23 01:33:12,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=884480.0, ans=0.2 2023-12-23 01:33:13,909 INFO [train.py:886] (0/4) Epoch 28, batch 4000, loss[loss=0.01217, audio_tagging_loss=0.01217, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4950527.33 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 128.0 2023-12-23 01:33:40,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=884680.0, ans=0.125 2023-12-23 01:33:50,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.40 vs. limit=22.5 2023-12-23 01:33:56,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=884813.3333333334, ans=0.125 2023-12-23 01:33:58,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=884813.3333333334, ans=0.125 2023-12-23 01:34:03,576 INFO [train.py:886] (0/4) Epoch 28, batch 4050, loss[loss=0.01444, audio_tagging_loss=0.01444, over 24750.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4947457.60 frames. ], batch size: 99, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:34:09,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=884880.0, ans=0.0 2023-12-23 01:34:11,209 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:34:14,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=884946.6666666666, ans=0.125 2023-12-23 01:34:24,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=885013.3333333334, ans=0.125 2023-12-23 01:34:31,372 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.870e+01 3.168e+01 3.309e+01 3.446e+01 3.884e+01, threshold=6.618e+01, percent-clipped=0.0 2023-12-23 01:34:32,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=885013.3333333334, ans=0.125 2023-12-23 01:34:33,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=885080.0, ans=0.2 2023-12-23 01:34:34,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=885080.0, ans=0.125 2023-12-23 01:34:35,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=885080.0, ans=0.125 2023-12-23 01:34:41,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=885080.0, ans=0.2 2023-12-23 01:34:54,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2023-12-23 01:34:55,288 INFO [train.py:886] (0/4) Epoch 28, batch 4100, loss[loss=0.01597, audio_tagging_loss=0.01597, over 24750.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4948916.61 frames. ], batch size: 99, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:35:08,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=885280.0, ans=0.125 2023-12-23 01:35:10,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885280.0, ans=0.1 2023-12-23 01:35:26,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-12-23 01:35:46,238 INFO [train.py:886] (0/4) Epoch 28, batch 4150, loss[loss=0.01256, audio_tagging_loss=0.01256, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4950726.29 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:35:51,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=885546.6666666666, ans=0.125 2023-12-23 01:36:04,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885613.3333333334, ans=0.1 2023-12-23 01:36:12,414 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.816e+01 3.213e+01 3.310e+01 3.521e+01 4.267e+01, threshold=6.621e+01, percent-clipped=0.0 2023-12-23 01:36:29,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=885813.3333333334, ans=0.125 2023-12-23 01:36:30,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.01 vs. limit=22.5 2023-12-23 01:36:34,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=885813.3333333334, ans=0.125 2023-12-23 01:36:35,438 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2023-12-23 01:36:36,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=885880.0, ans=0.125 2023-12-23 01:36:37,005 INFO [train.py:886] (0/4) Epoch 28, batch 4200, loss[loss=0.009854, audio_tagging_loss=0.009854, over 24750.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4948954.84 frames. ], batch size: 99, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:36:52,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=885946.6666666666, ans=0.0 2023-12-23 01:36:54,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=885946.6666666666, ans=0.2 2023-12-23 01:37:08,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=886080.0, ans=0.125 2023-12-23 01:37:10,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=886080.0, ans=0.125 2023-12-23 01:37:27,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=886146.6666666666, ans=0.2 2023-12-23 01:37:29,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=886213.3333333334, ans=0.0 2023-12-23 01:37:29,995 INFO [train.py:886] (0/4) Epoch 28, batch 4250, loss[loss=0.01633, audio_tagging_loss=0.01633, over 25000.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4957073.64 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:37:39,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=886280.0, ans=0.0 2023-12-23 01:37:41,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=886280.0, ans=0.125 2023-12-23 01:37:56,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=886346.6666666666, ans=0.125 2023-12-23 01:37:57,041 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.864e+01 3.165e+01 3.374e+01 3.516e+01 4.346e+01, threshold=6.749e+01, percent-clipped=0.0 2023-12-23 01:38:04,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=886413.3333333334, ans=0.125 2023-12-23 01:38:15,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=886480.0, ans=0.0 2023-12-23 01:38:20,007 INFO [train.py:886] (0/4) Epoch 28, batch 4300, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4956019.54 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:38:24,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=886546.6666666666, ans=0.05 2023-12-23 01:38:26,127 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-12-23 01:38:39,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=886613.3333333334, ans=0.125 2023-12-23 01:38:42,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=886680.0, ans=0.2 2023-12-23 01:39:13,172 INFO [train.py:886] (0/4) Epoch 28, batch 4350, loss[loss=0.01444, audio_tagging_loss=0.01444, over 24750.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4963950.43 frames. ], batch size: 99, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:39:26,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=886946.6666666666, ans=0.125 2023-12-23 01:39:30,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=886946.6666666666, ans=0.2 2023-12-23 01:39:30,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=886946.6666666666, ans=0.2 2023-12-23 01:39:41,273 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.916e+01 3.191e+01 3.325e+01 3.447e+01 4.229e+01, threshold=6.650e+01, percent-clipped=0.0 2023-12-23 01:39:42,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=887013.3333333334, ans=0.125 2023-12-23 01:39:44,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-12-23 01:39:51,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=887080.0, ans=0.035 2023-12-23 01:39:53,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=887146.6666666666, ans=0.0 2023-12-23 01:40:02,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=887146.6666666666, ans=0.125 2023-12-23 01:40:04,482 INFO [train.py:886] (0/4) Epoch 28, batch 4400, loss[loss=0.01443, audio_tagging_loss=0.01443, over 24750.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4957024.81 frames. ], batch size: 99, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:40:07,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=887213.3333333334, ans=0.125 2023-12-23 01:40:15,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=887280.0, ans=0.0 2023-12-23 01:40:16,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=22.5 2023-12-23 01:40:24,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2023-12-23 01:40:54,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=887546.6666666666, ans=0.1 2023-12-23 01:40:55,223 INFO [train.py:886] (0/4) Epoch 28, batch 4450, loss[loss=0.01219, audio_tagging_loss=0.01219, over 25000.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4952185.65 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:41:23,619 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.801e+01 3.227e+01 3.347e+01 3.540e+01 3.940e+01, threshold=6.695e+01, percent-clipped=0.0 2023-12-23 01:41:26,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=887746.6666666666, ans=0.1 2023-12-23 01:41:27,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=887746.6666666666, ans=0.2 2023-12-23 01:41:42,348 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:41:43,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=887813.3333333334, ans=0.125 2023-12-23 01:41:48,286 INFO [train.py:886] (0/4) Epoch 28, batch 4500, loss[loss=0.01256, audio_tagging_loss=0.01256, over 25000.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4946246.75 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:41:49,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.48 vs. limit=5.0 2023-12-23 01:42:21,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2023-12-23 01:42:23,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=888080.0, ans=0.125 2023-12-23 01:42:38,510 INFO [train.py:886] (0/4) Epoch 28, batch 4550, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4946405.84 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:42:41,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=888213.3333333334, ans=0.1 2023-12-23 01:42:54,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2023-12-23 01:43:06,889 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.744e+01 3.196e+01 3.308e+01 3.514e+01 4.018e+01, threshold=6.616e+01, percent-clipped=0.0 2023-12-23 01:43:09,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=888413.3333333334, ans=0.125 2023-12-23 01:43:31,507 INFO [train.py:886] (0/4) Epoch 28, batch 4600, loss[loss=0.01492, audio_tagging_loss=0.01492, over 25000.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4950459.51 frames. ], batch size: 100, lr: 3.82e-03, grad_scale: 64.0 2023-12-23 01:43:42,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=888613.3333333334, ans=0.125 2023-12-23 01:43:49,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=888613.3333333334, ans=0.035 2023-12-23 01:43:51,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=888680.0, ans=0.0 2023-12-23 01:43:55,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=888680.0, ans=0.125 2023-12-23 01:43:57,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=888680.0, ans=0.0 2023-12-23 01:44:05,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=22.5 2023-12-23 01:44:05,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=888746.6666666666, ans=0.2 2023-12-23 01:44:23,027 INFO [train.py:886] (0/4) Epoch 28, batch 4650, loss[loss=0.0107, audio_tagging_loss=0.0107, over 25000.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4952301.94 frames. ], batch size: 100, lr: 3.82e-03, grad_scale: 64.0 2023-12-23 01:44:29,017 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=15.0 2023-12-23 01:44:48,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=889013.3333333334, ans=0.0 2023-12-23 01:44:50,547 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.857e+01 3.209e+01 3.315e+01 3.492e+01 4.117e+01, threshold=6.630e+01, percent-clipped=0.0 2023-12-23 01:44:54,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=889080.0, ans=0.0 2023-12-23 01:44:56,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=889080.0, ans=0.125 2023-12-23 01:44:58,164 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.390e-03 2023-12-23 01:45:13,970 INFO [train.py:886] (0/4) Epoch 28, batch 4700, loss[loss=0.0149, audio_tagging_loss=0.0149, over 24750.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4947366.12 frames. ], batch size: 99, lr: 3.82e-03, grad_scale: 64.0 2023-12-23 01:45:15,211 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2023-12-23 01:45:15,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=889213.3333333334, ans=0.125 2023-12-23 01:45:38,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=889346.6666666666, ans=0.125 2023-12-23 01:45:54,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=889480.0, ans=0.125 2023-12-23 01:46:00,586 INFO [train.py:886] (0/4) Epoch 28, batch 4750, loss[loss=0.0122, audio_tagging_loss=0.0122, over 24750.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4948680.28 frames. ], batch size: 99, lr: 3.82e-03, grad_scale: 64.0 2023-12-23 01:46:09,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=889613.3333333334, ans=0.1 2023-12-23 01:46:16,195 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-28.pt 2023-12-23 01:46:36,938 INFO [train.py:886] (0/4) Epoch 29, batch 0, loss[loss=0.03115, audio_tagging_loss=0.03115, over 21335.00 frames. ], tot_loss[loss=0.03115, audio_tagging_loss=0.03115, over 21335.00 frames. ], batch size: 107, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:46:36,939 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 01:46:48,464 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.0575, 3.1990, 2.7425, 3.2804], device='cuda:0') 2023-12-23 01:46:49,801 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.7066, 3.4058, 3.8747, 3.8465], device='cuda:0') 2023-12-23 01:46:58,157 INFO [train.py:917] (0/4) Epoch 29, validation: loss=0.03319, audio_tagging_loss=0.03319, over 3737520.00 frames. 2023-12-23 01:46:58,158 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 01:46:59,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=22.5 2023-12-23 01:47:08,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=889720.0, ans=0.0 2023-12-23 01:47:10,289 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.924e+01 3.242e+01 3.406e+01 3.707e+01 9.005e+01, threshold=6.813e+01, percent-clipped=9.0 2023-12-23 01:47:34,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.74 vs. limit=22.5 2023-12-23 01:47:40,077 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2023-12-23 01:47:41,256 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.20 vs. limit=10.0 2023-12-23 01:47:49,189 INFO [train.py:886] (0/4) Epoch 29, batch 50, loss[loss=0.01553, audio_tagging_loss=0.01553, over 25000.00 frames. ], tot_loss[loss=0.02048, audio_tagging_loss=0.02048, over 1110743.34 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:48:12,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=890120.0, ans=0.125 2023-12-23 01:48:16,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=890120.0, ans=0.1 2023-12-23 01:48:16,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=890120.0, ans=0.125 2023-12-23 01:48:20,123 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.05 vs. limit=22.5 2023-12-23 01:48:28,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=890186.6666666666, ans=0.0 2023-12-23 01:48:29,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2023-12-23 01:48:33,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.91 vs. limit=22.5 2023-12-23 01:48:41,284 INFO [train.py:886] (0/4) Epoch 29, batch 100, loss[loss=0.01402, audio_tagging_loss=0.01402, over 25000.00 frames. ], tot_loss[loss=0.01778, audio_tagging_loss=0.01778, over 1970620.61 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:48:44,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=890320.0, ans=0.125 2023-12-23 01:48:53,292 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.271e+01 3.674e+01 3.939e+01 4.263e+01 5.538e+01, threshold=7.878e+01, percent-clipped=0.0 2023-12-23 01:49:00,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=890453.3333333334, ans=0.125 2023-12-23 01:49:11,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=890520.0, ans=0.0 2023-12-23 01:49:16,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=890520.0, ans=0.0 2023-12-23 01:49:30,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=890586.6666666666, ans=0.125 2023-12-23 01:49:32,034 INFO [train.py:886] (0/4) Epoch 29, batch 150, loss[loss=0.01381, audio_tagging_loss=0.01381, over 25000.00 frames. ], tot_loss[loss=0.01624, audio_tagging_loss=0.01624, over 2635730.06 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:50:24,553 INFO [train.py:886] (0/4) Epoch 29, batch 200, loss[loss=0.01336, audio_tagging_loss=0.01336, over 25000.00 frames. ], tot_loss[loss=0.01525, audio_tagging_loss=0.01525, over 3154475.71 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:50:35,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2023-12-23 01:50:36,596 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.903e+01 3.220e+01 3.343e+01 3.508e+01 4.197e+01, threshold=6.685e+01, percent-clipped=0.0 2023-12-23 01:50:48,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=891120.0, ans=0.125 2023-12-23 01:51:02,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=891186.6666666666, ans=0.09899494936611666 2023-12-23 01:51:16,767 INFO [train.py:886] (0/4) Epoch 29, batch 250, loss[loss=0.01218, audio_tagging_loss=0.01218, over 22117.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 3557095.28 frames. ], batch size: 107, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:51:20,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=891320.0, ans=0.125 2023-12-23 01:51:34,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=891386.6666666666, ans=0.125 2023-12-23 01:51:35,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=891386.6666666666, ans=0.125 2023-12-23 01:51:37,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=891453.3333333334, ans=0.0 2023-12-23 01:51:55,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=891520.0, ans=0.1 2023-12-23 01:52:08,238 INFO [train.py:886] (0/4) Epoch 29, batch 300, loss[loss=0.0119, audio_tagging_loss=0.0119, over 25000.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 3870201.71 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:52:10,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=891653.3333333334, ans=0.1 2023-12-23 01:52:20,957 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.835e+01 3.236e+01 3.377e+01 3.518e+01 3.995e+01, threshold=6.753e+01, percent-clipped=0.0 2023-12-23 01:52:24,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=891720.0, ans=0.05 2023-12-23 01:52:24,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=891720.0, ans=0.125 2023-12-23 01:52:27,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=891720.0, ans=0.0 2023-12-23 01:52:40,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=891853.3333333334, ans=0.0 2023-12-23 01:53:00,220 INFO [train.py:886] (0/4) Epoch 29, batch 350, loss[loss=0.01416, audio_tagging_loss=0.01416, over 24750.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4102340.57 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:53:02,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=891986.6666666666, ans=0.5 2023-12-23 01:53:04,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=891986.6666666666, ans=0.0 2023-12-23 01:53:18,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=892053.3333333334, ans=0.04949747468305833 2023-12-23 01:53:21,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=892120.0, ans=0.125 2023-12-23 01:53:21,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=892120.0, ans=0.125 2023-12-23 01:53:48,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=892253.3333333334, ans=0.1 2023-12-23 01:53:51,858 INFO [train.py:886] (0/4) Epoch 29, batch 400, loss[loss=0.01223, audio_tagging_loss=0.01223, over 24750.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4284510.88 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:53:54,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=892320.0, ans=0.125 2023-12-23 01:54:02,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=892386.6666666666, ans=0.1 2023-12-23 01:54:02,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=892386.6666666666, ans=0.0 2023-12-23 01:54:02,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=892386.6666666666, ans=0.125 2023-12-23 01:54:04,651 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.891e+01 3.197e+01 3.293e+01 3.447e+01 4.008e+01, threshold=6.587e+01, percent-clipped=0.0 2023-12-23 01:54:06,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=892386.6666666666, ans=0.0 2023-12-23 01:54:27,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=892520.0, ans=0.0 2023-12-23 01:54:43,573 INFO [train.py:886] (0/4) Epoch 29, batch 450, loss[loss=0.01248, audio_tagging_loss=0.01248, over 24750.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4433002.90 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:55:06,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=892786.6666666666, ans=0.125 2023-12-23 01:55:15,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=892853.3333333334, ans=0.125 2023-12-23 01:55:25,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=892920.0, ans=0.125 2023-12-23 01:55:36,735 INFO [train.py:886] (0/4) Epoch 29, batch 500, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4548484.47 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:55:48,272 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.779e+01 3.131e+01 3.260e+01 3.424e+01 4.258e+01, threshold=6.520e+01, percent-clipped=0.0 2023-12-23 01:56:20,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=893253.3333333334, ans=0.1 2023-12-23 01:56:24,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.89 vs. limit=6.0 2023-12-23 01:56:27,097 INFO [train.py:886] (0/4) Epoch 29, batch 550, loss[loss=0.01333, audio_tagging_loss=0.01333, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4641518.68 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:56:43,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=893386.6666666666, ans=0.1 2023-12-23 01:56:44,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=893386.6666666666, ans=0.0 2023-12-23 01:56:44,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 2023-12-23 01:56:47,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=893386.6666666666, ans=0.0 2023-12-23 01:57:20,498 INFO [train.py:886] (0/4) Epoch 29, batch 600, loss[loss=0.01153, audio_tagging_loss=0.01153, over 24052.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4709381.19 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:57:29,577 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.99 vs. limit=15.0 2023-12-23 01:57:31,798 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.900e+01 3.207e+01 3.346e+01 3.493e+01 4.560e+01, threshold=6.691e+01, percent-clipped=0.0 2023-12-23 01:57:46,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=893786.6666666666, ans=0.0 2023-12-23 01:57:54,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=893853.3333333334, ans=0.2 2023-12-23 01:58:12,567 INFO [train.py:886] (0/4) Epoch 29, batch 650, loss[loss=0.01522, audio_tagging_loss=0.01522, over 24750.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4757229.14 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:58:21,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=893986.6666666666, ans=0.0 2023-12-23 01:58:21,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=893986.6666666666, ans=0.1 2023-12-23 01:58:29,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=894053.3333333334, ans=0.04949747468305833 2023-12-23 01:58:37,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=894120.0, ans=0.125 2023-12-23 01:58:47,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=894186.6666666666, ans=0.1 2023-12-23 01:59:03,461 INFO [train.py:886] (0/4) Epoch 29, batch 700, loss[loss=0.01039, audio_tagging_loss=0.01039, over 24750.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4796264.51 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 01:59:04,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=894320.0, ans=0.125 2023-12-23 01:59:08,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=894320.0, ans=0.125 2023-12-23 01:59:16,885 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.862e+01 3.181e+01 3.381e+01 3.506e+01 3.965e+01, threshold=6.761e+01, percent-clipped=0.0 2023-12-23 01:59:45,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=894586.6666666666, ans=0.2 2023-12-23 01:59:54,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=894586.6666666666, ans=0.0 2023-12-23 01:59:56,273 INFO [train.py:886] (0/4) Epoch 29, batch 750, loss[loss=0.01114, audio_tagging_loss=0.01114, over 25000.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4830591.13 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:00:19,559 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:00:37,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=22.5 2023-12-23 02:00:39,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=894920.0, ans=0.125 2023-12-23 02:00:45,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=894986.6666666666, ans=0.125 2023-12-23 02:00:46,092 INFO [train.py:886] (0/4) Epoch 29, batch 800, loss[loss=0.01399, audio_tagging_loss=0.01399, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4860544.83 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:00:50,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=894986.6666666666, ans=0.125 2023-12-23 02:00:59,626 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.846e+01 3.158e+01 3.303e+01 3.490e+01 4.206e+01, threshold=6.605e+01, percent-clipped=0.0 2023-12-23 02:01:03,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=895053.3333333334, ans=0.0 2023-12-23 02:01:12,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=895120.0, ans=0.0 2023-12-23 02:01:38,412 INFO [train.py:886] (0/4) Epoch 29, batch 850, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4883968.04 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:01:43,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=895320.0, ans=0.0 2023-12-23 02:02:19,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=895586.6666666666, ans=0.0 2023-12-23 02:02:29,866 INFO [train.py:886] (0/4) Epoch 29, batch 900, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4903079.15 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:02:42,535 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.824e+01 3.196e+01 3.316e+01 3.462e+01 4.110e+01, threshold=6.632e+01, percent-clipped=0.0 2023-12-23 02:02:45,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.36 vs. limit=15.0 2023-12-23 02:02:46,601 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.018e+00 2023-12-23 02:02:47,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=895720.0, ans=0.125 2023-12-23 02:02:59,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=895786.6666666666, ans=0.0 2023-12-23 02:03:17,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=895920.0, ans=0.125 2023-12-23 02:03:20,995 INFO [train.py:886] (0/4) Epoch 29, batch 950, loss[loss=0.01676, audio_tagging_loss=0.01676, over 24750.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4909345.65 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:03:39,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=896053.3333333334, ans=0.0 2023-12-23 02:03:41,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=896053.3333333334, ans=0.0 2023-12-23 02:03:45,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=896120.0, ans=0.0 2023-12-23 02:03:51,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=896186.6666666666, ans=0.2 2023-12-23 02:04:02,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=896253.3333333334, ans=0.0 2023-12-23 02:04:13,319 INFO [train.py:886] (0/4) Epoch 29, batch 1000, loss[loss=0.01115, audio_tagging_loss=0.01115, over 24750.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4914360.78 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:04:18,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=896320.0, ans=0.125 2023-12-23 02:04:24,550 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.901e+01 3.234e+01 3.376e+01 3.564e+01 4.018e+01, threshold=6.752e+01, percent-clipped=0.0 2023-12-23 02:04:35,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=896453.3333333334, ans=0.125 2023-12-23 02:04:44,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=896520.0, ans=0.125 2023-12-23 02:04:46,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=896520.0, ans=0.125 2023-12-23 02:04:55,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=896586.6666666666, ans=0.125 2023-12-23 02:05:03,425 INFO [train.py:886] (0/4) Epoch 29, batch 1050, loss[loss=0.01306, audio_tagging_loss=0.01306, over 25000.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4920304.73 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:05:03,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=896653.3333333334, ans=0.0 2023-12-23 02:05:05,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=896653.3333333334, ans=0.125 2023-12-23 02:05:09,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=896653.3333333334, ans=0.125 2023-12-23 02:05:18,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=896720.0, ans=0.05 2023-12-23 02:05:38,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-12-23 02:05:44,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=896920.0, ans=0.125 2023-12-23 02:05:55,167 INFO [train.py:886] (0/4) Epoch 29, batch 1100, loss[loss=0.01217, audio_tagging_loss=0.01217, over 25000.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4929025.70 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:06:03,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=896986.6666666666, ans=0.1 2023-12-23 02:06:07,973 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.822e+01 3.193e+01 3.319e+01 3.484e+01 4.077e+01, threshold=6.637e+01, percent-clipped=0.0 2023-12-23 02:06:23,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=897120.0, ans=0.0 2023-12-23 02:06:23,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=897120.0, ans=0.125 2023-12-23 02:06:45,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.25 vs. limit=10.0 2023-12-23 02:06:46,464 INFO [train.py:886] (0/4) Epoch 29, batch 1150, loss[loss=0.01178, audio_tagging_loss=0.01178, over 24750.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4930712.93 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:06:50,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.78 vs. limit=15.0 2023-12-23 02:07:11,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.75 vs. limit=10.0 2023-12-23 02:07:36,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.41 vs. limit=15.0 2023-12-23 02:07:38,467 INFO [train.py:886] (0/4) Epoch 29, batch 1200, loss[loss=0.01343, audio_tagging_loss=0.01343, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4940775.89 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:07:40,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=897653.3333333334, ans=0.1 2023-12-23 02:07:50,591 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.874e+01 3.246e+01 3.372e+01 3.513e+01 4.009e+01, threshold=6.745e+01, percent-clipped=0.0 2023-12-23 02:07:51,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=897720.0, ans=0.125 2023-12-23 02:08:16,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=897853.3333333334, ans=0.2 2023-12-23 02:08:18,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=897853.3333333334, ans=0.2 2023-12-23 02:08:29,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.74 vs. limit=10.0 2023-12-23 02:08:29,804 INFO [train.py:886] (0/4) Epoch 29, batch 1250, loss[loss=0.01286, audio_tagging_loss=0.01286, over 24750.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4940424.65 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:09:21,418 INFO [train.py:886] (0/4) Epoch 29, batch 1300, loss[loss=0.01185, audio_tagging_loss=0.01185, over 24076.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4938792.86 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:09:31,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.42 vs. limit=22.5 2023-12-23 02:09:33,508 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.906e+01 3.213e+01 3.404e+01 3.516e+01 4.030e+01, threshold=6.807e+01, percent-clipped=0.0 2023-12-23 02:09:43,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=898453.3333333334, ans=0.07 2023-12-23 02:09:45,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=898453.3333333334, ans=0.125 2023-12-23 02:10:05,585 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2023-12-23 02:10:06,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=898586.6666666666, ans=0.2 2023-12-23 02:10:11,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=898653.3333333334, ans=0.125 2023-12-23 02:10:12,326 INFO [train.py:886] (0/4) Epoch 29, batch 1350, loss[loss=0.01116, audio_tagging_loss=0.01116, over 25000.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4934989.73 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:10:19,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=898653.3333333334, ans=0.0 2023-12-23 02:10:19,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2023-12-23 02:10:23,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=898720.0, ans=0.125 2023-12-23 02:10:34,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=898786.6666666666, ans=0.0 2023-12-23 02:10:34,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=898786.6666666666, ans=0.125 2023-12-23 02:10:41,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.60 vs. limit=22.5 2023-12-23 02:10:52,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=15.0 2023-12-23 02:11:03,477 INFO [train.py:886] (0/4) Epoch 29, batch 1400, loss[loss=0.01155, audio_tagging_loss=0.01155, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4940901.30 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:11:14,757 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.846e+01 3.186e+01 3.286e+01 3.462e+01 3.963e+01, threshold=6.572e+01, percent-clipped=0.0 2023-12-23 02:11:22,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=899120.0, ans=0.125 2023-12-23 02:11:43,039 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:11:48,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-12-23 02:11:53,857 INFO [train.py:886] (0/4) Epoch 29, batch 1450, loss[loss=0.01237, audio_tagging_loss=0.01237, over 24750.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4945593.57 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:12:11,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=899386.6666666666, ans=0.125 2023-12-23 02:12:11,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=899386.6666666666, ans=0.125 2023-12-23 02:12:44,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=899653.3333333334, ans=0.0 2023-12-23 02:12:44,703 INFO [train.py:886] (0/4) Epoch 29, batch 1500, loss[loss=0.01263, audio_tagging_loss=0.01263, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4944871.02 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:12:47,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-12-23 02:12:56,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2023-12-23 02:12:56,601 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.983e+01 3.233e+01 3.348e+01 3.462e+01 4.143e+01, threshold=6.696e+01, percent-clipped=0.0 2023-12-23 02:12:57,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=899720.0, ans=0.0 2023-12-23 02:13:15,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=899853.3333333334, ans=0.0 2023-12-23 02:13:22,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=899853.3333333334, ans=0.2 2023-12-23 02:13:36,104 INFO [train.py:886] (0/4) Epoch 29, batch 1550, loss[loss=0.01537, audio_tagging_loss=0.01537, over 24945.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4937825.49 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:13:57,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=900120.0, ans=0.125 2023-12-23 02:14:10,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=900186.6666666666, ans=0.1 2023-12-23 02:14:13,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=900186.6666666666, ans=0.0 2023-12-23 02:14:14,657 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-12-23 02:14:18,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=900253.3333333334, ans=0.1 2023-12-23 02:14:27,184 INFO [train.py:886] (0/4) Epoch 29, batch 1600, loss[loss=0.01327, audio_tagging_loss=0.01327, over 24750.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4927403.07 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:14:40,713 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.938e+01 3.277e+01 3.394e+01 3.581e+01 4.487e+01, threshold=6.788e+01, percent-clipped=0.0 2023-12-23 02:14:48,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2023-12-23 02:15:19,553 INFO [train.py:886] (0/4) Epoch 29, batch 1650, loss[loss=0.01454, audio_tagging_loss=0.01454, over 25000.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4927901.41 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:15:25,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=900653.3333333334, ans=0.05 2023-12-23 02:15:38,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.37 vs. limit=15.0 2023-12-23 02:15:53,154 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2023-12-23 02:16:04,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=900920.0, ans=0.1 2023-12-23 02:16:06,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=900920.0, ans=0.125 2023-12-23 02:16:10,153 INFO [train.py:886] (0/4) Epoch 29, batch 1700, loss[loss=0.01557, audio_tagging_loss=0.01557, over 24750.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4928748.32 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:16:11,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=900986.6666666666, ans=0.2 2023-12-23 02:16:22,984 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.862e+01 3.200e+01 3.336e+01 3.521e+01 4.401e+01, threshold=6.671e+01, percent-clipped=0.0 2023-12-23 02:16:23,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=901053.3333333334, ans=0.125 2023-12-23 02:16:36,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=901120.0, ans=0.125 2023-12-23 02:16:44,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=901186.6666666666, ans=0.125 2023-12-23 02:16:50,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.20 vs. limit=10.0 2023-12-23 02:16:57,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=901253.3333333334, ans=0.0 2023-12-23 02:17:01,609 INFO [train.py:886] (0/4) Epoch 29, batch 1750, loss[loss=0.01168, audio_tagging_loss=0.01168, over 21945.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4933363.68 frames. ], batch size: 107, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:17:19,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=901386.6666666666, ans=0.125 2023-12-23 02:17:23,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=901453.3333333334, ans=0.125 2023-12-23 02:17:26,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=901453.3333333334, ans=0.0 2023-12-23 02:17:34,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=901520.0, ans=0.0 2023-12-23 02:17:36,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=901520.0, ans=0.125 2023-12-23 02:17:38,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=901520.0, ans=0.0 2023-12-23 02:17:42,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=901586.6666666666, ans=0.125 2023-12-23 02:17:42,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=901586.6666666666, ans=0.0 2023-12-23 02:17:45,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=901586.6666666666, ans=0.1 2023-12-23 02:17:49,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=901586.6666666666, ans=0.1 2023-12-23 02:17:53,506 INFO [train.py:886] (0/4) Epoch 29, batch 1800, loss[loss=0.01349, audio_tagging_loss=0.01349, over 25000.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4939459.49 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:18:05,609 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.853e+01 3.204e+01 3.323e+01 3.491e+01 3.903e+01, threshold=6.647e+01, percent-clipped=0.0 2023-12-23 02:18:15,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=901786.6666666666, ans=0.125 2023-12-23 02:18:25,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=901853.3333333334, ans=0.2 2023-12-23 02:18:27,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=901853.3333333334, ans=0.0 2023-12-23 02:18:39,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=901920.0, ans=0.0 2023-12-23 02:18:44,438 INFO [train.py:886] (0/4) Epoch 29, batch 1850, loss[loss=0.01274, audio_tagging_loss=0.01274, over 24010.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4940949.59 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:18:45,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=901986.6666666666, ans=0.0 2023-12-23 02:18:47,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=901986.6666666666, ans=0.2 2023-12-23 02:18:52,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=901986.6666666666, ans=0.1 2023-12-23 02:19:04,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=902053.3333333334, ans=0.2 2023-12-23 02:19:11,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=902120.0, ans=0.125 2023-12-23 02:19:17,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=902186.6666666666, ans=0.0 2023-12-23 02:19:28,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=902253.3333333334, ans=0.125 2023-12-23 02:19:37,444 INFO [train.py:886] (0/4) Epoch 29, batch 1900, loss[loss=0.01398, audio_tagging_loss=0.01398, over 24750.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4934168.73 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:19:43,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.12 vs. limit=10.0 2023-12-23 02:19:48,705 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.018e+01 3.307e+01 3.435e+01 3.560e+01 3.951e+01, threshold=6.871e+01, percent-clipped=0.0 2023-12-23 02:19:59,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=902453.3333333334, ans=0.0 2023-12-23 02:20:10,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=902520.0, ans=0.0 2023-12-23 02:20:28,811 INFO [train.py:886] (0/4) Epoch 29, batch 1950, loss[loss=0.01141, audio_tagging_loss=0.01141, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4937099.81 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:20:41,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-12-23 02:20:41,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=902720.0, ans=0.125 2023-12-23 02:20:45,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=902720.0, ans=0.0 2023-12-23 02:21:01,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=902853.3333333334, ans=0.125 2023-12-23 02:21:19,620 INFO [train.py:886] (0/4) Epoch 29, batch 2000, loss[loss=0.01279, audio_tagging_loss=0.01279, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4938856.56 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 64.0 2023-12-23 02:21:22,043 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.06 vs. limit=10.0 2023-12-23 02:21:23,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=902986.6666666666, ans=0.0 2023-12-23 02:21:32,361 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.952e+01 3.186e+01 3.326e+01 3.515e+01 4.262e+01, threshold=6.651e+01, percent-clipped=0.0 2023-12-23 02:21:32,619 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:21:45,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=903120.0, ans=0.125 2023-12-23 02:22:07,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=903253.3333333334, ans=0.125 2023-12-23 02:22:10,987 INFO [train.py:886] (0/4) Epoch 29, batch 2050, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4937736.16 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 64.0 2023-12-23 02:22:22,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=903386.6666666666, ans=0.125 2023-12-23 02:22:22,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=903386.6666666666, ans=0.0 2023-12-23 02:22:27,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=903386.6666666666, ans=0.0 2023-12-23 02:22:32,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=903453.3333333334, ans=0.125 2023-12-23 02:22:44,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=903520.0, ans=0.125 2023-12-23 02:22:45,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.06 vs. limit=15.0 2023-12-23 02:22:46,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=903520.0, ans=0.125 2023-12-23 02:22:50,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=903520.0, ans=0.125 2023-12-23 02:22:54,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=903586.6666666666, ans=0.125 2023-12-23 02:22:59,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=903586.6666666666, ans=0.0 2023-12-23 02:23:02,152 INFO [train.py:886] (0/4) Epoch 29, batch 2100, loss[loss=0.01208, audio_tagging_loss=0.01208, over 21725.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4937526.86 frames. ], batch size: 107, lr: 3.73e-03, grad_scale: 64.0 2023-12-23 02:23:14,809 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.861e+01 3.167e+01 3.386e+01 3.538e+01 3.863e+01, threshold=6.772e+01, percent-clipped=0.0 2023-12-23 02:23:26,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.32 vs. limit=22.5 2023-12-23 02:23:26,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=903786.6666666666, ans=0.0 2023-12-23 02:23:40,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.85 vs. limit=15.0 2023-12-23 02:23:44,258 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2023-12-23 02:23:44,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=903920.0, ans=0.0 2023-12-23 02:23:54,322 INFO [train.py:886] (0/4) Epoch 29, batch 2150, loss[loss=0.01197, audio_tagging_loss=0.01197, over 24750.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4944240.21 frames. ], batch size: 99, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:24:07,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=904053.3333333334, ans=0.015 2023-12-23 02:24:07,975 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.150e-03 2023-12-23 02:24:09,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=904053.3333333334, ans=0.09899494936611666 2023-12-23 02:24:20,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=904120.0, ans=0.0 2023-12-23 02:24:34,589 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=12.0 2023-12-23 02:24:40,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2023-12-23 02:24:45,878 INFO [train.py:886] (0/4) Epoch 29, batch 2200, loss[loss=0.01235, audio_tagging_loss=0.01235, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4931861.90 frames. ], batch size: 99, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:24:47,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2023-12-23 02:24:56,261 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:24:57,916 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.956e+01 3.244e+01 3.385e+01 3.592e+01 6.696e+01, threshold=6.770e+01, percent-clipped=0.0 2023-12-23 02:25:00,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=904386.6666666666, ans=0.0 2023-12-23 02:25:03,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.49 vs. limit=22.5 2023-12-23 02:25:07,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=904453.3333333334, ans=0.1 2023-12-23 02:25:28,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=904586.6666666666, ans=0.125 2023-12-23 02:25:37,566 INFO [train.py:886] (0/4) Epoch 29, batch 2250, loss[loss=0.01184, audio_tagging_loss=0.01184, over 24750.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4925291.25 frames. ], batch size: 99, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:25:44,547 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2023-12-23 02:26:00,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=904786.6666666666, ans=0.0 2023-12-23 02:26:15,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=904853.3333333334, ans=0.035 2023-12-23 02:26:29,633 INFO [train.py:886] (0/4) Epoch 29, batch 2300, loss[loss=0.012, audio_tagging_loss=0.012, over 24750.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4928550.83 frames. ], batch size: 99, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:26:35,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=904986.6666666666, ans=0.0 2023-12-23 02:26:40,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=905053.3333333334, ans=0.125 2023-12-23 02:26:41,538 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.860e+01 3.170e+01 3.367e+01 3.537e+01 3.962e+01, threshold=6.735e+01, percent-clipped=0.0 2023-12-23 02:26:42,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=905053.3333333334, ans=0.2 2023-12-23 02:26:47,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=905053.3333333334, ans=0.125 2023-12-23 02:26:57,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=905120.0, ans=0.025 2023-12-23 02:26:59,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=905120.0, ans=0.5 2023-12-23 02:27:15,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=905253.3333333334, ans=0.0 2023-12-23 02:27:21,210 INFO [train.py:886] (0/4) Epoch 29, batch 2350, loss[loss=0.01129, audio_tagging_loss=0.01129, over 24750.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4936844.93 frames. ], batch size: 99, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:27:21,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=905320.0, ans=0.125 2023-12-23 02:27:35,665 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:27:40,466 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:27:46,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.34 vs. limit=22.5 2023-12-23 02:28:02,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=905586.6666666666, ans=0.125 2023-12-23 02:28:12,509 INFO [train.py:886] (0/4) Epoch 29, batch 2400, loss[loss=0.01026, audio_tagging_loss=0.01026, over 25000.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4934857.59 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:28:20,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=905653.3333333334, ans=0.125 2023-12-23 02:28:25,399 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.824e+01 3.172e+01 3.352e+01 3.505e+01 4.147e+01, threshold=6.703e+01, percent-clipped=0.0 2023-12-23 02:28:31,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=905720.0, ans=0.0 2023-12-23 02:28:33,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=905786.6666666666, ans=0.125 2023-12-23 02:28:35,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=905786.6666666666, ans=0.125 2023-12-23 02:28:55,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=905920.0, ans=0.125 2023-12-23 02:29:02,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=905986.6666666666, ans=0.1 2023-12-23 02:29:03,273 INFO [train.py:886] (0/4) Epoch 29, batch 2450, loss[loss=0.01337, audio_tagging_loss=0.01337, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4943634.98 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:29:15,119 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.12 vs. limit=22.5 2023-12-23 02:29:22,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=906120.0, ans=0.1 2023-12-23 02:29:22,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=906120.0, ans=0.2 2023-12-23 02:29:31,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.44 vs. limit=15.0 2023-12-23 02:29:36,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=906186.6666666666, ans=0.125 2023-12-23 02:29:41,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2023-12-23 02:29:54,560 INFO [train.py:886] (0/4) Epoch 29, batch 2500, loss[loss=0.01249, audio_tagging_loss=0.01249, over 24017.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4943115.81 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:30:05,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=906386.6666666666, ans=0.1 2023-12-23 02:30:07,119 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.911e+01 3.261e+01 3.366e+01 3.535e+01 4.148e+01, threshold=6.733e+01, percent-clipped=0.0 2023-12-23 02:30:07,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2023-12-23 02:30:46,632 INFO [train.py:886] (0/4) Epoch 29, batch 2550, loss[loss=0.01273, audio_tagging_loss=0.01273, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4939931.20 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:30:47,917 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-136000.pt 2023-12-23 02:31:01,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=906720.0, ans=0.0 2023-12-23 02:31:19,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=906853.3333333334, ans=0.035 2023-12-23 02:31:19,572 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2023-12-23 02:31:19,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.31 vs. limit=10.0 2023-12-23 02:31:21,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=906853.3333333334, ans=0.2 2023-12-23 02:31:23,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=906853.3333333334, ans=0.1 2023-12-23 02:31:32,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=906920.0, ans=0.1 2023-12-23 02:31:41,255 INFO [train.py:886] (0/4) Epoch 29, batch 2600, loss[loss=0.01505, audio_tagging_loss=0.01505, over 24052.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4941190.74 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:31:53,149 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.908e+01 3.274e+01 3.410e+01 3.573e+01 4.224e+01, threshold=6.821e+01, percent-clipped=0.0 2023-12-23 02:31:58,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=907053.3333333334, ans=0.0 2023-12-23 02:32:28,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=907253.3333333334, ans=15.0 2023-12-23 02:32:31,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=907320.0, ans=0.2 2023-12-23 02:32:32,187 INFO [train.py:886] (0/4) Epoch 29, batch 2650, loss[loss=0.01213, audio_tagging_loss=0.01213, over 25000.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4944681.97 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:33:25,048 INFO [train.py:886] (0/4) Epoch 29, batch 2700, loss[loss=0.01449, audio_tagging_loss=0.01449, over 25000.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4951405.26 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:33:27,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=907653.3333333334, ans=0.1 2023-12-23 02:33:28,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=907653.3333333334, ans=0.04949747468305833 2023-12-23 02:33:36,308 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.874e+01 3.174e+01 3.334e+01 3.479e+01 4.050e+01, threshold=6.667e+01, percent-clipped=0.0 2023-12-23 02:33:44,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=907786.6666666666, ans=0.2 2023-12-23 02:34:14,952 INFO [train.py:886] (0/4) Epoch 29, batch 2750, loss[loss=0.01388, audio_tagging_loss=0.01388, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4956346.74 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:34:20,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2023-12-23 02:34:59,190 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=22.5 2023-12-23 02:34:59,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=908253.3333333334, ans=0.0 2023-12-23 02:35:06,897 INFO [train.py:886] (0/4) Epoch 29, batch 2800, loss[loss=0.01368, audio_tagging_loss=0.01368, over 24750.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4955013.14 frames. ], batch size: 99, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:35:10,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=908320.0, ans=0.0 2023-12-23 02:35:18,185 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.855e+01 3.185e+01 3.347e+01 3.494e+01 4.048e+01, threshold=6.695e+01, percent-clipped=0.0 2023-12-23 02:35:24,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.94 vs. limit=10.0 2023-12-23 02:35:44,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=908520.0, ans=0.05 2023-12-23 02:35:49,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=908586.6666666666, ans=0.0 2023-12-23 02:35:54,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=908586.6666666666, ans=0.0 2023-12-23 02:35:58,179 INFO [train.py:886] (0/4) Epoch 29, batch 2850, loss[loss=0.01231, audio_tagging_loss=0.01231, over 24750.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4946240.14 frames. ], batch size: 99, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:36:00,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-12-23 02:36:01,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=908653.3333333334, ans=0.1 2023-12-23 02:36:11,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=908720.0, ans=0.0 2023-12-23 02:36:16,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=908720.0, ans=0.0 2023-12-23 02:36:36,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=908853.3333333334, ans=0.0 2023-12-23 02:36:36,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=908853.3333333334, ans=0.0 2023-12-23 02:36:45,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=908920.0, ans=0.1 2023-12-23 02:36:48,905 INFO [train.py:886] (0/4) Epoch 29, batch 2900, loss[loss=0.01127, audio_tagging_loss=0.01127, over 25000.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4945679.00 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:36:54,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=908986.6666666666, ans=0.0 2023-12-23 02:37:02,340 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.712e+01 3.255e+01 3.355e+01 3.547e+01 3.874e+01, threshold=6.710e+01, percent-clipped=0.0 2023-12-23 02:37:09,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=909053.3333333334, ans=0.0 2023-12-23 02:37:10,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=909120.0, ans=0.125 2023-12-23 02:37:15,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=909120.0, ans=0.1 2023-12-23 02:37:41,167 INFO [train.py:886] (0/4) Epoch 29, batch 2950, loss[loss=0.0114, audio_tagging_loss=0.0114, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4940076.17 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:37:49,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.27 vs. limit=10.0 2023-12-23 02:38:12,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=909520.0, ans=0.0 2023-12-23 02:38:14,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=909520.0, ans=0.0 2023-12-23 02:38:27,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=909586.6666666666, ans=0.125 2023-12-23 02:38:32,194 INFO [train.py:886] (0/4) Epoch 29, batch 3000, loss[loss=0.01196, audio_tagging_loss=0.01196, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4943349.57 frames. ], batch size: 99, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:38:32,196 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 02:38:52,609 INFO [train.py:917] (0/4) Epoch 29, validation: loss=0.03351, audio_tagging_loss=0.03351, over 3737520.00 frames. 2023-12-23 02:38:52,609 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 02:39:06,001 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.927e+01 3.194e+01 3.342e+01 3.489e+01 4.333e+01, threshold=6.683e+01, percent-clipped=0.0 2023-12-23 02:39:21,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=909786.6666666666, ans=0.0 2023-12-23 02:39:45,468 INFO [train.py:886] (0/4) Epoch 29, batch 3050, loss[loss=0.01307, audio_tagging_loss=0.01307, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4947186.93 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:39:58,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=910053.3333333334, ans=0.125 2023-12-23 02:40:01,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=910053.3333333334, ans=0.035 2023-12-23 02:40:05,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=910120.0, ans=0.125 2023-12-23 02:40:05,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=910120.0, ans=0.125 2023-12-23 02:40:21,491 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:40:22,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=910186.6666666666, ans=0.0 2023-12-23 02:40:25,764 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2023-12-23 02:40:36,996 INFO [train.py:886] (0/4) Epoch 29, batch 3100, loss[loss=0.01385, audio_tagging_loss=0.01385, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4950149.04 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:40:44,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.74 vs. limit=10.0 2023-12-23 02:40:45,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=910320.0, ans=0.0 2023-12-23 02:40:45,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2023-12-23 02:40:48,917 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.900e+01 3.214e+01 3.353e+01 3.491e+01 4.472e+01, threshold=6.707e+01, percent-clipped=0.0 2023-12-23 02:40:52,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=910386.6666666666, ans=0.05 2023-12-23 02:40:55,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=910453.3333333334, ans=0.125 2023-12-23 02:41:01,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=910453.3333333334, ans=0.04949747468305833 2023-12-23 02:41:09,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=910520.0, ans=0.125 2023-12-23 02:41:27,511 INFO [train.py:886] (0/4) Epoch 29, batch 3150, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4951231.64 frames. ], batch size: 99, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:41:29,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=910653.3333333334, ans=0.2 2023-12-23 02:41:29,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=910653.3333333334, ans=0.0 2023-12-23 02:41:48,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=910786.6666666666, ans=0.0 2023-12-23 02:41:54,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=910786.6666666666, ans=0.125 2023-12-23 02:42:19,858 INFO [train.py:886] (0/4) Epoch 29, batch 3200, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4953689.46 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:42:21,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2023-12-23 02:42:24,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=910986.6666666666, ans=0.0 2023-12-23 02:42:26,819 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2023-12-23 02:42:30,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=911053.3333333334, ans=0.1 2023-12-23 02:42:31,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=911053.3333333334, ans=0.125 2023-12-23 02:42:31,881 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.926e+01 3.238e+01 3.417e+01 3.574e+01 4.125e+01, threshold=6.834e+01, percent-clipped=0.0 2023-12-23 02:42:35,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=911053.3333333334, ans=0.125 2023-12-23 02:43:10,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=911320.0, ans=22.5 2023-12-23 02:43:12,006 INFO [train.py:886] (0/4) Epoch 29, batch 3250, loss[loss=0.01264, audio_tagging_loss=0.01264, over 25000.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4954024.16 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:43:16,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=911320.0, ans=0.2 2023-12-23 02:43:23,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=911386.6666666666, ans=0.125 2023-12-23 02:43:25,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=911386.6666666666, ans=0.0 2023-12-23 02:43:43,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=911520.0, ans=0.125 2023-12-23 02:43:46,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=911520.0, ans=0.125 2023-12-23 02:43:58,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2023-12-23 02:44:02,794 INFO [train.py:886] (0/4) Epoch 29, batch 3300, loss[loss=0.01282, audio_tagging_loss=0.01282, over 24908.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4955382.58 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:44:08,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=911653.3333333334, ans=0.0 2023-12-23 02:44:12,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=911720.0, ans=0.035 2023-12-23 02:44:14,883 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.877e+01 3.159e+01 3.341e+01 3.490e+01 4.241e+01, threshold=6.682e+01, percent-clipped=0.0 2023-12-23 02:44:17,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=911720.0, ans=0.125 2023-12-23 02:44:33,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=911853.3333333334, ans=0.125 2023-12-23 02:44:37,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=911853.3333333334, ans=0.125 2023-12-23 02:44:40,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=12.0 2023-12-23 02:44:41,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-12-23 02:44:43,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=911920.0, ans=0.0 2023-12-23 02:44:53,537 INFO [train.py:886] (0/4) Epoch 29, batch 3350, loss[loss=0.01227, audio_tagging_loss=0.01227, over 25000.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4950714.26 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:45:12,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=912053.3333333334, ans=0.0 2023-12-23 02:45:27,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=912186.6666666666, ans=0.125 2023-12-23 02:45:45,264 INFO [train.py:886] (0/4) Epoch 29, batch 3400, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4953848.55 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:45:45,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=912320.0, ans=0.0 2023-12-23 02:45:57,954 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.777e+01 3.196e+01 3.334e+01 3.498e+01 4.044e+01, threshold=6.669e+01, percent-clipped=0.0 2023-12-23 02:46:01,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=912386.6666666666, ans=0.1 2023-12-23 02:46:04,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=912386.6666666666, ans=0.125 2023-12-23 02:46:15,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=912520.0, ans=0.0 2023-12-23 02:46:16,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=912520.0, ans=0.125 2023-12-23 02:46:23,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=912520.0, ans=0.125 2023-12-23 02:46:23,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=912520.0, ans=0.125 2023-12-23 02:46:30,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2023-12-23 02:46:31,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=912586.6666666666, ans=0.125 2023-12-23 02:46:36,485 INFO [train.py:886] (0/4) Epoch 29, batch 3450, loss[loss=0.01334, audio_tagging_loss=0.01334, over 24750.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4949681.60 frames. ], batch size: 99, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:46:51,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=912720.0, ans=0.07 2023-12-23 02:46:52,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=912720.0, ans=0.0 2023-12-23 02:47:05,701 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.42 vs. limit=10.0 2023-12-23 02:47:28,180 INFO [train.py:886] (0/4) Epoch 29, batch 3500, loss[loss=0.01248, audio_tagging_loss=0.01248, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4947788.18 frames. ], batch size: 99, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:47:40,254 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.822e+01 3.263e+01 3.374e+01 3.511e+01 3.785e+01, threshold=6.748e+01, percent-clipped=0.0 2023-12-23 02:47:53,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=913120.0, ans=0.125 2023-12-23 02:48:09,424 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.79 vs. limit=6.0 2023-12-23 02:48:18,277 INFO [train.py:886] (0/4) Epoch 29, batch 3550, loss[loss=0.01228, audio_tagging_loss=0.01228, over 24750.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4945522.62 frames. ], batch size: 99, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:48:44,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=913453.3333333334, ans=0.0 2023-12-23 02:48:50,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=913520.0, ans=0.025 2023-12-23 02:48:58,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=913520.0, ans=0.07 2023-12-23 02:49:11,424 INFO [train.py:886] (0/4) Epoch 29, batch 3600, loss[loss=0.01408, audio_tagging_loss=0.01408, over 24750.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4950035.96 frames. ], batch size: 99, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:49:16,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=913653.3333333334, ans=0.125 2023-12-23 02:49:22,664 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.831e+01 3.175e+01 3.361e+01 3.491e+01 3.935e+01, threshold=6.721e+01, percent-clipped=0.0 2023-12-23 02:49:41,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=913853.3333333334, ans=0.0 2023-12-23 02:49:41,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=913853.3333333334, ans=0.125 2023-12-23 02:49:45,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2023-12-23 02:49:50,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=913853.3333333334, ans=0.125 2023-12-23 02:50:02,207 INFO [train.py:886] (0/4) Epoch 29, batch 3650, loss[loss=0.0152, audio_tagging_loss=0.0152, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4956429.32 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:50:13,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=914053.3333333334, ans=0.125 2023-12-23 02:50:29,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=914120.0, ans=0.125 2023-12-23 02:50:30,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=914120.0, ans=0.125 2023-12-23 02:50:34,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=914186.6666666666, ans=0.125 2023-12-23 02:50:54,414 INFO [train.py:886] (0/4) Epoch 29, batch 3700, loss[loss=0.01402, audio_tagging_loss=0.01402, over 25000.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4961063.96 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:51:05,787 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.954e+01 3.187e+01 3.336e+01 3.523e+01 3.935e+01, threshold=6.671e+01, percent-clipped=0.0 2023-12-23 02:51:18,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=914453.3333333334, ans=0.125 2023-12-23 02:51:21,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=914453.3333333334, ans=0.0 2023-12-23 02:51:25,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=914520.0, ans=0.1 2023-12-23 02:51:32,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=914520.0, ans=0.0 2023-12-23 02:51:38,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=914586.6666666666, ans=0.2 2023-12-23 02:51:46,766 INFO [train.py:886] (0/4) Epoch 29, batch 3750, loss[loss=0.01151, audio_tagging_loss=0.01151, over 25000.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4958934.30 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:51:51,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=914653.3333333334, ans=10.0 2023-12-23 02:51:54,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=914653.3333333334, ans=10.0 2023-12-23 02:52:13,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=15.0 2023-12-23 02:52:19,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=914853.3333333334, ans=0.0 2023-12-23 02:52:26,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=914853.3333333334, ans=0.2 2023-12-23 02:52:26,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2023-12-23 02:52:37,224 INFO [train.py:886] (0/4) Epoch 29, batch 3800, loss[loss=0.01191, audio_tagging_loss=0.01191, over 25000.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4953294.91 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:52:48,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=915053.3333333334, ans=0.125 2023-12-23 02:52:50,530 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.921e+01 3.295e+01 3.394e+01 3.553e+01 4.650e+01, threshold=6.788e+01, percent-clipped=0.0 2023-12-23 02:52:55,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=915053.3333333334, ans=0.0 2023-12-23 02:53:02,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=915120.0, ans=0.125 2023-12-23 02:53:11,797 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.31 vs. limit=6.0 2023-12-23 02:53:29,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=915320.0, ans=0.125 2023-12-23 02:53:29,909 INFO [train.py:886] (0/4) Epoch 29, batch 3850, loss[loss=0.01404, audio_tagging_loss=0.01404, over 24750.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4951992.38 frames. ], batch size: 99, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:53:32,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=915320.0, ans=0.125 2023-12-23 02:53:42,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=915386.6666666666, ans=0.1 2023-12-23 02:53:53,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=915453.3333333334, ans=0.0 2023-12-23 02:53:57,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=915453.3333333334, ans=0.125 2023-12-23 02:54:20,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=22.5 2023-12-23 02:54:21,520 INFO [train.py:886] (0/4) Epoch 29, batch 3900, loss[loss=0.01422, audio_tagging_loss=0.01422, over 22717.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4949973.07 frames. ], batch size: 107, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:54:23,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=915653.3333333334, ans=0.125 2023-12-23 02:54:30,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2023-12-23 02:54:34,394 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.806e+01 3.127e+01 3.290e+01 3.423e+01 3.887e+01, threshold=6.579e+01, percent-clipped=0.0 2023-12-23 02:54:46,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=915786.6666666666, ans=0.0 2023-12-23 02:54:46,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=915786.6666666666, ans=0.125 2023-12-23 02:55:13,359 INFO [train.py:886] (0/4) Epoch 29, batch 3950, loss[loss=0.01163, audio_tagging_loss=0.01163, over 25000.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4953474.81 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:55:17,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=915986.6666666666, ans=0.1 2023-12-23 02:55:20,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=915986.6666666666, ans=0.2 2023-12-23 02:55:22,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=916053.3333333334, ans=0.1 2023-12-23 02:55:44,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.50 vs. limit=12.0 2023-12-23 02:56:05,084 INFO [train.py:886] (0/4) Epoch 29, batch 4000, loss[loss=0.01332, audio_tagging_loss=0.01332, over 24750.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4956393.35 frames. ], batch size: 99, lr: 3.70e-03, grad_scale: 128.0 2023-12-23 02:56:08,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=916320.0, ans=0.0 2023-12-23 02:56:09,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.61 vs. limit=22.5 2023-12-23 02:56:17,057 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.016e+01 3.298e+01 3.392e+01 3.535e+01 4.607e+01, threshold=6.784e+01, percent-clipped=0.0 2023-12-23 02:56:22,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.66 vs. limit=10.0 2023-12-23 02:56:39,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=916520.0, ans=22.5 2023-12-23 02:56:43,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=916520.0, ans=0.2 2023-12-23 02:56:48,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=916586.6666666666, ans=0.125 2023-12-23 02:56:50,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=916586.6666666666, ans=0.05 2023-12-23 02:56:54,920 INFO [train.py:886] (0/4) Epoch 29, batch 4050, loss[loss=0.01762, audio_tagging_loss=0.01762, over 24942.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4959953.77 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:57:07,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=916720.0, ans=0.0 2023-12-23 02:57:15,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=916720.0, ans=0.0 2023-12-23 02:57:22,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=916786.6666666666, ans=0.125 2023-12-23 02:57:24,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=916786.6666666666, ans=0.125 2023-12-23 02:57:43,869 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.69 vs. limit=10.0 2023-12-23 02:57:47,878 INFO [train.py:886] (0/4) Epoch 29, batch 4100, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4952920.19 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:57:52,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=916986.6666666666, ans=0.125 2023-12-23 02:57:53,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2023-12-23 02:58:00,223 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.035e+01 3.272e+01 3.403e+01 3.607e+01 4.047e+01, threshold=6.806e+01, percent-clipped=0.0 2023-12-23 02:58:03,276 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2023-12-23 02:58:10,594 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2023-12-23 02:58:13,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=917120.0, ans=0.0 2023-12-23 02:58:29,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=917253.3333333334, ans=0.125 2023-12-23 02:58:33,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=917253.3333333334, ans=0.125 2023-12-23 02:58:33,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=917253.3333333334, ans=0.5 2023-12-23 02:58:39,289 INFO [train.py:886] (0/4) Epoch 29, batch 4150, loss[loss=0.012, audio_tagging_loss=0.012, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4951920.65 frames. ], batch size: 99, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:58:49,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2023-12-23 02:58:52,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=917386.6666666666, ans=0.0 2023-12-23 02:58:57,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=917386.6666666666, ans=0.125 2023-12-23 02:59:09,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=917520.0, ans=0.125 2023-12-23 02:59:19,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=917586.6666666666, ans=0.125 2023-12-23 02:59:27,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=917586.6666666666, ans=0.125 2023-12-23 02:59:30,202 INFO [train.py:886] (0/4) Epoch 29, batch 4200, loss[loss=0.01091, audio_tagging_loss=0.01091, over 25000.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4951237.69 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:59:41,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=917720.0, ans=0.0 2023-12-23 02:59:42,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=917720.0, ans=0.07 2023-12-23 02:59:43,288 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.970e+01 3.179e+01 3.334e+01 3.481e+01 3.882e+01, threshold=6.668e+01, percent-clipped=0.0 2023-12-23 02:59:58,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=917786.6666666666, ans=0.125 2023-12-23 03:00:00,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=917853.3333333334, ans=0.0 2023-12-23 03:00:21,202 INFO [train.py:886] (0/4) Epoch 29, batch 4250, loss[loss=0.01107, audio_tagging_loss=0.01107, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4951872.66 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 03:00:50,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=918120.0, ans=0.125 2023-12-23 03:00:53,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=918186.6666666666, ans=0.125 2023-12-23 03:00:54,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=918186.6666666666, ans=0.0 2023-12-23 03:00:55,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=918186.6666666666, ans=0.125 2023-12-23 03:00:59,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=918186.6666666666, ans=0.0 2023-12-23 03:00:59,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-12-23 03:01:01,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.54 vs. limit=10.0 2023-12-23 03:01:04,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=918253.3333333334, ans=0.0 2023-12-23 03:01:06,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=918253.3333333334, ans=0.0 2023-12-23 03:01:11,959 INFO [train.py:886] (0/4) Epoch 29, batch 4300, loss[loss=0.01253, audio_tagging_loss=0.01253, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4954010.41 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 03:01:14,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=918320.0, ans=0.1 2023-12-23 03:01:17,203 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2023-12-23 03:01:22,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=918386.6666666666, ans=0.07 2023-12-23 03:01:25,827 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.896e+01 3.260e+01 3.361e+01 3.479e+01 4.825e+01, threshold=6.722e+01, percent-clipped=0.0 2023-12-23 03:01:47,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=918520.0, ans=0.125 2023-12-23 03:02:02,821 INFO [train.py:886] (0/4) Epoch 29, batch 4350, loss[loss=0.0137, audio_tagging_loss=0.0137, over 24750.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4956324.94 frames. ], batch size: 99, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:02:04,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=15.0 2023-12-23 03:02:31,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=918786.6666666666, ans=0.1 2023-12-23 03:02:35,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=918853.3333333334, ans=0.0 2023-12-23 03:02:39,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=918853.3333333334, ans=0.0 2023-12-23 03:02:39,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=918853.3333333334, ans=0.1 2023-12-23 03:02:42,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=918920.0, ans=0.2 2023-12-23 03:02:54,085 INFO [train.py:886] (0/4) Epoch 29, batch 4400, loss[loss=0.0127, audio_tagging_loss=0.0127, over 24750.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4949259.82 frames. ], batch size: 99, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:02:58,858 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:03:06,951 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.926e+01 3.260e+01 3.393e+01 3.638e+01 4.099e+01, threshold=6.787e+01, percent-clipped=0.0 2023-12-23 03:03:22,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=919120.0, ans=0.125 2023-12-23 03:03:32,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=919186.6666666666, ans=0.125 2023-12-23 03:03:45,723 INFO [train.py:886] (0/4) Epoch 29, batch 4450, loss[loss=0.01332, audio_tagging_loss=0.01332, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4948232.01 frames. ], batch size: 99, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:03:48,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=919320.0, ans=0.2 2023-12-23 03:03:54,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=919320.0, ans=0.5 2023-12-23 03:04:04,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=919386.6666666666, ans=0.125 2023-12-23 03:04:06,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=919453.3333333334, ans=0.1 2023-12-23 03:04:23,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=919520.0, ans=0.125 2023-12-23 03:04:30,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=919586.6666666666, ans=0.125 2023-12-23 03:04:38,050 INFO [train.py:886] (0/4) Epoch 29, batch 4500, loss[loss=0.01301, audio_tagging_loss=0.01301, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4953174.49 frames. ], batch size: 100, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:04:51,067 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.905e+01 3.193e+01 3.342e+01 3.524e+01 4.106e+01, threshold=6.683e+01, percent-clipped=0.0 2023-12-23 03:05:10,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=919853.3333333334, ans=0.125 2023-12-23 03:05:29,608 INFO [train.py:886] (0/4) Epoch 29, batch 4550, loss[loss=0.01184, audio_tagging_loss=0.01184, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4953458.91 frames. ], batch size: 100, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:05:52,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=920120.0, ans=0.125 2023-12-23 03:06:02,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=920186.6666666666, ans=0.125 2023-12-23 03:06:06,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=920186.6666666666, ans=0.125 2023-12-23 03:06:08,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=920186.6666666666, ans=0.125 2023-12-23 03:06:21,697 INFO [train.py:886] (0/4) Epoch 29, batch 4600, loss[loss=0.01433, audio_tagging_loss=0.01433, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4955435.08 frames. ], batch size: 100, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:06:26,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=920320.0, ans=0.125 2023-12-23 03:06:31,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=920386.6666666666, ans=0.2 2023-12-23 03:06:35,366 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.907e+01 3.235e+01 3.344e+01 3.450e+01 3.985e+01, threshold=6.687e+01, percent-clipped=0.0 2023-12-23 03:06:37,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=920386.6666666666, ans=0.125 2023-12-23 03:06:40,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=920386.6666666666, ans=0.125 2023-12-23 03:06:53,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=920520.0, ans=0.125 2023-12-23 03:07:12,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=920586.6666666666, ans=0.0 2023-12-23 03:07:13,865 INFO [train.py:886] (0/4) Epoch 29, batch 4650, loss[loss=0.01512, audio_tagging_loss=0.01512, over 25000.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4956966.19 frames. ], batch size: 100, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:07:14,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=920653.3333333334, ans=0.07 2023-12-23 03:07:37,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=920786.6666666666, ans=0.0 2023-12-23 03:07:38,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=920786.6666666666, ans=0.125 2023-12-23 03:07:49,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=920853.3333333334, ans=0.2 2023-12-23 03:08:02,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=920920.0, ans=0.0 2023-12-23 03:08:04,171 INFO [train.py:886] (0/4) Epoch 29, batch 4700, loss[loss=0.01394, audio_tagging_loss=0.01394, over 24750.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4956777.97 frames. ], batch size: 99, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:08:07,325 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.49 vs. limit=10.0 2023-12-23 03:08:15,671 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.65 vs. limit=6.0 2023-12-23 03:08:15,993 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.987e+01 3.319e+01 3.435e+01 3.584e+01 4.089e+01, threshold=6.869e+01, percent-clipped=0.0 2023-12-23 03:08:47,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=921253.3333333334, ans=0.2 2023-12-23 03:08:51,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=921320.0, ans=0.1 2023-12-23 03:08:51,969 INFO [train.py:886] (0/4) Epoch 29, batch 4750, loss[loss=0.01166, audio_tagging_loss=0.01166, over 24750.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4950088.08 frames. ], batch size: 99, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:09:00,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2023-12-23 03:09:03,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=921386.6666666666, ans=0.125 2023-12-23 03:09:07,142 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-29.pt 2023-12-23 03:09:26,348 INFO [train.py:886] (0/4) Epoch 30, batch 0, loss[loss=0.03192, audio_tagging_loss=0.03192, over 25000.00 frames. ], tot_loss[loss=0.03192, audio_tagging_loss=0.03192, over 25000.00 frames. ], batch size: 100, lr: 3.63e-03, grad_scale: 32.0 2023-12-23 03:09:26,349 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 03:09:33,853 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0778, 5.8562, 5.7942, 5.9696], device='cuda:0') 2023-12-23 03:09:47,395 INFO [train.py:917] (0/4) Epoch 30, validation: loss=0.03363, audio_tagging_loss=0.03363, over 3737520.00 frames. 2023-12-23 03:09:47,396 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 03:10:03,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=921493.3333333334, ans=0.0 2023-12-23 03:10:06,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=921560.0, ans=0.0 2023-12-23 03:10:22,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=921626.6666666666, ans=0.0 2023-12-23 03:10:22,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2023-12-23 03:10:35,649 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.955e+01 3.394e+01 3.725e+01 4.735e+01 9.451e+01, threshold=7.450e+01, percent-clipped=7.0 2023-12-23 03:10:36,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=921760.0, ans=0.125 2023-12-23 03:10:37,596 INFO [train.py:886] (0/4) Epoch 30, batch 50, loss[loss=0.01841, audio_tagging_loss=0.01841, over 25000.00 frames. ], tot_loss[loss=0.02009, audio_tagging_loss=0.02009, over 1115883.10 frames. ], batch size: 100, lr: 3.63e-03, grad_scale: 32.0 2023-12-23 03:10:38,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=921760.0, ans=0.2 2023-12-23 03:10:40,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.69 vs. limit=10.0 2023-12-23 03:11:03,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=921893.3333333334, ans=0.125 2023-12-23 03:11:07,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=921893.3333333334, ans=0.0 2023-12-23 03:11:15,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.92 vs. limit=10.0 2023-12-23 03:11:20,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=922026.6666666666, ans=0.125 2023-12-23 03:11:30,718 INFO [train.py:886] (0/4) Epoch 30, batch 100, loss[loss=0.01595, audio_tagging_loss=0.01595, over 25000.00 frames. ], tot_loss[loss=0.01724, audio_tagging_loss=0.01724, over 1973360.36 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:11:32,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.64 vs. limit=22.5 2023-12-23 03:11:34,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=922093.3333333334, ans=0.2 2023-12-23 03:11:41,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=922160.0, ans=0.0 2023-12-23 03:11:45,310 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:12:12,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=922360.0, ans=0.125 2023-12-23 03:12:14,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=922360.0, ans=0.0 2023-12-23 03:12:17,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=922360.0, ans=0.07 2023-12-23 03:12:18,431 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.004e+01 3.542e+01 3.730e+01 3.937e+01 4.567e+01, threshold=7.459e+01, percent-clipped=0.0 2023-12-23 03:12:20,992 INFO [train.py:886] (0/4) Epoch 30, batch 150, loss[loss=0.01315, audio_tagging_loss=0.01315, over 24067.00 frames. ], tot_loss[loss=0.01566, audio_tagging_loss=0.01566, over 2638100.84 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:12:32,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=922493.3333333334, ans=0.0 2023-12-23 03:12:48,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.52 vs. limit=15.0 2023-12-23 03:12:53,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=15.0 2023-12-23 03:12:59,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=922626.6666666666, ans=0.1 2023-12-23 03:13:05,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=922693.3333333334, ans=0.2 2023-12-23 03:13:09,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=922693.3333333334, ans=0.0 2023-12-23 03:13:10,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=922693.3333333334, ans=0.125 2023-12-23 03:13:12,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=922760.0, ans=0.0 2023-12-23 03:13:13,250 INFO [train.py:886] (0/4) Epoch 30, batch 200, loss[loss=0.01196, audio_tagging_loss=0.01196, over 25000.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 3149149.75 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:13:14,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=922760.0, ans=0.1 2023-12-23 03:14:02,674 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.057e+01 3.258e+01 3.386e+01 3.497e+01 4.082e+01, threshold=6.772e+01, percent-clipped=0.0 2023-12-23 03:14:05,232 INFO [train.py:886] (0/4) Epoch 30, batch 250, loss[loss=0.01107, audio_tagging_loss=0.01107, over 21964.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 3554694.40 frames. ], batch size: 107, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:14:07,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=923093.3333333334, ans=0.2 2023-12-23 03:14:26,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=15.0 2023-12-23 03:14:28,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=923226.6666666666, ans=0.1 2023-12-23 03:14:30,068 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2023-12-23 03:14:30,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=923226.6666666666, ans=0.1 2023-12-23 03:14:49,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.63 vs. limit=12.0 2023-12-23 03:14:56,017 INFO [train.py:886] (0/4) Epoch 30, batch 300, loss[loss=0.0141, audio_tagging_loss=0.0141, over 25000.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 3861924.52 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:15:10,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=923493.3333333334, ans=0.125 2023-12-23 03:15:12,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=923493.3333333334, ans=0.0 2023-12-23 03:15:15,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=923493.3333333334, ans=0.125 2023-12-23 03:15:19,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.34 vs. limit=10.0 2023-12-23 03:15:23,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=923560.0, ans=0.125 2023-12-23 03:15:26,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=923626.6666666666, ans=0.2 2023-12-23 03:15:44,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=923693.3333333334, ans=0.125 2023-12-23 03:15:46,800 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.964e+01 3.195e+01 3.394e+01 3.540e+01 4.201e+01, threshold=6.789e+01, percent-clipped=0.0 2023-12-23 03:15:48,769 INFO [train.py:886] (0/4) Epoch 30, batch 350, loss[loss=0.01198, audio_tagging_loss=0.01198, over 24750.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4098836.12 frames. ], batch size: 99, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:15:49,926 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:15:50,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=923760.0, ans=0.0 2023-12-23 03:16:07,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2023-12-23 03:16:26,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=923960.0, ans=0.1 2023-12-23 03:16:37,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2023-12-23 03:16:39,591 INFO [train.py:886] (0/4) Epoch 30, batch 400, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4285967.61 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:16:45,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=924093.3333333334, ans=0.0 2023-12-23 03:17:22,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=924360.0, ans=0.125 2023-12-23 03:17:29,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=924360.0, ans=0.2 2023-12-23 03:17:30,134 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.820e+01 3.158e+01 3.345e+01 3.496e+01 3.973e+01, threshold=6.691e+01, percent-clipped=0.0 2023-12-23 03:17:32,070 INFO [train.py:886] (0/4) Epoch 30, batch 450, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.01313, audio_tagging_loss=0.01313, over 4435513.93 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:17:45,369 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:17:49,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2023-12-23 03:17:53,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=924560.0, ans=0.125 2023-12-23 03:18:01,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.48 vs. limit=22.5 2023-12-23 03:18:18,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=924693.3333333334, ans=0.0 2023-12-23 03:18:24,820 INFO [train.py:886] (0/4) Epoch 30, batch 500, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4555105.02 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:18:36,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=924826.6666666666, ans=0.1 2023-12-23 03:18:49,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=924893.3333333334, ans=0.0 2023-12-23 03:19:11,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=925026.6666666666, ans=0.125 2023-12-23 03:19:13,977 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.840e+01 3.274e+01 3.359e+01 3.528e+01 4.075e+01, threshold=6.718e+01, percent-clipped=0.0 2023-12-23 03:19:15,862 INFO [train.py:886] (0/4) Epoch 30, batch 550, loss[loss=0.01618, audio_tagging_loss=0.01618, over 25000.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4649471.40 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:19:36,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=925226.6666666666, ans=0.125 2023-12-23 03:19:43,565 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:19:59,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=925360.0, ans=0.125 2023-12-23 03:20:08,785 INFO [train.py:886] (0/4) Epoch 30, batch 600, loss[loss=0.01184, audio_tagging_loss=0.01184, over 24750.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4712349.05 frames. ], batch size: 99, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:20:10,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=925426.6666666666, ans=0.125 2023-12-23 03:20:12,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=925426.6666666666, ans=0.125 2023-12-23 03:20:16,430 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:20:18,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=925493.3333333334, ans=0.1 2023-12-23 03:20:20,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2023-12-23 03:20:29,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-12-23 03:20:50,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=925693.3333333334, ans=0.125 2023-12-23 03:20:51,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.84 vs. limit=15.0 2023-12-23 03:20:57,412 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.002e+01 3.230e+01 3.382e+01 3.569e+01 4.053e+01, threshold=6.765e+01, percent-clipped=0.0 2023-12-23 03:20:59,380 INFO [train.py:886] (0/4) Epoch 30, batch 650, loss[loss=0.01513, audio_tagging_loss=0.01513, over 24750.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4759971.89 frames. ], batch size: 99, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:21:15,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.95 vs. limit=10.0 2023-12-23 03:21:35,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=925960.0, ans=0.125 2023-12-23 03:21:36,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=925960.0, ans=0.0 2023-12-23 03:21:40,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=926026.6666666666, ans=10.0 2023-12-23 03:21:50,679 INFO [train.py:886] (0/4) Epoch 30, batch 700, loss[loss=0.01223, audio_tagging_loss=0.01223, over 25000.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4798687.26 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:21:59,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=926093.3333333334, ans=0.125 2023-12-23 03:22:04,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=926160.0, ans=0.1 2023-12-23 03:22:14,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=926226.6666666666, ans=0.0 2023-12-23 03:22:17,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=926226.6666666666, ans=0.125 2023-12-23 03:22:19,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=926226.6666666666, ans=0.0 2023-12-23 03:22:25,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=926293.3333333334, ans=0.0 2023-12-23 03:22:37,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=926360.0, ans=0.2 2023-12-23 03:22:41,096 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.868e+01 3.229e+01 3.388e+01 3.605e+01 3.882e+01, threshold=6.777e+01, percent-clipped=0.0 2023-12-23 03:22:43,039 INFO [train.py:886] (0/4) Epoch 30, batch 750, loss[loss=0.01148, audio_tagging_loss=0.01148, over 25000.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4833069.47 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:22:51,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=926426.6666666666, ans=0.125 2023-12-23 03:22:58,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=926493.3333333334, ans=0.125 2023-12-23 03:22:59,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=926493.3333333334, ans=0.125 2023-12-23 03:23:01,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.43 vs. limit=22.5 2023-12-23 03:23:01,853 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2023-12-23 03:23:06,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=926560.0, ans=0.2 2023-12-23 03:23:34,698 INFO [train.py:886] (0/4) Epoch 30, batch 800, loss[loss=0.01321, audio_tagging_loss=0.01321, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4859907.83 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:23:43,278 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2023-12-23 03:23:51,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=926826.6666666666, ans=0.0 2023-12-23 03:24:13,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=926960.0, ans=0.125 2023-12-23 03:24:23,222 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.98 vs. limit=22.5 2023-12-23 03:24:24,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2023-12-23 03:24:24,602 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.909e+01 3.234e+01 3.358e+01 3.542e+01 4.031e+01, threshold=6.717e+01, percent-clipped=0.0 2023-12-23 03:24:26,565 INFO [train.py:886] (0/4) Epoch 30, batch 850, loss[loss=0.01454, audio_tagging_loss=0.01454, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4886426.65 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:24:33,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=927093.3333333334, ans=0.1 2023-12-23 03:24:41,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=927160.0, ans=10.0 2023-12-23 03:24:53,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=927226.6666666666, ans=0.125 2023-12-23 03:24:56,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=927226.6666666666, ans=0.1 2023-12-23 03:25:03,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=927293.3333333334, ans=0.2 2023-12-23 03:25:19,747 INFO [train.py:886] (0/4) Epoch 30, batch 900, loss[loss=0.01281, audio_tagging_loss=0.01281, over 24750.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4899998.93 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:25:20,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=927426.6666666666, ans=0.125 2023-12-23 03:25:22,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=927426.6666666666, ans=0.0 2023-12-23 03:25:23,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2023-12-23 03:25:34,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=927493.3333333334, ans=0.125 2023-12-23 03:25:44,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=927560.0, ans=0.0 2023-12-23 03:26:07,767 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.944e+01 3.263e+01 3.411e+01 3.563e+01 4.036e+01, threshold=6.823e+01, percent-clipped=0.0 2023-12-23 03:26:10,370 INFO [train.py:886] (0/4) Epoch 30, batch 950, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4904878.46 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:26:15,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=927760.0, ans=0.0 2023-12-23 03:26:18,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=927760.0, ans=0.125 2023-12-23 03:26:18,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2023-12-23 03:26:27,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=927826.6666666666, ans=0.5 2023-12-23 03:26:33,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=927893.3333333334, ans=10.0 2023-12-23 03:26:41,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=15.0 2023-12-23 03:26:48,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=927960.0, ans=0.5 2023-12-23 03:26:49,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.94 vs. limit=10.0 2023-12-23 03:26:51,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=928026.6666666666, ans=0.05 2023-12-23 03:27:02,697 INFO [train.py:886] (0/4) Epoch 30, batch 1000, loss[loss=0.01016, audio_tagging_loss=0.01016, over 25000.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4909880.99 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:27:04,877 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:27:15,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=928160.0, ans=0.0 2023-12-23 03:27:20,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=928160.0, ans=0.125 2023-12-23 03:27:37,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=928293.3333333334, ans=0.2 2023-12-23 03:27:45,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.89 vs. limit=22.5 2023-12-23 03:27:52,138 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.864e+01 3.168e+01 3.324e+01 3.493e+01 4.167e+01, threshold=6.648e+01, percent-clipped=0.0 2023-12-23 03:27:54,028 INFO [train.py:886] (0/4) Epoch 30, batch 1050, loss[loss=0.01199, audio_tagging_loss=0.01199, over 24750.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4919626.62 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:28:15,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=928560.0, ans=0.1 2023-12-23 03:28:44,576 INFO [train.py:886] (0/4) Epoch 30, batch 1100, loss[loss=0.01374, audio_tagging_loss=0.01374, over 25000.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4925741.63 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:28:45,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.12 vs. limit=12.0 2023-12-23 03:28:49,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=928760.0, ans=0.0 2023-12-23 03:29:08,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=928893.3333333334, ans=0.2 2023-12-23 03:29:13,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=928893.3333333334, ans=0.07 2023-12-23 03:29:27,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2023-12-23 03:29:35,375 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.868e+01 3.223e+01 3.331e+01 3.541e+01 4.042e+01, threshold=6.662e+01, percent-clipped=0.0 2023-12-23 03:29:37,296 INFO [train.py:886] (0/4) Epoch 30, batch 1150, loss[loss=0.01295, audio_tagging_loss=0.01295, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4935128.44 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:29:47,614 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.46 vs. limit=5.0 2023-12-23 03:30:10,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=929293.3333333334, ans=0.125 2023-12-23 03:30:27,921 INFO [train.py:886] (0/4) Epoch 30, batch 1200, loss[loss=0.01385, audio_tagging_loss=0.01385, over 25000.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4940663.34 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:30:30,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=929426.6666666666, ans=0.0 2023-12-23 03:31:02,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=929626.6666666666, ans=0.125 2023-12-23 03:31:11,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.97 vs. limit=15.0 2023-12-23 03:31:18,622 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.002e+01 3.244e+01 3.373e+01 3.542e+01 4.199e+01, threshold=6.745e+01, percent-clipped=0.0 2023-12-23 03:31:20,509 INFO [train.py:886] (0/4) Epoch 30, batch 1250, loss[loss=0.0129, audio_tagging_loss=0.0129, over 24750.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4939745.66 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:31:27,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=929760.0, ans=0.125 2023-12-23 03:31:30,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=929826.6666666666, ans=0.125 2023-12-23 03:31:46,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=929893.3333333334, ans=0.025 2023-12-23 03:31:47,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=929893.3333333334, ans=0.1 2023-12-23 03:31:53,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=929960.0, ans=0.0 2023-12-23 03:32:12,533 INFO [train.py:886] (0/4) Epoch 30, batch 1300, loss[loss=0.0152, audio_tagging_loss=0.0152, over 25000.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4939022.86 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:32:16,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=930093.3333333334, ans=0.0 2023-12-23 03:32:22,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=930160.0, ans=0.125 2023-12-23 03:32:30,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=930160.0, ans=0.1 2023-12-23 03:32:36,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=930226.6666666666, ans=0.125 2023-12-23 03:32:40,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=930226.6666666666, ans=0.125 2023-12-23 03:32:41,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=930293.3333333334, ans=0.125 2023-12-23 03:32:48,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=930293.3333333334, ans=0.125 2023-12-23 03:32:49,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=930293.3333333334, ans=0.125 2023-12-23 03:33:01,544 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.974e+01 3.237e+01 3.393e+01 3.536e+01 4.080e+01, threshold=6.786e+01, percent-clipped=0.0 2023-12-23 03:33:03,504 INFO [train.py:886] (0/4) Epoch 30, batch 1350, loss[loss=0.0153, audio_tagging_loss=0.0153, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4942282.34 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:33:04,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=930426.6666666666, ans=0.0 2023-12-23 03:33:21,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=930493.3333333334, ans=0.0 2023-12-23 03:33:32,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=930560.0, ans=0.125 2023-12-23 03:33:36,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=930626.6666666666, ans=0.04949747468305833 2023-12-23 03:33:53,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=930693.3333333334, ans=0.125 2023-12-23 03:33:55,486 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.10 vs. limit=10.0 2023-12-23 03:33:55,840 INFO [train.py:886] (0/4) Epoch 30, batch 1400, loss[loss=0.01507, audio_tagging_loss=0.01507, over 25000.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4948783.26 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:34:01,838 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=15.0 2023-12-23 03:34:12,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=930826.6666666666, ans=0.2 2023-12-23 03:34:14,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=930893.3333333334, ans=0.2 2023-12-23 03:34:29,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.70 vs. limit=15.0 2023-12-23 03:34:34,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=930960.0, ans=0.2 2023-12-23 03:34:39,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=931026.6666666666, ans=0.2 2023-12-23 03:34:44,810 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.927e+01 3.190e+01 3.312e+01 3.498e+01 3.963e+01, threshold=6.624e+01, percent-clipped=0.0 2023-12-23 03:34:46,707 INFO [train.py:886] (0/4) Epoch 30, batch 1450, loss[loss=0.01146, audio_tagging_loss=0.01146, over 25000.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4951893.98 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:34:48,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.94 vs. limit=6.0 2023-12-23 03:34:48,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=931093.3333333334, ans=0.0 2023-12-23 03:34:53,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=931093.3333333334, ans=0.1 2023-12-23 03:34:58,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=931160.0, ans=0.09899494936611666 2023-12-23 03:35:00,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=931160.0, ans=0.0 2023-12-23 03:35:05,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.35 vs. limit=15.0 2023-12-23 03:35:36,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=931360.0, ans=0.125 2023-12-23 03:35:38,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=931426.6666666666, ans=0.125 2023-12-23 03:35:38,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=931426.6666666666, ans=0.0 2023-12-23 03:35:39,327 INFO [train.py:886] (0/4) Epoch 30, batch 1500, loss[loss=0.01104, audio_tagging_loss=0.01104, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4949703.40 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:35:41,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=931426.6666666666, ans=0.2 2023-12-23 03:35:47,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=931426.6666666666, ans=0.09899494936611666 2023-12-23 03:35:51,972 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=22.5 2023-12-23 03:36:08,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=931560.0, ans=0.1 2023-12-23 03:36:12,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.18 vs. limit=15.0 2023-12-23 03:36:21,622 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=12.0 2023-12-23 03:36:22,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=931693.3333333334, ans=0.1 2023-12-23 03:36:29,272 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.936e+01 3.252e+01 3.425e+01 3.586e+01 4.549e+01, threshold=6.850e+01, percent-clipped=0.0 2023-12-23 03:36:31,152 INFO [train.py:886] (0/4) Epoch 30, batch 1550, loss[loss=0.009677, audio_tagging_loss=0.009677, over 24750.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4950742.39 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:36:40,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=931760.0, ans=0.1 2023-12-23 03:36:41,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=931826.6666666666, ans=0.07 2023-12-23 03:36:59,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=931893.3333333334, ans=0.5 2023-12-23 03:37:02,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=931960.0, ans=0.0 2023-12-23 03:37:02,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=931960.0, ans=0.0 2023-12-23 03:37:06,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.48 vs. limit=15.0 2023-12-23 03:37:12,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=932026.6666666666, ans=0.2 2023-12-23 03:37:19,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=932026.6666666666, ans=0.125 2023-12-23 03:37:23,207 INFO [train.py:886] (0/4) Epoch 30, batch 1600, loss[loss=0.01152, audio_tagging_loss=0.01152, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4945968.01 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:37:33,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-12-23 03:37:44,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=932226.6666666666, ans=0.2 2023-12-23 03:37:49,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2023-12-23 03:38:08,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=932360.0, ans=0.125 2023-12-23 03:38:14,419 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.061e+01 3.234e+01 3.380e+01 3.507e+01 4.339e+01, threshold=6.761e+01, percent-clipped=0.0 2023-12-23 03:38:16,307 INFO [train.py:886] (0/4) Epoch 30, batch 1650, loss[loss=0.01248, audio_tagging_loss=0.01248, over 25000.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4942024.62 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:38:21,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=932426.6666666666, ans=0.125 2023-12-23 03:38:31,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=932493.3333333334, ans=0.1 2023-12-23 03:38:35,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.93 vs. limit=22.5 2023-12-23 03:38:41,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=932560.0, ans=0.125 2023-12-23 03:38:43,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=932560.0, ans=0.125 2023-12-23 03:38:57,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=932693.3333333334, ans=0.2 2023-12-23 03:38:58,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=932693.3333333334, ans=0.0 2023-12-23 03:39:07,496 INFO [train.py:886] (0/4) Epoch 30, batch 1700, loss[loss=0.01094, audio_tagging_loss=0.01094, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4943521.63 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:39:08,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=932760.0, ans=22.5 2023-12-23 03:39:26,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=932826.6666666666, ans=0.125 2023-12-23 03:39:46,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=932960.0, ans=0.1 2023-12-23 03:39:54,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=933026.6666666666, ans=0.0 2023-12-23 03:39:58,100 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.929e+01 3.264e+01 3.396e+01 3.507e+01 4.450e+01, threshold=6.793e+01, percent-clipped=0.0 2023-12-23 03:40:00,113 INFO [train.py:886] (0/4) Epoch 30, batch 1750, loss[loss=0.01299, audio_tagging_loss=0.01299, over 24750.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4938866.70 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:40:08,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=933093.3333333334, ans=0.0 2023-12-23 03:40:13,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.34 vs. limit=15.0 2023-12-23 03:40:17,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.70 vs. limit=22.5 2023-12-23 03:40:36,708 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-140000.pt 2023-12-23 03:40:39,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=933293.3333333334, ans=0.0 2023-12-23 03:40:45,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=933360.0, ans=0.0 2023-12-23 03:40:46,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2023-12-23 03:40:52,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=933360.0, ans=0.125 2023-12-23 03:40:55,028 INFO [train.py:886] (0/4) Epoch 30, batch 1800, loss[loss=0.01103, audio_tagging_loss=0.01103, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4948508.96 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:41:28,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.99 vs. limit=6.0 2023-12-23 03:41:43,725 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.809e+01 3.246e+01 3.412e+01 3.538e+01 4.076e+01, threshold=6.825e+01, percent-clipped=0.0 2023-12-23 03:41:45,630 INFO [train.py:886] (0/4) Epoch 30, batch 1850, loss[loss=0.01163, audio_tagging_loss=0.01163, over 24750.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4951468.51 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:41:46,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=933760.0, ans=0.1 2023-12-23 03:41:53,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=933760.0, ans=0.1 2023-12-23 03:42:00,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2023-12-23 03:42:17,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=933960.0, ans=0.1 2023-12-23 03:42:27,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=934026.6666666666, ans=0.125 2023-12-23 03:42:37,475 INFO [train.py:886] (0/4) Epoch 30, batch 1900, loss[loss=0.01576, audio_tagging_loss=0.01576, over 24750.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4948294.97 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:42:52,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=934160.0, ans=0.0 2023-12-23 03:42:58,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=934226.6666666666, ans=0.025 2023-12-23 03:43:24,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=934360.0, ans=0.125 2023-12-23 03:43:26,845 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.989e+01 3.318e+01 3.464e+01 3.638e+01 4.261e+01, threshold=6.929e+01, percent-clipped=0.0 2023-12-23 03:43:29,480 INFO [train.py:886] (0/4) Epoch 30, batch 1950, loss[loss=0.009379, audio_tagging_loss=0.009379, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4942629.13 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:43:37,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.76 vs. limit=6.0 2023-12-23 03:43:39,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=934493.3333333334, ans=0.0 2023-12-23 03:44:06,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=934626.6666666666, ans=0.0 2023-12-23 03:44:08,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2023-12-23 03:44:12,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=934693.3333333334, ans=0.0 2023-12-23 03:44:16,077 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:44:21,277 INFO [train.py:886] (0/4) Epoch 30, batch 2000, loss[loss=0.01156, audio_tagging_loss=0.01156, over 24750.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4945295.96 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:44:28,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=934760.0, ans=0.0 2023-12-23 03:44:28,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=934760.0, ans=0.125 2023-12-23 03:44:38,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=934826.6666666666, ans=0.09899494936611666 2023-12-23 03:44:53,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=934960.0, ans=0.09899494936611666 2023-12-23 03:44:54,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=934960.0, ans=0.0 2023-12-23 03:44:59,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=934960.0, ans=0.125 2023-12-23 03:45:11,877 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.798e+01 3.193e+01 3.321e+01 3.490e+01 4.233e+01, threshold=6.642e+01, percent-clipped=0.0 2023-12-23 03:45:13,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=935093.3333333334, ans=0.125 2023-12-23 03:45:13,808 INFO [train.py:886] (0/4) Epoch 30, batch 2050, loss[loss=0.01266, audio_tagging_loss=0.01266, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4950829.10 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:46:05,440 INFO [train.py:886] (0/4) Epoch 30, batch 2100, loss[loss=0.01375, audio_tagging_loss=0.01375, over 21908.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4948583.21 frames. ], batch size: 107, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:46:31,555 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2023-12-23 03:46:39,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.64 vs. limit=15.0 2023-12-23 03:46:54,847 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.885e+01 3.237e+01 3.407e+01 3.545e+01 4.095e+01, threshold=6.814e+01, percent-clipped=0.0 2023-12-23 03:46:56,775 INFO [train.py:886] (0/4) Epoch 30, batch 2150, loss[loss=0.01381, audio_tagging_loss=0.01381, over 24750.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4951914.09 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:47:01,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=935760.0, ans=0.125 2023-12-23 03:47:02,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=935760.0, ans=0.125 2023-12-23 03:47:05,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=935760.0, ans=0.2 2023-12-23 03:47:20,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=935893.3333333334, ans=0.0 2023-12-23 03:47:21,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=935893.3333333334, ans=0.125 2023-12-23 03:47:29,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=935960.0, ans=0.125 2023-12-23 03:47:40,570 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2023-12-23 03:47:49,774 INFO [train.py:886] (0/4) Epoch 30, batch 2200, loss[loss=0.01318, audio_tagging_loss=0.01318, over 24750.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4944987.44 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:47:51,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=936093.3333333334, ans=0.0 2023-12-23 03:47:54,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=936093.3333333334, ans=0.125 2023-12-23 03:47:56,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=936093.3333333334, ans=0.125 2023-12-23 03:48:18,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=936226.6666666666, ans=0.1 2023-12-23 03:48:37,955 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.929e+01 3.275e+01 3.471e+01 3.594e+01 4.076e+01, threshold=6.942e+01, percent-clipped=0.0 2023-12-23 03:48:39,974 INFO [train.py:886] (0/4) Epoch 30, batch 2250, loss[loss=0.01259, audio_tagging_loss=0.01259, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4941845.08 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:48:45,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2023-12-23 03:49:13,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2023-12-23 03:49:16,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=936626.6666666666, ans=0.09899494936611666 2023-12-23 03:49:16,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=936626.6666666666, ans=0.125 2023-12-23 03:49:16,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=936626.6666666666, ans=0.0 2023-12-23 03:49:21,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=936693.3333333334, ans=0.125 2023-12-23 03:49:25,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=936693.3333333334, ans=0.1 2023-12-23 03:49:32,505 INFO [train.py:886] (0/4) Epoch 30, batch 2300, loss[loss=0.01217, audio_tagging_loss=0.01217, over 24750.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4942579.38 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:49:33,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=936760.0, ans=0.0 2023-12-23 03:49:38,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=936760.0, ans=0.1 2023-12-23 03:49:41,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=936826.6666666666, ans=0.125 2023-12-23 03:49:53,612 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:49:58,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=936893.3333333334, ans=0.0 2023-12-23 03:50:01,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=936893.3333333334, ans=0.1 2023-12-23 03:50:21,590 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.901e+01 3.201e+01 3.316e+01 3.475e+01 3.980e+01, threshold=6.632e+01, percent-clipped=0.0 2023-12-23 03:50:24,259 INFO [train.py:886] (0/4) Epoch 30, batch 2350, loss[loss=0.01072, audio_tagging_loss=0.01072, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4945950.23 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:50:41,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=937160.0, ans=0.125 2023-12-23 03:50:45,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=937226.6666666666, ans=0.125 2023-12-23 03:50:51,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=937226.6666666666, ans=0.125 2023-12-23 03:50:54,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=937293.3333333334, ans=0.125 2023-12-23 03:51:12,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=937360.0, ans=0.0 2023-12-23 03:51:15,743 INFO [train.py:886] (0/4) Epoch 30, batch 2400, loss[loss=0.01275, audio_tagging_loss=0.01275, over 24009.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4949008.13 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:51:37,630 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:51:47,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=937626.6666666666, ans=0.125 2023-12-23 03:51:47,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=937626.6666666666, ans=0.125 2023-12-23 03:51:51,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2023-12-23 03:51:52,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=937626.6666666666, ans=0.1 2023-12-23 03:52:03,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=937693.3333333334, ans=0.025 2023-12-23 03:52:05,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=937693.3333333334, ans=0.2 2023-12-23 03:52:05,952 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.939e+01 3.251e+01 3.352e+01 3.497e+01 5.027e+01, threshold=6.704e+01, percent-clipped=0.0 2023-12-23 03:52:08,553 INFO [train.py:886] (0/4) Epoch 30, batch 2450, loss[loss=0.01338, audio_tagging_loss=0.01338, over 25000.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4955342.99 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:52:10,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=937760.0, ans=0.125 2023-12-23 03:52:26,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=937893.3333333334, ans=0.125 2023-12-23 03:52:26,739 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:52:31,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=22.5 2023-12-23 03:52:33,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=937893.3333333334, ans=0.125 2023-12-23 03:52:48,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=938026.6666666666, ans=0.2 2023-12-23 03:52:58,798 INFO [train.py:886] (0/4) Epoch 30, batch 2500, loss[loss=0.01527, audio_tagging_loss=0.01527, over 22009.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4945741.81 frames. ], batch size: 107, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:53:00,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=938093.3333333334, ans=0.2 2023-12-23 03:53:00,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2023-12-23 03:53:07,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=938093.3333333334, ans=0.1 2023-12-23 03:53:14,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=938160.0, ans=0.125 2023-12-23 03:53:21,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=12.0 2023-12-23 03:53:21,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=938226.6666666666, ans=0.125 2023-12-23 03:53:29,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=938293.3333333334, ans=0.0 2023-12-23 03:53:31,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=938293.3333333334, ans=0.2 2023-12-23 03:53:49,318 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.034e+01 3.342e+01 3.462e+01 3.638e+01 4.220e+01, threshold=6.925e+01, percent-clipped=0.0 2023-12-23 03:53:51,303 INFO [train.py:886] (0/4) Epoch 30, batch 2550, loss[loss=0.0117, audio_tagging_loss=0.0117, over 22742.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4943924.27 frames. ], batch size: 107, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:53:56,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=938426.6666666666, ans=0.125 2023-12-23 03:53:57,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=938426.6666666666, ans=0.125 2023-12-23 03:53:58,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=938426.6666666666, ans=0.04949747468305833 2023-12-23 03:54:06,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=938493.3333333334, ans=10.0 2023-12-23 03:54:09,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=938493.3333333334, ans=0.1 2023-12-23 03:54:42,917 INFO [train.py:886] (0/4) Epoch 30, batch 2600, loss[loss=0.01315, audio_tagging_loss=0.01315, over 24075.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4946807.57 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:54:48,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=938760.0, ans=0.125 2023-12-23 03:55:12,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=938893.3333333334, ans=0.125 2023-12-23 03:55:30,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=939026.6666666666, ans=0.125 2023-12-23 03:55:32,985 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.923e+01 3.221e+01 3.370e+01 3.530e+01 4.052e+01, threshold=6.740e+01, percent-clipped=0.0 2023-12-23 03:55:34,924 INFO [train.py:886] (0/4) Epoch 30, batch 2650, loss[loss=0.01349, audio_tagging_loss=0.01349, over 25000.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4954343.88 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:55:54,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2023-12-23 03:56:03,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=939226.6666666666, ans=0.125 2023-12-23 03:56:06,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=939293.3333333334, ans=0.05 2023-12-23 03:56:18,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.92 vs. limit=15.0 2023-12-23 03:56:19,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=939360.0, ans=0.0 2023-12-23 03:56:26,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=939360.0, ans=0.125 2023-12-23 03:56:28,533 INFO [train.py:886] (0/4) Epoch 30, batch 2700, loss[loss=0.009332, audio_tagging_loss=0.009332, over 24071.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4954966.89 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:56:29,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=939426.6666666666, ans=0.1 2023-12-23 03:56:47,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=939560.0, ans=0.1 2023-12-23 03:56:50,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=939560.0, ans=0.2 2023-12-23 03:56:50,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=939560.0, ans=0.125 2023-12-23 03:57:16,652 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.960e+01 3.256e+01 3.372e+01 3.519e+01 4.379e+01, threshold=6.744e+01, percent-clipped=0.0 2023-12-23 03:57:18,574 INFO [train.py:886] (0/4) Epoch 30, batch 2750, loss[loss=0.01122, audio_tagging_loss=0.01122, over 25000.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4956930.88 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:57:24,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=939760.0, ans=0.1 2023-12-23 03:57:32,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=939826.6666666666, ans=0.125 2023-12-23 03:57:34,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.24 vs. limit=15.0 2023-12-23 03:57:55,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=939960.0, ans=0.1 2023-12-23 03:58:04,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=940026.6666666666, ans=0.5 2023-12-23 03:58:04,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=940026.6666666666, ans=0.125 2023-12-23 03:58:05,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=940026.6666666666, ans=0.2 2023-12-23 03:58:08,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.46 vs. limit=6.0 2023-12-23 03:58:09,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-12-23 03:58:11,691 INFO [train.py:886] (0/4) Epoch 30, batch 2800, loss[loss=0.009603, audio_tagging_loss=0.009603, over 23930.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4958442.47 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:58:13,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=940093.3333333334, ans=0.1 2023-12-23 03:58:13,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=940093.3333333334, ans=0.0 2023-12-23 03:58:20,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=940160.0, ans=0.125 2023-12-23 03:58:20,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.52 vs. limit=12.0 2023-12-23 03:58:48,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=940293.3333333334, ans=0.125 2023-12-23 03:58:49,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=15.0 2023-12-23 03:59:01,824 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.885e+01 3.252e+01 3.401e+01 3.539e+01 4.080e+01, threshold=6.801e+01, percent-clipped=0.0 2023-12-23 03:59:03,684 INFO [train.py:886] (0/4) Epoch 30, batch 2850, loss[loss=0.01227, audio_tagging_loss=0.01227, over 24750.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4957411.47 frames. ], batch size: 99, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:59:26,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=940560.0, ans=0.125 2023-12-23 03:59:51,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=940693.3333333334, ans=0.0 2023-12-23 03:59:54,502 INFO [train.py:886] (0/4) Epoch 30, batch 2900, loss[loss=0.01452, audio_tagging_loss=0.01452, over 25000.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4948762.55 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:00:02,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=940760.0, ans=0.125 2023-12-23 04:00:14,011 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.40 vs. limit=12.0 2023-12-23 04:00:23,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.67 vs. limit=10.0 2023-12-23 04:00:27,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.89 vs. limit=15.0 2023-12-23 04:00:28,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=940960.0, ans=0.1 2023-12-23 04:00:33,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=940960.0, ans=0.09899494936611666 2023-12-23 04:00:43,900 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.852e+01 3.200e+01 3.351e+01 3.510e+01 3.990e+01, threshold=6.702e+01, percent-clipped=0.0 2023-12-23 04:00:45,810 INFO [train.py:886] (0/4) Epoch 30, batch 2950, loss[loss=0.01061, audio_tagging_loss=0.01061, over 25000.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4951587.17 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:00:47,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=941093.3333333334, ans=0.2 2023-12-23 04:01:01,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=941160.0, ans=0.0 2023-12-23 04:01:21,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=941293.3333333334, ans=0.125 2023-12-23 04:01:25,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=941293.3333333334, ans=0.125 2023-12-23 04:01:33,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=941360.0, ans=0.125 2023-12-23 04:01:37,756 INFO [train.py:886] (0/4) Epoch 30, batch 3000, loss[loss=0.01035, audio_tagging_loss=0.01035, over 24750.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4951080.90 frames. ], batch size: 99, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:01:37,758 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 04:01:59,246 INFO [train.py:917] (0/4) Epoch 30, validation: loss=0.03287, audio_tagging_loss=0.03287, over 3737520.00 frames. 2023-12-23 04:01:59,247 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 04:02:08,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=941426.6666666666, ans=0.125 2023-12-23 04:02:12,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=941493.3333333334, ans=0.125 2023-12-23 04:02:15,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=941493.3333333334, ans=0.125 2023-12-23 04:02:38,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=941626.6666666666, ans=0.125 2023-12-23 04:02:48,227 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.895e+01 3.215e+01 3.346e+01 3.573e+01 4.041e+01, threshold=6.693e+01, percent-clipped=0.0 2023-12-23 04:02:49,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=941760.0, ans=0.125 2023-12-23 04:02:50,141 INFO [train.py:886] (0/4) Epoch 30, batch 3050, loss[loss=0.01295, audio_tagging_loss=0.01295, over 25000.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4954453.24 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:02:55,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=941760.0, ans=0.125 2023-12-23 04:03:08,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=941826.6666666666, ans=0.0 2023-12-23 04:03:11,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=941893.3333333334, ans=0.125 2023-12-23 04:03:18,791 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.57 vs. limit=15.0 2023-12-23 04:03:41,696 INFO [train.py:886] (0/4) Epoch 30, batch 3100, loss[loss=0.01221, audio_tagging_loss=0.01221, over 24750.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4949206.64 frames. ], batch size: 99, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:03:58,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.28 vs. limit=10.0 2023-12-23 04:04:04,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=942226.6666666666, ans=0.125 2023-12-23 04:04:04,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.66 vs. limit=15.0 2023-12-23 04:04:23,520 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2023-12-23 04:04:24,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=942360.0, ans=0.1 2023-12-23 04:04:25,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=942360.0, ans=0.2 2023-12-23 04:04:30,574 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.053e+01 3.332e+01 3.451e+01 3.634e+01 4.318e+01, threshold=6.902e+01, percent-clipped=0.0 2023-12-23 04:04:32,514 INFO [train.py:886] (0/4) Epoch 30, batch 3150, loss[loss=0.01186, audio_tagging_loss=0.01186, over 22213.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4942177.20 frames. ], batch size: 107, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:04:48,235 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:04:49,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=942493.3333333334, ans=0.95 2023-12-23 04:04:58,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=942560.0, ans=0.0 2023-12-23 04:05:00,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=942560.0, ans=0.2 2023-12-23 04:05:24,515 INFO [train.py:886] (0/4) Epoch 30, batch 3200, loss[loss=0.01127, audio_tagging_loss=0.01127, over 24750.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4938571.56 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:05:30,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=942760.0, ans=0.125 2023-12-23 04:05:48,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=942893.3333333334, ans=0.1 2023-12-23 04:05:48,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=942893.3333333334, ans=0.125 2023-12-23 04:05:51,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=942893.3333333334, ans=0.0 2023-12-23 04:05:54,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=942960.0, ans=0.1 2023-12-23 04:06:04,242 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-12-23 04:06:12,109 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.760e+01 3.281e+01 3.417e+01 3.634e+01 4.167e+01, threshold=6.835e+01, percent-clipped=0.0 2023-12-23 04:06:14,694 INFO [train.py:886] (0/4) Epoch 30, batch 3250, loss[loss=0.01211, audio_tagging_loss=0.01211, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4940364.03 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:06:19,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=943093.3333333334, ans=0.125 2023-12-23 04:06:27,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=943160.0, ans=0.2 2023-12-23 04:06:44,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=943226.6666666666, ans=0.0 2023-12-23 04:06:54,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2023-12-23 04:06:58,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=943360.0, ans=0.0 2023-12-23 04:07:06,673 INFO [train.py:886] (0/4) Epoch 30, batch 3300, loss[loss=0.01337, audio_tagging_loss=0.01337, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4948797.66 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:07:11,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2023-12-23 04:07:14,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=943426.6666666666, ans=0.125 2023-12-23 04:07:16,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=943493.3333333334, ans=0.125 2023-12-23 04:07:28,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=943560.0, ans=0.0 2023-12-23 04:07:33,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=943560.0, ans=0.0 2023-12-23 04:07:55,900 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.982e+01 3.270e+01 3.395e+01 3.544e+01 4.890e+01, threshold=6.790e+01, percent-clipped=0.0 2023-12-23 04:07:58,513 INFO [train.py:886] (0/4) Epoch 30, batch 3350, loss[loss=0.01173, audio_tagging_loss=0.01173, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4954128.64 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:08:06,551 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=12.0 2023-12-23 04:08:08,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=943826.6666666666, ans=0.0 2023-12-23 04:08:09,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=943826.6666666666, ans=0.125 2023-12-23 04:08:26,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=943893.3333333334, ans=0.125 2023-12-23 04:08:39,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=944026.6666666666, ans=0.0 2023-12-23 04:08:40,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.57 vs. limit=15.0 2023-12-23 04:08:44,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=944026.6666666666, ans=0.0 2023-12-23 04:08:48,480 INFO [train.py:886] (0/4) Epoch 30, batch 3400, loss[loss=0.01209, audio_tagging_loss=0.01209, over 24750.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4957290.27 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:08:49,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=944093.3333333334, ans=0.1 2023-12-23 04:08:56,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2023-12-23 04:09:01,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=944160.0, ans=0.2 2023-12-23 04:09:04,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=944160.0, ans=0.125 2023-12-23 04:09:05,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=944160.0, ans=0.0 2023-12-23 04:09:05,463 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.01 vs. limit=15.0 2023-12-23 04:09:16,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=944226.6666666666, ans=0.0 2023-12-23 04:09:18,565 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.47 vs. limit=10.0 2023-12-23 04:09:22,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=944293.3333333334, ans=0.125 2023-12-23 04:09:22,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=944293.3333333334, ans=6.0 2023-12-23 04:09:23,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=944293.3333333334, ans=0.125 2023-12-23 04:09:37,165 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.991e+01 3.324e+01 3.423e+01 3.624e+01 4.414e+01, threshold=6.845e+01, percent-clipped=0.0 2023-12-23 04:09:39,090 INFO [train.py:886] (0/4) Epoch 30, batch 3450, loss[loss=0.0109, audio_tagging_loss=0.0109, over 24750.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4951031.11 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:09:41,647 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.73 vs. limit=15.0 2023-12-23 04:09:45,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=944426.6666666666, ans=0.2 2023-12-23 04:09:49,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=944493.3333333334, ans=0.07 2023-12-23 04:09:52,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=944493.3333333334, ans=0.125 2023-12-23 04:09:56,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=944493.3333333334, ans=0.125 2023-12-23 04:10:08,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=944626.6666666666, ans=0.0 2023-12-23 04:10:20,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=944693.3333333334, ans=22.5 2023-12-23 04:10:30,256 INFO [train.py:886] (0/4) Epoch 30, batch 3500, loss[loss=0.01285, audio_tagging_loss=0.01285, over 25000.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4949054.44 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:10:49,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=944893.3333333334, ans=0.125 2023-12-23 04:10:50,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=944893.3333333334, ans=0.025 2023-12-23 04:10:52,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.39 vs. limit=6.0 2023-12-23 04:10:54,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=944893.3333333334, ans=0.0 2023-12-23 04:11:19,704 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.755e+01 3.226e+01 3.344e+01 3.619e+01 4.102e+01, threshold=6.688e+01, percent-clipped=0.0 2023-12-23 04:11:21,608 INFO [train.py:886] (0/4) Epoch 30, batch 3550, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4946007.86 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:11:38,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=945160.0, ans=0.0 2023-12-23 04:12:00,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=945293.3333333334, ans=0.125 2023-12-23 04:12:14,348 INFO [train.py:886] (0/4) Epoch 30, batch 3600, loss[loss=0.01446, audio_tagging_loss=0.01446, over 25000.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4944994.26 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:12:16,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.19 vs. limit=22.5 2023-12-23 04:12:29,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=945493.3333333334, ans=0.0 2023-12-23 04:12:41,143 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2023-12-23 04:12:47,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2023-12-23 04:12:52,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=945626.6666666666, ans=0.125 2023-12-23 04:12:56,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=945693.3333333334, ans=0.0 2023-12-23 04:13:02,252 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.867e+01 3.251e+01 3.392e+01 3.502e+01 4.191e+01, threshold=6.784e+01, percent-clipped=0.0 2023-12-23 04:13:04,186 INFO [train.py:886] (0/4) Epoch 30, batch 3650, loss[loss=0.01058, audio_tagging_loss=0.01058, over 24015.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4948502.60 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:13:10,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=945760.0, ans=0.0 2023-12-23 04:13:21,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=945826.6666666666, ans=0.0 2023-12-23 04:13:26,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=945893.3333333334, ans=0.125 2023-12-23 04:13:33,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=945893.3333333334, ans=0.125 2023-12-23 04:13:36,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=945960.0, ans=0.0 2023-12-23 04:13:37,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=945960.0, ans=0.125 2023-12-23 04:13:40,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.73 vs. limit=6.0 2023-12-23 04:13:42,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=945960.0, ans=0.125 2023-12-23 04:13:57,396 INFO [train.py:886] (0/4) Epoch 30, batch 3700, loss[loss=0.01326, audio_tagging_loss=0.01326, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4950193.42 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:14:05,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.18 vs. limit=22.5 2023-12-23 04:14:07,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2023-12-23 04:14:10,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=946160.0, ans=0.125 2023-12-23 04:14:12,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=946160.0, ans=0.1 2023-12-23 04:14:13,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=946160.0, ans=0.0 2023-12-23 04:14:28,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=946293.3333333334, ans=0.125 2023-12-23 04:14:38,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=946360.0, ans=0.125 2023-12-23 04:14:45,607 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.920e+01 3.248e+01 3.376e+01 3.563e+01 4.172e+01, threshold=6.751e+01, percent-clipped=0.0 2023-12-23 04:14:47,531 INFO [train.py:886] (0/4) Epoch 30, batch 3750, loss[loss=0.01383, audio_tagging_loss=0.01383, over 24750.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4949070.77 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:14:55,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=946426.6666666666, ans=0.125 2023-12-23 04:15:07,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=946560.0, ans=0.0 2023-12-23 04:15:11,811 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:15:22,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=946626.6666666666, ans=0.125 2023-12-23 04:15:33,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=946693.3333333334, ans=0.0 2023-12-23 04:15:34,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=946693.3333333334, ans=0.0 2023-12-23 04:15:39,174 INFO [train.py:886] (0/4) Epoch 30, batch 3800, loss[loss=0.01345, audio_tagging_loss=0.01345, over 24750.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4946895.24 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:15:41,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=946760.0, ans=0.125 2023-12-23 04:15:51,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-12-23 04:15:56,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2023-12-23 04:16:00,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=946893.3333333334, ans=0.0 2023-12-23 04:16:07,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=946893.3333333334, ans=0.0 2023-12-23 04:16:29,393 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.901e+01 3.289e+01 3.427e+01 3.573e+01 5.060e+01, threshold=6.854e+01, percent-clipped=0.0 2023-12-23 04:16:31,292 INFO [train.py:886] (0/4) Epoch 30, batch 3850, loss[loss=0.01197, audio_tagging_loss=0.01197, over 24750.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4946485.24 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:16:40,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=947160.0, ans=0.0 2023-12-23 04:16:47,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=947160.0, ans=10.0 2023-12-23 04:16:56,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=947226.6666666666, ans=0.0 2023-12-23 04:16:57,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.63 vs. limit=10.0 2023-12-23 04:17:01,864 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:17:11,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=947360.0, ans=0.0 2023-12-23 04:17:13,182 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2023-12-23 04:17:19,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=947360.0, ans=0.125 2023-12-23 04:17:20,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=947360.0, ans=0.1 2023-12-23 04:17:22,777 INFO [train.py:886] (0/4) Epoch 30, batch 3900, loss[loss=0.01118, audio_tagging_loss=0.01118, over 25000.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4946835.01 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:17:28,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=947426.6666666666, ans=0.2 2023-12-23 04:17:29,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=947426.6666666666, ans=0.0 2023-12-23 04:17:30,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=947426.6666666666, ans=0.05 2023-12-23 04:17:42,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=947493.3333333334, ans=0.1 2023-12-23 04:17:48,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2023-12-23 04:17:48,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.35 vs. limit=22.5 2023-12-23 04:17:57,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=947626.6666666666, ans=0.125 2023-12-23 04:18:04,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=947693.3333333334, ans=0.125 2023-12-23 04:18:12,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.60 vs. limit=22.5 2023-12-23 04:18:12,358 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.816e+01 3.238e+01 3.413e+01 3.556e+01 4.213e+01, threshold=6.826e+01, percent-clipped=0.0 2023-12-23 04:18:13,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2023-12-23 04:18:14,272 INFO [train.py:886] (0/4) Epoch 30, batch 3950, loss[loss=0.01327, audio_tagging_loss=0.01327, over 25000.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4948307.43 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:18:17,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=947760.0, ans=0.5 2023-12-23 04:18:18,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=947760.0, ans=0.2 2023-12-23 04:18:26,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=947826.6666666666, ans=0.125 2023-12-23 04:18:34,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=947826.6666666666, ans=0.125 2023-12-23 04:18:34,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=947826.6666666666, ans=0.125 2023-12-23 04:18:42,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=947893.3333333334, ans=0.125 2023-12-23 04:18:43,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=947893.3333333334, ans=0.0 2023-12-23 04:19:07,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=948093.3333333334, ans=0.0 2023-12-23 04:19:07,785 INFO [train.py:886] (0/4) Epoch 30, batch 4000, loss[loss=0.01447, audio_tagging_loss=0.01447, over 25000.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4947715.69 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 128.0 2023-12-23 04:19:17,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=948160.0, ans=0.1 2023-12-23 04:19:23,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=948160.0, ans=0.125 2023-12-23 04:19:57,729 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.995e+01 3.306e+01 3.420e+01 3.603e+01 4.772e+01, threshold=6.841e+01, percent-clipped=0.0 2023-12-23 04:19:58,688 INFO [train.py:886] (0/4) Epoch 30, batch 4050, loss[loss=0.01245, audio_tagging_loss=0.01245, over 24750.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4950813.93 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:20:04,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=948426.6666666666, ans=0.125 2023-12-23 04:20:13,420 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.46 vs. limit=15.0 2023-12-23 04:20:36,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.96 vs. limit=10.0 2023-12-23 04:20:41,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=948693.3333333334, ans=0.125 2023-12-23 04:20:42,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=948693.3333333334, ans=0.1 2023-12-23 04:20:51,069 INFO [train.py:886] (0/4) Epoch 30, batch 4100, loss[loss=0.01198, audio_tagging_loss=0.01198, over 24750.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4942750.49 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:20:52,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=948760.0, ans=0.2 2023-12-23 04:20:54,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=948760.0, ans=0.125 2023-12-23 04:20:54,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=948760.0, ans=0.125 2023-12-23 04:20:58,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=948760.0, ans=0.0 2023-12-23 04:21:15,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=948893.3333333334, ans=0.1 2023-12-23 04:21:22,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=948960.0, ans=0.1 2023-12-23 04:21:25,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=948960.0, ans=0.125 2023-12-23 04:21:25,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=948960.0, ans=0.2 2023-12-23 04:21:41,579 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.94 vs. limit=12.0 2023-12-23 04:21:41,983 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.936e+01 3.264e+01 3.402e+01 3.586e+01 4.088e+01, threshold=6.804e+01, percent-clipped=0.0 2023-12-23 04:21:43,646 INFO [train.py:886] (0/4) Epoch 30, batch 4150, loss[loss=0.01191, audio_tagging_loss=0.01191, over 22656.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4939352.53 frames. ], batch size: 107, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:21:45,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=949093.3333333334, ans=0.2 2023-12-23 04:21:52,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=949093.3333333334, ans=0.125 2023-12-23 04:21:58,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=949160.0, ans=0.035 2023-12-23 04:21:58,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=949160.0, ans=0.2 2023-12-23 04:21:59,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=949160.0, ans=0.1 2023-12-23 04:22:24,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=949360.0, ans=0.125 2023-12-23 04:22:34,544 INFO [train.py:886] (0/4) Epoch 30, batch 4200, loss[loss=0.01471, audio_tagging_loss=0.01471, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4945863.02 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:23:05,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=949626.6666666666, ans=0.125 2023-12-23 04:23:06,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=949626.6666666666, ans=0.07 2023-12-23 04:23:25,798 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.859e+01 3.208e+01 3.393e+01 3.521e+01 4.162e+01, threshold=6.786e+01, percent-clipped=0.0 2023-12-23 04:23:26,749 INFO [train.py:886] (0/4) Epoch 30, batch 4250, loss[loss=0.01302, audio_tagging_loss=0.01302, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4945966.63 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:23:29,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=949760.0, ans=0.125 2023-12-23 04:23:45,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=949826.6666666666, ans=15.0 2023-12-23 04:23:46,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-12-23 04:24:17,244 INFO [train.py:886] (0/4) Epoch 30, batch 4300, loss[loss=0.01244, audio_tagging_loss=0.01244, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4952857.27 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:24:20,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=950093.3333333334, ans=0.125 2023-12-23 04:24:26,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=950093.3333333334, ans=0.0 2023-12-23 04:24:30,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=950160.0, ans=0.1 2023-12-23 04:24:54,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=950293.3333333334, ans=0.025 2023-12-23 04:24:55,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=950293.3333333334, ans=0.2 2023-12-23 04:25:08,196 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.009e+01 3.316e+01 3.415e+01 3.581e+01 5.246e+01, threshold=6.831e+01, percent-clipped=0.0 2023-12-23 04:25:09,180 INFO [train.py:886] (0/4) Epoch 30, batch 4350, loss[loss=0.01594, audio_tagging_loss=0.01594, over 24750.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4959923.01 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:25:11,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=950426.6666666666, ans=0.2 2023-12-23 04:26:02,660 INFO [train.py:886] (0/4) Epoch 30, batch 4400, loss[loss=0.01437, audio_tagging_loss=0.01437, over 24750.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4951823.40 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:26:15,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=950826.6666666666, ans=0.1 2023-12-23 04:26:16,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=950826.6666666666, ans=0.125 2023-12-23 04:26:20,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2023-12-23 04:26:21,009 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=12.0 2023-12-23 04:26:22,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=950893.3333333334, ans=0.0 2023-12-23 04:26:26,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=950893.3333333334, ans=0.125 2023-12-23 04:26:35,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=950960.0, ans=0.1 2023-12-23 04:26:37,237 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:26:46,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=951026.6666666666, ans=0.125 2023-12-23 04:26:50,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.44 vs. limit=22.5 2023-12-23 04:26:51,211 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.941e+01 3.268e+01 3.436e+01 3.608e+01 4.178e+01, threshold=6.872e+01, percent-clipped=0.0 2023-12-23 04:26:52,192 INFO [train.py:886] (0/4) Epoch 30, batch 4450, loss[loss=0.01488, audio_tagging_loss=0.01488, over 22371.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4944444.38 frames. ], batch size: 107, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:27:18,841 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:27:26,504 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:27:45,071 INFO [train.py:886] (0/4) Epoch 30, batch 4500, loss[loss=0.01183, audio_tagging_loss=0.01183, over 24750.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4947552.59 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:27:46,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=951426.6666666666, ans=0.125 2023-12-23 04:27:55,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.71 vs. limit=22.5 2023-12-23 04:28:08,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=951560.0, ans=0.2 2023-12-23 04:28:09,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.54 vs. limit=22.5 2023-12-23 04:28:21,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=951626.6666666666, ans=0.125 2023-12-23 04:28:29,362 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:28:34,497 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.987e+01 3.270e+01 3.347e+01 3.646e+01 4.061e+01, threshold=6.694e+01, percent-clipped=0.0 2023-12-23 04:28:35,441 INFO [train.py:886] (0/4) Epoch 30, batch 4550, loss[loss=0.0135, audio_tagging_loss=0.0135, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4948945.35 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:28:39,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=951760.0, ans=0.0 2023-12-23 04:28:53,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=951826.6666666666, ans=0.1 2023-12-23 04:29:22,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=952026.6666666666, ans=0.0 2023-12-23 04:29:27,426 INFO [train.py:886] (0/4) Epoch 30, batch 4600, loss[loss=0.01192, audio_tagging_loss=0.01192, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4953491.61 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:29:36,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2023-12-23 04:29:44,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=952160.0, ans=0.125 2023-12-23 04:29:46,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=952160.0, ans=0.1 2023-12-23 04:30:06,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.67 vs. limit=22.5 2023-12-23 04:30:14,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.29 vs. limit=10.0 2023-12-23 04:30:19,065 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.884e+01 3.239e+01 3.388e+01 3.549e+01 4.736e+01, threshold=6.775e+01, percent-clipped=0.0 2023-12-23 04:30:20,032 INFO [train.py:886] (0/4) Epoch 30, batch 4650, loss[loss=0.01165, audio_tagging_loss=0.01165, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4953881.30 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:30:25,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=952426.6666666666, ans=0.05 2023-12-23 04:30:28,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=952426.6666666666, ans=0.0 2023-12-23 04:30:35,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=952493.3333333334, ans=0.1 2023-12-23 04:30:39,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=952560.0, ans=0.0 2023-12-23 04:31:05,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=952693.3333333334, ans=0.125 2023-12-23 04:31:08,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=952693.3333333334, ans=0.125 2023-12-23 04:31:10,386 INFO [train.py:886] (0/4) Epoch 30, batch 4700, loss[loss=0.01185, audio_tagging_loss=0.01185, over 24750.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4953759.99 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:31:15,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=952760.0, ans=0.125 2023-12-23 04:31:15,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=952760.0, ans=0.125 2023-12-23 04:31:26,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=952826.6666666666, ans=0.0 2023-12-23 04:31:26,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.86 vs. limit=8.0 2023-12-23 04:31:37,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=952960.0, ans=0.04949747468305833 2023-12-23 04:31:42,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=952960.0, ans=0.1 2023-12-23 04:31:47,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=952960.0, ans=0.1 2023-12-23 04:31:49,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=953026.6666666666, ans=0.0 2023-12-23 04:31:56,727 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.041e+01 3.362e+01 3.489e+01 3.665e+01 4.317e+01, threshold=6.977e+01, percent-clipped=0.0 2023-12-23 04:31:57,646 INFO [train.py:886] (0/4) Epoch 30, batch 4750, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4951830.81 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:32:02,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=12.0 2023-12-23 04:32:13,201 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-30.pt 2023-12-23 04:32:32,841 INFO [train.py:886] (0/4) Epoch 31, batch 0, loss[loss=0.02668, audio_tagging_loss=0.02668, over 25000.00 frames. ], tot_loss[loss=0.02668, audio_tagging_loss=0.02668, over 25000.00 frames. ], batch size: 100, lr: 3.51e-03, grad_scale: 32.0 2023-12-23 04:32:32,842 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 04:32:44,351 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.6788, 2.8663, 2.4806, 2.2748, 3.8316, 3.3834, 4.0464, 2.3378], device='cuda:0') 2023-12-23 04:32:44,969 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3988, 3.4474, 3.7318, 4.0137], device='cuda:0') 2023-12-23 04:32:54,307 INFO [train.py:917] (0/4) Epoch 31, validation: loss=0.03297, audio_tagging_loss=0.03297, over 3737520.00 frames. 2023-12-23 04:32:54,308 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 04:32:56,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=953200.0, ans=0.125 2023-12-23 04:33:01,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=953200.0, ans=0.125 2023-12-23 04:33:10,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2023-12-23 04:33:10,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2023-12-23 04:33:26,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=953400.0, ans=0.125 2023-12-23 04:33:30,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=953400.0, ans=0.07 2023-12-23 04:33:31,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=953400.0, ans=0.0 2023-12-23 04:33:40,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=953466.6666666666, ans=0.125 2023-12-23 04:33:42,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=953466.6666666666, ans=0.0 2023-12-23 04:33:44,884 INFO [train.py:886] (0/4) Epoch 31, batch 50, loss[loss=0.01597, audio_tagging_loss=0.01597, over 25000.00 frames. ], tot_loss[loss=0.01949, audio_tagging_loss=0.01949, over 1114298.85 frames. ], batch size: 100, lr: 3.51e-03, grad_scale: 32.0 2023-12-23 04:33:46,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=953533.3333333334, ans=0.0 2023-12-23 04:34:13,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2023-12-23 04:34:16,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=953733.3333333334, ans=0.125 2023-12-23 04:34:16,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=953733.3333333334, ans=0.125 2023-12-23 04:34:17,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=953733.3333333334, ans=0.0 2023-12-23 04:34:20,595 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.241e+01 3.845e+01 4.121e+01 4.670e+01 9.872e+01, threshold=8.242e+01, percent-clipped=8.0 2023-12-23 04:34:20,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=953733.3333333334, ans=0.125 2023-12-23 04:34:37,994 INFO [train.py:886] (0/4) Epoch 31, batch 100, loss[loss=0.01382, audio_tagging_loss=0.01382, over 25000.00 frames. ], tot_loss[loss=0.01703, audio_tagging_loss=0.01703, over 1968503.20 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:35:01,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=954000.0, ans=0.125 2023-12-23 04:35:11,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2023-12-23 04:35:20,148 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:35:22,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=954133.3333333334, ans=0.0 2023-12-23 04:35:29,354 INFO [train.py:886] (0/4) Epoch 31, batch 150, loss[loss=0.0143, audio_tagging_loss=0.0143, over 25000.00 frames. ], tot_loss[loss=0.01577, audio_tagging_loss=0.01577, over 2634378.33 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:35:29,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=954200.0, ans=0.125 2023-12-23 04:35:40,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=954266.6666666666, ans=0.2 2023-12-23 04:35:44,064 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:36:05,256 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.100e+01 3.397e+01 3.570e+01 3.705e+01 4.340e+01, threshold=7.141e+01, percent-clipped=0.0 2023-12-23 04:36:10,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=954466.6666666666, ans=0.2 2023-12-23 04:36:20,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=954466.6666666666, ans=0.125 2023-12-23 04:36:22,031 INFO [train.py:886] (0/4) Epoch 31, batch 200, loss[loss=0.01289, audio_tagging_loss=0.01289, over 25000.00 frames. ], tot_loss[loss=0.01477, audio_tagging_loss=0.01477, over 3146030.02 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:36:24,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=954533.3333333334, ans=0.0 2023-12-23 04:36:33,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.76 vs. limit=15.0 2023-12-23 04:37:14,758 INFO [train.py:886] (0/4) Epoch 31, batch 250, loss[loss=0.01276, audio_tagging_loss=0.01276, over 25000.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 3552864.79 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:37:31,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.13 vs. limit=12.0 2023-12-23 04:37:35,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=955000.0, ans=0.125 2023-12-23 04:37:50,812 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.801e+01 3.236e+01 3.404e+01 3.581e+01 4.137e+01, threshold=6.809e+01, percent-clipped=0.0 2023-12-23 04:38:06,946 INFO [train.py:886] (0/4) Epoch 31, batch 300, loss[loss=0.01328, audio_tagging_loss=0.01328, over 24750.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 3859152.95 frames. ], batch size: 99, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:38:23,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=12.0 2023-12-23 04:38:27,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=955333.3333333334, ans=0.0 2023-12-23 04:38:38,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=955400.0, ans=0.05 2023-12-23 04:38:59,472 INFO [train.py:886] (0/4) Epoch 31, batch 350, loss[loss=0.009777, audio_tagging_loss=0.009777, over 24750.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4097941.79 frames. ], batch size: 99, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:39:08,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=955533.3333333334, ans=0.0 2023-12-23 04:39:11,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=955600.0, ans=0.0 2023-12-23 04:39:35,045 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.893e+01 3.321e+01 3.436e+01 3.601e+01 4.104e+01, threshold=6.873e+01, percent-clipped=0.0 2023-12-23 04:39:37,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=955733.3333333334, ans=0.07 2023-12-23 04:39:38,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=955733.3333333334, ans=0.07 2023-12-23 04:39:39,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.02 vs. limit=6.0 2023-12-23 04:39:44,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=955800.0, ans=0.1 2023-12-23 04:39:52,384 INFO [train.py:886] (0/4) Epoch 31, batch 400, loss[loss=0.01292, audio_tagging_loss=0.01292, over 25000.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4288240.26 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:40:21,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=956000.0, ans=0.5 2023-12-23 04:40:24,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-12-23 04:40:29,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=956066.6666666666, ans=0.125 2023-12-23 04:40:34,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=956133.3333333334, ans=0.125 2023-12-23 04:40:36,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=956133.3333333334, ans=0.1 2023-12-23 04:40:38,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=956133.3333333334, ans=0.125 2023-12-23 04:40:39,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=956133.3333333334, ans=0.5 2023-12-23 04:40:44,907 INFO [train.py:886] (0/4) Epoch 31, batch 450, loss[loss=0.01413, audio_tagging_loss=0.01413, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4437238.97 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:41:20,614 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.863e+01 3.195e+01 3.396e+01 3.625e+01 4.055e+01, threshold=6.792e+01, percent-clipped=0.0 2023-12-23 04:41:21,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=956400.0, ans=0.125 2023-12-23 04:41:32,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=956466.6666666666, ans=0.5 2023-12-23 04:41:38,114 INFO [train.py:886] (0/4) Epoch 31, batch 500, loss[loss=0.01427, audio_tagging_loss=0.01427, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4547092.00 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:41:39,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=956533.3333333334, ans=0.125 2023-12-23 04:41:44,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=956533.3333333334, ans=0.125 2023-12-23 04:41:51,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=956600.0, ans=0.0 2023-12-23 04:41:52,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=956600.0, ans=0.125 2023-12-23 04:42:01,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=956666.6666666666, ans=0.0 2023-12-23 04:42:21,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=956800.0, ans=0.125 2023-12-23 04:42:24,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=956800.0, ans=0.125 2023-12-23 04:42:30,462 INFO [train.py:886] (0/4) Epoch 31, batch 550, loss[loss=0.01404, audio_tagging_loss=0.01404, over 25000.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4639680.00 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:42:30,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=956866.6666666666, ans=0.125 2023-12-23 04:43:05,867 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.926e+01 3.280e+01 3.449e+01 3.593e+01 5.125e+01, threshold=6.898e+01, percent-clipped=0.0 2023-12-23 04:43:06,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=957066.6666666666, ans=0.1 2023-12-23 04:43:11,549 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:43:22,655 INFO [train.py:886] (0/4) Epoch 31, batch 600, loss[loss=0.01298, audio_tagging_loss=0.01298, over 24750.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4708236.02 frames. ], batch size: 99, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:43:24,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=957200.0, ans=0.07 2023-12-23 04:43:24,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=957200.0, ans=0.0 2023-12-23 04:43:32,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=957266.6666666666, ans=0.125 2023-12-23 04:43:57,436 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:44:04,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=957466.6666666666, ans=0.025 2023-12-23 04:44:12,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2023-12-23 04:44:15,883 INFO [train.py:886] (0/4) Epoch 31, batch 650, loss[loss=0.01646, audio_tagging_loss=0.01646, over 24750.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4758005.85 frames. ], batch size: 99, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:44:18,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=957533.3333333334, ans=0.035 2023-12-23 04:44:26,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=957600.0, ans=0.125 2023-12-23 04:44:37,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=957666.6666666666, ans=0.125 2023-12-23 04:44:41,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=957666.6666666666, ans=0.2 2023-12-23 04:44:49,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=957733.3333333334, ans=0.125 2023-12-23 04:44:51,803 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.011e+01 3.328e+01 3.481e+01 3.624e+01 4.466e+01, threshold=6.961e+01, percent-clipped=0.0 2023-12-23 04:45:06,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.53 vs. limit=22.5 2023-12-23 04:45:07,155 INFO [train.py:886] (0/4) Epoch 31, batch 700, loss[loss=0.01409, audio_tagging_loss=0.01409, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4796361.26 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:45:11,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=957866.6666666666, ans=0.0 2023-12-23 04:45:29,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=958000.0, ans=0.0 2023-12-23 04:45:32,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=958000.0, ans=0.125 2023-12-23 04:45:34,758 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2023-12-23 04:45:58,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=958133.3333333334, ans=0.07 2023-12-23 04:46:00,015 INFO [train.py:886] (0/4) Epoch 31, batch 750, loss[loss=0.01181, audio_tagging_loss=0.01181, over 25000.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4831578.57 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:46:04,441 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=22.5 2023-12-23 04:46:05,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=958200.0, ans=0.0 2023-12-23 04:46:07,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=958200.0, ans=0.125 2023-12-23 04:46:16,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=958266.6666666666, ans=0.125 2023-12-23 04:46:20,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=958333.3333333334, ans=0.125 2023-12-23 04:46:31,381 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.83 vs. limit=10.0 2023-12-23 04:46:32,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=958400.0, ans=0.0 2023-12-23 04:46:35,659 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.016e+01 3.246e+01 3.400e+01 3.570e+01 4.142e+01, threshold=6.799e+01, percent-clipped=0.0 2023-12-23 04:46:37,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=958400.0, ans=0.1 2023-12-23 04:46:39,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=958400.0, ans=0.125 2023-12-23 04:46:39,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=958400.0, ans=0.0 2023-12-23 04:46:52,361 INFO [train.py:886] (0/4) Epoch 31, batch 800, loss[loss=0.01203, audio_tagging_loss=0.01203, over 25000.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4859275.28 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:47:01,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=958533.3333333334, ans=0.1 2023-12-23 04:47:11,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=958600.0, ans=0.125 2023-12-23 04:47:13,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=958666.6666666666, ans=0.125 2023-12-23 04:47:27,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=958733.3333333334, ans=0.125 2023-12-23 04:47:44,841 INFO [train.py:886] (0/4) Epoch 31, batch 850, loss[loss=0.01252, audio_tagging_loss=0.01252, over 25000.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4882103.29 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:48:19,407 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.919e+01 3.282e+01 3.408e+01 3.536e+01 4.077e+01, threshold=6.816e+01, percent-clipped=0.0 2023-12-23 04:48:20,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.58 vs. limit=22.5 2023-12-23 04:48:36,938 INFO [train.py:886] (0/4) Epoch 31, batch 900, loss[loss=0.01434, audio_tagging_loss=0.01434, over 24929.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4897946.48 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:48:44,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.52 vs. limit=10.0 2023-12-23 04:48:50,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=959266.6666666666, ans=0.125 2023-12-23 04:49:05,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=959333.3333333334, ans=0.125 2023-12-23 04:49:14,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=959400.0, ans=0.125 2023-12-23 04:49:17,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=959466.6666666666, ans=0.2 2023-12-23 04:49:17,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=959466.6666666666, ans=0.0 2023-12-23 04:49:26,584 INFO [train.py:886] (0/4) Epoch 31, batch 950, loss[loss=0.01387, audio_tagging_loss=0.01387, over 24750.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4906082.81 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:49:46,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=959600.0, ans=0.125 2023-12-23 04:49:50,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=959666.6666666666, ans=0.125 2023-12-23 04:49:58,886 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.49 vs. limit=12.0 2023-12-23 04:50:02,955 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.982e+01 3.282e+01 3.466e+01 3.627e+01 4.372e+01, threshold=6.931e+01, percent-clipped=0.0 2023-12-23 04:50:15,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=959800.0, ans=0.0 2023-12-23 04:50:16,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=959800.0, ans=0.125 2023-12-23 04:50:20,521 INFO [train.py:886] (0/4) Epoch 31, batch 1000, loss[loss=0.01192, audio_tagging_loss=0.01192, over 24750.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4913872.97 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:50:35,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=959933.3333333334, ans=0.1 2023-12-23 04:50:36,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2023-12-23 04:50:40,746 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-144000.pt 2023-12-23 04:50:53,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=960000.0, ans=0.035 2023-12-23 04:51:00,100 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=12.0 2023-12-23 04:51:03,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=960133.3333333334, ans=0.0 2023-12-23 04:51:16,530 INFO [train.py:886] (0/4) Epoch 31, batch 1050, loss[loss=0.01074, audio_tagging_loss=0.01074, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4918961.95 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:51:27,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=960266.6666666666, ans=0.0 2023-12-23 04:51:29,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=960266.6666666666, ans=0.125 2023-12-23 04:51:38,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=960333.3333333334, ans=0.125 2023-12-23 04:51:44,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=960333.3333333334, ans=0.1 2023-12-23 04:51:52,740 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.955e+01 3.261e+01 3.393e+01 3.606e+01 4.316e+01, threshold=6.786e+01, percent-clipped=0.0 2023-12-23 04:51:59,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2023-12-23 04:52:00,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=960466.6666666666, ans=0.0 2023-12-23 04:52:07,919 INFO [train.py:886] (0/4) Epoch 31, batch 1100, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4927378.29 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:52:08,478 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.48 vs. limit=22.5 2023-12-23 04:52:11,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=960533.3333333334, ans=0.0 2023-12-23 04:52:19,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=960600.0, ans=0.0 2023-12-23 04:52:31,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=960666.6666666666, ans=0.125 2023-12-23 04:53:01,555 INFO [train.py:886] (0/4) Epoch 31, batch 1150, loss[loss=0.01375, audio_tagging_loss=0.01375, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4926030.82 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:53:02,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=15.0 2023-12-23 04:53:11,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=960933.3333333334, ans=0.1 2023-12-23 04:53:26,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=961000.0, ans=0.125 2023-12-23 04:53:36,112 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.954e+01 3.303e+01 3.399e+01 3.563e+01 3.907e+01, threshold=6.798e+01, percent-clipped=0.0 2023-12-23 04:53:42,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=961133.3333333334, ans=0.0 2023-12-23 04:53:46,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=961133.3333333334, ans=0.1 2023-12-23 04:53:51,375 INFO [train.py:886] (0/4) Epoch 31, batch 1200, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4930823.16 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:54:05,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=961266.6666666666, ans=0.0 2023-12-23 04:54:07,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2023-12-23 04:54:14,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-12-23 04:54:32,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.34 vs. limit=8.0 2023-12-23 04:54:44,882 INFO [train.py:886] (0/4) Epoch 31, batch 1250, loss[loss=0.01297, audio_tagging_loss=0.01297, over 24750.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4930487.49 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:54:46,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=961533.3333333334, ans=0.1 2023-12-23 04:55:14,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=961666.6666666666, ans=0.125 2023-12-23 04:55:17,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=961733.3333333334, ans=0.0 2023-12-23 04:55:19,942 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.041e+01 3.396e+01 3.498e+01 3.625e+01 4.153e+01, threshold=6.995e+01, percent-clipped=0.0 2023-12-23 04:55:36,755 INFO [train.py:886] (0/4) Epoch 31, batch 1300, loss[loss=0.01264, audio_tagging_loss=0.01264, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4935933.70 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:55:39,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=961866.6666666666, ans=0.0 2023-12-23 04:55:50,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=961933.3333333334, ans=0.125 2023-12-23 04:55:54,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=961933.3333333334, ans=0.125 2023-12-23 04:56:11,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=962066.6666666666, ans=0.125 2023-12-23 04:56:17,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=962066.6666666666, ans=0.2 2023-12-23 04:56:18,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=962133.3333333334, ans=0.125 2023-12-23 04:56:28,654 INFO [train.py:886] (0/4) Epoch 31, batch 1350, loss[loss=0.01179, audio_tagging_loss=0.01179, over 25000.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4936904.01 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:56:29,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=962200.0, ans=0.035 2023-12-23 04:56:39,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=962266.6666666666, ans=0.2 2023-12-23 04:57:04,304 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.922e+01 3.247e+01 3.434e+01 3.583e+01 4.287e+01, threshold=6.867e+01, percent-clipped=0.0 2023-12-23 04:57:05,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=962400.0, ans=0.1 2023-12-23 04:57:17,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=962466.6666666666, ans=0.2 2023-12-23 04:57:22,571 INFO [train.py:886] (0/4) Epoch 31, batch 1400, loss[loss=0.01222, audio_tagging_loss=0.01222, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4942279.06 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:57:31,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=962600.0, ans=0.0 2023-12-23 04:57:32,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=962600.0, ans=0.125 2023-12-23 04:57:33,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=962600.0, ans=0.125 2023-12-23 04:57:35,176 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:57:39,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=962600.0, ans=0.125 2023-12-23 04:58:14,715 INFO [train.py:886] (0/4) Epoch 31, batch 1450, loss[loss=0.0119, audio_tagging_loss=0.0119, over 25000.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4940981.73 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:58:21,261 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=12.0 2023-12-23 04:58:30,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=962933.3333333334, ans=0.125 2023-12-23 04:58:32,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=962933.3333333334, ans=0.125 2023-12-23 04:58:33,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=962933.3333333334, ans=0.0 2023-12-23 04:58:49,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=963066.6666666666, ans=0.0 2023-12-23 04:58:50,672 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.018e+01 3.252e+01 3.359e+01 3.465e+01 3.821e+01, threshold=6.718e+01, percent-clipped=0.0 2023-12-23 04:58:50,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=963066.6666666666, ans=0.125 2023-12-23 04:58:54,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=963066.6666666666, ans=0.0 2023-12-23 04:59:00,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=963133.3333333334, ans=0.125 2023-12-23 04:59:07,363 INFO [train.py:886] (0/4) Epoch 31, batch 1500, loss[loss=0.01448, audio_tagging_loss=0.01448, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4952142.09 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:59:15,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=963200.0, ans=0.125 2023-12-23 04:59:19,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2023-12-23 04:59:27,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=963333.3333333334, ans=0.0 2023-12-23 04:59:35,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=963333.3333333334, ans=0.1 2023-12-23 04:59:47,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-12-23 04:59:56,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=963466.6666666666, ans=0.1 2023-12-23 04:59:57,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=963466.6666666666, ans=0.1 2023-12-23 04:59:59,916 INFO [train.py:886] (0/4) Epoch 31, batch 1550, loss[loss=0.01542, audio_tagging_loss=0.01542, over 24750.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4951034.49 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 05:00:19,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=963666.6666666666, ans=0.0 2023-12-23 05:00:27,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=963666.6666666666, ans=0.0 2023-12-23 05:00:28,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2023-12-23 05:00:31,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=963733.3333333334, ans=0.1 2023-12-23 05:00:35,105 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.972e+01 3.345e+01 3.487e+01 3.654e+01 4.145e+01, threshold=6.974e+01, percent-clipped=0.0 2023-12-23 05:00:37,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=963733.3333333334, ans=0.0 2023-12-23 05:00:50,978 INFO [train.py:886] (0/4) Epoch 31, batch 1600, loss[loss=0.01077, audio_tagging_loss=0.01077, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4940597.26 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 05:01:05,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=963933.3333333334, ans=0.125 2023-12-23 05:01:06,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.93 vs. limit=15.0 2023-12-23 05:01:33,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=964133.3333333334, ans=0.07 2023-12-23 05:01:43,457 INFO [train.py:886] (0/4) Epoch 31, batch 1650, loss[loss=0.01211, audio_tagging_loss=0.01211, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4939699.63 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 05:01:43,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=964200.0, ans=0.1 2023-12-23 05:01:46,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=964200.0, ans=0.125 2023-12-23 05:01:53,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=964266.6666666666, ans=0.0 2023-12-23 05:01:54,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=964266.6666666666, ans=0.125 2023-12-23 05:01:58,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2023-12-23 05:02:00,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=964266.6666666666, ans=0.125 2023-12-23 05:02:04,106 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.59 vs. limit=15.0 2023-12-23 05:02:07,676 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2023-12-23 05:02:09,663 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=15.0 2023-12-23 05:02:14,467 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.54 vs. limit=12.0 2023-12-23 05:02:18,491 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.893e+01 3.266e+01 3.423e+01 3.548e+01 4.336e+01, threshold=6.845e+01, percent-clipped=0.0 2023-12-23 05:02:20,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.74 vs. limit=22.5 2023-12-23 05:02:33,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=964466.6666666666, ans=0.0 2023-12-23 05:02:35,988 INFO [train.py:886] (0/4) Epoch 31, batch 1700, loss[loss=0.01095, audio_tagging_loss=0.01095, over 24750.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4942842.20 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 05:02:58,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=964666.6666666666, ans=0.1 2023-12-23 05:03:00,754 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.13 vs. limit=10.0 2023-12-23 05:03:06,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=964733.3333333334, ans=0.0 2023-12-23 05:03:27,706 INFO [train.py:886] (0/4) Epoch 31, batch 1750, loss[loss=0.01468, audio_tagging_loss=0.01468, over 25000.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4946279.26 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:03:28,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=964866.6666666666, ans=0.2 2023-12-23 05:03:29,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=964866.6666666666, ans=0.1 2023-12-23 05:03:30,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=964866.6666666666, ans=0.0 2023-12-23 05:03:37,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=964933.3333333334, ans=0.0 2023-12-23 05:03:45,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=964933.3333333334, ans=0.125 2023-12-23 05:03:51,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=965000.0, ans=0.2 2023-12-23 05:03:54,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=15.0 2023-12-23 05:04:03,083 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.955e+01 3.266e+01 3.414e+01 3.599e+01 4.382e+01, threshold=6.827e+01, percent-clipped=0.0 2023-12-23 05:04:14,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-12-23 05:04:20,474 INFO [train.py:886] (0/4) Epoch 31, batch 1800, loss[loss=0.01261, audio_tagging_loss=0.01261, over 25000.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4955811.72 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:04:24,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=965200.0, ans=0.0 2023-12-23 05:04:25,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=965200.0, ans=0.0 2023-12-23 05:04:35,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=965266.6666666666, ans=0.025 2023-12-23 05:04:37,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=965266.6666666666, ans=15.0 2023-12-23 05:04:38,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=965266.6666666666, ans=0.125 2023-12-23 05:04:39,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=965333.3333333334, ans=0.0 2023-12-23 05:04:48,010 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.96 vs. limit=22.5 2023-12-23 05:04:55,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=965400.0, ans=0.1 2023-12-23 05:05:09,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.84 vs. limit=15.0 2023-12-23 05:05:11,667 INFO [train.py:886] (0/4) Epoch 31, batch 1850, loss[loss=0.01118, audio_tagging_loss=0.01118, over 24750.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4952536.57 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:05:12,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=965533.3333333334, ans=0.0 2023-12-23 05:05:38,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=965666.6666666666, ans=0.1 2023-12-23 05:05:42,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=965733.3333333334, ans=0.125 2023-12-23 05:05:44,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=965733.3333333334, ans=0.125 2023-12-23 05:05:47,722 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.028e+01 3.346e+01 3.530e+01 3.672e+01 4.173e+01, threshold=7.061e+01, percent-clipped=0.0 2023-12-23 05:05:48,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=965733.3333333334, ans=0.2 2023-12-23 05:06:04,202 INFO [train.py:886] (0/4) Epoch 31, batch 1900, loss[loss=0.0132, audio_tagging_loss=0.0132, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4953360.14 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:06:54,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.75 vs. limit=6.0 2023-12-23 05:06:57,413 INFO [train.py:886] (0/4) Epoch 31, batch 1950, loss[loss=0.01203, audio_tagging_loss=0.01203, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4948883.98 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:06:58,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=966200.0, ans=0.0 2023-12-23 05:06:59,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=966200.0, ans=0.0 2023-12-23 05:06:59,826 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.04 vs. limit=22.5 2023-12-23 05:07:01,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=966200.0, ans=0.0 2023-12-23 05:07:05,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=966200.0, ans=0.1 2023-12-23 05:07:32,862 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.868e+01 3.281e+01 3.424e+01 3.602e+01 4.114e+01, threshold=6.849e+01, percent-clipped=0.0 2023-12-23 05:07:36,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=966400.0, ans=0.0 2023-12-23 05:07:46,219 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:07:47,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.13 vs. limit=15.0 2023-12-23 05:07:47,980 INFO [train.py:886] (0/4) Epoch 31, batch 2000, loss[loss=0.01414, audio_tagging_loss=0.01414, over 24750.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4953133.62 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:07:49,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=966533.3333333334, ans=0.0 2023-12-23 05:07:50,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=966533.3333333334, ans=0.0 2023-12-23 05:08:07,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2023-12-23 05:08:23,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=966733.3333333334, ans=0.0 2023-12-23 05:08:34,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=966800.0, ans=0.95 2023-12-23 05:08:41,028 INFO [train.py:886] (0/4) Epoch 31, batch 2050, loss[loss=0.01043, audio_tagging_loss=0.01043, over 25000.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4954991.84 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:08:47,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=966866.6666666666, ans=0.125 2023-12-23 05:08:53,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.91 vs. limit=22.5 2023-12-23 05:09:12,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=967066.6666666666, ans=0.125 2023-12-23 05:09:14,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=967066.6666666666, ans=0.125 2023-12-23 05:09:15,613 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.914e+01 3.257e+01 3.392e+01 3.574e+01 4.327e+01, threshold=6.783e+01, percent-clipped=0.0 2023-12-23 05:09:29,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=967133.3333333334, ans=0.0 2023-12-23 05:09:31,440 INFO [train.py:886] (0/4) Epoch 31, batch 2100, loss[loss=0.01165, audio_tagging_loss=0.01165, over 25000.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4954814.32 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:09:47,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=967266.6666666666, ans=0.2 2023-12-23 05:10:13,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=967400.0, ans=0.125 2023-12-23 05:10:23,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=967533.3333333334, ans=0.1 2023-12-23 05:10:24,302 INFO [train.py:886] (0/4) Epoch 31, batch 2150, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4961459.79 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:10:30,278 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:10:32,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=967533.3333333334, ans=0.2 2023-12-23 05:10:34,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=22.5 2023-12-23 05:10:46,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=967666.6666666666, ans=0.0 2023-12-23 05:10:55,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=967733.3333333334, ans=0.07 2023-12-23 05:10:59,512 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.971e+01 3.338e+01 3.483e+01 3.621e+01 4.579e+01, threshold=6.966e+01, percent-clipped=0.0 2023-12-23 05:11:08,437 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.08 vs. limit=22.5 2023-12-23 05:11:14,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=967800.0, ans=0.125 2023-12-23 05:11:14,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=967800.0, ans=0.2 2023-12-23 05:11:17,275 INFO [train.py:886] (0/4) Epoch 31, batch 2200, loss[loss=0.01164, audio_tagging_loss=0.01164, over 24750.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4950959.53 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:11:17,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=967866.6666666666, ans=0.2 2023-12-23 05:11:22,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=967866.6666666666, ans=0.125 2023-12-23 05:11:24,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=967866.6666666666, ans=0.125 2023-12-23 05:11:26,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=967933.3333333334, ans=0.1 2023-12-23 05:11:53,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2023-12-23 05:12:08,049 INFO [train.py:886] (0/4) Epoch 31, batch 2250, loss[loss=0.01441, audio_tagging_loss=0.01441, over 24750.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4951346.29 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:12:08,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=968200.0, ans=0.125 2023-12-23 05:12:10,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=968200.0, ans=0.2 2023-12-23 05:12:29,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.12 vs. limit=22.5 2023-12-23 05:12:32,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=968333.3333333334, ans=0.0 2023-12-23 05:12:32,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=968333.3333333334, ans=0.1 2023-12-23 05:12:40,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=968400.0, ans=0.125 2023-12-23 05:12:44,288 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.002e+01 3.260e+01 3.452e+01 3.625e+01 3.919e+01, threshold=6.903e+01, percent-clipped=0.0 2023-12-23 05:12:48,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=968400.0, ans=0.125 2023-12-23 05:12:51,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=968466.6666666666, ans=0.2 2023-12-23 05:12:55,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.41 vs. limit=15.0 2023-12-23 05:13:01,705 INFO [train.py:886] (0/4) Epoch 31, batch 2300, loss[loss=0.01379, audio_tagging_loss=0.01379, over 23977.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4945054.18 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:13:04,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=968533.3333333334, ans=0.2 2023-12-23 05:13:10,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=968600.0, ans=0.1 2023-12-23 05:13:19,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=968600.0, ans=0.0 2023-12-23 05:13:27,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=968666.6666666666, ans=0.1 2023-12-23 05:13:36,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.77 vs. limit=6.0 2023-12-23 05:13:42,644 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-12-23 05:13:54,146 INFO [train.py:886] (0/4) Epoch 31, batch 2350, loss[loss=0.01298, audio_tagging_loss=0.01298, over 25000.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4947377.61 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:14:21,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=969000.0, ans=0.1 2023-12-23 05:14:22,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=12.0 2023-12-23 05:14:27,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=969066.6666666666, ans=0.0 2023-12-23 05:14:29,949 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.885e+01 3.284e+01 3.418e+01 3.549e+01 4.229e+01, threshold=6.835e+01, percent-clipped=0.0 2023-12-23 05:14:34,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=969066.6666666666, ans=0.2 2023-12-23 05:14:45,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=969200.0, ans=0.1 2023-12-23 05:14:45,947 INFO [train.py:886] (0/4) Epoch 31, batch 2400, loss[loss=0.01191, audio_tagging_loss=0.01191, over 25000.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4956333.68 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:14:47,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=969200.0, ans=0.125 2023-12-23 05:14:49,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.82 vs. limit=10.0 2023-12-23 05:14:53,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=969200.0, ans=0.0 2023-12-23 05:15:07,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=969333.3333333334, ans=0.0 2023-12-23 05:15:14,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=969333.3333333334, ans=0.0 2023-12-23 05:15:23,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=969400.0, ans=0.125 2023-12-23 05:15:26,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=969400.0, ans=0.125 2023-12-23 05:15:30,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=969466.6666666666, ans=0.125 2023-12-23 05:15:37,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=969466.6666666666, ans=0.0 2023-12-23 05:15:38,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=969533.3333333334, ans=0.0 2023-12-23 05:15:39,264 INFO [train.py:886] (0/4) Epoch 31, batch 2450, loss[loss=0.01215, audio_tagging_loss=0.01215, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4957045.98 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:15:54,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=969600.0, ans=0.125 2023-12-23 05:15:56,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=969600.0, ans=0.125 2023-12-23 05:15:57,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=969600.0, ans=0.125 2023-12-23 05:15:58,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=969666.6666666666, ans=0.1 2023-12-23 05:16:01,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=969666.6666666666, ans=0.0 2023-12-23 05:16:06,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=969666.6666666666, ans=0.2 2023-12-23 05:16:14,735 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.981e+01 3.300e+01 3.451e+01 3.618e+01 3.996e+01, threshold=6.901e+01, percent-clipped=0.0 2023-12-23 05:16:31,304 INFO [train.py:886] (0/4) Epoch 31, batch 2500, loss[loss=0.0104, audio_tagging_loss=0.0104, over 24750.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4955303.65 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:17:19,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=970133.3333333334, ans=0.125 2023-12-23 05:17:23,729 INFO [train.py:886] (0/4) Epoch 31, batch 2550, loss[loss=0.01285, audio_tagging_loss=0.01285, over 25000.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4952404.71 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:17:25,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.92 vs. limit=15.0 2023-12-23 05:17:31,474 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:17:45,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=12.0 2023-12-23 05:17:59,854 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.010e+01 3.321e+01 3.428e+01 3.568e+01 4.183e+01, threshold=6.855e+01, percent-clipped=0.0 2023-12-23 05:18:09,658 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.20 vs. limit=15.0 2023-12-23 05:18:15,518 INFO [train.py:886] (0/4) Epoch 31, batch 2600, loss[loss=0.01332, audio_tagging_loss=0.01332, over 24750.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4951087.92 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:18:41,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=970666.6666666666, ans=0.125 2023-12-23 05:18:44,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=970666.6666666666, ans=0.025 2023-12-23 05:18:48,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=970733.3333333334, ans=0.125 2023-12-23 05:18:51,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=970733.3333333334, ans=0.125 2023-12-23 05:18:58,737 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2023-12-23 05:19:07,500 INFO [train.py:886] (0/4) Epoch 31, batch 2650, loss[loss=0.01138, audio_tagging_loss=0.01138, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4955042.68 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:19:07,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=970866.6666666666, ans=0.05 2023-12-23 05:19:08,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.55 vs. limit=10.0 2023-12-23 05:19:12,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=970866.6666666666, ans=0.125 2023-12-23 05:19:16,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=970866.6666666666, ans=0.2 2023-12-23 05:19:24,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=970933.3333333334, ans=0.0 2023-12-23 05:19:25,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=970933.3333333334, ans=0.2 2023-12-23 05:19:34,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=971000.0, ans=0.07 2023-12-23 05:19:38,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.50 vs. limit=15.0 2023-12-23 05:19:40,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2023-12-23 05:19:44,614 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.992e+01 3.292e+01 3.427e+01 3.635e+01 4.044e+01, threshold=6.854e+01, percent-clipped=0.0 2023-12-23 05:20:00,182 INFO [train.py:886] (0/4) Epoch 31, batch 2700, loss[loss=0.014, audio_tagging_loss=0.014, over 21733.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4956075.67 frames. ], batch size: 107, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:20:05,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=971200.0, ans=0.125 2023-12-23 05:20:52,250 INFO [train.py:886] (0/4) Epoch 31, batch 2750, loss[loss=0.01272, audio_tagging_loss=0.01272, over 25000.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4957062.60 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:20:53,353 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:21:28,307 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.862e+01 3.291e+01 3.432e+01 3.574e+01 3.962e+01, threshold=6.863e+01, percent-clipped=0.0 2023-12-23 05:21:42,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=971800.0, ans=10.0 2023-12-23 05:21:43,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=971866.6666666666, ans=0.2 2023-12-23 05:21:44,118 INFO [train.py:886] (0/4) Epoch 31, batch 2800, loss[loss=0.01248, audio_tagging_loss=0.01248, over 25000.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4954705.56 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:21:46,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=971866.6666666666, ans=0.125 2023-12-23 05:21:49,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=971866.6666666666, ans=0.0 2023-12-23 05:22:00,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=971933.3333333334, ans=0.2 2023-12-23 05:22:13,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=972000.0, ans=0.1 2023-12-23 05:22:15,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=972066.6666666666, ans=0.125 2023-12-23 05:22:29,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=972133.3333333334, ans=0.1 2023-12-23 05:22:37,104 INFO [train.py:886] (0/4) Epoch 31, batch 2850, loss[loss=0.01059, audio_tagging_loss=0.01059, over 25000.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4948379.25 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:22:46,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=972266.6666666666, ans=0.125 2023-12-23 05:22:47,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=972266.6666666666, ans=0.125 2023-12-23 05:23:05,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.25 vs. limit=22.5 2023-12-23 05:23:13,336 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.924e+01 3.280e+01 3.439e+01 3.597e+01 4.174e+01, threshold=6.878e+01, percent-clipped=0.0 2023-12-23 05:23:22,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=972466.6666666666, ans=0.125 2023-12-23 05:23:28,948 INFO [train.py:886] (0/4) Epoch 31, batch 2900, loss[loss=0.01634, audio_tagging_loss=0.01634, over 25000.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4948918.52 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:23:53,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=972666.6666666666, ans=0.2 2023-12-23 05:23:53,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2023-12-23 05:23:56,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.40 vs. limit=10.0 2023-12-23 05:24:09,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-12-23 05:24:13,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=972800.0, ans=0.125 2023-12-23 05:24:16,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=972800.0, ans=0.1 2023-12-23 05:24:20,312 INFO [train.py:886] (0/4) Epoch 31, batch 2950, loss[loss=0.01274, audio_tagging_loss=0.01274, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4950065.95 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:24:23,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=972866.6666666666, ans=0.125 2023-12-23 05:24:30,565 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2023-12-23 05:24:37,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=972933.3333333334, ans=0.125 2023-12-23 05:24:40,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=973000.0, ans=0.1 2023-12-23 05:24:43,502 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:24:55,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2023-12-23 05:24:56,483 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.890e+01 3.283e+01 3.439e+01 3.603e+01 3.990e+01, threshold=6.877e+01, percent-clipped=0.0 2023-12-23 05:25:12,295 INFO [train.py:886] (0/4) Epoch 31, batch 3000, loss[loss=0.01152, audio_tagging_loss=0.01152, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4949094.18 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:25:12,297 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 05:25:33,499 INFO [train.py:917] (0/4) Epoch 31, validation: loss=0.03277, audio_tagging_loss=0.03277, over 3737520.00 frames. 2023-12-23 05:25:33,500 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 05:25:37,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=973200.0, ans=0.2 2023-12-23 05:25:46,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=973266.6666666666, ans=0.2 2023-12-23 05:25:46,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2023-12-23 05:25:47,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.75 vs. limit=10.0 2023-12-23 05:25:55,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.87 vs. limit=15.0 2023-12-23 05:26:06,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=973400.0, ans=0.0 2023-12-23 05:26:15,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=973466.6666666666, ans=0.1 2023-12-23 05:26:25,825 INFO [train.py:886] (0/4) Epoch 31, batch 3050, loss[loss=0.0131, audio_tagging_loss=0.0131, over 24750.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4950799.11 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:26:40,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=973600.0, ans=0.125 2023-12-23 05:26:41,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=973600.0, ans=0.125 2023-12-23 05:26:50,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.71 vs. limit=6.0 2023-12-23 05:27:00,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=973733.3333333334, ans=0.035 2023-12-23 05:27:01,677 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.915e+01 3.290e+01 3.424e+01 3.581e+01 4.026e+01, threshold=6.848e+01, percent-clipped=0.0 2023-12-23 05:27:18,186 INFO [train.py:886] (0/4) Epoch 31, batch 3100, loss[loss=0.01275, audio_tagging_loss=0.01275, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4951472.19 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:27:21,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=973866.6666666666, ans=0.0 2023-12-23 05:27:26,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=973866.6666666666, ans=0.0 2023-12-23 05:27:36,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=973933.3333333334, ans=0.125 2023-12-23 05:28:04,590 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:28:08,565 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:28:09,282 INFO [train.py:886] (0/4) Epoch 31, batch 3150, loss[loss=0.01078, audio_tagging_loss=0.01078, over 22388.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4946517.14 frames. ], batch size: 107, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:28:13,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=974200.0, ans=0.125 2023-12-23 05:28:16,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=974200.0, ans=0.1 2023-12-23 05:28:45,156 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.907e+01 3.356e+01 3.493e+01 3.608e+01 4.155e+01, threshold=6.985e+01, percent-clipped=0.0 2023-12-23 05:28:48,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=974400.0, ans=0.0 2023-12-23 05:28:55,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=974466.6666666666, ans=0.0 2023-12-23 05:29:00,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=974466.6666666666, ans=10.0 2023-12-23 05:29:01,484 INFO [train.py:886] (0/4) Epoch 31, batch 3200, loss[loss=0.01453, audio_tagging_loss=0.01453, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4943856.16 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:29:02,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=974533.3333333334, ans=0.5 2023-12-23 05:29:12,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=974600.0, ans=0.125 2023-12-23 05:29:16,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=974600.0, ans=0.125 2023-12-23 05:29:18,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=974600.0, ans=0.125 2023-12-23 05:29:36,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=974733.3333333334, ans=0.125 2023-12-23 05:29:41,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=974800.0, ans=0.0 2023-12-23 05:29:46,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=974800.0, ans=0.125 2023-12-23 05:29:52,906 INFO [train.py:886] (0/4) Epoch 31, batch 3250, loss[loss=0.01229, audio_tagging_loss=0.01229, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4945733.90 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:29:53,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=974866.6666666666, ans=0.1 2023-12-23 05:29:58,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=974866.6666666666, ans=0.0 2023-12-23 05:30:05,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=974933.3333333334, ans=0.09899494936611666 2023-12-23 05:30:16,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=15.0 2023-12-23 05:30:18,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=975000.0, ans=0.0 2023-12-23 05:30:30,252 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.928e+01 3.264e+01 3.408e+01 3.519e+01 4.216e+01, threshold=6.815e+01, percent-clipped=0.0 2023-12-23 05:30:34,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=975066.6666666666, ans=0.125 2023-12-23 05:30:35,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=975133.3333333334, ans=0.125 2023-12-23 05:30:36,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=975133.3333333334, ans=0.0 2023-12-23 05:30:37,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=975133.3333333334, ans=0.125 2023-12-23 05:30:38,167 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=12.0 2023-12-23 05:30:45,252 INFO [train.py:886] (0/4) Epoch 31, batch 3300, loss[loss=0.01249, audio_tagging_loss=0.01249, over 25000.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4941511.24 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:30:51,206 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:31:06,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.18 vs. limit=10.0 2023-12-23 05:31:12,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=975333.3333333334, ans=0.0 2023-12-23 05:31:15,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=975400.0, ans=0.1 2023-12-23 05:31:16,190 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2023-12-23 05:31:20,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=15.0 2023-12-23 05:31:27,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=975466.6666666666, ans=0.0 2023-12-23 05:31:27,089 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:31:37,507 INFO [train.py:886] (0/4) Epoch 31, batch 3350, loss[loss=0.01264, audio_tagging_loss=0.01264, over 25000.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4947406.73 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:31:43,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=975533.3333333334, ans=0.125 2023-12-23 05:31:57,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=975666.6666666666, ans=0.125 2023-12-23 05:32:14,153 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.007e+01 3.277e+01 3.424e+01 3.609e+01 4.697e+01, threshold=6.848e+01, percent-clipped=0.0 2023-12-23 05:32:17,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=975733.3333333334, ans=0.2 2023-12-23 05:32:18,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2023-12-23 05:32:19,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=975800.0, ans=0.1 2023-12-23 05:32:20,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2023-12-23 05:32:21,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=975800.0, ans=0.1 2023-12-23 05:32:28,536 INFO [train.py:886] (0/4) Epoch 31, batch 3400, loss[loss=0.009757, audio_tagging_loss=0.009757, over 24750.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4955976.21 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:32:31,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=975866.6666666666, ans=0.0 2023-12-23 05:32:35,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=975866.6666666666, ans=0.125 2023-12-23 05:32:37,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=975933.3333333334, ans=0.125 2023-12-23 05:32:54,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.13 vs. limit=15.0 2023-12-23 05:33:15,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=976133.3333333334, ans=0.125 2023-12-23 05:33:21,838 INFO [train.py:886] (0/4) Epoch 31, batch 3450, loss[loss=0.01193, audio_tagging_loss=0.01193, over 22026.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4950238.19 frames. ], batch size: 107, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:33:21,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=976200.0, ans=0.0 2023-12-23 05:33:22,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=976200.0, ans=0.0 2023-12-23 05:33:39,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=976266.6666666666, ans=0.125 2023-12-23 05:33:39,104 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:33:48,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.78 vs. limit=10.0 2023-12-23 05:33:57,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=976400.0, ans=0.125 2023-12-23 05:33:58,069 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.000e+01 3.343e+01 3.465e+01 3.665e+01 4.176e+01, threshold=6.930e+01, percent-clipped=0.0 2023-12-23 05:33:59,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=976400.0, ans=0.0 2023-12-23 05:34:11,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=976466.6666666666, ans=0.125 2023-12-23 05:34:13,714 INFO [train.py:886] (0/4) Epoch 31, batch 3500, loss[loss=0.0131, audio_tagging_loss=0.0131, over 24750.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4941424.32 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:34:37,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=976666.6666666666, ans=0.0 2023-12-23 05:35:05,384 INFO [train.py:886] (0/4) Epoch 31, batch 3550, loss[loss=0.01138, audio_tagging_loss=0.01138, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4940069.83 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:35:08,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.45 vs. limit=22.5 2023-12-23 05:35:10,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=976866.6666666666, ans=0.1 2023-12-23 05:35:24,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.28 vs. limit=10.0 2023-12-23 05:35:29,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=977000.0, ans=0.0 2023-12-23 05:35:41,267 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.791e+01 3.288e+01 3.473e+01 3.624e+01 4.174e+01, threshold=6.946e+01, percent-clipped=0.0 2023-12-23 05:35:57,770 INFO [train.py:886] (0/4) Epoch 31, batch 3600, loss[loss=0.01119, audio_tagging_loss=0.01119, over 24750.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4947911.84 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:36:01,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.77 vs. limit=22.5 2023-12-23 05:36:08,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=977266.6666666666, ans=0.1 2023-12-23 05:36:32,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=977400.0, ans=0.1 2023-12-23 05:36:37,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=977400.0, ans=0.125 2023-12-23 05:36:37,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.23 vs. limit=10.0 2023-12-23 05:36:47,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=977466.6666666666, ans=0.125 2023-12-23 05:36:50,215 INFO [train.py:886] (0/4) Epoch 31, batch 3650, loss[loss=0.01147, audio_tagging_loss=0.01147, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4950426.73 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:37:04,449 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:37:25,683 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.024e+01 3.220e+01 3.394e+01 3.585e+01 4.042e+01, threshold=6.789e+01, percent-clipped=0.0 2023-12-23 05:37:31,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2023-12-23 05:37:31,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=977800.0, ans=0.0 2023-12-23 05:37:38,723 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.98 vs. limit=15.0 2023-12-23 05:37:41,098 INFO [train.py:886] (0/4) Epoch 31, batch 3700, loss[loss=0.01296, audio_tagging_loss=0.01296, over 25000.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4958363.34 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:37:50,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=977933.3333333334, ans=0.0 2023-12-23 05:37:57,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=977933.3333333334, ans=0.1 2023-12-23 05:38:21,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.42 vs. limit=15.0 2023-12-23 05:38:22,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=978133.3333333334, ans=0.125 2023-12-23 05:38:28,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=978133.3333333334, ans=0.125 2023-12-23 05:38:33,225 INFO [train.py:886] (0/4) Epoch 31, batch 3750, loss[loss=0.01296, audio_tagging_loss=0.01296, over 24750.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4962104.47 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:38:52,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=978333.3333333334, ans=0.04949747468305833 2023-12-23 05:38:54,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=978333.3333333334, ans=0.2 2023-12-23 05:39:09,090 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.069e+01 3.376e+01 3.545e+01 3.717e+01 4.224e+01, threshold=7.090e+01, percent-clipped=0.0 2023-12-23 05:39:19,070 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=12.0 2023-12-23 05:39:24,134 INFO [train.py:886] (0/4) Epoch 31, batch 3800, loss[loss=0.01565, audio_tagging_loss=0.01565, over 24750.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4955679.81 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:39:27,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=978533.3333333334, ans=0.125 2023-12-23 05:39:34,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=978600.0, ans=0.125 2023-12-23 05:39:46,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=978666.6666666666, ans=0.125 2023-12-23 05:39:48,132 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.97 vs. limit=22.5 2023-12-23 05:40:08,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=978800.0, ans=0.1 2023-12-23 05:40:14,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=978800.0, ans=0.125 2023-12-23 05:40:15,968 INFO [train.py:886] (0/4) Epoch 31, batch 3850, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4950501.43 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:40:16,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=978866.6666666666, ans=0.125 2023-12-23 05:40:18,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=978866.6666666666, ans=0.0 2023-12-23 05:40:20,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=978866.6666666666, ans=0.0 2023-12-23 05:40:21,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=978866.6666666666, ans=0.125 2023-12-23 05:40:28,479 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2023-12-23 05:40:29,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=978933.3333333334, ans=0.1 2023-12-23 05:40:40,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=979000.0, ans=0.2 2023-12-23 05:40:43,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=979000.0, ans=0.95 2023-12-23 05:40:43,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=979000.0, ans=0.125 2023-12-23 05:40:46,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=979066.6666666666, ans=0.0 2023-12-23 05:40:51,986 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.000e+01 3.321e+01 3.429e+01 3.579e+01 4.062e+01, threshold=6.857e+01, percent-clipped=0.0 2023-12-23 05:40:59,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=979133.3333333334, ans=0.1 2023-12-23 05:41:06,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.97 vs. limit=22.5 2023-12-23 05:41:07,607 INFO [train.py:886] (0/4) Epoch 31, batch 3900, loss[loss=0.01263, audio_tagging_loss=0.01263, over 24750.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4948608.72 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:41:08,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=979200.0, ans=0.125 2023-12-23 05:41:18,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=979266.6666666666, ans=0.1 2023-12-23 05:41:20,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=979266.6666666666, ans=0.125 2023-12-23 05:41:24,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=979266.6666666666, ans=0.125 2023-12-23 05:41:26,931 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.520e-03 2023-12-23 05:41:43,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=979400.0, ans=0.125 2023-12-23 05:41:44,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=979400.0, ans=0.0 2023-12-23 05:41:49,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=979466.6666666666, ans=0.95 2023-12-23 05:41:58,203 INFO [train.py:886] (0/4) Epoch 31, batch 3950, loss[loss=0.01125, audio_tagging_loss=0.01125, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4954170.23 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:42:15,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=979600.0, ans=0.125 2023-12-23 05:42:25,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=979666.6666666666, ans=0.0 2023-12-23 05:42:33,389 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.895e+01 3.318e+01 3.420e+01 3.631e+01 6.052e+01, threshold=6.840e+01, percent-clipped=0.0 2023-12-23 05:42:50,391 INFO [train.py:886] (0/4) Epoch 31, batch 4000, loss[loss=0.0126, audio_tagging_loss=0.0126, over 25000.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4955354.15 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:42:56,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=979866.6666666666, ans=0.125 2023-12-23 05:42:57,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.46 vs. limit=6.0 2023-12-23 05:43:00,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=979933.3333333334, ans=0.125 2023-12-23 05:43:09,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=980000.0, ans=0.125 2023-12-23 05:43:22,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=980066.6666666666, ans=0.125 2023-12-23 05:43:29,333 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=12.0 2023-12-23 05:43:31,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=980133.3333333334, ans=0.2 2023-12-23 05:43:33,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=980133.3333333334, ans=0.0 2023-12-23 05:43:40,174 INFO [train.py:886] (0/4) Epoch 31, batch 4050, loss[loss=0.01036, audio_tagging_loss=0.01036, over 24750.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4952891.14 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:43:49,831 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.99 vs. limit=22.5 2023-12-23 05:44:02,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=980333.3333333334, ans=0.125 2023-12-23 05:44:03,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=980333.3333333334, ans=10.0 2023-12-23 05:44:15,581 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.086e+01 3.376e+01 3.511e+01 3.717e+01 4.360e+01, threshold=7.022e+01, percent-clipped=0.0 2023-12-23 05:44:30,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2023-12-23 05:44:31,102 INFO [train.py:886] (0/4) Epoch 31, batch 4100, loss[loss=0.01082, audio_tagging_loss=0.01082, over 25000.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4947962.62 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:44:42,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=980600.0, ans=0.95 2023-12-23 05:44:51,808 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=15.0 2023-12-23 05:44:57,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=980666.6666666666, ans=0.1 2023-12-23 05:45:14,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=980800.0, ans=0.125 2023-12-23 05:45:15,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=980800.0, ans=0.125 2023-12-23 05:45:20,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=980800.0, ans=0.125 2023-12-23 05:45:23,709 INFO [train.py:886] (0/4) Epoch 31, batch 4150, loss[loss=0.01325, audio_tagging_loss=0.01325, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4947558.93 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:45:28,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=980866.6666666666, ans=0.125 2023-12-23 05:45:41,721 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:45:59,586 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.785e+01 3.259e+01 3.391e+01 3.549e+01 4.171e+01, threshold=6.781e+01, percent-clipped=0.0 2023-12-23 05:46:03,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=981133.3333333334, ans=0.1 2023-12-23 05:46:13,796 INFO [train.py:886] (0/4) Epoch 31, batch 4200, loss[loss=0.0118, audio_tagging_loss=0.0118, over 25000.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4944266.36 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:46:25,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=981266.6666666666, ans=0.0 2023-12-23 05:46:29,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=981266.6666666666, ans=0.125 2023-12-23 05:46:29,909 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:46:35,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=981333.3333333334, ans=0.05 2023-12-23 05:46:36,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=981333.3333333334, ans=0.125 2023-12-23 05:46:45,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=981400.0, ans=0.125 2023-12-23 05:46:49,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2023-12-23 05:46:59,384 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.85 vs. limit=15.0 2023-12-23 05:47:01,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-12-23 05:47:04,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=981466.6666666666, ans=0.5 2023-12-23 05:47:06,145 INFO [train.py:886] (0/4) Epoch 31, batch 4250, loss[loss=0.01271, audio_tagging_loss=0.01271, over 25000.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4943448.48 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:47:09,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2023-12-23 05:47:13,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=981533.3333333334, ans=0.125 2023-12-23 05:47:19,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=981600.0, ans=0.125 2023-12-23 05:47:19,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=981600.0, ans=0.0 2023-12-23 05:47:27,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=981666.6666666666, ans=0.1 2023-12-23 05:47:37,059 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.562e-03 2023-12-23 05:47:41,475 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.892e+01 3.291e+01 3.423e+01 3.545e+01 3.863e+01, threshold=6.845e+01, percent-clipped=0.0 2023-12-23 05:47:54,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2023-12-23 05:47:55,767 INFO [train.py:886] (0/4) Epoch 31, batch 4300, loss[loss=0.01615, audio_tagging_loss=0.01615, over 25000.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4948665.48 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 32.0 2023-12-23 05:48:48,958 INFO [train.py:886] (0/4) Epoch 31, batch 4350, loss[loss=0.01028, audio_tagging_loss=0.01028, over 25000.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4955278.28 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 32.0 2023-12-23 05:48:49,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=982200.0, ans=0.0 2023-12-23 05:49:12,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=15.0 2023-12-23 05:49:24,696 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.048e+01 3.354e+01 3.488e+01 3.602e+01 4.133e+01, threshold=6.977e+01, percent-clipped=0.0 2023-12-23 05:49:25,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=982400.0, ans=0.125 2023-12-23 05:49:40,822 INFO [train.py:886] (0/4) Epoch 31, batch 4400, loss[loss=0.01476, audio_tagging_loss=0.01476, over 24750.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4952540.26 frames. ], batch size: 99, lr: 3.45e-03, grad_scale: 32.0 2023-12-23 05:49:45,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=982533.3333333334, ans=0.2 2023-12-23 05:49:53,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=982600.0, ans=0.2 2023-12-23 05:49:53,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=982600.0, ans=0.0 2023-12-23 05:49:54,333 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2023-12-23 05:50:06,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=982666.6666666666, ans=15.0 2023-12-23 05:50:20,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2023-12-23 05:50:29,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=982800.0, ans=0.0 2023-12-23 05:50:31,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=982866.6666666666, ans=0.125 2023-12-23 05:50:31,894 INFO [train.py:886] (0/4) Epoch 31, batch 4450, loss[loss=0.01279, audio_tagging_loss=0.01279, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4941468.54 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 32.0 2023-12-23 05:50:38,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=15.0 2023-12-23 05:50:42,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=982933.3333333334, ans=0.125 2023-12-23 05:50:51,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=982933.3333333334, ans=0.0 2023-12-23 05:50:58,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=983000.0, ans=0.125 2023-12-23 05:51:02,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=983066.6666666666, ans=0.1 2023-12-23 05:51:07,905 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.813e+01 3.353e+01 3.451e+01 3.641e+01 4.281e+01, threshold=6.902e+01, percent-clipped=0.0 2023-12-23 05:51:10,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=983066.6666666666, ans=0.125 2023-12-23 05:51:25,191 INFO [train.py:886] (0/4) Epoch 31, batch 4500, loss[loss=0.0119, audio_tagging_loss=0.0119, over 24750.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4942768.34 frames. ], batch size: 99, lr: 3.45e-03, grad_scale: 32.0 2023-12-23 05:51:29,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.22 vs. limit=22.5 2023-12-23 05:51:31,136 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:51:34,913 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:52:07,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=983466.6666666666, ans=0.0 2023-12-23 05:52:08,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=983466.6666666666, ans=0.125 2023-12-23 05:52:08,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.29 vs. limit=15.0 2023-12-23 05:52:16,178 INFO [train.py:886] (0/4) Epoch 31, batch 4550, loss[loss=0.01093, audio_tagging_loss=0.01093, over 24750.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4944288.84 frames. ], batch size: 99, lr: 3.45e-03, grad_scale: 64.0 2023-12-23 05:52:44,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=983666.6666666666, ans=0.0 2023-12-23 05:52:53,977 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.975e+01 3.298e+01 3.434e+01 3.578e+01 4.234e+01, threshold=6.868e+01, percent-clipped=0.0 2023-12-23 05:52:54,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.50 vs. limit=22.5 2023-12-23 05:53:09,078 INFO [train.py:886] (0/4) Epoch 31, batch 4600, loss[loss=0.01348, audio_tagging_loss=0.01348, over 25000.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4948690.87 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 64.0 2023-12-23 05:53:09,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=983866.6666666666, ans=0.125 2023-12-23 05:53:11,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2023-12-23 05:53:13,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=983866.6666666666, ans=0.125 2023-12-23 05:53:18,418 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.66 vs. limit=10.0 2023-12-23 05:53:23,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=983933.3333333334, ans=0.0 2023-12-23 05:53:29,960 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.02 vs. limit=12.0 2023-12-23 05:53:32,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=984000.0, ans=0.125 2023-12-23 05:53:46,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=984066.6666666666, ans=0.125 2023-12-23 05:54:01,446 INFO [train.py:886] (0/4) Epoch 31, batch 4650, loss[loss=0.01496, audio_tagging_loss=0.01496, over 25000.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4951573.64 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 64.0 2023-12-23 05:54:21,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=984333.3333333334, ans=0.2 2023-12-23 05:54:25,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=984333.3333333334, ans=0.1 2023-12-23 05:54:31,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=12.0 2023-12-23 05:54:34,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=984400.0, ans=0.1 2023-12-23 05:54:37,333 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.989e+01 3.324e+01 3.438e+01 3.563e+01 4.326e+01, threshold=6.876e+01, percent-clipped=0.0 2023-12-23 05:54:37,963 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.13 vs. limit=10.0 2023-12-23 05:54:39,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=984400.0, ans=0.0 2023-12-23 05:54:44,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=984466.6666666666, ans=0.1 2023-12-23 05:54:46,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=984466.6666666666, ans=0.125 2023-12-23 05:54:52,065 INFO [train.py:886] (0/4) Epoch 31, batch 4700, loss[loss=0.01279, audio_tagging_loss=0.01279, over 24750.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4952981.35 frames. ], batch size: 99, lr: 3.45e-03, grad_scale: 64.0 2023-12-23 05:54:53,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=984533.3333333334, ans=0.0 2023-12-23 05:54:55,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=984533.3333333334, ans=0.125 2023-12-23 05:55:17,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=984666.6666666666, ans=0.125 2023-12-23 05:55:20,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=984733.3333333334, ans=0.125 2023-12-23 05:55:39,296 INFO [train.py:886] (0/4) Epoch 31, batch 4750, loss[loss=0.01347, audio_tagging_loss=0.01347, over 25000.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4952452.57 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 64.0 2023-12-23 05:55:41,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=984866.6666666666, ans=0.0 2023-12-23 05:55:45,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=984866.6666666666, ans=0.0 2023-12-23 05:55:52,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=984933.3333333334, ans=0.0 2023-12-23 05:55:54,461 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-31.pt 2023-12-23 05:56:14,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=984973.3333333334, ans=0.1 2023-12-23 05:56:15,678 INFO [train.py:886] (0/4) Epoch 32, batch 0, loss[loss=0.03552, audio_tagging_loss=0.03552, over 20585.00 frames. ], tot_loss[loss=0.03552, audio_tagging_loss=0.03552, over 20585.00 frames. ], batch size: 107, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 05:56:15,679 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 05:56:36,812 INFO [train.py:917] (0/4) Epoch 32, validation: loss=0.03288, audio_tagging_loss=0.03288, over 3737520.00 frames. 2023-12-23 05:56:36,813 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 05:56:37,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=984973.3333333334, ans=0.1 2023-12-23 05:56:42,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=984973.3333333334, ans=0.125 2023-12-23 05:56:46,376 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2023-12-23 05:56:47,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=985040.0, ans=0.125 2023-12-23 05:56:57,223 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.142e+01 3.370e+01 3.559e+01 3.900e+01 9.561e+01, threshold=7.118e+01, percent-clipped=7.0 2023-12-23 05:57:01,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=985106.6666666666, ans=15.0 2023-12-23 05:57:04,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=985106.6666666666, ans=0.0 2023-12-23 05:57:06,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=985173.3333333334, ans=0.2 2023-12-23 05:57:09,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=985173.3333333334, ans=0.2 2023-12-23 05:57:13,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=985173.3333333334, ans=0.125 2023-12-23 05:57:20,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=985240.0, ans=0.125 2023-12-23 05:57:23,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=985240.0, ans=0.125 2023-12-23 05:57:26,980 INFO [train.py:886] (0/4) Epoch 32, batch 50, loss[loss=0.01672, audio_tagging_loss=0.01672, over 25000.00 frames. ], tot_loss[loss=0.01956, audio_tagging_loss=0.01956, over 1114826.95 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 05:57:41,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=985373.3333333334, ans=0.0 2023-12-23 05:57:48,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2023-12-23 05:57:51,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=985440.0, ans=0.0 2023-12-23 05:57:58,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.29 vs. limit=22.5 2023-12-23 05:58:18,032 INFO [train.py:886] (0/4) Epoch 32, batch 100, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.01708, audio_tagging_loss=0.01708, over 1966580.71 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 05:58:38,570 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.265e+01 3.723e+01 3.980e+01 4.376e+01 5.362e+01, threshold=7.961e+01, percent-clipped=0.0 2023-12-23 05:58:38,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=985773.3333333334, ans=0.0 2023-12-23 05:58:41,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=985773.3333333334, ans=0.125 2023-12-23 05:58:46,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=985773.3333333334, ans=0.2 2023-12-23 05:58:52,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=985840.0, ans=0.1 2023-12-23 05:58:55,318 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.77 vs. limit=22.5 2023-12-23 05:58:58,126 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=22.5 2023-12-23 05:58:58,747 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.072e-02 2023-12-23 05:59:09,718 INFO [train.py:886] (0/4) Epoch 32, batch 150, loss[loss=0.01198, audio_tagging_loss=0.01198, over 25000.00 frames. ], tot_loss[loss=0.0155, audio_tagging_loss=0.0155, over 2628360.76 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 05:59:10,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=985973.3333333334, ans=0.0 2023-12-23 05:59:28,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=986040.0, ans=0.125 2023-12-23 05:59:57,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=986240.0, ans=0.125 2023-12-23 05:59:58,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=986240.0, ans=0.125 2023-12-23 06:00:01,205 INFO [train.py:886] (0/4) Epoch 32, batch 200, loss[loss=0.01273, audio_tagging_loss=0.01273, over 24750.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 3148390.51 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:00:05,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=986306.6666666666, ans=0.125 2023-12-23 06:00:19,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=986373.3333333334, ans=0.125 2023-12-23 06:00:21,576 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.980e+01 3.390e+01 3.500e+01 3.693e+01 4.218e+01, threshold=7.000e+01, percent-clipped=0.0 2023-12-23 06:00:41,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=986573.3333333334, ans=0.125 2023-12-23 06:00:43,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=986573.3333333334, ans=0.0 2023-12-23 06:00:46,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=986573.3333333334, ans=0.125 2023-12-23 06:00:51,464 INFO [train.py:886] (0/4) Epoch 32, batch 250, loss[loss=0.0115, audio_tagging_loss=0.0115, over 25000.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 3551038.13 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:00:55,214 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-148000.pt 2023-12-23 06:01:34,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=986906.6666666666, ans=0.04949747468305833 2023-12-23 06:01:45,004 INFO [train.py:886] (0/4) Epoch 32, batch 300, loss[loss=0.01094, audio_tagging_loss=0.01094, over 24750.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 3859757.10 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:01:46,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=986973.3333333334, ans=0.125 2023-12-23 06:02:06,007 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.926e+01 3.374e+01 3.482e+01 3.656e+01 5.710e+01, threshold=6.964e+01, percent-clipped=0.0 2023-12-23 06:02:27,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=987240.0, ans=0.1 2023-12-23 06:02:27,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=987240.0, ans=0.125 2023-12-23 06:02:35,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.34 vs. limit=22.5 2023-12-23 06:02:37,253 INFO [train.py:886] (0/4) Epoch 32, batch 350, loss[loss=0.01371, audio_tagging_loss=0.01371, over 24750.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4098010.56 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:02:37,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=987306.6666666666, ans=0.0 2023-12-23 06:03:09,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=987506.6666666666, ans=0.125 2023-12-23 06:03:28,783 INFO [train.py:886] (0/4) Epoch 32, batch 400, loss[loss=0.01172, audio_tagging_loss=0.01172, over 25000.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4287746.24 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:03:33,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=987640.0, ans=0.125 2023-12-23 06:03:49,898 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.966e+01 3.292e+01 3.425e+01 3.603e+01 4.421e+01, threshold=6.851e+01, percent-clipped=0.0 2023-12-23 06:04:16,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=987906.6666666666, ans=0.2 2023-12-23 06:04:19,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=987973.3333333334, ans=0.125 2023-12-23 06:04:19,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=987973.3333333334, ans=0.04949747468305833 2023-12-23 06:04:20,505 INFO [train.py:886] (0/4) Epoch 32, batch 450, loss[loss=0.0109, audio_tagging_loss=0.0109, over 24750.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4437203.11 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:04:25,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.82 vs. limit=22.5 2023-12-23 06:04:33,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=988040.0, ans=0.125 2023-12-23 06:05:03,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2023-12-23 06:05:06,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=988240.0, ans=0.125 2023-12-23 06:05:13,748 INFO [train.py:886] (0/4) Epoch 32, batch 500, loss[loss=0.01325, audio_tagging_loss=0.01325, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4554795.76 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:05:19,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=22.5 2023-12-23 06:05:27,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=988373.3333333334, ans=0.1 2023-12-23 06:05:27,493 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.74 vs. limit=6.0 2023-12-23 06:05:34,072 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.920e+01 3.279e+01 3.431e+01 3.556e+01 4.457e+01, threshold=6.862e+01, percent-clipped=0.0 2023-12-23 06:05:45,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.34 vs. limit=10.0 2023-12-23 06:05:48,194 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2023-12-23 06:05:52,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=988506.6666666666, ans=0.125 2023-12-23 06:05:56,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=988573.3333333334, ans=0.035 2023-12-23 06:05:59,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=988573.3333333334, ans=0.0 2023-12-23 06:06:00,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=988573.3333333334, ans=0.125 2023-12-23 06:06:04,454 INFO [train.py:886] (0/4) Epoch 32, batch 550, loss[loss=0.01367, audio_tagging_loss=0.01367, over 25000.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4641301.60 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:06:26,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=988773.3333333334, ans=0.0 2023-12-23 06:06:56,690 INFO [train.py:886] (0/4) Epoch 32, batch 600, loss[loss=0.01412, audio_tagging_loss=0.01412, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4710759.36 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:06:56,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=988973.3333333334, ans=0.125 2023-12-23 06:06:56,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=988973.3333333334, ans=0.1 2023-12-23 06:06:57,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=988973.3333333334, ans=0.1 2023-12-23 06:07:00,169 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2023-12-23 06:07:16,462 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.995e+01 3.360e+01 3.500e+01 3.662e+01 4.640e+01, threshold=6.999e+01, percent-clipped=0.0 2023-12-23 06:07:33,271 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.04 vs. limit=10.0 2023-12-23 06:07:36,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=989240.0, ans=0.125 2023-12-23 06:07:43,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=989240.0, ans=0.125 2023-12-23 06:07:47,458 INFO [train.py:886] (0/4) Epoch 32, batch 650, loss[loss=0.0144, audio_tagging_loss=0.0144, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4757059.46 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:08:15,170 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:08:30,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=989573.3333333334, ans=0.125 2023-12-23 06:08:36,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=989573.3333333334, ans=0.125 2023-12-23 06:08:36,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2023-12-23 06:08:37,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=989573.3333333334, ans=0.2 2023-12-23 06:08:38,951 INFO [train.py:886] (0/4) Epoch 32, batch 700, loss[loss=0.01296, audio_tagging_loss=0.01296, over 24750.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4793272.60 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:08:40,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=989640.0, ans=0.95 2023-12-23 06:08:52,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2023-12-23 06:08:59,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2023-12-23 06:09:00,794 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.021e+01 3.361e+01 3.483e+01 3.638e+01 4.133e+01, threshold=6.965e+01, percent-clipped=0.0 2023-12-23 06:09:01,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=989773.3333333334, ans=0.125 2023-12-23 06:09:04,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=989773.3333333334, ans=22.5 2023-12-23 06:09:13,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=989840.0, ans=0.0 2023-12-23 06:09:19,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=989840.0, ans=0.2 2023-12-23 06:09:32,134 INFO [train.py:886] (0/4) Epoch 32, batch 750, loss[loss=0.01222, audio_tagging_loss=0.01222, over 24750.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4831607.57 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:09:36,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=989973.3333333334, ans=0.125 2023-12-23 06:09:37,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=989973.3333333334, ans=0.0 2023-12-23 06:09:38,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=989973.3333333334, ans=10.0 2023-12-23 06:10:13,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.43 vs. limit=5.0 2023-12-23 06:10:14,519 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2023-12-23 06:10:22,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=990306.6666666666, ans=10.0 2023-12-23 06:10:23,086 INFO [train.py:886] (0/4) Epoch 32, batch 800, loss[loss=0.01197, audio_tagging_loss=0.01197, over 25000.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4863836.04 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:10:29,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=990306.6666666666, ans=0.1 2023-12-23 06:10:36,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=990373.3333333334, ans=0.125 2023-12-23 06:10:44,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.44 vs. limit=22.5 2023-12-23 06:10:44,494 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.016e+01 3.268e+01 3.423e+01 3.577e+01 3.951e+01, threshold=6.846e+01, percent-clipped=0.0 2023-12-23 06:10:47,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=990440.0, ans=0.1 2023-12-23 06:11:00,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2023-12-23 06:11:06,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=990573.3333333334, ans=0.125 2023-12-23 06:11:09,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=990573.3333333334, ans=0.1 2023-12-23 06:11:11,032 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2023-12-23 06:11:12,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=990573.3333333334, ans=0.125 2023-12-23 06:11:13,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=990573.3333333334, ans=0.125 2023-12-23 06:11:16,034 INFO [train.py:886] (0/4) Epoch 32, batch 850, loss[loss=0.01246, audio_tagging_loss=0.01246, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4888539.98 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:11:35,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=990773.3333333334, ans=0.1 2023-12-23 06:11:39,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.49 vs. limit=12.0 2023-12-23 06:11:48,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=990840.0, ans=0.125 2023-12-23 06:12:07,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=990973.3333333334, ans=0.1 2023-12-23 06:12:07,820 INFO [train.py:886] (0/4) Epoch 32, batch 900, loss[loss=0.01143, audio_tagging_loss=0.01143, over 25000.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4904095.62 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:12:10,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=990973.3333333334, ans=0.125 2023-12-23 06:12:19,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=991040.0, ans=0.07 2023-12-23 06:12:21,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=991040.0, ans=0.0 2023-12-23 06:12:28,210 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.920e+01 3.389e+01 3.537e+01 3.671e+01 4.365e+01, threshold=7.073e+01, percent-clipped=0.0 2023-12-23 06:12:50,654 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:12:58,829 INFO [train.py:886] (0/4) Epoch 32, batch 950, loss[loss=0.0126, audio_tagging_loss=0.0126, over 24750.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4905928.13 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:13:07,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=991306.6666666666, ans=0.2 2023-12-23 06:13:20,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=991440.0, ans=0.025 2023-12-23 06:13:27,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.62 vs. limit=15.0 2023-12-23 06:13:28,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=991440.0, ans=0.0 2023-12-23 06:13:31,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=991506.6666666666, ans=0.125 2023-12-23 06:13:32,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=991506.6666666666, ans=0.1 2023-12-23 06:13:50,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=991573.3333333334, ans=0.0 2023-12-23 06:13:51,882 INFO [train.py:886] (0/4) Epoch 32, batch 1000, loss[loss=0.01427, audio_tagging_loss=0.01427, over 25000.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4911299.15 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:14:05,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=991706.6666666666, ans=0.2 2023-12-23 06:14:05,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=991706.6666666666, ans=0.0 2023-12-23 06:14:08,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=991706.6666666666, ans=0.0 2023-12-23 06:14:08,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=991706.6666666666, ans=0.95 2023-12-23 06:14:11,544 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.978e+01 3.269e+01 3.394e+01 3.560e+01 3.959e+01, threshold=6.787e+01, percent-clipped=0.0 2023-12-23 06:14:19,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=991773.3333333334, ans=0.125 2023-12-23 06:14:19,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=991773.3333333334, ans=0.125 2023-12-23 06:14:20,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=991773.3333333334, ans=0.125 2023-12-23 06:14:25,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=991840.0, ans=0.0 2023-12-23 06:14:35,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=991906.6666666666, ans=0.125 2023-12-23 06:14:42,957 INFO [train.py:886] (0/4) Epoch 32, batch 1050, loss[loss=0.009996, audio_tagging_loss=0.009996, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4924075.40 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:14:45,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=991973.3333333334, ans=10.0 2023-12-23 06:14:49,573 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:14:56,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2023-12-23 06:15:12,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=992106.6666666666, ans=0.125 2023-12-23 06:15:14,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=992173.3333333334, ans=0.125 2023-12-23 06:15:14,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=992173.3333333334, ans=0.0 2023-12-23 06:15:33,860 INFO [train.py:886] (0/4) Epoch 32, batch 1100, loss[loss=0.01436, audio_tagging_loss=0.01436, over 25000.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4926704.16 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:15:35,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=992306.6666666666, ans=0.0 2023-12-23 06:15:45,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=992373.3333333334, ans=0.2 2023-12-23 06:15:54,836 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.843e+01 3.284e+01 3.426e+01 3.635e+01 4.027e+01, threshold=6.852e+01, percent-clipped=0.0 2023-12-23 06:16:26,210 INFO [train.py:886] (0/4) Epoch 32, batch 1150, loss[loss=0.01223, audio_tagging_loss=0.01223, over 25000.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4940431.63 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:16:36,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=992706.6666666666, ans=0.125 2023-12-23 06:16:44,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=992706.6666666666, ans=15.0 2023-12-23 06:16:48,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=992773.3333333334, ans=0.0 2023-12-23 06:16:50,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=992773.3333333334, ans=0.0 2023-12-23 06:17:07,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.03 vs. limit=8.0 2023-12-23 06:17:14,656 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:17:17,329 INFO [train.py:886] (0/4) Epoch 32, batch 1200, loss[loss=0.01372, audio_tagging_loss=0.01372, over 24750.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4937475.44 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:17:32,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=993040.0, ans=0.0 2023-12-23 06:17:32,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=993040.0, ans=0.0 2023-12-23 06:17:39,160 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.986e+01 3.358e+01 3.522e+01 3.691e+01 4.259e+01, threshold=7.044e+01, percent-clipped=0.0 2023-12-23 06:17:56,642 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=12.0 2023-12-23 06:18:04,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=993240.0, ans=0.1 2023-12-23 06:18:10,289 INFO [train.py:886] (0/4) Epoch 32, batch 1250, loss[loss=0.012, audio_tagging_loss=0.012, over 24750.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4939314.90 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:18:43,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=993506.6666666666, ans=0.2 2023-12-23 06:18:48,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.46 vs. limit=15.0 2023-12-23 06:18:50,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=993573.3333333334, ans=0.0 2023-12-23 06:19:01,453 INFO [train.py:886] (0/4) Epoch 32, batch 1300, loss[loss=0.01187, audio_tagging_loss=0.01187, over 24750.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4936116.55 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:19:09,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=993640.0, ans=0.125 2023-12-23 06:19:22,692 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.042e+01 3.376e+01 3.531e+01 3.672e+01 4.244e+01, threshold=7.062e+01, percent-clipped=0.0 2023-12-23 06:19:31,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=993773.3333333334, ans=0.125 2023-12-23 06:19:31,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=993773.3333333334, ans=0.125 2023-12-23 06:19:39,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=993840.0, ans=0.125 2023-12-23 06:19:50,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=993906.6666666666, ans=0.125 2023-12-23 06:19:53,909 INFO [train.py:886] (0/4) Epoch 32, batch 1350, loss[loss=0.01346, audio_tagging_loss=0.01346, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4942706.61 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:20:16,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=994106.6666666666, ans=0.0 2023-12-23 06:20:17,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=10.0 2023-12-23 06:20:27,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=994173.3333333334, ans=0.1 2023-12-23 06:20:30,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=994173.3333333334, ans=0.125 2023-12-23 06:20:33,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2023-12-23 06:20:40,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=994240.0, ans=0.125 2023-12-23 06:20:46,240 INFO [train.py:886] (0/4) Epoch 32, batch 1400, loss[loss=0.0118, audio_tagging_loss=0.0118, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4949433.98 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:20:47,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.83 vs. limit=22.5 2023-12-23 06:20:47,874 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.77 vs. limit=10.0 2023-12-23 06:20:48,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=994306.6666666666, ans=0.1 2023-12-23 06:20:52,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=994306.6666666666, ans=0.1 2023-12-23 06:21:06,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.56 vs. limit=10.0 2023-12-23 06:21:06,615 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.912e+01 3.298e+01 3.472e+01 3.566e+01 4.099e+01, threshold=6.943e+01, percent-clipped=0.0 2023-12-23 06:21:18,041 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=12.0 2023-12-23 06:21:23,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=994506.6666666666, ans=0.125 2023-12-23 06:21:29,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=994573.3333333334, ans=0.125 2023-12-23 06:21:38,016 INFO [train.py:886] (0/4) Epoch 32, batch 1450, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4947612.70 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:21:38,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=994640.0, ans=0.05 2023-12-23 06:21:39,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=994640.0, ans=0.2 2023-12-23 06:22:07,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=994773.3333333334, ans=0.0 2023-12-23 06:22:17,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=994840.0, ans=0.125 2023-12-23 06:22:18,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=994906.6666666666, ans=0.0 2023-12-23 06:22:26,106 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.40 vs. limit=22.5 2023-12-23 06:22:30,195 INFO [train.py:886] (0/4) Epoch 32, batch 1500, loss[loss=0.0137, audio_tagging_loss=0.0137, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4951788.37 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:22:30,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=994973.3333333334, ans=0.2 2023-12-23 06:22:35,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=994973.3333333334, ans=0.125 2023-12-23 06:22:36,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=994973.3333333334, ans=0.0 2023-12-23 06:22:48,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=995040.0, ans=0.0 2023-12-23 06:22:50,505 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.068e+01 3.293e+01 3.421e+01 3.550e+01 3.928e+01, threshold=6.843e+01, percent-clipped=0.0 2023-12-23 06:22:57,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=995106.6666666666, ans=0.2 2023-12-23 06:23:04,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=995173.3333333334, ans=0.125 2023-12-23 06:23:06,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=995173.3333333334, ans=0.125 2023-12-23 06:23:13,047 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:23:21,278 INFO [train.py:886] (0/4) Epoch 32, batch 1550, loss[loss=0.01357, audio_tagging_loss=0.01357, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4956253.92 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:23:21,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=995306.6666666666, ans=0.2 2023-12-23 06:23:27,578 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:23:32,500 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-12-23 06:23:35,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=995373.3333333334, ans=0.125 2023-12-23 06:23:50,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=995440.0, ans=0.2 2023-12-23 06:24:05,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=995573.3333333334, ans=0.125 2023-12-23 06:24:13,316 INFO [train.py:886] (0/4) Epoch 32, batch 1600, loss[loss=0.01202, audio_tagging_loss=0.01202, over 25000.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4945132.13 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:24:20,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=995640.0, ans=0.95 2023-12-23 06:24:29,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=995706.6666666666, ans=0.125 2023-12-23 06:24:34,477 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.084e+01 3.339e+01 3.484e+01 3.655e+01 4.537e+01, threshold=6.968e+01, percent-clipped=0.0 2023-12-23 06:24:42,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=995773.3333333334, ans=0.0 2023-12-23 06:24:54,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=995906.6666666666, ans=0.125 2023-12-23 06:25:04,876 INFO [train.py:886] (0/4) Epoch 32, batch 1650, loss[loss=0.01391, audio_tagging_loss=0.01391, over 24750.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4949174.69 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:25:05,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=995973.3333333334, ans=0.0 2023-12-23 06:25:13,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=996040.0, ans=0.125 2023-12-23 06:25:13,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=996040.0, ans=0.125 2023-12-23 06:25:42,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.11 vs. limit=12.0 2023-12-23 06:25:56,155 INFO [train.py:886] (0/4) Epoch 32, batch 1700, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4951673.23 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:26:09,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=996373.3333333334, ans=0.0 2023-12-23 06:26:13,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.98 vs. limit=15.0 2023-12-23 06:26:16,298 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.047e+01 3.316e+01 3.484e+01 3.622e+01 4.382e+01, threshold=6.969e+01, percent-clipped=0.0 2023-12-23 06:26:19,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=996440.0, ans=0.0 2023-12-23 06:26:32,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=996506.6666666666, ans=0.5 2023-12-23 06:26:46,893 INFO [train.py:886] (0/4) Epoch 32, batch 1750, loss[loss=0.01327, audio_tagging_loss=0.01327, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4953640.73 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:26:50,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=996640.0, ans=0.2 2023-12-23 06:27:08,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=996773.3333333334, ans=0.035 2023-12-23 06:27:08,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=996773.3333333334, ans=0.1 2023-12-23 06:27:22,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=996840.0, ans=0.1 2023-12-23 06:27:29,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=996906.6666666666, ans=0.0 2023-12-23 06:27:33,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=996906.6666666666, ans=0.2 2023-12-23 06:27:34,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=996906.6666666666, ans=0.0 2023-12-23 06:27:40,116 INFO [train.py:886] (0/4) Epoch 32, batch 1800, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4956888.05 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:27:41,273 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:27:54,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=997040.0, ans=0.0 2023-12-23 06:27:59,177 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.973e+01 3.314e+01 3.468e+01 3.623e+01 4.214e+01, threshold=6.936e+01, percent-clipped=0.0 2023-12-23 06:27:59,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=997106.6666666666, ans=0.125 2023-12-23 06:28:00,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=997106.6666666666, ans=0.0 2023-12-23 06:28:13,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=997173.3333333334, ans=0.0 2023-12-23 06:28:29,599 INFO [train.py:886] (0/4) Epoch 32, batch 1850, loss[loss=0.01422, audio_tagging_loss=0.01422, over 24750.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4958877.54 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:28:38,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=997306.6666666666, ans=0.125 2023-12-23 06:29:11,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.07 vs. limit=10.0 2023-12-23 06:29:19,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=997573.3333333334, ans=0.125 2023-12-23 06:29:22,519 INFO [train.py:886] (0/4) Epoch 32, batch 1900, loss[loss=0.01335, audio_tagging_loss=0.01335, over 24750.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4949422.18 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:29:42,701 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.010e+01 3.383e+01 3.531e+01 3.681e+01 4.206e+01, threshold=7.062e+01, percent-clipped=0.0 2023-12-23 06:29:58,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=997840.0, ans=0.125 2023-12-23 06:29:58,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2023-12-23 06:30:13,312 INFO [train.py:886] (0/4) Epoch 32, batch 1950, loss[loss=0.01308, audio_tagging_loss=0.01308, over 25000.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4938974.15 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:30:23,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 2023-12-23 06:30:52,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=998173.3333333334, ans=0.125 2023-12-23 06:31:03,196 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:31:04,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=998306.6666666666, ans=0.0 2023-12-23 06:31:04,961 INFO [train.py:886] (0/4) Epoch 32, batch 2000, loss[loss=0.01107, audio_tagging_loss=0.01107, over 24750.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4945811.05 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:31:14,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=998373.3333333334, ans=0.0 2023-12-23 06:31:26,068 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.968e+01 3.338e+01 3.486e+01 3.657e+01 4.428e+01, threshold=6.972e+01, percent-clipped=0.0 2023-12-23 06:31:30,416 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2023-12-23 06:31:56,443 INFO [train.py:886] (0/4) Epoch 32, batch 2050, loss[loss=0.01011, audio_tagging_loss=0.01011, over 24750.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4950213.40 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:32:00,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=998640.0, ans=15.0 2023-12-23 06:32:30,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=998840.0, ans=0.0 2023-12-23 06:32:32,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.92 vs. limit=10.0 2023-12-23 06:32:46,712 INFO [train.py:886] (0/4) Epoch 32, batch 2100, loss[loss=0.01312, audio_tagging_loss=0.01312, over 25000.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4954612.42 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:32:51,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=998973.3333333334, ans=0.2 2023-12-23 06:33:01,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=999040.0, ans=0.0 2023-12-23 06:33:07,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=999106.6666666666, ans=0.125 2023-12-23 06:33:08,551 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.022e+01 3.328e+01 3.487e+01 3.635e+01 4.227e+01, threshold=6.974e+01, percent-clipped=0.0 2023-12-23 06:33:30,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=999240.0, ans=0.1 2023-12-23 06:33:37,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=999240.0, ans=0.125 2023-12-23 06:33:39,871 INFO [train.py:886] (0/4) Epoch 32, batch 2150, loss[loss=0.01005, audio_tagging_loss=0.01005, over 24071.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4958181.62 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:33:40,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.62 vs. limit=22.5 2023-12-23 06:33:57,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=999373.3333333334, ans=0.2 2023-12-23 06:34:05,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=999440.0, ans=0.1 2023-12-23 06:34:31,148 INFO [train.py:886] (0/4) Epoch 32, batch 2200, loss[loss=0.01515, audio_tagging_loss=0.01515, over 24750.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4951810.36 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:34:39,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=999640.0, ans=0.0 2023-12-23 06:34:39,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=999640.0, ans=0.125 2023-12-23 06:34:51,672 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.973e+01 3.381e+01 3.492e+01 3.667e+01 4.618e+01, threshold=6.983e+01, percent-clipped=0.0 2023-12-23 06:34:57,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=999773.3333333334, ans=0.2 2023-12-23 06:35:09,226 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:35:14,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-12-23 06:35:22,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=999973.3333333334, ans=0.0 2023-12-23 06:35:22,728 INFO [train.py:886] (0/4) Epoch 32, batch 2250, loss[loss=0.01209, audio_tagging_loss=0.01209, over 25000.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4941670.00 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:35:24,824 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:35:49,649 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:35:57,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1000173.3333333334, ans=0.125 2023-12-23 06:35:58,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1000173.3333333334, ans=0.1 2023-12-23 06:36:15,249 INFO [train.py:886] (0/4) Epoch 32, batch 2300, loss[loss=0.01166, audio_tagging_loss=0.01166, over 24750.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4946450.48 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:36:24,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.37 vs. limit=6.0 2023-12-23 06:36:35,437 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.075e+01 3.334e+01 3.452e+01 3.612e+01 4.472e+01, threshold=6.904e+01, percent-clipped=0.0 2023-12-23 06:36:37,897 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2023-12-23 06:36:53,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1000506.6666666666, ans=0.0 2023-12-23 06:37:00,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1000573.3333333334, ans=0.2 2023-12-23 06:37:06,010 INFO [train.py:886] (0/4) Epoch 32, batch 2350, loss[loss=0.01083, audio_tagging_loss=0.01083, over 25000.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4946680.05 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:37:08,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1000640.0, ans=0.1 2023-12-23 06:37:20,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1000706.6666666666, ans=0.0 2023-12-23 06:37:25,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1000706.6666666666, ans=0.1 2023-12-23 06:37:28,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1000773.3333333334, ans=0.0 2023-12-23 06:37:34,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1000773.3333333334, ans=0.125 2023-12-23 06:37:35,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2023-12-23 06:37:48,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1000906.6666666666, ans=0.125 2023-12-23 06:37:51,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1000906.6666666666, ans=0.0 2023-12-23 06:37:55,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1000906.6666666666, ans=0.125 2023-12-23 06:37:57,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1000973.3333333334, ans=0.07 2023-12-23 06:37:58,430 INFO [train.py:886] (0/4) Epoch 32, batch 2400, loss[loss=0.01259, audio_tagging_loss=0.01259, over 25000.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4948047.09 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:38:01,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1000973.3333333334, ans=0.125 2023-12-23 06:38:03,405 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:38:05,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2023-12-23 06:38:19,327 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.045e+01 3.328e+01 3.460e+01 3.635e+01 4.169e+01, threshold=6.920e+01, percent-clipped=0.0 2023-12-23 06:38:19,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1001106.6666666666, ans=0.125 2023-12-23 06:38:31,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1001173.3333333334, ans=0.015 2023-12-23 06:38:47,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1001240.0, ans=0.125 2023-12-23 06:38:50,369 INFO [train.py:886] (0/4) Epoch 32, batch 2450, loss[loss=0.01095, audio_tagging_loss=0.01095, over 24050.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4949195.05 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:38:53,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.63 vs. limit=22.5 2023-12-23 06:39:11,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.89 vs. limit=15.0 2023-12-23 06:39:35,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1001573.3333333334, ans=0.125 2023-12-23 06:39:41,479 INFO [train.py:886] (0/4) Epoch 32, batch 2500, loss[loss=0.01257, audio_tagging_loss=0.01257, over 24750.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4944822.03 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:40:02,500 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.169e+01 3.349e+01 3.493e+01 3.683e+01 6.550e+01, threshold=6.987e+01, percent-clipped=0.0 2023-12-23 06:40:08,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1001773.3333333334, ans=0.2 2023-12-23 06:40:09,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1001773.3333333334, ans=0.1 2023-12-23 06:40:16,449 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:40:17,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1001840.0, ans=0.125 2023-12-23 06:40:20,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1001840.0, ans=10.0 2023-12-23 06:40:25,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2023-12-23 06:40:33,437 INFO [train.py:886] (0/4) Epoch 32, batch 2550, loss[loss=0.01298, audio_tagging_loss=0.01298, over 23260.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4939972.09 frames. ], batch size: 107, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:41:01,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1002106.6666666666, ans=0.1 2023-12-23 06:41:17,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1002240.0, ans=0.125 2023-12-23 06:41:25,726 INFO [train.py:886] (0/4) Epoch 32, batch 2600, loss[loss=0.01131, audio_tagging_loss=0.01131, over 24750.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4941963.13 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:41:45,339 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.988e+01 3.350e+01 3.516e+01 3.667e+01 4.402e+01, threshold=7.033e+01, percent-clipped=0.0 2023-12-23 06:42:01,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1002506.6666666666, ans=0.125 2023-12-23 06:42:12,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1002573.3333333334, ans=0.1 2023-12-23 06:42:14,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.56 vs. limit=22.5 2023-12-23 06:42:16,566 INFO [train.py:886] (0/4) Epoch 32, batch 2650, loss[loss=0.01137, audio_tagging_loss=0.01137, over 24750.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4943426.93 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:42:24,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1002640.0, ans=0.125 2023-12-23 06:42:53,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1002840.0, ans=0.0 2023-12-23 06:43:07,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.28 vs. limit=8.0 2023-12-23 06:43:10,030 INFO [train.py:886] (0/4) Epoch 32, batch 2700, loss[loss=0.01045, audio_tagging_loss=0.01045, over 24750.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4951583.60 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:43:16,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1002973.3333333334, ans=0.125 2023-12-23 06:43:20,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1003040.0, ans=0.5 2023-12-23 06:43:29,820 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.987e+01 3.300e+01 3.397e+01 3.576e+01 4.184e+01, threshold=6.794e+01, percent-clipped=0.0 2023-12-23 06:43:32,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1003106.6666666666, ans=0.125 2023-12-23 06:43:34,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1003106.6666666666, ans=0.125 2023-12-23 06:43:38,276 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2023-12-23 06:43:39,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1003106.6666666666, ans=0.125 2023-12-23 06:43:54,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1003240.0, ans=0.125 2023-12-23 06:44:01,207 INFO [train.py:886] (0/4) Epoch 32, batch 2750, loss[loss=0.01292, audio_tagging_loss=0.01292, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4954076.29 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:44:18,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1003373.3333333334, ans=0.125 2023-12-23 06:44:29,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-12-23 06:44:32,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1003506.6666666666, ans=0.1 2023-12-23 06:44:53,707 INFO [train.py:886] (0/4) Epoch 32, batch 2800, loss[loss=0.0129, audio_tagging_loss=0.0129, over 24750.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4953945.71 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:45:14,669 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.950e+01 3.316e+01 3.538e+01 3.680e+01 4.583e+01, threshold=7.076e+01, percent-clipped=0.0 2023-12-23 06:45:17,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1003773.3333333334, ans=0.0 2023-12-23 06:45:28,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1003840.0, ans=0.125 2023-12-23 06:45:32,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1003840.0, ans=0.125 2023-12-23 06:45:37,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2023-12-23 06:45:46,243 INFO [train.py:886] (0/4) Epoch 32, batch 2850, loss[loss=0.01394, audio_tagging_loss=0.01394, over 24750.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4947853.98 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:45:52,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1003973.3333333334, ans=0.1 2023-12-23 06:45:52,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1003973.3333333334, ans=0.1 2023-12-23 06:45:57,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1004040.0, ans=0.07 2023-12-23 06:46:07,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1004106.6666666666, ans=0.125 2023-12-23 06:46:11,702 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:46:15,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1004106.6666666666, ans=0.125 2023-12-23 06:46:21,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.72 vs. limit=22.5 2023-12-23 06:46:27,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1004240.0, ans=0.125 2023-12-23 06:46:35,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1004240.0, ans=0.125 2023-12-23 06:46:37,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=15.0 2023-12-23 06:46:37,740 INFO [train.py:886] (0/4) Epoch 32, batch 2900, loss[loss=0.0119, audio_tagging_loss=0.0119, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4939257.82 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:46:59,934 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.913e+01 3.284e+01 3.453e+01 3.590e+01 5.160e+01, threshold=6.905e+01, percent-clipped=0.0 2023-12-23 06:47:00,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1004440.0, ans=0.1 2023-12-23 06:47:04,990 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.626e-03 2023-12-23 06:47:11,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.24 vs. limit=15.0 2023-12-23 06:47:25,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1004573.3333333334, ans=0.0 2023-12-23 06:47:30,608 INFO [train.py:886] (0/4) Epoch 32, batch 2950, loss[loss=0.01265, audio_tagging_loss=0.01265, over 25000.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4944250.73 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:47:44,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1004706.6666666666, ans=0.125 2023-12-23 06:47:49,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1004773.3333333334, ans=0.1 2023-12-23 06:47:53,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1004773.3333333334, ans=0.0 2023-12-23 06:47:58,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1004773.3333333334, ans=0.125 2023-12-23 06:48:01,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1004840.0, ans=0.1 2023-12-23 06:48:02,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1004840.0, ans=0.125 2023-12-23 06:48:10,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1004906.6666666666, ans=0.125 2023-12-23 06:48:16,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1004906.6666666666, ans=0.2 2023-12-23 06:48:19,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1004973.3333333334, ans=0.0 2023-12-23 06:48:20,537 INFO [train.py:886] (0/4) Epoch 32, batch 3000, loss[loss=0.01469, audio_tagging_loss=0.01469, over 25000.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4942114.86 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:48:20,539 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 06:48:41,461 INFO [train.py:917] (0/4) Epoch 32, validation: loss=0.03345, audio_tagging_loss=0.03345, over 3737520.00 frames. 2023-12-23 06:48:41,461 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 06:48:42,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=12.0 2023-12-23 06:48:46,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1004973.3333333334, ans=0.07 2023-12-23 06:48:54,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1005040.0, ans=0.0 2023-12-23 06:48:58,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1005040.0, ans=0.0 2023-12-23 06:49:02,380 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.007e+01 3.356e+01 3.479e+01 3.651e+01 4.265e+01, threshold=6.959e+01, percent-clipped=0.0 2023-12-23 06:49:04,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1005106.6666666666, ans=0.5 2023-12-23 06:49:06,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.23 vs. limit=10.0 2023-12-23 06:49:07,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1005106.6666666666, ans=0.0 2023-12-23 06:49:14,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1005173.3333333334, ans=0.125 2023-12-23 06:49:26,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1005240.0, ans=0.125 2023-12-23 06:49:30,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1005240.0, ans=0.125 2023-12-23 06:49:33,528 INFO [train.py:886] (0/4) Epoch 32, batch 3050, loss[loss=0.01179, audio_tagging_loss=0.01179, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4945492.86 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:49:39,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1005306.6666666666, ans=0.2 2023-12-23 06:49:41,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=15.0 2023-12-23 06:49:59,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1005440.0, ans=0.0 2023-12-23 06:50:11,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1005506.6666666666, ans=0.125 2023-12-23 06:50:13,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1005573.3333333334, ans=0.1 2023-12-23 06:50:24,655 INFO [train.py:886] (0/4) Epoch 32, batch 3100, loss[loss=0.01063, audio_tagging_loss=0.01063, over 24750.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4951599.29 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:50:43,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1005706.6666666666, ans=0.0 2023-12-23 06:50:45,087 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.066e+01 3.353e+01 3.474e+01 3.650e+01 4.005e+01, threshold=6.948e+01, percent-clipped=0.0 2023-12-23 06:50:45,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.71 vs. limit=10.0 2023-12-23 06:50:49,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1005773.3333333334, ans=0.125 2023-12-23 06:50:49,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1005773.3333333334, ans=0.0 2023-12-23 06:50:54,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1005840.0, ans=0.5 2023-12-23 06:51:00,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1005840.0, ans=0.125 2023-12-23 06:51:08,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1005906.6666666666, ans=0.0 2023-12-23 06:51:12,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2023-12-23 06:51:16,240 INFO [train.py:886] (0/4) Epoch 32, batch 3150, loss[loss=0.01207, audio_tagging_loss=0.01207, over 24750.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4948460.82 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:51:19,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1005973.3333333334, ans=0.125 2023-12-23 06:51:25,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1006040.0, ans=0.125 2023-12-23 06:51:27,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1006040.0, ans=0.125 2023-12-23 06:51:28,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1006040.0, ans=0.1 2023-12-23 06:51:49,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2023-12-23 06:51:49,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.52 vs. limit=12.0 2023-12-23 06:52:01,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1006240.0, ans=0.0 2023-12-23 06:52:04,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1006240.0, ans=0.125 2023-12-23 06:52:09,055 INFO [train.py:886] (0/4) Epoch 32, batch 3200, loss[loss=0.01253, audio_tagging_loss=0.01253, over 24750.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4945865.37 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:52:14,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=1006306.6666666666, ans=0.2 2023-12-23 06:52:16,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1006306.6666666666, ans=0.125 2023-12-23 06:52:20,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1006373.3333333334, ans=0.1 2023-12-23 06:52:24,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.86 vs. limit=22.5 2023-12-23 06:52:28,450 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.025e+01 3.282e+01 3.455e+01 3.590e+01 4.189e+01, threshold=6.910e+01, percent-clipped=0.0 2023-12-23 06:52:40,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-12-23 06:52:51,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.59 vs. limit=22.5 2023-12-23 06:52:54,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1006573.3333333334, ans=0.07 2023-12-23 06:52:54,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1006573.3333333334, ans=0.0 2023-12-23 06:53:00,205 INFO [train.py:886] (0/4) Epoch 32, batch 3250, loss[loss=0.01184, audio_tagging_loss=0.01184, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4949607.26 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:53:19,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1006706.6666666666, ans=0.0 2023-12-23 06:53:31,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1006840.0, ans=0.0 2023-12-23 06:53:42,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1006906.6666666666, ans=0.0 2023-12-23 06:53:52,663 INFO [train.py:886] (0/4) Epoch 32, batch 3300, loss[loss=0.01298, audio_tagging_loss=0.01298, over 25000.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4950112.94 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:53:55,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-12-23 06:54:00,796 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=15.0 2023-12-23 06:54:06,133 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:54:06,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.85 vs. limit=22.5 2023-12-23 06:54:13,383 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.804e+01 3.291e+01 3.461e+01 3.674e+01 4.146e+01, threshold=6.923e+01, percent-clipped=0.0 2023-12-23 06:54:16,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1007106.6666666666, ans=22.5 2023-12-23 06:54:23,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1007173.3333333334, ans=0.0 2023-12-23 06:54:27,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.58 vs. limit=10.0 2023-12-23 06:54:44,515 INFO [train.py:886] (0/4) Epoch 32, batch 3350, loss[loss=0.01036, audio_tagging_loss=0.01036, over 25000.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4947435.77 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:54:49,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1007306.6666666666, ans=0.125 2023-12-23 06:54:57,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1007373.3333333334, ans=0.125 2023-12-23 06:55:04,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1007440.0, ans=0.0 2023-12-23 06:55:28,309 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.90 vs. limit=15.0 2023-12-23 06:55:36,289 INFO [train.py:886] (0/4) Epoch 32, batch 3400, loss[loss=0.01093, audio_tagging_loss=0.01093, over 25000.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4947443.80 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:55:57,397 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.946e+01 3.386e+01 3.545e+01 3.714e+01 5.112e+01, threshold=7.090e+01, percent-clipped=0.0 2023-12-23 06:56:06,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1007840.0, ans=0.125 2023-12-23 06:56:12,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1007840.0, ans=0.1 2023-12-23 06:56:24,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1007906.6666666666, ans=0.1 2023-12-23 06:56:28,832 INFO [train.py:886] (0/4) Epoch 32, batch 3450, loss[loss=0.01317, audio_tagging_loss=0.01317, over 24750.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4942051.63 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 64.0 2023-12-23 06:56:42,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1008040.0, ans=0.125 2023-12-23 06:56:57,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1008106.6666666666, ans=0.125 2023-12-23 06:57:11,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1008240.0, ans=0.5 2023-12-23 06:57:15,529 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2023-12-23 06:57:20,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=12.0 2023-12-23 06:57:20,548 INFO [train.py:886] (0/4) Epoch 32, batch 3500, loss[loss=0.01101, audio_tagging_loss=0.01101, over 25000.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4936349.46 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 64.0 2023-12-23 06:57:23,608 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2023-12-23 06:57:25,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1008306.6666666666, ans=10.0 2023-12-23 06:57:42,417 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.074e+01 3.340e+01 3.505e+01 3.678e+01 3.978e+01, threshold=7.011e+01, percent-clipped=0.0 2023-12-23 06:57:47,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1008440.0, ans=0.1 2023-12-23 06:57:47,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1008440.0, ans=0.125 2023-12-23 06:58:06,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1008573.3333333334, ans=0.125 2023-12-23 06:58:12,680 INFO [train.py:886] (0/4) Epoch 32, batch 3550, loss[loss=0.0116, audio_tagging_loss=0.0116, over 25000.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4935013.28 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 06:58:16,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1008640.0, ans=0.0 2023-12-23 06:58:20,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1008640.0, ans=0.125 2023-12-23 06:58:32,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1008706.6666666666, ans=0.1 2023-12-23 06:58:38,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1008773.3333333334, ans=0.0 2023-12-23 06:58:48,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1008840.0, ans=0.0 2023-12-23 06:59:02,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=15.0 2023-12-23 06:59:05,013 INFO [train.py:886] (0/4) Epoch 32, batch 3600, loss[loss=0.01214, audio_tagging_loss=0.01214, over 24750.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4939189.26 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 06:59:15,647 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:59:25,496 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.038e+01 3.316e+01 3.432e+01 3.603e+01 4.151e+01, threshold=6.864e+01, percent-clipped=0.0 2023-12-23 06:59:48,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1009240.0, ans=0.1 2023-12-23 06:59:51,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-12-23 06:59:54,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1009240.0, ans=0.0 2023-12-23 06:59:55,906 INFO [train.py:886] (0/4) Epoch 32, batch 3650, loss[loss=0.01197, audio_tagging_loss=0.01197, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4942904.45 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 06:59:57,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1009306.6666666666, ans=0.0 2023-12-23 07:00:00,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1009306.6666666666, ans=0.125 2023-12-23 07:00:09,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=1009373.3333333334, ans=0.2 2023-12-23 07:00:11,151 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.77 vs. limit=6.0 2023-12-23 07:00:17,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1009440.0, ans=0.1 2023-12-23 07:00:30,725 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2023-12-23 07:00:31,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1009506.6666666666, ans=0.04949747468305833 2023-12-23 07:00:44,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=12.0 2023-12-23 07:00:48,413 INFO [train.py:886] (0/4) Epoch 32, batch 3700, loss[loss=0.01192, audio_tagging_loss=0.01192, over 25000.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4946313.19 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:00:52,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1009640.0, ans=0.125 2023-12-23 07:01:07,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1009706.6666666666, ans=0.0 2023-12-23 07:01:10,308 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.017e+01 3.377e+01 3.513e+01 3.629e+01 4.104e+01, threshold=7.025e+01, percent-clipped=0.0 2023-12-23 07:01:12,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1009773.3333333334, ans=0.125 2023-12-23 07:01:19,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.99 vs. limit=22.5 2023-12-23 07:01:39,868 INFO [train.py:886] (0/4) Epoch 32, batch 3750, loss[loss=0.01446, audio_tagging_loss=0.01446, over 24750.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4946659.81 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:01:40,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1009973.3333333334, ans=0.2 2023-12-23 07:01:59,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1010106.6666666666, ans=0.0 2023-12-23 07:02:04,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1010106.6666666666, ans=0.05 2023-12-23 07:02:05,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1010106.6666666666, ans=0.2 2023-12-23 07:02:11,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1010173.3333333334, ans=0.125 2023-12-23 07:02:15,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1010173.3333333334, ans=0.04949747468305833 2023-12-23 07:02:19,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1010173.3333333334, ans=0.0 2023-12-23 07:02:30,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1010240.0, ans=0.0 2023-12-23 07:02:31,698 INFO [train.py:886] (0/4) Epoch 32, batch 3800, loss[loss=0.01508, audio_tagging_loss=0.01508, over 25000.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4946225.23 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:02:45,365 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.409e-02 2023-12-23 07:02:49,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1010373.3333333334, ans=0.1 2023-12-23 07:02:54,395 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.001e+01 3.373e+01 3.530e+01 3.741e+01 4.683e+01, threshold=7.061e+01, percent-clipped=0.0 2023-12-23 07:03:04,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2023-12-23 07:03:18,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1010573.3333333334, ans=0.125 2023-12-23 07:03:24,690 INFO [train.py:886] (0/4) Epoch 32, batch 3850, loss[loss=0.01125, audio_tagging_loss=0.01125, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4943507.67 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:03:37,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1010706.6666666666, ans=0.125 2023-12-23 07:03:53,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2023-12-23 07:03:54,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1010773.3333333334, ans=0.125 2023-12-23 07:04:00,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.31 vs. limit=15.0 2023-12-23 07:04:05,501 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.87 vs. limit=10.0 2023-12-23 07:04:06,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1010906.6666666666, ans=10.0 2023-12-23 07:04:14,735 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2023-12-23 07:04:15,170 INFO [train.py:886] (0/4) Epoch 32, batch 3900, loss[loss=0.01162, audio_tagging_loss=0.01162, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4944369.50 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:04:17,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1010973.3333333334, ans=0.125 2023-12-23 07:04:22,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1010973.3333333334, ans=0.125 2023-12-23 07:04:29,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1011040.0, ans=0.025 2023-12-23 07:04:36,353 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.088e+01 3.332e+01 3.481e+01 3.670e+01 4.273e+01, threshold=6.961e+01, percent-clipped=0.0 2023-12-23 07:04:52,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1011173.3333333334, ans=0.125 2023-12-23 07:05:04,713 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=12.0 2023-12-23 07:05:06,153 INFO [train.py:886] (0/4) Epoch 32, batch 3950, loss[loss=0.01113, audio_tagging_loss=0.01113, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4950842.43 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:05:35,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1011440.0, ans=0.125 2023-12-23 07:05:58,512 INFO [train.py:886] (0/4) Epoch 32, batch 4000, loss[loss=0.009743, audio_tagging_loss=0.009743, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4945992.19 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:06:19,223 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.094e+01 3.350e+01 3.494e+01 3.630e+01 4.145e+01, threshold=6.988e+01, percent-clipped=0.0 2023-12-23 07:06:22,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1011773.3333333334, ans=0.125 2023-12-23 07:06:44,683 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:06:49,333 INFO [train.py:886] (0/4) Epoch 32, batch 4050, loss[loss=0.01307, audio_tagging_loss=0.01307, over 21617.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4946894.23 frames. ], batch size: 107, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:06:53,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1011973.3333333334, ans=0.1 2023-12-23 07:07:20,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2023-12-23 07:07:41,198 INFO [train.py:886] (0/4) Epoch 32, batch 4100, loss[loss=0.01171, audio_tagging_loss=0.01171, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4941890.49 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:07:41,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1012306.6666666666, ans=0.1 2023-12-23 07:07:44,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1012306.6666666666, ans=0.1 2023-12-23 07:08:02,285 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.038e+01 3.388e+01 3.515e+01 3.658e+01 4.060e+01, threshold=7.030e+01, percent-clipped=0.0 2023-12-23 07:08:26,043 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2023-12-23 07:08:32,029 INFO [train.py:886] (0/4) Epoch 32, batch 4150, loss[loss=0.01018, audio_tagging_loss=0.01018, over 22613.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4932645.10 frames. ], batch size: 107, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:08:36,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2023-12-23 07:09:08,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1012840.0, ans=0.0 2023-12-23 07:09:18,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1012906.6666666666, ans=0.125 2023-12-23 07:09:20,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1012906.6666666666, ans=0.2 2023-12-23 07:09:24,408 INFO [train.py:886] (0/4) Epoch 32, batch 4200, loss[loss=0.01308, audio_tagging_loss=0.01308, over 24750.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4938678.02 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:09:47,513 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.997e+01 3.331e+01 3.526e+01 3.677e+01 4.548e+01, threshold=7.052e+01, percent-clipped=0.0 2023-12-23 07:09:49,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1013106.6666666666, ans=0.125 2023-12-23 07:10:09,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1013240.0, ans=0.125 2023-12-23 07:10:17,390 INFO [train.py:886] (0/4) Epoch 32, batch 4250, loss[loss=0.01234, audio_tagging_loss=0.01234, over 24750.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4943960.14 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:10:21,218 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-152000.pt 2023-12-23 07:10:40,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=1013440.0, ans=0.02 2023-12-23 07:10:54,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=12.0 2023-12-23 07:11:10,500 INFO [train.py:886] (0/4) Epoch 32, batch 4300, loss[loss=0.01396, audio_tagging_loss=0.01396, over 25000.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4951102.66 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:11:33,500 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.071e+01 3.371e+01 3.452e+01 3.602e+01 4.385e+01, threshold=6.904e+01, percent-clipped=0.0 2023-12-23 07:11:35,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1013773.3333333334, ans=0.125 2023-12-23 07:11:42,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1013840.0, ans=0.125 2023-12-23 07:11:54,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1013906.6666666666, ans=0.1 2023-12-23 07:12:00,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=22.5 2023-12-23 07:12:03,710 INFO [train.py:886] (0/4) Epoch 32, batch 4350, loss[loss=0.01074, audio_tagging_loss=0.01074, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4955641.88 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:12:11,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1013973.3333333334, ans=0.0 2023-12-23 07:12:41,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1014173.3333333334, ans=0.2 2023-12-23 07:12:52,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1014240.0, ans=0.125 2023-12-23 07:12:55,279 INFO [train.py:886] (0/4) Epoch 32, batch 4400, loss[loss=0.01029, audio_tagging_loss=0.01029, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4952358.24 frames. ], batch size: 99, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:13:03,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-12-23 07:13:09,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1014373.3333333334, ans=0.0 2023-12-23 07:13:16,349 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.989e+01 3.370e+01 3.596e+01 3.730e+01 4.489e+01, threshold=7.192e+01, percent-clipped=0.0 2023-12-23 07:13:30,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1014506.6666666666, ans=10.0 2023-12-23 07:13:34,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1014506.6666666666, ans=0.0 2023-12-23 07:13:34,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1014506.6666666666, ans=0.125 2023-12-23 07:13:39,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1014573.3333333334, ans=0.2 2023-12-23 07:13:39,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1014573.3333333334, ans=0.125 2023-12-23 07:13:46,508 INFO [train.py:886] (0/4) Epoch 32, batch 4450, loss[loss=0.01271, audio_tagging_loss=0.01271, over 24750.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4946823.54 frames. ], batch size: 99, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:13:46,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=1014640.0, ans=0.1 2023-12-23 07:13:52,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1014640.0, ans=0.125 2023-12-23 07:13:55,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1014640.0, ans=0.0 2023-12-23 07:13:56,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1014706.6666666666, ans=0.125 2023-12-23 07:13:59,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1014706.6666666666, ans=0.025 2023-12-23 07:14:00,517 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:14:00,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1014706.6666666666, ans=0.0 2023-12-23 07:14:02,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1014706.6666666666, ans=0.0 2023-12-23 07:14:12,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1014773.3333333334, ans=0.125 2023-12-23 07:14:38,861 INFO [train.py:886] (0/4) Epoch 32, batch 4500, loss[loss=0.01084, audio_tagging_loss=0.01084, over 24750.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4945314.08 frames. ], batch size: 99, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:14:44,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1014973.3333333334, ans=0.125 2023-12-23 07:14:49,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1015040.0, ans=15.0 2023-12-23 07:15:00,270 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.070e+01 3.361e+01 3.496e+01 3.729e+01 4.409e+01, threshold=6.991e+01, percent-clipped=0.0 2023-12-23 07:15:08,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1015106.6666666666, ans=0.2 2023-12-23 07:15:24,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1015240.0, ans=0.125 2023-12-23 07:15:30,776 INFO [train.py:886] (0/4) Epoch 32, batch 4550, loss[loss=0.009206, audio_tagging_loss=0.009206, over 24028.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4948272.44 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:15:33,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=15.0 2023-12-23 07:15:43,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1015373.3333333334, ans=0.0 2023-12-23 07:15:46,068 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.01 vs. limit=15.0 2023-12-23 07:16:03,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1015506.6666666666, ans=0.025 2023-12-23 07:16:15,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2023-12-23 07:16:23,684 INFO [train.py:886] (0/4) Epoch 32, batch 4600, loss[loss=0.01028, audio_tagging_loss=0.01028, over 25000.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4945751.65 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:16:29,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1015640.0, ans=0.0 2023-12-23 07:16:42,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1015706.6666666666, ans=0.125 2023-12-23 07:16:45,514 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.990e+01 3.339e+01 3.480e+01 3.659e+01 4.110e+01, threshold=6.960e+01, percent-clipped=0.0 2023-12-23 07:16:49,754 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=22.5 2023-12-23 07:16:54,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1015840.0, ans=0.125 2023-12-23 07:16:55,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1015840.0, ans=0.125 2023-12-23 07:16:56,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1015840.0, ans=0.0 2023-12-23 07:16:58,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1015840.0, ans=0.2 2023-12-23 07:17:15,907 INFO [train.py:886] (0/4) Epoch 32, batch 4650, loss[loss=0.00979, audio_tagging_loss=0.00979, over 24061.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4950536.79 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:17:25,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1016040.0, ans=0.125 2023-12-23 07:17:32,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1016040.0, ans=0.0 2023-12-23 07:17:32,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1016040.0, ans=0.125 2023-12-23 07:17:34,868 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:17:37,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1016106.6666666666, ans=0.1 2023-12-23 07:17:39,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1016106.6666666666, ans=0.125 2023-12-23 07:17:58,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1016240.0, ans=0.125 2023-12-23 07:18:04,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1016240.0, ans=0.125 2023-12-23 07:18:06,521 INFO [train.py:886] (0/4) Epoch 32, batch 4700, loss[loss=0.01184, audio_tagging_loss=0.01184, over 24750.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4945262.31 frames. ], batch size: 99, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:18:11,580 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2023-12-23 07:18:14,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1016306.6666666666, ans=0.0 2023-12-23 07:18:26,131 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.128e+01 3.383e+01 3.517e+01 3.637e+01 4.392e+01, threshold=7.033e+01, percent-clipped=0.0 2023-12-23 07:18:35,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1016506.6666666666, ans=0.125 2023-12-23 07:18:51,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1016573.3333333334, ans=0.2 2023-12-23 07:18:53,553 INFO [train.py:886] (0/4) Epoch 32, batch 4750, loss[loss=0.01353, audio_tagging_loss=0.01353, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4942169.98 frames. ], batch size: 99, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:18:55,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2023-12-23 07:19:05,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1016706.6666666666, ans=0.025 2023-12-23 07:19:09,276 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-32.pt 2023-12-23 07:19:28,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1016746.6666666666, ans=0.1 2023-12-23 07:19:29,347 INFO [train.py:886] (0/4) Epoch 33, batch 0, loss[loss=0.03335, audio_tagging_loss=0.03335, over 19956.00 frames. ], tot_loss[loss=0.03335, audio_tagging_loss=0.03335, over 19956.00 frames. ], batch size: 107, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:19:29,348 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 07:19:50,858 INFO [train.py:917] (0/4) Epoch 33, validation: loss=0.03278, audio_tagging_loss=0.03278, over 3737520.00 frames. 2023-12-23 07:19:50,859 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 07:20:08,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1016813.3333333334, ans=0.125 2023-12-23 07:20:13,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1016880.0, ans=0.125 2023-12-23 07:20:20,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1016946.6666666666, ans=0.07 2023-12-23 07:20:31,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1017013.3333333334, ans=0.0 2023-12-23 07:20:39,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1017013.3333333334, ans=0.1 2023-12-23 07:20:41,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1017080.0, ans=0.0 2023-12-23 07:20:41,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2023-12-23 07:20:42,063 INFO [train.py:886] (0/4) Epoch 33, batch 50, loss[loss=0.01619, audio_tagging_loss=0.01619, over 25000.00 frames. ], tot_loss[loss=0.01971, audio_tagging_loss=0.01971, over 1114363.24 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:20:46,768 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.141e+01 3.573e+01 4.216e+01 4.740e+01 9.407e+01, threshold=8.432e+01, percent-clipped=7.0 2023-12-23 07:20:49,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1017080.0, ans=0.0 2023-12-23 07:21:00,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2023-12-23 07:21:04,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=1017213.3333333334, ans=12.0 2023-12-23 07:21:20,865 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.22 vs. limit=15.0 2023-12-23 07:21:34,669 INFO [train.py:886] (0/4) Epoch 33, batch 100, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.01718, audio_tagging_loss=0.01718, over 1964274.48 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:21:38,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1017413.3333333334, ans=0.125 2023-12-23 07:21:44,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1017480.0, ans=0.125 2023-12-23 07:21:44,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1017480.0, ans=0.125 2023-12-23 07:21:45,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1017480.0, ans=0.125 2023-12-23 07:21:48,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1017480.0, ans=0.0 2023-12-23 07:21:48,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1017480.0, ans=0.125 2023-12-23 07:21:51,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.96 vs. limit=6.0 2023-12-23 07:21:57,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2023-12-23 07:22:18,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.74 vs. limit=15.0 2023-12-23 07:22:24,855 INFO [train.py:886] (0/4) Epoch 33, batch 150, loss[loss=0.0123, audio_tagging_loss=0.0123, over 24750.00 frames. ], tot_loss[loss=0.01564, audio_tagging_loss=0.01564, over 2628493.65 frames. ], batch size: 99, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:22:30,333 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.203e+01 3.570e+01 3.774e+01 4.009e+01 4.712e+01, threshold=7.548e+01, percent-clipped=0.0 2023-12-23 07:22:39,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1017813.3333333334, ans=0.1 2023-12-23 07:22:40,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1017813.3333333334, ans=0.0 2023-12-23 07:22:53,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1017880.0, ans=0.125 2023-12-23 07:23:09,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1018013.3333333334, ans=0.125 2023-12-23 07:23:16,426 INFO [train.py:886] (0/4) Epoch 33, batch 200, loss[loss=0.01109, audio_tagging_loss=0.01109, over 25000.00 frames. ], tot_loss[loss=0.01464, audio_tagging_loss=0.01464, over 3147883.68 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:23:23,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1018080.0, ans=0.0 2023-12-23 07:23:24,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1018080.0, ans=0.125 2023-12-23 07:23:58,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1018346.6666666666, ans=0.125 2023-12-23 07:24:02,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1018346.6666666666, ans=0.125 2023-12-23 07:24:03,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1018346.6666666666, ans=0.125 2023-12-23 07:24:07,456 INFO [train.py:886] (0/4) Epoch 33, batch 250, loss[loss=0.01431, audio_tagging_loss=0.01431, over 25000.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 3552031.61 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:24:12,231 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.107e+01 3.419e+01 3.522e+01 3.696e+01 4.416e+01, threshold=7.043e+01, percent-clipped=0.0 2023-12-23 07:24:15,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.31 vs. limit=15.0 2023-12-23 07:24:18,253 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.62 vs. limit=15.0 2023-12-23 07:24:20,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1018480.0, ans=0.125 2023-12-23 07:24:25,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1018480.0, ans=0.0 2023-12-23 07:24:29,354 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:24:37,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1018613.3333333334, ans=0.0 2023-12-23 07:24:40,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2023-12-23 07:24:42,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2023-12-23 07:24:45,154 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2023-12-23 07:24:58,546 INFO [train.py:886] (0/4) Epoch 33, batch 300, loss[loss=0.01105, audio_tagging_loss=0.01105, over 24750.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 3859051.73 frames. ], batch size: 99, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:25:04,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1018746.6666666666, ans=0.0 2023-12-23 07:25:06,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1018746.6666666666, ans=0.2 2023-12-23 07:25:06,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1018746.6666666666, ans=0.0 2023-12-23 07:25:12,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.70 vs. limit=10.0 2023-12-23 07:25:17,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1018813.3333333334, ans=0.1 2023-12-23 07:25:18,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1018880.0, ans=0.1 2023-12-23 07:25:25,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1018880.0, ans=0.125 2023-12-23 07:25:36,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1018946.6666666666, ans=0.125 2023-12-23 07:25:51,287 INFO [train.py:886] (0/4) Epoch 33, batch 350, loss[loss=0.01399, audio_tagging_loss=0.01399, over 24750.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4098453.87 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:25:53,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1019080.0, ans=0.035 2023-12-23 07:25:56,050 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.021e+01 3.335e+01 3.546e+01 3.715e+01 4.310e+01, threshold=7.092e+01, percent-clipped=0.0 2023-12-23 07:25:59,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1019080.0, ans=0.125 2023-12-23 07:26:00,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1019146.6666666666, ans=0.1 2023-12-23 07:26:11,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=22.5 2023-12-23 07:26:15,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.79 vs. limit=10.0 2023-12-23 07:26:23,253 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=8.0 2023-12-23 07:26:23,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1019280.0, ans=0.1 2023-12-23 07:26:33,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1019346.6666666666, ans=0.125 2023-12-23 07:26:42,790 INFO [train.py:886] (0/4) Epoch 33, batch 400, loss[loss=0.01413, audio_tagging_loss=0.01413, over 24750.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4285065.71 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:26:47,533 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:26:53,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1019480.0, ans=0.125 2023-12-23 07:26:55,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1019480.0, ans=0.2 2023-12-23 07:27:01,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1019480.0, ans=0.125 2023-12-23 07:27:01,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=1019480.0, ans=0.1 2023-12-23 07:27:02,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1019546.6666666666, ans=0.125 2023-12-23 07:27:25,992 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:27:29,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1019680.0, ans=0.0 2023-12-23 07:27:34,266 INFO [train.py:886] (0/4) Epoch 33, batch 450, loss[loss=0.01268, audio_tagging_loss=0.01268, over 25000.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4432983.67 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:27:38,960 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.932e+01 3.272e+01 3.469e+01 3.632e+01 4.131e+01, threshold=6.938e+01, percent-clipped=0.0 2023-12-23 07:27:39,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1019746.6666666666, ans=0.1 2023-12-23 07:28:05,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1019946.6666666666, ans=0.125 2023-12-23 07:28:10,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1019946.6666666666, ans=0.2 2023-12-23 07:28:14,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1020013.3333333334, ans=0.95 2023-12-23 07:28:15,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=15.0 2023-12-23 07:28:19,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1020013.3333333334, ans=0.1 2023-12-23 07:28:19,605 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2023-12-23 07:28:27,369 INFO [train.py:886] (0/4) Epoch 33, batch 500, loss[loss=0.01209, audio_tagging_loss=0.01209, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4553015.06 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:28:27,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1020080.0, ans=0.125 2023-12-23 07:29:03,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1020280.0, ans=0.125 2023-12-23 07:29:17,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=15.0 2023-12-23 07:29:17,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1020413.3333333334, ans=0.125 2023-12-23 07:29:18,728 INFO [train.py:886] (0/4) Epoch 33, batch 550, loss[loss=0.009686, audio_tagging_loss=0.009686, over 25000.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4645300.89 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:29:23,355 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.066e+01 3.338e+01 3.495e+01 3.646e+01 4.151e+01, threshold=6.991e+01, percent-clipped=0.0 2023-12-23 07:29:33,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1020480.0, ans=0.125 2023-12-23 07:29:33,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2023-12-23 07:29:39,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1020546.6666666666, ans=0.0 2023-12-23 07:29:50,671 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.72 vs. limit=6.0 2023-12-23 07:30:11,207 INFO [train.py:886] (0/4) Epoch 33, batch 600, loss[loss=0.01117, audio_tagging_loss=0.01117, over 25000.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4712569.88 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:30:12,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1020746.6666666666, ans=22.5 2023-12-23 07:30:16,121 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:30:26,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1020813.3333333334, ans=0.125 2023-12-23 07:30:31,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2023-12-23 07:30:40,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1020946.6666666666, ans=0.125 2023-12-23 07:30:54,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1021013.3333333334, ans=0.09899494936611666 2023-12-23 07:31:01,928 INFO [train.py:886] (0/4) Epoch 33, batch 650, loss[loss=0.01191, audio_tagging_loss=0.01191, over 24750.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4762919.55 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:31:06,119 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.24 vs. limit=22.5 2023-12-23 07:31:07,457 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.031e+01 3.397e+01 3.533e+01 3.690e+01 3.984e+01, threshold=7.067e+01, percent-clipped=0.0 2023-12-23 07:31:10,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1021080.0, ans=0.125 2023-12-23 07:31:19,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1021146.6666666666, ans=0.125 2023-12-23 07:31:43,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1021346.6666666666, ans=0.125 2023-12-23 07:31:45,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1021346.6666666666, ans=0.125 2023-12-23 07:31:54,006 INFO [train.py:886] (0/4) Epoch 33, batch 700, loss[loss=0.01336, audio_tagging_loss=0.01336, over 24750.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4804110.47 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:31:54,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1021413.3333333334, ans=0.025 2023-12-23 07:31:55,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1021413.3333333334, ans=0.125 2023-12-23 07:32:00,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1021413.3333333334, ans=0.0 2023-12-23 07:32:06,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1021480.0, ans=0.05 2023-12-23 07:32:16,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1021546.6666666666, ans=0.125 2023-12-23 07:32:28,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1021613.3333333334, ans=0.09899494936611666 2023-12-23 07:32:46,951 INFO [train.py:886] (0/4) Epoch 33, batch 750, loss[loss=0.01379, audio_tagging_loss=0.01379, over 25000.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4837927.67 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:32:49,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1021746.6666666666, ans=0.07 2023-12-23 07:32:51,660 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.077e+01 3.365e+01 3.500e+01 3.684e+01 4.096e+01, threshold=7.001e+01, percent-clipped=0.0 2023-12-23 07:33:10,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.41 vs. limit=12.0 2023-12-23 07:33:27,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1022013.3333333334, ans=0.1 2023-12-23 07:33:28,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1022013.3333333334, ans=0.0 2023-12-23 07:33:35,591 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:33:37,151 INFO [train.py:886] (0/4) Epoch 33, batch 800, loss[loss=0.01153, audio_tagging_loss=0.01153, over 24750.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4861576.06 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:33:53,207 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-12-23 07:34:08,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1022280.0, ans=0.0 2023-12-23 07:34:14,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1022280.0, ans=0.2 2023-12-23 07:34:19,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1022346.6666666666, ans=0.1 2023-12-23 07:34:30,563 INFO [train.py:886] (0/4) Epoch 33, batch 850, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4886458.77 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:34:35,227 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.933e+01 3.306e+01 3.423e+01 3.606e+01 5.967e+01, threshold=6.845e+01, percent-clipped=0.0 2023-12-23 07:34:46,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.88 vs. limit=15.0 2023-12-23 07:35:01,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1022613.3333333334, ans=0.0 2023-12-23 07:35:05,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1022613.3333333334, ans=0.0 2023-12-23 07:35:20,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1022746.6666666666, ans=0.0 2023-12-23 07:35:21,360 INFO [train.py:886] (0/4) Epoch 33, batch 900, loss[loss=0.01048, audio_tagging_loss=0.01048, over 24750.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4900986.90 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:35:27,041 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:35:27,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1022746.6666666666, ans=0.125 2023-12-23 07:35:31,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1022813.3333333334, ans=0.0 2023-12-23 07:35:41,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1022880.0, ans=0.0 2023-12-23 07:35:43,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1022880.0, ans=0.125 2023-12-23 07:36:05,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1023013.3333333334, ans=0.125 2023-12-23 07:36:11,459 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=15.0 2023-12-23 07:36:12,626 INFO [train.py:886] (0/4) Epoch 33, batch 950, loss[loss=0.0149, audio_tagging_loss=0.0149, over 24750.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4910215.59 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:36:17,321 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.098e+01 3.441e+01 3.582e+01 3.744e+01 4.324e+01, threshold=7.165e+01, percent-clipped=0.0 2023-12-23 07:36:43,673 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:36:49,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1023280.0, ans=0.05 2023-12-23 07:37:04,744 INFO [train.py:886] (0/4) Epoch 33, batch 1000, loss[loss=0.0102, audio_tagging_loss=0.0102, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4917494.91 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:37:17,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1023480.0, ans=0.125 2023-12-23 07:37:24,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1023546.6666666666, ans=0.0 2023-12-23 07:37:45,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.38 vs. limit=22.5 2023-12-23 07:37:55,588 INFO [train.py:886] (0/4) Epoch 33, batch 1050, loss[loss=0.01469, audio_tagging_loss=0.01469, over 25000.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4923592.90 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:38:00,265 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.049e+01 3.330e+01 3.500e+01 3.696e+01 4.249e+01, threshold=7.000e+01, percent-clipped=0.0 2023-12-23 07:38:37,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1024013.3333333334, ans=0.125 2023-12-23 07:38:47,378 INFO [train.py:886] (0/4) Epoch 33, batch 1100, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4931725.86 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:39:00,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1024146.6666666666, ans=0.0 2023-12-23 07:39:16,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1024280.0, ans=0.125 2023-12-23 07:39:18,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1024280.0, ans=0.1 2023-12-23 07:39:23,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1024280.0, ans=0.125 2023-12-23 07:39:32,333 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-12-23 07:39:36,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=12.0 2023-12-23 07:39:37,332 INFO [train.py:886] (0/4) Epoch 33, batch 1150, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4938025.13 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:39:42,735 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.007e+01 3.357e+01 3.502e+01 3.673e+01 4.162e+01, threshold=7.004e+01, percent-clipped=0.0 2023-12-23 07:39:59,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1024546.6666666666, ans=0.0 2023-12-23 07:40:00,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1024546.6666666666, ans=0.125 2023-12-23 07:40:26,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1024680.0, ans=0.0 2023-12-23 07:40:27,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2023-12-23 07:40:28,348 INFO [train.py:886] (0/4) Epoch 33, batch 1200, loss[loss=0.01169, audio_tagging_loss=0.01169, over 24750.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4945032.33 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:40:41,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1024813.3333333334, ans=0.125 2023-12-23 07:40:42,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1024813.3333333334, ans=0.0 2023-12-23 07:41:09,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1025013.3333333334, ans=0.125 2023-12-23 07:41:10,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1025013.3333333334, ans=0.125 2023-12-23 07:41:20,672 INFO [train.py:886] (0/4) Epoch 33, batch 1250, loss[loss=0.01117, audio_tagging_loss=0.01117, over 24750.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4941793.01 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:41:23,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2023-12-23 07:41:25,271 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.073e+01 3.391e+01 3.513e+01 3.731e+01 4.516e+01, threshold=7.026e+01, percent-clipped=0.0 2023-12-23 07:41:43,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1025213.3333333334, ans=0.1 2023-12-23 07:41:54,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1025280.0, ans=0.1 2023-12-23 07:42:06,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1025346.6666666666, ans=0.05 2023-12-23 07:42:11,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1025413.3333333334, ans=0.0 2023-12-23 07:42:12,425 INFO [train.py:886] (0/4) Epoch 33, batch 1300, loss[loss=0.01264, audio_tagging_loss=0.01264, over 22733.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4934345.27 frames. ], batch size: 107, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:42:23,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1025480.0, ans=0.5 2023-12-23 07:42:31,440 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:42:48,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1025613.3333333334, ans=0.125 2023-12-23 07:42:57,740 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.11 vs. limit=22.5 2023-12-23 07:42:58,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1025680.0, ans=0.125 2023-12-23 07:43:04,399 INFO [train.py:886] (0/4) Epoch 33, batch 1350, loss[loss=0.01512, audio_tagging_loss=0.01512, over 24028.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4934858.14 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:43:09,133 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.898e+01 3.394e+01 3.555e+01 3.712e+01 4.283e+01, threshold=7.109e+01, percent-clipped=0.0 2023-12-23 07:43:43,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1025946.6666666666, ans=0.1 2023-12-23 07:43:53,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1026013.3333333334, ans=0.125 2023-12-23 07:43:57,039 INFO [train.py:886] (0/4) Epoch 33, batch 1400, loss[loss=0.01191, audio_tagging_loss=0.01191, over 24750.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4935052.16 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:43:58,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1026080.0, ans=0.125 2023-12-23 07:43:59,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2023-12-23 07:44:24,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2023-12-23 07:44:47,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1026413.3333333334, ans=0.125 2023-12-23 07:44:48,410 INFO [train.py:886] (0/4) Epoch 33, batch 1450, loss[loss=0.01226, audio_tagging_loss=0.01226, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4943217.06 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:44:53,831 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.086e+01 3.346e+01 3.476e+01 3.616e+01 4.118e+01, threshold=6.952e+01, percent-clipped=0.0 2023-12-23 07:44:56,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1026413.3333333334, ans=0.1 2023-12-23 07:45:26,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-12-23 07:45:29,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1026680.0, ans=0.125 2023-12-23 07:45:40,598 INFO [train.py:886] (0/4) Epoch 33, batch 1500, loss[loss=0.01179, audio_tagging_loss=0.01179, over 24750.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4947591.84 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:46:14,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2023-12-23 07:46:18,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1026946.6666666666, ans=0.125 2023-12-23 07:46:25,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1027013.3333333334, ans=0.125 2023-12-23 07:46:29,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1027013.3333333334, ans=0.125 2023-12-23 07:46:31,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1027013.3333333334, ans=0.125 2023-12-23 07:46:32,820 INFO [train.py:886] (0/4) Epoch 33, batch 1550, loss[loss=0.01437, audio_tagging_loss=0.01437, over 24750.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4942832.64 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:46:34,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1027080.0, ans=0.125 2023-12-23 07:46:38,189 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.078e+01 3.419e+01 3.555e+01 3.697e+01 4.231e+01, threshold=7.109e+01, percent-clipped=0.0 2023-12-23 07:46:38,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1027080.0, ans=0.025 2023-12-23 07:46:39,666 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.55 vs. limit=22.5 2023-12-23 07:46:42,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1027146.6666666666, ans=0.1 2023-12-23 07:46:48,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1027146.6666666666, ans=0.0 2023-12-23 07:47:00,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1027213.3333333334, ans=0.0 2023-12-23 07:47:13,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1027346.6666666666, ans=0.0 2023-12-23 07:47:14,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1027346.6666666666, ans=0.07 2023-12-23 07:47:15,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1027346.6666666666, ans=0.125 2023-12-23 07:47:17,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.24 vs. limit=10.0 2023-12-23 07:47:22,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1027346.6666666666, ans=0.0 2023-12-23 07:47:23,710 INFO [train.py:886] (0/4) Epoch 33, batch 1600, loss[loss=0.0151, audio_tagging_loss=0.0151, over 24750.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4943128.49 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:47:31,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1027413.3333333334, ans=0.125 2023-12-23 07:47:45,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1027546.6666666666, ans=0.1 2023-12-23 07:47:58,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2023-12-23 07:48:06,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1027680.0, ans=0.0 2023-12-23 07:48:16,979 INFO [train.py:886] (0/4) Epoch 33, batch 1650, loss[loss=0.01287, audio_tagging_loss=0.01287, over 25000.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4945240.64 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:48:21,580 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.029e+01 3.394e+01 3.522e+01 3.682e+01 5.123e+01, threshold=7.045e+01, percent-clipped=0.0 2023-12-23 07:48:22,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1027746.6666666666, ans=0.125 2023-12-23 07:48:32,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1027813.3333333334, ans=0.125 2023-12-23 07:49:03,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1028013.3333333334, ans=0.1 2023-12-23 07:49:08,125 INFO [train.py:886] (0/4) Epoch 33, batch 1700, loss[loss=0.01173, audio_tagging_loss=0.01173, over 24031.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4948671.86 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:49:14,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1028080.0, ans=0.125 2023-12-23 07:49:21,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1028146.6666666666, ans=0.125 2023-12-23 07:49:30,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.54 vs. limit=15.0 2023-12-23 07:49:38,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1028280.0, ans=0.0 2023-12-23 07:49:59,790 INFO [train.py:886] (0/4) Epoch 33, batch 1750, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4946940.45 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:50:04,542 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.030e+01 3.329e+01 3.475e+01 3.627e+01 4.397e+01, threshold=6.950e+01, percent-clipped=0.0 2023-12-23 07:50:06,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1028413.3333333334, ans=0.125 2023-12-23 07:50:07,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1028413.3333333334, ans=0.125 2023-12-23 07:50:23,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1028546.6666666666, ans=0.95 2023-12-23 07:50:40,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1028680.0, ans=0.125 2023-12-23 07:50:49,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1028680.0, ans=0.1 2023-12-23 07:50:52,827 INFO [train.py:886] (0/4) Epoch 33, batch 1800, loss[loss=0.01269, audio_tagging_loss=0.01269, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4952338.86 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:50:55,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1028746.6666666666, ans=0.125 2023-12-23 07:51:00,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1028746.6666666666, ans=0.125 2023-12-23 07:51:03,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1028813.3333333334, ans=0.1 2023-12-23 07:51:29,567 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.18 vs. limit=22.5 2023-12-23 07:51:40,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1029013.3333333334, ans=0.0 2023-12-23 07:51:42,110 INFO [train.py:886] (0/4) Epoch 33, batch 1850, loss[loss=0.01495, audio_tagging_loss=0.01495, over 24750.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4952515.43 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:51:43,438 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2023-12-23 07:51:46,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1029080.0, ans=0.2 2023-12-23 07:51:46,822 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.019e+01 3.396e+01 3.524e+01 3.649e+01 4.087e+01, threshold=7.047e+01, percent-clipped=0.0 2023-12-23 07:51:52,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1029146.6666666666, ans=0.0 2023-12-23 07:52:04,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1029213.3333333334, ans=0.125 2023-12-23 07:52:10,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1029213.3333333334, ans=0.2 2023-12-23 07:52:23,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1029346.6666666666, ans=0.125 2023-12-23 07:52:35,189 INFO [train.py:886] (0/4) Epoch 33, batch 1900, loss[loss=0.01288, audio_tagging_loss=0.01288, over 24750.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4941889.14 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:52:52,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1029480.0, ans=0.0 2023-12-23 07:53:26,784 INFO [train.py:886] (0/4) Epoch 33, batch 1950, loss[loss=0.01029, audio_tagging_loss=0.01029, over 24750.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4942594.91 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:53:31,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1029746.6666666666, ans=0.0 2023-12-23 07:53:32,176 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.173e+01 3.392e+01 3.533e+01 3.748e+01 4.233e+01, threshold=7.067e+01, percent-clipped=0.0 2023-12-23 07:54:03,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1029946.6666666666, ans=0.125 2023-12-23 07:54:18,568 INFO [train.py:886] (0/4) Epoch 33, batch 2000, loss[loss=0.01212, audio_tagging_loss=0.01212, over 25000.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4943286.41 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:54:24,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.80 vs. limit=22.5 2023-12-23 07:54:39,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1030213.3333333334, ans=0.0 2023-12-23 07:54:45,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1030213.3333333334, ans=0.125 2023-12-23 07:55:07,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1030346.6666666666, ans=0.0 2023-12-23 07:55:10,801 INFO [train.py:886] (0/4) Epoch 33, batch 2050, loss[loss=0.01116, audio_tagging_loss=0.01116, over 25000.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4949091.09 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:55:16,254 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.917e+01 3.317e+01 3.445e+01 3.668e+01 4.108e+01, threshold=6.890e+01, percent-clipped=0.0 2023-12-23 07:55:18,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1030413.3333333334, ans=0.125 2023-12-23 07:55:20,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1030480.0, ans=0.1 2023-12-23 07:55:49,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2023-12-23 07:55:51,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1030680.0, ans=0.125 2023-12-23 07:56:01,868 INFO [train.py:886] (0/4) Epoch 33, batch 2100, loss[loss=0.01316, audio_tagging_loss=0.01316, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4953252.71 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:56:16,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1030813.3333333334, ans=0.125 2023-12-23 07:56:18,448 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:56:21,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1030813.3333333334, ans=0.125 2023-12-23 07:56:23,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1030880.0, ans=0.0 2023-12-23 07:56:38,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1030946.6666666666, ans=0.125 2023-12-23 07:56:50,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.74 vs. limit=15.0 2023-12-23 07:56:52,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1031013.3333333334, ans=0.2 2023-12-23 07:56:54,955 INFO [train.py:886] (0/4) Epoch 33, batch 2150, loss[loss=0.01435, audio_tagging_loss=0.01435, over 24945.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4955325.76 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:56:59,623 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.872e+01 3.355e+01 3.526e+01 3.683e+01 4.612e+01, threshold=7.052e+01, percent-clipped=0.0 2023-12-23 07:57:06,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1031146.6666666666, ans=0.0 2023-12-23 07:57:07,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.86 vs. limit=8.0 2023-12-23 07:57:17,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1031213.3333333334, ans=0.1 2023-12-23 07:57:25,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1031280.0, ans=0.0 2023-12-23 07:57:32,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1031280.0, ans=0.0 2023-12-23 07:57:44,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1031346.6666666666, ans=0.0 2023-12-23 07:57:46,395 INFO [train.py:886] (0/4) Epoch 33, batch 2200, loss[loss=0.01305, audio_tagging_loss=0.01305, over 24750.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4951237.32 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:57:56,427 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=12.0 2023-12-23 07:58:16,624 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-12-23 07:58:22,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1031613.3333333334, ans=0.1 2023-12-23 07:58:34,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1031680.0, ans=0.125 2023-12-23 07:58:38,189 INFO [train.py:886] (0/4) Epoch 33, batch 2250, loss[loss=0.01207, audio_tagging_loss=0.01207, over 25000.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4949294.47 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 07:58:42,939 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.057e+01 3.375e+01 3.511e+01 3.690e+01 4.571e+01, threshold=7.022e+01, percent-clipped=0.0 2023-12-23 07:58:51,601 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.83 vs. limit=10.0 2023-12-23 07:59:02,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1031880.0, ans=0.1 2023-12-23 07:59:17,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1031946.6666666666, ans=0.0 2023-12-23 07:59:18,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1031946.6666666666, ans=0.125 2023-12-23 07:59:26,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2023-12-23 07:59:28,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1032013.3333333334, ans=0.0 2023-12-23 07:59:30,933 INFO [train.py:886] (0/4) Epoch 33, batch 2300, loss[loss=0.01355, audio_tagging_loss=0.01355, over 25000.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4952358.21 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 07:59:34,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.81 vs. limit=22.5 2023-12-23 07:59:44,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1032146.6666666666, ans=0.025 2023-12-23 07:59:48,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1032146.6666666666, ans=0.125 2023-12-23 07:59:49,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1032146.6666666666, ans=0.2 2023-12-23 08:00:07,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1032280.0, ans=0.125 2023-12-23 08:00:17,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1032346.6666666666, ans=0.2 2023-12-23 08:00:23,000 INFO [train.py:886] (0/4) Epoch 33, batch 2350, loss[loss=0.009466, audio_tagging_loss=0.009466, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4947866.22 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:00:28,464 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.041e+01 3.321e+01 3.472e+01 3.677e+01 4.298e+01, threshold=6.945e+01, percent-clipped=0.0 2023-12-23 08:00:31,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1032413.3333333334, ans=10.0 2023-12-23 08:00:53,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1032613.3333333334, ans=0.0 2023-12-23 08:00:55,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1032613.3333333334, ans=0.125 2023-12-23 08:01:04,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=1032680.0, ans=15.0 2023-12-23 08:01:14,860 INFO [train.py:886] (0/4) Epoch 33, batch 2400, loss[loss=0.009695, audio_tagging_loss=0.009695, over 25000.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4953208.49 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:01:17,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1032746.6666666666, ans=0.2 2023-12-23 08:01:19,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1032746.6666666666, ans=0.0 2023-12-23 08:01:33,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1032813.3333333334, ans=0.125 2023-12-23 08:01:34,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1032813.3333333334, ans=0.125 2023-12-23 08:02:02,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=22.5 2023-12-23 08:02:07,472 INFO [train.py:886] (0/4) Epoch 33, batch 2450, loss[loss=0.01146, audio_tagging_loss=0.01146, over 25000.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4959820.59 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:02:11,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1033080.0, ans=0.125 2023-12-23 08:02:12,815 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.086e+01 3.321e+01 3.452e+01 3.648e+01 4.269e+01, threshold=6.903e+01, percent-clipped=0.0 2023-12-23 08:02:22,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1033146.6666666666, ans=0.125 2023-12-23 08:02:28,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1033213.3333333334, ans=0.2 2023-12-23 08:02:33,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1033213.3333333334, ans=0.05 2023-12-23 08:02:52,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1033346.6666666666, ans=0.125 2023-12-23 08:02:58,916 INFO [train.py:886] (0/4) Epoch 33, batch 2500, loss[loss=0.01135, audio_tagging_loss=0.01135, over 24750.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4959810.91 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:03:03,638 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:03:09,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1033480.0, ans=0.0 2023-12-23 08:03:30,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1033613.3333333334, ans=0.2 2023-12-23 08:03:42,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1033680.0, ans=0.125 2023-12-23 08:03:49,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1033680.0, ans=0.125 2023-12-23 08:03:51,008 INFO [train.py:886] (0/4) Epoch 33, batch 2550, loss[loss=0.01033, audio_tagging_loss=0.01033, over 24750.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4956977.13 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:03:55,657 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.147e+01 3.396e+01 3.556e+01 3.745e+01 4.206e+01, threshold=7.112e+01, percent-clipped=0.0 2023-12-23 08:04:00,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1033813.3333333334, ans=0.5 2023-12-23 08:04:18,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1033880.0, ans=0.125 2023-12-23 08:04:25,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1033946.6666666666, ans=0.125 2023-12-23 08:04:29,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1033946.6666666666, ans=0.07 2023-12-23 08:04:40,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1034013.3333333334, ans=0.1 2023-12-23 08:04:43,683 INFO [train.py:886] (0/4) Epoch 33, batch 2600, loss[loss=0.01346, audio_tagging_loss=0.01346, over 24750.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4956551.10 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:04:44,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1034080.0, ans=0.125 2023-12-23 08:04:53,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1034146.6666666666, ans=15.0 2023-12-23 08:05:12,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1034213.3333333334, ans=0.1 2023-12-23 08:05:34,821 INFO [train.py:886] (0/4) Epoch 33, batch 2650, loss[loss=0.01073, audio_tagging_loss=0.01073, over 25000.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4956039.44 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:05:38,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1034413.3333333334, ans=0.0 2023-12-23 08:05:39,477 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.928e+01 3.366e+01 3.521e+01 3.683e+01 4.023e+01, threshold=7.042e+01, percent-clipped=0.0 2023-12-23 08:05:43,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1034413.3333333334, ans=0.125 2023-12-23 08:05:44,011 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:05:44,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1034480.0, ans=0.0 2023-12-23 08:05:46,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-12-23 08:05:53,195 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:05:56,009 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2023-12-23 08:06:20,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1034680.0, ans=0.125 2023-12-23 08:06:27,632 INFO [train.py:886] (0/4) Epoch 33, batch 2700, loss[loss=0.01187, audio_tagging_loss=0.01187, over 25000.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4956861.15 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:07:08,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1035013.3333333334, ans=0.125 2023-12-23 08:07:17,210 INFO [train.py:886] (0/4) Epoch 33, batch 2750, loss[loss=0.01236, audio_tagging_loss=0.01236, over 25000.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4962809.38 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 128.0 2023-12-23 08:07:17,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1035080.0, ans=0.125 2023-12-23 08:07:23,274 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.910e+01 3.350e+01 3.483e+01 3.677e+01 4.348e+01, threshold=6.966e+01, percent-clipped=0.0 2023-12-23 08:07:58,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1035346.6666666666, ans=0.2 2023-12-23 08:08:06,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2023-12-23 08:08:09,519 INFO [train.py:886] (0/4) Epoch 33, batch 2800, loss[loss=0.01185, audio_tagging_loss=0.01185, over 24750.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4956979.76 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:08:09,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1035413.3333333334, ans=0.1 2023-12-23 08:08:21,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.52 vs. limit=22.5 2023-12-23 08:09:01,185 INFO [train.py:886] (0/4) Epoch 33, batch 2850, loss[loss=0.009823, audio_tagging_loss=0.009823, over 24006.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4951889.47 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:09:02,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1035746.6666666666, ans=0.125 2023-12-23 08:09:02,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.82 vs. limit=22.5 2023-12-23 08:09:07,510 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.083e+01 3.407e+01 3.538e+01 3.728e+01 5.942e+01, threshold=7.077e+01, percent-clipped=0.0 2023-12-23 08:09:19,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1035813.3333333334, ans=10.0 2023-12-23 08:09:21,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=1035880.0, ans=0.1 2023-12-23 08:09:46,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1036013.3333333334, ans=0.125 2023-12-23 08:09:52,021 INFO [train.py:886] (0/4) Epoch 33, batch 2900, loss[loss=0.01338, audio_tagging_loss=0.01338, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4953937.66 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:09:53,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1036080.0, ans=0.125 2023-12-23 08:09:53,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.17 vs. limit=10.0 2023-12-23 08:10:45,080 INFO [train.py:886] (0/4) Epoch 33, batch 2950, loss[loss=0.01338, audio_tagging_loss=0.01338, over 25000.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4954300.63 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:10:48,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2023-12-23 08:10:50,739 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.973e+01 3.310e+01 3.454e+01 3.690e+01 4.834e+01, threshold=6.907e+01, percent-clipped=0.0 2023-12-23 08:11:03,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1036546.6666666666, ans=0.0 2023-12-23 08:11:12,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1036546.6666666666, ans=0.125 2023-12-23 08:11:25,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1036680.0, ans=0.2 2023-12-23 08:11:28,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.70 vs. limit=22.5 2023-12-23 08:11:35,883 INFO [train.py:886] (0/4) Epoch 33, batch 3000, loss[loss=0.01338, audio_tagging_loss=0.01338, over 24750.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4951225.57 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:11:35,885 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 08:11:56,757 INFO [train.py:917] (0/4) Epoch 33, validation: loss=0.03378, audio_tagging_loss=0.03378, over 3737520.00 frames. 2023-12-23 08:11:56,758 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 08:11:58,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1036746.6666666666, ans=0.1 2023-12-23 08:12:03,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1036746.6666666666, ans=0.2 2023-12-23 08:12:04,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1036746.6666666666, ans=0.0 2023-12-23 08:12:16,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1036813.3333333334, ans=0.125 2023-12-23 08:12:21,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.70 vs. limit=10.0 2023-12-23 08:12:27,125 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.17 vs. limit=10.0 2023-12-23 08:12:43,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1037013.3333333334, ans=0.0 2023-12-23 08:12:49,273 INFO [train.py:886] (0/4) Epoch 33, batch 3050, loss[loss=0.0115, audio_tagging_loss=0.0115, over 25000.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4954266.94 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:12:54,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1037080.0, ans=0.1 2023-12-23 08:12:54,860 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.120e+01 3.381e+01 3.541e+01 3.697e+01 4.146e+01, threshold=7.081e+01, percent-clipped=0.0 2023-12-23 08:12:55,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1037080.0, ans=0.04949747468305833 2023-12-23 08:13:40,921 INFO [train.py:886] (0/4) Epoch 33, batch 3100, loss[loss=0.01188, audio_tagging_loss=0.01188, over 24750.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4954871.79 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:14:03,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1037546.6666666666, ans=0.125 2023-12-23 08:14:07,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1037546.6666666666, ans=0.1 2023-12-23 08:14:07,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1037546.6666666666, ans=0.125 2023-12-23 08:14:17,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1037613.3333333334, ans=0.2 2023-12-23 08:14:24,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1037680.0, ans=0.1 2023-12-23 08:14:32,359 INFO [train.py:886] (0/4) Epoch 33, batch 3150, loss[loss=0.01645, audio_tagging_loss=0.01645, over 24933.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4945097.51 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:14:37,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1037746.6666666666, ans=0.125 2023-12-23 08:14:38,067 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.211e+01 3.392e+01 3.546e+01 3.675e+01 4.446e+01, threshold=7.092e+01, percent-clipped=0.0 2023-12-23 08:15:00,041 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2023-12-23 08:15:24,439 INFO [train.py:886] (0/4) Epoch 33, batch 3200, loss[loss=0.01071, audio_tagging_loss=0.01071, over 25000.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4944359.90 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:15:31,048 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:15:53,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1038213.3333333334, ans=0.0 2023-12-23 08:15:57,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1038280.0, ans=0.0 2023-12-23 08:16:10,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1038346.6666666666, ans=0.09899494936611666 2023-12-23 08:16:12,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1038346.6666666666, ans=0.1 2023-12-23 08:16:15,385 INFO [train.py:886] (0/4) Epoch 33, batch 3250, loss[loss=0.0111, audio_tagging_loss=0.0111, over 25000.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4944957.90 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:16:19,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1038413.3333333334, ans=0.125 2023-12-23 08:16:22,358 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.103e+01 3.403e+01 3.565e+01 3.733e+01 4.507e+01, threshold=7.131e+01, percent-clipped=0.0 2023-12-23 08:16:30,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1038480.0, ans=0.125 2023-12-23 08:16:58,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=1038680.0, ans=0.1 2023-12-23 08:17:08,595 INFO [train.py:886] (0/4) Epoch 33, batch 3300, loss[loss=0.01373, audio_tagging_loss=0.01373, over 24750.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4951327.91 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:17:19,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1038813.3333333334, ans=0.0 2023-12-23 08:17:41,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2023-12-23 08:17:59,940 INFO [train.py:886] (0/4) Epoch 33, batch 3350, loss[loss=0.01086, audio_tagging_loss=0.01086, over 21603.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4948267.09 frames. ], batch size: 107, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:18:04,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1039080.0, ans=0.0 2023-12-23 08:18:06,314 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.096e+01 3.385e+01 3.532e+01 3.687e+01 4.158e+01, threshold=7.063e+01, percent-clipped=0.0 2023-12-23 08:18:10,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2023-12-23 08:18:11,260 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:18:13,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1039146.6666666666, ans=0.0 2023-12-23 08:18:14,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1039146.6666666666, ans=0.125 2023-12-23 08:18:18,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1039146.6666666666, ans=0.0 2023-12-23 08:18:50,762 INFO [train.py:886] (0/4) Epoch 33, batch 3400, loss[loss=0.01301, audio_tagging_loss=0.01301, over 25000.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4948924.89 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:19:06,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.29 vs. limit=22.5 2023-12-23 08:19:10,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1039546.6666666666, ans=0.125 2023-12-23 08:19:24,089 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-12-23 08:19:42,837 INFO [train.py:886] (0/4) Epoch 33, batch 3450, loss[loss=0.01443, audio_tagging_loss=0.01443, over 22624.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4948912.21 frames. ], batch size: 107, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:19:44,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1039746.6666666666, ans=0.125 2023-12-23 08:19:48,485 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.944e+01 3.448e+01 3.587e+01 3.703e+01 4.197e+01, threshold=7.175e+01, percent-clipped=0.0 2023-12-23 08:19:49,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.54 vs. limit=22.5 2023-12-23 08:19:55,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=15.0 2023-12-23 08:20:04,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1039880.0, ans=0.0 2023-12-23 08:20:20,282 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-156000.pt 2023-12-23 08:20:36,031 INFO [train.py:886] (0/4) Epoch 33, batch 3500, loss[loss=0.01153, audio_tagging_loss=0.01153, over 24750.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4946414.13 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:20:38,216 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:20:44,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1040080.0, ans=0.2 2023-12-23 08:21:04,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1040213.3333333334, ans=0.125 2023-12-23 08:21:04,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1040213.3333333334, ans=0.2 2023-12-23 08:21:06,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1040280.0, ans=0.2 2023-12-23 08:21:06,570 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2023-12-23 08:21:13,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1040280.0, ans=0.1 2023-12-23 08:21:18,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1040346.6666666666, ans=0.025 2023-12-23 08:21:21,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1040346.6666666666, ans=0.1 2023-12-23 08:21:23,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1040346.6666666666, ans=0.1 2023-12-23 08:21:26,440 INFO [train.py:886] (0/4) Epoch 33, batch 3550, loss[loss=0.01222, audio_tagging_loss=0.01222, over 25000.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4936424.47 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:21:26,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1040413.3333333334, ans=0.0 2023-12-23 08:21:32,830 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.940e+01 3.348e+01 3.483e+01 3.687e+01 4.217e+01, threshold=6.967e+01, percent-clipped=0.0 2023-12-23 08:21:34,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1040413.3333333334, ans=0.125 2023-12-23 08:21:36,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1040480.0, ans=0.0 2023-12-23 08:21:41,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1040480.0, ans=0.2 2023-12-23 08:21:54,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1040546.6666666666, ans=0.0 2023-12-23 08:21:56,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1040613.3333333334, ans=0.09899494936611666 2023-12-23 08:22:03,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1040613.3333333334, ans=0.2 2023-12-23 08:22:18,351 INFO [train.py:886] (0/4) Epoch 33, batch 3600, loss[loss=0.01147, audio_tagging_loss=0.01147, over 24918.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4942407.55 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:22:21,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1040746.6666666666, ans=0.125 2023-12-23 08:22:25,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1040746.6666666666, ans=0.0 2023-12-23 08:22:30,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1040813.3333333334, ans=0.0 2023-12-23 08:22:35,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1040813.3333333334, ans=0.2 2023-12-23 08:22:37,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1040880.0, ans=0.2 2023-12-23 08:22:38,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1040880.0, ans=0.0 2023-12-23 08:22:43,047 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2023-12-23 08:23:09,587 INFO [train.py:886] (0/4) Epoch 33, batch 3650, loss[loss=0.01237, audio_tagging_loss=0.01237, over 25000.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4942124.27 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:23:15,867 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.932e+01 3.316e+01 3.480e+01 3.651e+01 4.543e+01, threshold=6.960e+01, percent-clipped=0.0 2023-12-23 08:23:16,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1041080.0, ans=0.125 2023-12-23 08:23:18,912 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:23:22,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1041146.6666666666, ans=0.125 2023-12-23 08:23:25,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1041146.6666666666, ans=0.125 2023-12-23 08:23:34,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1041213.3333333334, ans=0.125 2023-12-23 08:23:48,075 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.49 vs. limit=15.0 2023-12-23 08:24:01,230 INFO [train.py:886] (0/4) Epoch 33, batch 3700, loss[loss=0.01778, audio_tagging_loss=0.01778, over 25000.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4947187.68 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:24:04,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1041413.3333333334, ans=0.0 2023-12-23 08:24:06,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1041413.3333333334, ans=0.07 2023-12-23 08:24:20,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1041546.6666666666, ans=0.0 2023-12-23 08:24:20,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1041546.6666666666, ans=0.0 2023-12-23 08:24:32,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1041613.3333333334, ans=0.1 2023-12-23 08:24:40,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1041613.3333333334, ans=0.2 2023-12-23 08:24:44,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1041680.0, ans=0.2 2023-12-23 08:24:52,535 INFO [train.py:886] (0/4) Epoch 33, batch 3750, loss[loss=0.01405, audio_tagging_loss=0.01405, over 24750.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4949979.32 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:24:59,663 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.113e+01 3.433e+01 3.584e+01 3.717e+01 4.082e+01, threshold=7.168e+01, percent-clipped=0.0 2023-12-23 08:25:00,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1041746.6666666666, ans=0.125 2023-12-23 08:25:03,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1041813.3333333334, ans=0.1 2023-12-23 08:25:04,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1041813.3333333334, ans=0.125 2023-12-23 08:25:04,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=22.5 2023-12-23 08:25:10,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1041813.3333333334, ans=0.125 2023-12-23 08:25:13,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1041880.0, ans=0.0 2023-12-23 08:25:38,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1042013.3333333334, ans=0.125 2023-12-23 08:25:40,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1042013.3333333334, ans=0.125 2023-12-23 08:25:44,407 INFO [train.py:886] (0/4) Epoch 33, batch 3800, loss[loss=0.01536, audio_tagging_loss=0.01536, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4950007.27 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:25:45,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1042080.0, ans=0.2 2023-12-23 08:26:08,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1042213.3333333334, ans=0.125 2023-12-23 08:26:10,519 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.12 vs. limit=22.5 2023-12-23 08:26:20,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1042280.0, ans=0.025 2023-12-23 08:26:27,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1042346.6666666666, ans=0.0 2023-12-23 08:26:36,535 INFO [train.py:886] (0/4) Epoch 33, batch 3850, loss[loss=0.01166, audio_tagging_loss=0.01166, over 23961.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4945678.01 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:26:42,210 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.174e+01 3.454e+01 3.600e+01 3.785e+01 4.455e+01, threshold=7.200e+01, percent-clipped=0.0 2023-12-23 08:27:03,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1042546.6666666666, ans=0.0 2023-12-23 08:27:09,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1042613.3333333334, ans=0.0 2023-12-23 08:27:16,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1042680.0, ans=0.015 2023-12-23 08:27:25,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1042680.0, ans=0.125 2023-12-23 08:27:26,759 INFO [train.py:886] (0/4) Epoch 33, batch 3900, loss[loss=0.01316, audio_tagging_loss=0.01316, over 25000.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4949817.08 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:27:52,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1042880.0, ans=0.0 2023-12-23 08:27:57,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1042946.6666666666, ans=0.125 2023-12-23 08:28:04,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1042946.6666666666, ans=0.2 2023-12-23 08:28:05,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=1042946.6666666666, ans=0.02 2023-12-23 08:28:09,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1043013.3333333334, ans=0.125 2023-12-23 08:28:12,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1043013.3333333334, ans=0.125 2023-12-23 08:28:16,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1043013.3333333334, ans=0.0 2023-12-23 08:28:18,534 INFO [train.py:886] (0/4) Epoch 33, batch 3950, loss[loss=0.007746, audio_tagging_loss=0.007746, over 25000.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4952271.86 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:28:24,296 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.034e+01 3.369e+01 3.508e+01 3.685e+01 5.218e+01, threshold=7.015e+01, percent-clipped=0.0 2023-12-23 08:28:26,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=22.5 2023-12-23 08:28:29,662 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.97 vs. limit=22.5 2023-12-23 08:28:34,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1043146.6666666666, ans=0.125 2023-12-23 08:28:37,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1043146.6666666666, ans=0.025 2023-12-23 08:28:45,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1043213.3333333334, ans=0.125 2023-12-23 08:28:57,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1043280.0, ans=0.125 2023-12-23 08:28:59,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1043346.6666666666, ans=0.0 2023-12-23 08:29:06,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1043346.6666666666, ans=0.125 2023-12-23 08:29:09,808 INFO [train.py:886] (0/4) Epoch 33, batch 4000, loss[loss=0.01461, audio_tagging_loss=0.01461, over 25000.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4957550.83 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:29:21,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1043480.0, ans=0.125 2023-12-23 08:29:41,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1043613.3333333334, ans=0.2 2023-12-23 08:29:57,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1043680.0, ans=0.125 2023-12-23 08:30:00,658 INFO [train.py:886] (0/4) Epoch 33, batch 4050, loss[loss=0.01362, audio_tagging_loss=0.01362, over 24750.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4956457.54 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:30:06,405 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.985e+01 3.429e+01 3.579e+01 3.740e+01 4.198e+01, threshold=7.158e+01, percent-clipped=0.0 2023-12-23 08:30:20,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1043813.3333333334, ans=0.2 2023-12-23 08:30:52,101 INFO [train.py:886] (0/4) Epoch 33, batch 4100, loss[loss=0.0108, audio_tagging_loss=0.0108, over 24750.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4949390.40 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:31:00,007 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=15.0 2023-12-23 08:31:06,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.03 vs. limit=22.5 2023-12-23 08:31:21,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1044280.0, ans=0.1 2023-12-23 08:31:26,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1044280.0, ans=0.0 2023-12-23 08:31:29,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1044280.0, ans=0.0 2023-12-23 08:31:32,486 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2023-12-23 08:31:37,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1044346.6666666666, ans=0.125 2023-12-23 08:31:37,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1044346.6666666666, ans=0.125 2023-12-23 08:31:42,122 INFO [train.py:886] (0/4) Epoch 33, batch 4150, loss[loss=0.01198, audio_tagging_loss=0.01198, over 24750.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4950426.96 frames. ], batch size: 99, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:31:48,511 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.092e+01 3.375e+01 3.544e+01 3.687e+01 4.379e+01, threshold=7.088e+01, percent-clipped=0.0 2023-12-23 08:31:54,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1044480.0, ans=0.1 2023-12-23 08:32:06,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.21 vs. limit=22.5 2023-12-23 08:32:13,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1044613.3333333334, ans=0.125 2023-12-23 08:32:14,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2023-12-23 08:32:25,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1044680.0, ans=0.125 2023-12-23 08:32:33,497 INFO [train.py:886] (0/4) Epoch 33, batch 4200, loss[loss=0.01123, audio_tagging_loss=0.01123, over 21465.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4946312.91 frames. ], batch size: 107, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:32:34,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1044746.6666666666, ans=0.04949747468305833 2023-12-23 08:32:37,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1044746.6666666666, ans=0.0 2023-12-23 08:32:44,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.08 vs. limit=22.5 2023-12-23 08:32:48,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1044813.3333333334, ans=0.125 2023-12-23 08:32:56,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2023-12-23 08:32:59,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1044880.0, ans=0.0 2023-12-23 08:33:03,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1044946.6666666666, ans=0.0 2023-12-23 08:33:15,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1045013.3333333334, ans=0.2 2023-12-23 08:33:25,244 INFO [train.py:886] (0/4) Epoch 33, batch 4250, loss[loss=0.008165, audio_tagging_loss=0.008165, over 24072.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4948740.73 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:33:31,639 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.010e+01 3.356e+01 3.487e+01 3.659e+01 4.243e+01, threshold=6.975e+01, percent-clipped=0.0 2023-12-23 08:33:46,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1045213.3333333334, ans=0.125 2023-12-23 08:34:06,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1045346.6666666666, ans=0.125 2023-12-23 08:34:16,215 INFO [train.py:886] (0/4) Epoch 33, batch 4300, loss[loss=0.01221, audio_tagging_loss=0.01221, over 24750.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4954542.17 frames. ], batch size: 99, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:34:18,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1045413.3333333334, ans=0.0 2023-12-23 08:34:25,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1045413.3333333334, ans=0.125 2023-12-23 08:34:36,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1045546.6666666666, ans=0.2 2023-12-23 08:34:49,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1045613.3333333334, ans=0.125 2023-12-23 08:34:58,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1045680.0, ans=0.0 2023-12-23 08:35:04,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1045680.0, ans=0.0 2023-12-23 08:35:05,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1045680.0, ans=0.2 2023-12-23 08:35:08,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1045746.6666666666, ans=0.0 2023-12-23 08:35:08,725 INFO [train.py:886] (0/4) Epoch 33, batch 4350, loss[loss=0.01388, audio_tagging_loss=0.01388, over 25000.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4952341.11 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:35:10,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-12-23 08:35:15,035 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.109e+01 3.438e+01 3.558e+01 3.684e+01 4.485e+01, threshold=7.115e+01, percent-clipped=0.0 2023-12-23 08:35:16,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1045746.6666666666, ans=0.125 2023-12-23 08:35:25,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1045813.3333333334, ans=0.05 2023-12-23 08:36:01,110 INFO [train.py:886] (0/4) Epoch 33, batch 4400, loss[loss=0.01303, audio_tagging_loss=0.01303, over 24750.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4945329.16 frames. ], batch size: 99, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:36:12,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1046146.6666666666, ans=0.125 2023-12-23 08:36:14,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1046146.6666666666, ans=0.125 2023-12-23 08:36:16,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1046146.6666666666, ans=0.0 2023-12-23 08:36:39,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=12.0 2023-12-23 08:36:42,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1046346.6666666666, ans=0.0 2023-12-23 08:36:48,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.82 vs. limit=15.0 2023-12-23 08:36:52,634 INFO [train.py:886] (0/4) Epoch 33, batch 4450, loss[loss=0.0132, audio_tagging_loss=0.0132, over 24750.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4947968.71 frames. ], batch size: 99, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:36:55,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1046413.3333333334, ans=0.125 2023-12-23 08:36:58,260 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.139e+01 3.349e+01 3.519e+01 3.667e+01 4.264e+01, threshold=7.037e+01, percent-clipped=0.0 2023-12-23 08:37:19,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1046546.6666666666, ans=0.2 2023-12-23 08:37:19,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1046546.6666666666, ans=0.125 2023-12-23 08:37:22,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1046546.6666666666, ans=0.2 2023-12-23 08:37:37,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1046680.0, ans=0.125 2023-12-23 08:37:44,999 INFO [train.py:886] (0/4) Epoch 33, batch 4500, loss[loss=0.01332, audio_tagging_loss=0.01332, over 25000.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4950856.92 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:38:16,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1046946.6666666666, ans=0.1 2023-12-23 08:38:19,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1046946.6666666666, ans=0.0 2023-12-23 08:38:35,942 INFO [train.py:886] (0/4) Epoch 33, batch 4550, loss[loss=0.01263, audio_tagging_loss=0.01263, over 25000.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4958280.60 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:38:36,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-12-23 08:38:38,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1047080.0, ans=0.2 2023-12-23 08:38:43,052 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.099e+01 3.336e+01 3.508e+01 3.645e+01 4.432e+01, threshold=7.015e+01, percent-clipped=0.0 2023-12-23 08:38:44,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1047080.0, ans=0.1 2023-12-23 08:39:11,513 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:39:28,814 INFO [train.py:886] (0/4) Epoch 33, batch 4600, loss[loss=0.01067, audio_tagging_loss=0.01067, over 24073.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4954585.25 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:39:30,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1047413.3333333334, ans=0.1 2023-12-23 08:40:14,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1047680.0, ans=0.0 2023-12-23 08:40:21,223 INFO [train.py:886] (0/4) Epoch 33, batch 4650, loss[loss=0.01227, audio_tagging_loss=0.01227, over 25000.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4954418.73 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:40:23,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1047746.6666666666, ans=0.125 2023-12-23 08:40:23,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1047746.6666666666, ans=0.2 2023-12-23 08:40:27,557 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.970e+01 3.430e+01 3.556e+01 3.733e+01 4.404e+01, threshold=7.113e+01, percent-clipped=0.0 2023-12-23 08:40:31,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1047813.3333333334, ans=0.0 2023-12-23 08:40:33,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1047813.3333333334, ans=0.125 2023-12-23 08:40:41,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1047880.0, ans=0.125 2023-12-23 08:40:43,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1047880.0, ans=0.2 2023-12-23 08:40:44,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1047880.0, ans=22.5 2023-12-23 08:41:08,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1048013.3333333334, ans=0.1 2023-12-23 08:41:10,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2023-12-23 08:41:12,176 INFO [train.py:886] (0/4) Epoch 33, batch 4700, loss[loss=0.01195, audio_tagging_loss=0.01195, over 24750.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4956883.33 frames. ], batch size: 99, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:41:12,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1048080.0, ans=0.125 2023-12-23 08:41:14,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1048080.0, ans=0.125 2023-12-23 08:41:19,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1048080.0, ans=0.0 2023-12-23 08:41:34,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1048213.3333333334, ans=0.125 2023-12-23 08:41:36,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2023-12-23 08:41:51,478 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2023-12-23 08:41:59,658 INFO [train.py:886] (0/4) Epoch 33, batch 4750, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4948491.35 frames. ], batch size: 99, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:42:05,086 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.132e+01 3.425e+01 3.596e+01 3.749e+01 4.228e+01, threshold=7.192e+01, percent-clipped=0.0 2023-12-23 08:42:10,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.43 vs. limit=15.0 2023-12-23 08:42:14,735 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-33.pt 2023-12-23 08:42:35,316 INFO [train.py:886] (0/4) Epoch 34, batch 0, loss[loss=0.02568, audio_tagging_loss=0.02568, over 25000.00 frames. ], tot_loss[loss=0.02568, audio_tagging_loss=0.02568, over 25000.00 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:42:35,318 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 08:42:56,424 INFO [train.py:917] (0/4) Epoch 34, validation: loss=0.03363, audio_tagging_loss=0.03363, over 3737520.00 frames. 2023-12-23 08:42:56,424 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 08:42:58,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1048520.0, ans=0.0 2023-12-23 08:43:12,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1048586.6666666667, ans=0.125 2023-12-23 08:43:33,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1048720.0, ans=0.125 2023-12-23 08:43:45,941 INFO [train.py:886] (0/4) Epoch 34, batch 50, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.01931, audio_tagging_loss=0.01931, over 1123211.26 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:44:00,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1048920.0, ans=0.125 2023-12-23 08:44:06,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1048986.6666666667, ans=0.2 2023-12-23 08:44:10,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1048986.6666666667, ans=0.125 2023-12-23 08:44:12,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1048986.6666666667, ans=0.1 2023-12-23 08:44:12,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1048986.6666666667, ans=0.1 2023-12-23 08:44:15,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=15.0 2023-12-23 08:44:21,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1049053.3333333333, ans=0.07 2023-12-23 08:44:22,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1049053.3333333333, ans=0.125 2023-12-23 08:44:25,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1049053.3333333333, ans=0.125 2023-12-23 08:44:28,826 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.137e+01 4.024e+01 4.370e+01 4.886e+01 9.756e+01, threshold=8.739e+01, percent-clipped=6.0 2023-12-23 08:44:37,891 INFO [train.py:886] (0/4) Epoch 34, batch 100, loss[loss=0.01375, audio_tagging_loss=0.01375, over 25000.00 frames. ], tot_loss[loss=0.01661, audio_tagging_loss=0.01661, over 1973682.65 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:44:42,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1049186.6666666667, ans=0.0 2023-12-23 08:44:45,089 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.32 vs. limit=15.0 2023-12-23 08:44:46,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1049253.3333333333, ans=0.125 2023-12-23 08:44:49,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1049253.3333333333, ans=0.2 2023-12-23 08:45:02,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.43 vs. limit=15.0 2023-12-23 08:45:16,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2023-12-23 08:45:22,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-12-23 08:45:28,897 INFO [train.py:886] (0/4) Epoch 34, batch 150, loss[loss=0.01092, audio_tagging_loss=0.01092, over 21100.00 frames. ], tot_loss[loss=0.01529, audio_tagging_loss=0.01529, over 2635299.10 frames. ], batch size: 107, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:46:06,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1049720.0, ans=0.125 2023-12-23 08:46:11,362 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.202e+01 3.487e+01 3.657e+01 3.856e+01 4.371e+01, threshold=7.314e+01, percent-clipped=0.0 2023-12-23 08:46:13,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1049786.6666666667, ans=0.2 2023-12-23 08:46:19,916 INFO [train.py:886] (0/4) Epoch 34, batch 200, loss[loss=0.012, audio_tagging_loss=0.012, over 25000.00 frames. ], tot_loss[loss=0.01438, audio_tagging_loss=0.01438, over 3151589.68 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:46:20,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1049853.3333333333, ans=0.0 2023-12-23 08:46:37,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.66 vs. limit=22.5 2023-12-23 08:46:40,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1049986.6666666667, ans=0.2 2023-12-23 08:46:49,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1050053.3333333333, ans=0.0 2023-12-23 08:47:00,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1050120.0, ans=0.125 2023-12-23 08:47:01,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1050120.0, ans=0.125 2023-12-23 08:47:09,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1050186.6666666667, ans=10.0 2023-12-23 08:47:10,933 INFO [train.py:886] (0/4) Epoch 34, batch 250, loss[loss=0.00945, audio_tagging_loss=0.00945, over 25000.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 3551232.32 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:47:30,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1050320.0, ans=0.125 2023-12-23 08:47:51,926 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.087e+01 3.410e+01 3.573e+01 3.673e+01 4.532e+01, threshold=7.147e+01, percent-clipped=0.0 2023-12-23 08:47:53,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1050453.3333333333, ans=0.035 2023-12-23 08:47:55,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1050453.3333333333, ans=0.1 2023-12-23 08:48:00,540 INFO [train.py:886] (0/4) Epoch 34, batch 300, loss[loss=0.01008, audio_tagging_loss=0.01008, over 24035.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 3858672.86 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:48:23,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1050653.3333333333, ans=0.125 2023-12-23 08:48:24,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.06 vs. limit=12.0 2023-12-23 08:48:34,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1050720.0, ans=0.1 2023-12-23 08:48:40,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1050786.6666666667, ans=0.125 2023-12-23 08:48:47,458 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2023-12-23 08:48:52,553 INFO [train.py:886] (0/4) Epoch 34, batch 350, loss[loss=0.01229, audio_tagging_loss=0.01229, over 24750.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4093653.27 frames. ], batch size: 99, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:48:54,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2023-12-23 08:49:21,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1050986.6666666667, ans=0.125 2023-12-23 08:49:34,320 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.936e+01 3.391e+01 3.531e+01 3.690e+01 4.649e+01, threshold=7.063e+01, percent-clipped=0.0 2023-12-23 08:49:44,265 INFO [train.py:886] (0/4) Epoch 34, batch 400, loss[loss=0.01189, audio_tagging_loss=0.01189, over 24750.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4279491.62 frames. ], batch size: 99, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:49:47,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1051186.6666666667, ans=0.125 2023-12-23 08:50:05,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1051320.0, ans=0.0 2023-12-23 08:50:05,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1051320.0, ans=0.0 2023-12-23 08:50:07,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1051320.0, ans=0.0 2023-12-23 08:50:09,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1051320.0, ans=0.09899494936611666 2023-12-23 08:50:11,813 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2023-12-23 08:50:21,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1051386.6666666667, ans=0.2 2023-12-23 08:50:36,040 INFO [train.py:886] (0/4) Epoch 34, batch 450, loss[loss=0.01103, audio_tagging_loss=0.01103, over 25000.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4431367.35 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:50:51,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1051586.6666666667, ans=0.1 2023-12-23 08:50:52,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1051586.6666666667, ans=0.125 2023-12-23 08:51:04,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1051653.3333333333, ans=0.125 2023-12-23 08:51:06,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=12.0 2023-12-23 08:51:18,110 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.014e+01 3.386e+01 3.481e+01 3.688e+01 4.054e+01, threshold=6.962e+01, percent-clipped=0.0 2023-12-23 08:51:25,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1051786.6666666667, ans=0.015 2023-12-23 08:51:28,810 INFO [train.py:886] (0/4) Epoch 34, batch 500, loss[loss=0.01539, audio_tagging_loss=0.01539, over 25000.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4548722.61 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:51:39,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.31 vs. limit=12.0 2023-12-23 08:51:39,732 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-12-23 08:51:46,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1051920.0, ans=0.125 2023-12-23 08:51:48,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1051986.6666666667, ans=0.025 2023-12-23 08:52:13,283 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2023-12-23 08:52:19,584 INFO [train.py:886] (0/4) Epoch 34, batch 550, loss[loss=0.01379, audio_tagging_loss=0.01379, over 25000.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4637370.80 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:52:27,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1052186.6666666667, ans=0.09899494936611666 2023-12-23 08:52:44,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1052320.0, ans=0.125 2023-12-23 08:52:48,573 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-12-23 08:52:55,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1052386.6666666667, ans=0.125 2023-12-23 08:53:00,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1052386.6666666667, ans=0.125 2023-12-23 08:53:03,275 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.111e+01 3.451e+01 3.642e+01 3.802e+01 4.281e+01, threshold=7.285e+01, percent-clipped=0.0 2023-12-23 08:53:05,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1052453.3333333333, ans=0.0 2023-12-23 08:53:12,562 INFO [train.py:886] (0/4) Epoch 34, batch 600, loss[loss=0.01241, audio_tagging_loss=0.01241, over 25000.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4711062.25 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:53:20,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-12-23 08:53:23,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1052586.6666666667, ans=0.0 2023-12-23 08:53:48,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.66 vs. limit=10.0 2023-12-23 08:53:51,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1052720.0, ans=0.125 2023-12-23 08:54:03,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1052853.3333333333, ans=0.025 2023-12-23 08:54:04,413 INFO [train.py:886] (0/4) Epoch 34, batch 650, loss[loss=0.01355, audio_tagging_loss=0.01355, over 24750.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4754357.27 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:54:07,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1052853.3333333333, ans=0.125 2023-12-23 08:54:13,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1052920.0, ans=0.1 2023-12-23 08:54:33,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1052986.6666666667, ans=0.2 2023-12-23 08:54:46,586 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.068e+01 3.392e+01 3.565e+01 3.715e+01 5.032e+01, threshold=7.129e+01, percent-clipped=0.0 2023-12-23 08:54:55,078 INFO [train.py:886] (0/4) Epoch 34, batch 700, loss[loss=0.01297, audio_tagging_loss=0.01297, over 25000.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4798348.02 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:55:03,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.20 vs. limit=22.5 2023-12-23 08:55:04,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2023-12-23 08:55:14,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1053253.3333333333, ans=0.2 2023-12-23 08:55:19,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1053320.0, ans=0.125 2023-12-23 08:55:19,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=12.0 2023-12-23 08:55:22,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1053320.0, ans=0.125 2023-12-23 08:55:29,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1053386.6666666667, ans=0.025 2023-12-23 08:55:31,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.99 vs. limit=15.0 2023-12-23 08:55:44,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1053453.3333333333, ans=0.1 2023-12-23 08:55:47,832 INFO [train.py:886] (0/4) Epoch 34, batch 750, loss[loss=0.01116, audio_tagging_loss=0.01116, over 25000.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4833195.50 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:55:52,629 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.579e-03 2023-12-23 08:56:05,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1053586.6666666667, ans=0.125 2023-12-23 08:56:18,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1053720.0, ans=0.0 2023-12-23 08:56:30,774 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.054e+01 3.393e+01 3.521e+01 3.705e+01 4.133e+01, threshold=7.041e+01, percent-clipped=0.0 2023-12-23 08:56:40,016 INFO [train.py:886] (0/4) Epoch 34, batch 800, loss[loss=0.01006, audio_tagging_loss=0.01006, over 24750.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4861063.18 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:56:54,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1053920.0, ans=0.1 2023-12-23 08:56:54,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1053920.0, ans=0.0 2023-12-23 08:57:15,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1054053.3333333333, ans=0.125 2023-12-23 08:57:32,068 INFO [train.py:886] (0/4) Epoch 34, batch 850, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4885190.42 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:57:38,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1054186.6666666667, ans=0.2 2023-12-23 08:57:58,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.64 vs. limit=6.0 2023-12-23 08:58:02,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.85 vs. limit=22.5 2023-12-23 08:58:10,252 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.38 vs. limit=12.0 2023-12-23 08:58:12,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1054386.6666666667, ans=0.125 2023-12-23 08:58:14,559 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.893e+01 3.443e+01 3.585e+01 3.750e+01 4.520e+01, threshold=7.170e+01, percent-clipped=0.0 2023-12-23 08:58:25,628 INFO [train.py:886] (0/4) Epoch 34, batch 900, loss[loss=0.01188, audio_tagging_loss=0.01188, over 24750.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4897732.77 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:58:45,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1054653.3333333333, ans=0.125 2023-12-23 08:59:00,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1054720.0, ans=0.125 2023-12-23 08:59:07,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2023-12-23 08:59:16,998 INFO [train.py:886] (0/4) Epoch 34, batch 950, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4904705.30 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:59:30,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1054920.0, ans=0.2 2023-12-23 08:59:30,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1054920.0, ans=0.125 2023-12-23 08:59:32,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.55 vs. limit=15.0 2023-12-23 08:59:57,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1055053.3333333333, ans=0.0 2023-12-23 09:00:00,945 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.950e+01 3.447e+01 3.600e+01 3.803e+01 4.759e+01, threshold=7.201e+01, percent-clipped=0.0 2023-12-23 09:00:01,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1055120.0, ans=0.125 2023-12-23 09:00:06,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.00 vs. limit=6.0 2023-12-23 09:00:08,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1055186.6666666667, ans=0.125 2023-12-23 09:00:09,508 INFO [train.py:886] (0/4) Epoch 34, batch 1000, loss[loss=0.01456, audio_tagging_loss=0.01456, over 25000.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4906703.15 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:00:18,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1055186.6666666667, ans=0.125 2023-12-23 09:00:30,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1055320.0, ans=0.2 2023-12-23 09:00:31,343 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.90 vs. limit=22.5 2023-12-23 09:00:45,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1055386.6666666667, ans=0.1 2023-12-23 09:00:45,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1055386.6666666667, ans=0.1 2023-12-23 09:01:02,060 INFO [train.py:886] (0/4) Epoch 34, batch 1050, loss[loss=0.01116, audio_tagging_loss=0.01116, over 25000.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4917337.50 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:01:21,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1055653.3333333333, ans=0.1 2023-12-23 09:01:30,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.53 vs. limit=12.0 2023-12-23 09:01:38,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1055720.0, ans=0.0 2023-12-23 09:01:44,522 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.130e+01 3.411e+01 3.556e+01 3.695e+01 4.710e+01, threshold=7.113e+01, percent-clipped=0.0 2023-12-23 09:01:53,104 INFO [train.py:886] (0/4) Epoch 34, batch 1100, loss[loss=0.01024, audio_tagging_loss=0.01024, over 25000.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4922918.83 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:01:55,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1055853.3333333333, ans=0.125 2023-12-23 09:02:11,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1055920.0, ans=0.025 2023-12-23 09:02:36,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1056120.0, ans=0.125 2023-12-23 09:02:46,083 INFO [train.py:886] (0/4) Epoch 34, batch 1150, loss[loss=0.01309, audio_tagging_loss=0.01309, over 25000.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4922251.76 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:02:49,128 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:03:27,719 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.087e+01 3.380e+01 3.484e+01 3.660e+01 4.361e+01, threshold=6.968e+01, percent-clipped=0.0 2023-12-23 09:03:36,200 INFO [train.py:886] (0/4) Epoch 34, batch 1200, loss[loss=0.01295, audio_tagging_loss=0.01295, over 24750.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4938324.41 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:03:36,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1056520.0, ans=0.125 2023-12-23 09:03:40,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1056520.0, ans=0.2 2023-12-23 09:03:40,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1056520.0, ans=0.125 2023-12-23 09:03:57,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1056653.3333333333, ans=0.0 2023-12-23 09:04:27,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1056853.3333333333, ans=15.0 2023-12-23 09:04:28,020 INFO [train.py:886] (0/4) Epoch 34, batch 1250, loss[loss=0.01211, audio_tagging_loss=0.01211, over 22126.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4932445.24 frames. ], batch size: 107, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:04:42,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1056920.0, ans=0.125 2023-12-23 09:04:45,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1056920.0, ans=0.125 2023-12-23 09:04:48,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1056986.6666666667, ans=0.0 2023-12-23 09:05:08,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1057120.0, ans=0.1 2023-12-23 09:05:09,583 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.087e+01 3.461e+01 3.581e+01 3.707e+01 4.566e+01, threshold=7.161e+01, percent-clipped=0.0 2023-12-23 09:05:19,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1057186.6666666667, ans=0.125 2023-12-23 09:05:20,295 INFO [train.py:886] (0/4) Epoch 34, batch 1300, loss[loss=0.01463, audio_tagging_loss=0.01463, over 24750.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4927400.39 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:05:20,489 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:05:30,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1057253.3333333333, ans=0.125 2023-12-23 09:05:30,831 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.62 vs. limit=22.5 2023-12-23 09:05:56,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1057386.6666666667, ans=0.1 2023-12-23 09:05:58,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=1057386.6666666667, ans=0.05 2023-12-23 09:06:10,394 INFO [train.py:886] (0/4) Epoch 34, batch 1350, loss[loss=0.01246, audio_tagging_loss=0.01246, over 25000.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4925738.73 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:06:19,520 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.86 vs. limit=22.5 2023-12-23 09:06:20,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1057520.0, ans=0.125 2023-12-23 09:06:39,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1057653.3333333333, ans=0.0 2023-12-23 09:06:49,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1057720.0, ans=0.125 2023-12-23 09:06:52,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1057786.6666666667, ans=0.04949747468305833 2023-12-23 09:06:54,022 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.110e+01 3.373e+01 3.475e+01 3.645e+01 4.225e+01, threshold=6.949e+01, percent-clipped=0.0 2023-12-23 09:06:57,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1057786.6666666667, ans=0.2 2023-12-23 09:07:03,281 INFO [train.py:886] (0/4) Epoch 34, batch 1400, loss[loss=0.01055, audio_tagging_loss=0.01055, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4933357.96 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:07:08,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1057853.3333333333, ans=0.125 2023-12-23 09:07:10,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1057853.3333333333, ans=0.125 2023-12-23 09:07:20,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1057920.0, ans=10.0 2023-12-23 09:07:54,250 INFO [train.py:886] (0/4) Epoch 34, batch 1450, loss[loss=0.01369, audio_tagging_loss=0.01369, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4939584.98 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:07:56,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1058186.6666666667, ans=0.125 2023-12-23 09:08:32,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1058386.6666666667, ans=0.125 2023-12-23 09:08:37,984 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.058e+01 3.393e+01 3.506e+01 3.631e+01 4.312e+01, threshold=7.011e+01, percent-clipped=0.0 2023-12-23 09:08:42,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1058453.3333333333, ans=0.125 2023-12-23 09:08:46,586 INFO [train.py:886] (0/4) Epoch 34, batch 1500, loss[loss=0.01348, audio_tagging_loss=0.01348, over 25000.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4939226.93 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:08:56,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1058586.6666666667, ans=0.125 2023-12-23 09:08:56,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1058586.6666666667, ans=10.0 2023-12-23 09:08:56,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1058586.6666666667, ans=0.125 2023-12-23 09:09:04,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1058586.6666666667, ans=0.125 2023-12-23 09:09:14,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1058653.3333333333, ans=0.0 2023-12-23 09:09:18,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=16.01 vs. limit=15.0 2023-12-23 09:09:37,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1058853.3333333333, ans=0.125 2023-12-23 09:09:38,149 INFO [train.py:886] (0/4) Epoch 34, batch 1550, loss[loss=0.0109, audio_tagging_loss=0.0109, over 24750.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4939041.23 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:09:42,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2023-12-23 09:09:47,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1058853.3333333333, ans=0.125 2023-12-23 09:09:53,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1058920.0, ans=0.0 2023-12-23 09:10:07,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1058986.6666666667, ans=0.0 2023-12-23 09:10:08,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1059053.3333333333, ans=0.125 2023-12-23 09:10:21,190 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.143e+01 3.468e+01 3.577e+01 3.734e+01 4.228e+01, threshold=7.153e+01, percent-clipped=0.0 2023-12-23 09:10:21,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1059120.0, ans=0.1 2023-12-23 09:10:29,765 INFO [train.py:886] (0/4) Epoch 34, batch 1600, loss[loss=0.01128, audio_tagging_loss=0.01128, over 24750.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4933961.25 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:10:56,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2023-12-23 09:10:58,330 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:11:09,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1059386.6666666667, ans=0.1 2023-12-23 09:11:17,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1059453.3333333333, ans=6.0 2023-12-23 09:11:18,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1059453.3333333333, ans=0.1 2023-12-23 09:11:22,326 INFO [train.py:886] (0/4) Epoch 34, batch 1650, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24750.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4934689.67 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:11:33,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1059586.6666666667, ans=0.0 2023-12-23 09:11:39,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1059586.6666666667, ans=0.125 2023-12-23 09:11:39,429 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=15.0 2023-12-23 09:11:41,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1059586.6666666667, ans=0.125 2023-12-23 09:11:42,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.93 vs. limit=15.0 2023-12-23 09:11:48,925 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.67 vs. limit=15.0 2023-12-23 09:11:51,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.74 vs. limit=10.0 2023-12-23 09:12:03,328 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.071e+01 3.325e+01 3.519e+01 3.722e+01 4.583e+01, threshold=7.038e+01, percent-clipped=0.0 2023-12-23 09:12:08,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.66 vs. limit=10.0 2023-12-23 09:12:13,264 INFO [train.py:886] (0/4) Epoch 34, batch 1700, loss[loss=0.0152, audio_tagging_loss=0.0152, over 24750.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4941609.29 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:12:13,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1059853.3333333333, ans=0.2 2023-12-23 09:12:24,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1059920.0, ans=0.0 2023-12-23 09:12:26,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1059920.0, ans=0.125 2023-12-23 09:12:30,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1059920.0, ans=0.0 2023-12-23 09:12:35,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1059986.6666666667, ans=10.0 2023-12-23 09:12:39,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1059986.6666666667, ans=0.1 2023-12-23 09:12:49,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1060053.3333333333, ans=0.0 2023-12-23 09:13:05,125 INFO [train.py:886] (0/4) Epoch 34, batch 1750, loss[loss=0.01201, audio_tagging_loss=0.01201, over 25000.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4946670.92 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:13:09,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.63 vs. limit=22.5 2023-12-23 09:13:20,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=15.0 2023-12-23 09:13:21,010 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.01 vs. limit=22.5 2023-12-23 09:13:25,515 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=12.0 2023-12-23 09:13:28,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1060320.0, ans=0.2 2023-12-23 09:13:29,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1060320.0, ans=0.1 2023-12-23 09:13:47,293 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.064e+01 3.371e+01 3.527e+01 3.701e+01 4.388e+01, threshold=7.054e+01, percent-clipped=0.0 2023-12-23 09:13:55,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1060520.0, ans=0.125 2023-12-23 09:13:57,073 INFO [train.py:886] (0/4) Epoch 34, batch 1800, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4950378.19 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:13:57,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.01 vs. limit=22.5 2023-12-23 09:14:11,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=15.0 2023-12-23 09:14:27,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1060720.0, ans=0.5 2023-12-23 09:14:43,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1060786.6666666667, ans=0.1 2023-12-23 09:14:46,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1060786.6666666667, ans=0.1 2023-12-23 09:14:47,626 INFO [train.py:886] (0/4) Epoch 34, batch 1850, loss[loss=0.01284, audio_tagging_loss=0.01284, over 24750.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4953566.79 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:14:48,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1060853.3333333333, ans=0.0 2023-12-23 09:14:49,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1060853.3333333333, ans=0.125 2023-12-23 09:14:58,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1060920.0, ans=0.09899494936611666 2023-12-23 09:14:59,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1060920.0, ans=0.07 2023-12-23 09:15:15,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1060986.6666666667, ans=0.125 2023-12-23 09:15:16,018 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.32 vs. limit=10.0 2023-12-23 09:15:16,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1060986.6666666667, ans=0.125 2023-12-23 09:15:30,276 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.188e+01 3.414e+01 3.585e+01 3.770e+01 4.253e+01, threshold=7.170e+01, percent-clipped=0.0 2023-12-23 09:15:36,358 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-12-23 09:15:40,218 INFO [train.py:886] (0/4) Epoch 34, batch 1900, loss[loss=0.01309, audio_tagging_loss=0.01309, over 24750.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4947310.79 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:15:48,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=1061186.6666666667, ans=12.0 2023-12-23 09:15:58,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1061253.3333333333, ans=0.125 2023-12-23 09:16:04,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.53 vs. limit=15.0 2023-12-23 09:16:18,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1061386.6666666667, ans=0.125 2023-12-23 09:16:23,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1061453.3333333333, ans=0.125 2023-12-23 09:16:31,200 INFO [train.py:886] (0/4) Epoch 34, batch 1950, loss[loss=0.01039, audio_tagging_loss=0.01039, over 25000.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4947349.60 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:17:09,896 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:17:14,343 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.990e+01 3.384e+01 3.570e+01 3.708e+01 4.243e+01, threshold=7.140e+01, percent-clipped=0.0 2023-12-23 09:17:18,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1061786.6666666667, ans=0.2 2023-12-23 09:17:23,724 INFO [train.py:886] (0/4) Epoch 34, batch 2000, loss[loss=0.01149, audio_tagging_loss=0.01149, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4950423.38 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:17:24,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2023-12-23 09:17:24,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.09 vs. limit=15.0 2023-12-23 09:17:24,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1061853.3333333333, ans=0.125 2023-12-23 09:17:45,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2023-12-23 09:18:07,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1062120.0, ans=0.0 2023-12-23 09:18:16,316 INFO [train.py:886] (0/4) Epoch 34, batch 2050, loss[loss=0.01029, audio_tagging_loss=0.01029, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4949546.24 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:18:18,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.95 vs. limit=22.5 2023-12-23 09:18:20,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1062186.6666666667, ans=0.125 2023-12-23 09:18:24,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1062186.6666666667, ans=0.125 2023-12-23 09:18:47,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1062386.6666666667, ans=0.125 2023-12-23 09:18:49,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1062386.6666666667, ans=0.1 2023-12-23 09:18:58,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1062453.3333333333, ans=0.2 2023-12-23 09:18:58,857 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.046e+01 3.394e+01 3.550e+01 3.730e+01 4.740e+01, threshold=7.100e+01, percent-clipped=0.0 2023-12-23 09:19:08,951 INFO [train.py:886] (0/4) Epoch 34, batch 2100, loss[loss=0.01095, audio_tagging_loss=0.01095, over 25000.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4950121.80 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:19:10,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1062520.0, ans=0.5 2023-12-23 09:19:13,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1062520.0, ans=0.0 2023-12-23 09:19:21,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2023-12-23 09:19:28,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1062653.3333333333, ans=0.0 2023-12-23 09:19:37,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.26 vs. limit=22.5 2023-12-23 09:19:52,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1062786.6666666667, ans=0.1 2023-12-23 09:19:59,893 INFO [train.py:886] (0/4) Epoch 34, batch 2150, loss[loss=0.01419, audio_tagging_loss=0.01419, over 24750.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4955050.19 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:20:15,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1062920.0, ans=22.5 2023-12-23 09:20:24,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=12.0 2023-12-23 09:20:30,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.52 vs. limit=10.0 2023-12-23 09:20:42,093 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.148e+01 3.426e+01 3.569e+01 3.716e+01 4.443e+01, threshold=7.139e+01, percent-clipped=0.0 2023-12-23 09:20:42,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.71 vs. limit=10.0 2023-12-23 09:20:52,025 INFO [train.py:886] (0/4) Epoch 34, batch 2200, loss[loss=0.01411, audio_tagging_loss=0.01411, over 24750.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4950275.54 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:20:54,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1063186.6666666667, ans=0.125 2023-12-23 09:20:55,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1063186.6666666667, ans=0.0 2023-12-23 09:21:06,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1063253.3333333333, ans=0.0 2023-12-23 09:21:08,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1063253.3333333333, ans=0.0 2023-12-23 09:21:32,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1063453.3333333333, ans=0.125 2023-12-23 09:21:43,779 INFO [train.py:886] (0/4) Epoch 34, batch 2250, loss[loss=0.01266, audio_tagging_loss=0.01266, over 24750.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4947717.00 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:21:45,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.24 vs. limit=12.0 2023-12-23 09:21:53,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1063520.0, ans=0.0 2023-12-23 09:22:08,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1063653.3333333333, ans=0.1 2023-12-23 09:22:27,019 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.046e+01 3.422e+01 3.558e+01 3.753e+01 4.731e+01, threshold=7.116e+01, percent-clipped=0.0 2023-12-23 09:22:32,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1063786.6666666667, ans=0.125 2023-12-23 09:22:36,329 INFO [train.py:886] (0/4) Epoch 34, batch 2300, loss[loss=0.01284, audio_tagging_loss=0.01284, over 23071.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4938864.65 frames. ], batch size: 107, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:22:46,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1063920.0, ans=15.0 2023-12-23 09:23:10,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1064053.3333333333, ans=0.0 2023-12-23 09:23:11,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1064053.3333333333, ans=0.125 2023-12-23 09:23:11,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1064053.3333333333, ans=0.05 2023-12-23 09:23:14,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1064053.3333333333, ans=0.0 2023-12-23 09:23:29,144 INFO [train.py:886] (0/4) Epoch 34, batch 2350, loss[loss=0.01118, audio_tagging_loss=0.01118, over 25000.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4947560.46 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:23:47,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1064253.3333333333, ans=0.09899494936611666 2023-12-23 09:24:00,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1064386.6666666667, ans=0.0 2023-12-23 09:24:06,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1064386.6666666667, ans=0.0 2023-12-23 09:24:11,371 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.992e+01 3.391e+01 3.504e+01 3.673e+01 5.436e+01, threshold=7.008e+01, percent-clipped=0.0 2023-12-23 09:24:12,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1064453.3333333333, ans=0.125 2023-12-23 09:24:19,900 INFO [train.py:886] (0/4) Epoch 34, batch 2400, loss[loss=0.01107, audio_tagging_loss=0.01107, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4949511.30 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:24:31,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1064586.6666666667, ans=0.125 2023-12-23 09:24:43,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1064653.3333333333, ans=0.125 2023-12-23 09:24:51,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1064720.0, ans=0.1 2023-12-23 09:24:58,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1064720.0, ans=0.025 2023-12-23 09:25:10,989 INFO [train.py:886] (0/4) Epoch 34, batch 2450, loss[loss=0.0109, audio_tagging_loss=0.0109, over 25000.00 frames. ], tot_loss[loss=0.01204, audio_tagging_loss=0.01204, over 4953207.96 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:25:18,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1064853.3333333333, ans=0.07 2023-12-23 09:25:29,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1064986.6666666667, ans=0.0 2023-12-23 09:25:52,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1065120.0, ans=0.2 2023-12-23 09:25:52,954 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.052e+01 3.384e+01 3.531e+01 3.725e+01 4.723e+01, threshold=7.062e+01, percent-clipped=0.0 2023-12-23 09:26:01,413 INFO [train.py:886] (0/4) Epoch 34, batch 2500, loss[loss=0.01043, audio_tagging_loss=0.01043, over 22315.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4947657.07 frames. ], batch size: 107, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:26:02,921 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-12-23 09:26:28,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1065320.0, ans=0.2 2023-12-23 09:26:33,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=12.0 2023-12-23 09:26:52,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.31 vs. limit=22.5 2023-12-23 09:26:54,401 INFO [train.py:886] (0/4) Epoch 34, batch 2550, loss[loss=0.01157, audio_tagging_loss=0.01157, over 25000.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4941679.38 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:26:56,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1065520.0, ans=0.125 2023-12-23 09:27:23,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1065653.3333333333, ans=0.2 2023-12-23 09:27:32,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1065720.0, ans=0.125 2023-12-23 09:27:35,930 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.037e+01 3.394e+01 3.621e+01 3.808e+01 4.469e+01, threshold=7.242e+01, percent-clipped=0.0 2023-12-23 09:27:44,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1065786.6666666667, ans=0.125 2023-12-23 09:27:46,490 INFO [train.py:886] (0/4) Epoch 34, batch 2600, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4943200.28 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:27:46,772 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:27:52,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1065853.3333333333, ans=0.05 2023-12-23 09:28:00,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1065920.0, ans=0.09899494936611666 2023-12-23 09:28:37,553 INFO [train.py:886] (0/4) Epoch 34, batch 2650, loss[loss=0.01243, audio_tagging_loss=0.01243, over 24750.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4942939.94 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:29:06,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1066320.0, ans=0.0 2023-12-23 09:29:20,769 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.086e+01 3.338e+01 3.500e+01 3.671e+01 4.069e+01, threshold=7.000e+01, percent-clipped=0.0 2023-12-23 09:29:27,155 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=15.0 2023-12-23 09:29:28,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1066453.3333333333, ans=0.2 2023-12-23 09:29:30,358 INFO [train.py:886] (0/4) Epoch 34, batch 2700, loss[loss=0.009166, audio_tagging_loss=0.009166, over 24076.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4946542.07 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:29:31,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2023-12-23 09:29:32,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.28 vs. limit=6.0 2023-12-23 09:29:37,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1066520.0, ans=0.125 2023-12-23 09:29:43,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.95 vs. limit=22.5 2023-12-23 09:29:46,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1066586.6666666667, ans=0.0 2023-12-23 09:29:51,728 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-160000.pt 2023-12-23 09:29:54,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2023-12-23 09:29:56,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1066653.3333333333, ans=10.0 2023-12-23 09:30:08,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.01 vs. limit=6.0 2023-12-23 09:30:22,355 INFO [train.py:886] (0/4) Epoch 34, batch 2750, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4952555.87 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:30:23,492 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:30:28,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1066853.3333333333, ans=0.125 2023-12-23 09:30:52,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1067053.3333333333, ans=0.125 2023-12-23 09:31:02,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1067053.3333333333, ans=0.1 2023-12-23 09:31:03,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1067120.0, ans=0.125 2023-12-23 09:31:03,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-12-23 09:31:05,354 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.165e+01 3.431e+01 3.588e+01 3.825e+01 4.310e+01, threshold=7.176e+01, percent-clipped=0.0 2023-12-23 09:31:14,665 INFO [train.py:886] (0/4) Epoch 34, batch 2800, loss[loss=0.01005, audio_tagging_loss=0.01005, over 24750.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4951095.40 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:31:30,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1067253.3333333333, ans=0.09899494936611666 2023-12-23 09:31:40,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.98 vs. limit=15.0 2023-12-23 09:31:44,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1067320.0, ans=0.125 2023-12-23 09:31:55,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1067453.3333333333, ans=0.125 2023-12-23 09:31:57,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1067453.3333333333, ans=0.125 2023-12-23 09:31:58,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-12-23 09:32:07,320 INFO [train.py:886] (0/4) Epoch 34, batch 2850, loss[loss=0.01139, audio_tagging_loss=0.01139, over 24750.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4944043.46 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:32:10,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1067520.0, ans=0.0 2023-12-23 09:32:12,341 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2023-12-23 09:32:18,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1067586.6666666667, ans=0.2 2023-12-23 09:32:49,735 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.095e+01 3.409e+01 3.533e+01 3.696e+01 4.223e+01, threshold=7.066e+01, percent-clipped=0.0 2023-12-23 09:32:58,208 INFO [train.py:886] (0/4) Epoch 34, batch 2900, loss[loss=0.01396, audio_tagging_loss=0.01396, over 25000.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4942465.77 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:33:05,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.77 vs. limit=6.0 2023-12-23 09:33:19,530 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.47 vs. limit=22.5 2023-12-23 09:33:23,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1067986.6666666667, ans=0.0 2023-12-23 09:33:29,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=22.5 2023-12-23 09:33:38,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1068053.3333333333, ans=0.1 2023-12-23 09:33:41,447 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.644e-03 2023-12-23 09:33:45,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1068120.0, ans=0.95 2023-12-23 09:33:48,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1068120.0, ans=0.2 2023-12-23 09:33:49,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1068186.6666666667, ans=0.125 2023-12-23 09:33:50,596 INFO [train.py:886] (0/4) Epoch 34, batch 2950, loss[loss=0.01491, audio_tagging_loss=0.01491, over 25000.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4942537.67 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:34:00,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1068253.3333333333, ans=0.125 2023-12-23 09:34:32,917 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.061e+01 3.362e+01 3.516e+01 3.709e+01 4.574e+01, threshold=7.032e+01, percent-clipped=0.0 2023-12-23 09:34:42,811 INFO [train.py:886] (0/4) Epoch 34, batch 3000, loss[loss=0.01039, audio_tagging_loss=0.01039, over 25000.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4947043.44 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:34:42,813 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 09:35:01,839 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.6789, 2.4623, 2.6767, 2.1732, 3.8287, 3.3125, 3.9907, 2.3182], device='cuda:0') 2023-12-23 09:35:04,059 INFO [train.py:917] (0/4) Epoch 34, validation: loss=0.03414, audio_tagging_loss=0.03414, over 3737520.00 frames. 2023-12-23 09:35:04,059 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 09:35:10,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1068520.0, ans=0.1 2023-12-23 09:35:18,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1068586.6666666667, ans=0.125 2023-12-23 09:35:20,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=15.0 2023-12-23 09:35:20,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1068586.6666666667, ans=0.2 2023-12-23 09:35:23,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1068653.3333333333, ans=0.125 2023-12-23 09:35:24,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1068653.3333333333, ans=10.0 2023-12-23 09:35:32,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1068720.0, ans=0.125 2023-12-23 09:35:33,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1068720.0, ans=0.2 2023-12-23 09:35:34,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1068720.0, ans=0.1 2023-12-23 09:35:50,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1068786.6666666667, ans=0.125 2023-12-23 09:35:51,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1068786.6666666667, ans=0.125 2023-12-23 09:35:54,413 INFO [train.py:886] (0/4) Epoch 34, batch 3050, loss[loss=0.009741, audio_tagging_loss=0.009741, over 25000.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4948313.11 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:35:56,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1068853.3333333333, ans=0.0 2023-12-23 09:35:59,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1068853.3333333333, ans=0.0 2023-12-23 09:36:02,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=1068853.3333333333, ans=0.02 2023-12-23 09:36:24,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1069053.3333333333, ans=0.125 2023-12-23 09:36:36,791 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.053e+01 3.409e+01 3.553e+01 3.721e+01 5.104e+01, threshold=7.106e+01, percent-clipped=0.0 2023-12-23 09:36:36,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1069120.0, ans=0.0 2023-12-23 09:36:46,699 INFO [train.py:886] (0/4) Epoch 34, batch 3100, loss[loss=0.01311, audio_tagging_loss=0.01311, over 24750.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4951466.48 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:36:51,838 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.00 vs. limit=22.5 2023-12-23 09:36:59,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1069253.3333333333, ans=0.125 2023-12-23 09:37:07,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.04 vs. limit=22.5 2023-12-23 09:37:20,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1069386.6666666667, ans=0.05 2023-12-23 09:37:22,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1069386.6666666667, ans=0.125 2023-12-23 09:37:23,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1069386.6666666667, ans=0.0 2023-12-23 09:37:30,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1069453.3333333333, ans=0.125 2023-12-23 09:37:35,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1069453.3333333333, ans=0.1 2023-12-23 09:37:37,802 INFO [train.py:886] (0/4) Epoch 34, batch 3150, loss[loss=0.01249, audio_tagging_loss=0.01249, over 24750.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4951091.39 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:37:44,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1069520.0, ans=0.0 2023-12-23 09:37:52,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1069586.6666666667, ans=0.5 2023-12-23 09:37:53,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1069586.6666666667, ans=0.125 2023-12-23 09:38:06,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1069653.3333333333, ans=0.04949747468305833 2023-12-23 09:38:12,316 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.35 vs. limit=5.0 2023-12-23 09:38:21,535 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.098e+01 3.466e+01 3.603e+01 3.773e+01 5.785e+01, threshold=7.206e+01, percent-clipped=0.0 2023-12-23 09:38:30,590 INFO [train.py:886] (0/4) Epoch 34, batch 3200, loss[loss=0.01354, audio_tagging_loss=0.01354, over 24750.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4944835.48 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:38:49,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=15.0 2023-12-23 09:38:53,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1069986.6666666667, ans=0.0 2023-12-23 09:39:02,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1070053.3333333333, ans=0.125 2023-12-23 09:39:22,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1070186.6666666667, ans=0.2 2023-12-23 09:39:23,023 INFO [train.py:886] (0/4) Epoch 34, batch 3250, loss[loss=0.01214, audio_tagging_loss=0.01214, over 25000.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4945516.83 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:39:29,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1070186.6666666667, ans=0.0 2023-12-23 09:39:31,427 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:39:38,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1070253.3333333333, ans=0.2 2023-12-23 09:39:40,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.87 vs. limit=15.0 2023-12-23 09:39:45,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1070320.0, ans=0.125 2023-12-23 09:39:55,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1070386.6666666667, ans=0.125 2023-12-23 09:40:05,133 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.908e+01 3.341e+01 3.558e+01 3.738e+01 4.204e+01, threshold=7.116e+01, percent-clipped=0.0 2023-12-23 09:40:13,617 INFO [train.py:886] (0/4) Epoch 34, batch 3300, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4941795.49 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:40:26,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1070586.6666666667, ans=0.07 2023-12-23 09:40:27,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1070586.6666666667, ans=0.04949747468305833 2023-12-23 09:40:57,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1070786.6666666667, ans=0.0 2023-12-23 09:40:59,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1070786.6666666667, ans=0.0 2023-12-23 09:41:05,507 INFO [train.py:886] (0/4) Epoch 34, batch 3350, loss[loss=0.01269, audio_tagging_loss=0.01269, over 25000.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4943074.69 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:41:15,037 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:41:39,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1071053.3333333333, ans=0.1 2023-12-23 09:41:47,081 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.108e+01 3.417e+01 3.570e+01 3.745e+01 4.410e+01, threshold=7.140e+01, percent-clipped=0.0 2023-12-23 09:41:50,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2023-12-23 09:41:55,622 INFO [train.py:886] (0/4) Epoch 34, batch 3400, loss[loss=0.01263, audio_tagging_loss=0.01263, over 25000.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4948656.03 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:41:58,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1071186.6666666667, ans=0.0 2023-12-23 09:42:03,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1071186.6666666667, ans=0.125 2023-12-23 09:42:06,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1071253.3333333333, ans=0.125 2023-12-23 09:42:31,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1071386.6666666667, ans=0.125 2023-12-23 09:42:36,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1071386.6666666667, ans=0.0 2023-12-23 09:42:39,589 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.06 vs. limit=15.0 2023-12-23 09:42:41,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1071453.3333333333, ans=0.1 2023-12-23 09:42:43,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1071453.3333333333, ans=0.125 2023-12-23 09:42:48,503 INFO [train.py:886] (0/4) Epoch 34, batch 3450, loss[loss=0.009795, audio_tagging_loss=0.009795, over 24750.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4941355.61 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:43:00,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1071586.6666666667, ans=0.125 2023-12-23 09:43:00,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1071586.6666666667, ans=0.1 2023-12-23 09:43:02,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1071586.6666666667, ans=0.125 2023-12-23 09:43:13,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1071653.3333333333, ans=0.125 2023-12-23 09:43:14,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1071653.3333333333, ans=0.0 2023-12-23 09:43:18,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1071720.0, ans=0.1 2023-12-23 09:43:18,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1071720.0, ans=0.1 2023-12-23 09:43:29,989 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.229e+01 3.457e+01 3.606e+01 3.781e+01 4.224e+01, threshold=7.212e+01, percent-clipped=0.0 2023-12-23 09:43:40,606 INFO [train.py:886] (0/4) Epoch 34, batch 3500, loss[loss=0.01112, audio_tagging_loss=0.01112, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4932013.31 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:43:51,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1071920.0, ans=0.125 2023-12-23 09:43:55,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1071920.0, ans=0.125 2023-12-23 09:44:05,011 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.83 vs. limit=10.0 2023-12-23 09:44:20,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1072053.3333333333, ans=0.04949747468305833 2023-12-23 09:44:21,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1072120.0, ans=0.125 2023-12-23 09:44:24,415 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:44:25,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1072120.0, ans=0.1 2023-12-23 09:44:29,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1072120.0, ans=0.125 2023-12-23 09:44:31,678 INFO [train.py:886] (0/4) Epoch 34, batch 3550, loss[loss=0.01088, audio_tagging_loss=0.01088, over 25000.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4936221.66 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:44:31,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1072186.6666666667, ans=0.1 2023-12-23 09:45:06,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1072386.6666666667, ans=0.1 2023-12-23 09:45:14,670 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.056e+01 3.412e+01 3.528e+01 3.703e+01 4.575e+01, threshold=7.057e+01, percent-clipped=0.0 2023-12-23 09:45:19,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2023-12-23 09:45:24,643 INFO [train.py:886] (0/4) Epoch 34, batch 3600, loss[loss=0.01188, audio_tagging_loss=0.01188, over 25000.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4940250.44 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:45:25,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1072520.0, ans=0.125 2023-12-23 09:45:26,009 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=12.0 2023-12-23 09:45:39,968 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:46:02,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1072720.0, ans=0.1 2023-12-23 09:46:09,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1072786.6666666667, ans=0.125 2023-12-23 09:46:14,849 INFO [train.py:886] (0/4) Epoch 34, batch 3650, loss[loss=0.01369, audio_tagging_loss=0.01369, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4946338.35 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:46:15,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1072853.3333333333, ans=0.0 2023-12-23 09:46:16,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1072853.3333333333, ans=0.0 2023-12-23 09:46:35,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1072986.6666666667, ans=0.0 2023-12-23 09:46:51,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1073053.3333333333, ans=0.125 2023-12-23 09:46:52,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1073053.3333333333, ans=0.125 2023-12-23 09:46:57,542 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.017e+01 3.373e+01 3.534e+01 3.700e+01 5.262e+01, threshold=7.068e+01, percent-clipped=0.0 2023-12-23 09:47:01,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1073120.0, ans=0.1 2023-12-23 09:47:06,752 INFO [train.py:886] (0/4) Epoch 34, batch 3700, loss[loss=0.01132, audio_tagging_loss=0.01132, over 24750.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4953165.30 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:47:08,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1073186.6666666667, ans=0.125 2023-12-23 09:47:36,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1073320.0, ans=0.125 2023-12-23 09:47:37,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1073386.6666666667, ans=0.1 2023-12-23 09:47:50,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1073453.3333333333, ans=0.125 2023-12-23 09:47:56,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=12.0 2023-12-23 09:47:58,691 INFO [train.py:886] (0/4) Epoch 34, batch 3750, loss[loss=0.0106, audio_tagging_loss=0.0106, over 24750.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4956584.38 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:48:05,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1073520.0, ans=0.0 2023-12-23 09:48:12,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1073586.6666666667, ans=0.125 2023-12-23 09:48:16,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1073586.6666666667, ans=0.0 2023-12-23 09:48:25,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.27 vs. limit=10.0 2023-12-23 09:48:29,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1073720.0, ans=0.125 2023-12-23 09:48:30,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1073720.0, ans=0.125 2023-12-23 09:48:38,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1073720.0, ans=0.1 2023-12-23 09:48:42,083 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.136e+01 3.444e+01 3.639e+01 3.754e+01 4.905e+01, threshold=7.278e+01, percent-clipped=0.0 2023-12-23 09:48:50,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1073853.3333333333, ans=0.1 2023-12-23 09:48:50,766 INFO [train.py:886] (0/4) Epoch 34, batch 3800, loss[loss=0.01187, audio_tagging_loss=0.01187, over 24750.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4951726.27 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:48:55,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2023-12-23 09:48:58,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1073853.3333333333, ans=0.1 2023-12-23 09:49:42,828 INFO [train.py:886] (0/4) Epoch 34, batch 3850, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4952732.09 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:49:52,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1074253.3333333333, ans=0.125 2023-12-23 09:49:53,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1074253.3333333333, ans=0.1 2023-12-23 09:49:55,581 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.50 vs. limit=10.0 2023-12-23 09:50:00,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1074253.3333333333, ans=0.125 2023-12-23 09:50:07,996 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:50:09,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1074320.0, ans=0.5 2023-12-23 09:50:23,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.65 vs. limit=15.0 2023-12-23 09:50:23,464 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.088e+01 3.418e+01 3.601e+01 3.736e+01 4.189e+01, threshold=7.201e+01, percent-clipped=0.0 2023-12-23 09:50:32,680 INFO [train.py:886] (0/4) Epoch 34, batch 3900, loss[loss=0.01196, audio_tagging_loss=0.01196, over 25000.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4957878.03 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:50:55,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1074653.3333333333, ans=0.025 2023-12-23 09:50:56,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1074653.3333333333, ans=0.125 2023-12-23 09:50:59,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1074653.3333333333, ans=0.04949747468305833 2023-12-23 09:51:10,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1074720.0, ans=0.2 2023-12-23 09:51:19,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1074786.6666666667, ans=0.0 2023-12-23 09:51:24,465 INFO [train.py:886] (0/4) Epoch 34, batch 3950, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4959695.76 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:52:07,664 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.046e+01 3.354e+01 3.526e+01 3.698e+01 4.132e+01, threshold=7.051e+01, percent-clipped=0.0 2023-12-23 09:52:16,941 INFO [train.py:886] (0/4) Epoch 34, batch 4000, loss[loss=0.01319, audio_tagging_loss=0.01319, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4967404.33 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 128.0 2023-12-23 09:52:38,514 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:52:48,993 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.89 vs. limit=15.0 2023-12-23 09:52:55,136 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:53:00,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1075453.3333333333, ans=0.125 2023-12-23 09:53:07,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1075520.0, ans=0.0 2023-12-23 09:53:08,047 INFO [train.py:886] (0/4) Epoch 34, batch 4050, loss[loss=0.01391, audio_tagging_loss=0.01391, over 24750.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4963636.71 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:53:13,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1075520.0, ans=0.1 2023-12-23 09:53:14,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1075520.0, ans=0.0 2023-12-23 09:53:31,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2023-12-23 09:53:51,620 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.048e+01 3.432e+01 3.586e+01 3.724e+01 5.445e+01, threshold=7.172e+01, percent-clipped=0.0 2023-12-23 09:53:59,201 INFO [train.py:886] (0/4) Epoch 34, batch 4100, loss[loss=0.0133, audio_tagging_loss=0.0133, over 24750.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4956850.70 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:53:59,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1075853.3333333333, ans=0.125 2023-12-23 09:54:05,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1075853.3333333333, ans=0.125 2023-12-23 09:54:09,549 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:54:12,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1075920.0, ans=0.5 2023-12-23 09:54:17,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1075920.0, ans=0.0 2023-12-23 09:54:25,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1075986.6666666667, ans=0.2 2023-12-23 09:54:31,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1076053.3333333333, ans=0.125 2023-12-23 09:54:40,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1076120.0, ans=0.0 2023-12-23 09:54:52,428 INFO [train.py:886] (0/4) Epoch 34, batch 4150, loss[loss=0.009584, audio_tagging_loss=0.009584, over 25000.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4938992.09 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:54:52,617 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:55:01,428 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.40 vs. limit=22.5 2023-12-23 09:55:13,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1076320.0, ans=0.1 2023-12-23 09:55:21,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1076320.0, ans=0.1 2023-12-23 09:55:32,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.23 vs. limit=15.0 2023-12-23 09:55:34,921 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.051e+01 3.411e+01 3.544e+01 3.753e+01 4.282e+01, threshold=7.087e+01, percent-clipped=0.0 2023-12-23 09:55:42,521 INFO [train.py:886] (0/4) Epoch 34, batch 4200, loss[loss=0.01291, audio_tagging_loss=0.01291, over 25000.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4939713.74 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:56:02,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.28 vs. limit=15.0 2023-12-23 09:56:15,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1076720.0, ans=0.0 2023-12-23 09:56:17,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1076720.0, ans=0.1 2023-12-23 09:56:24,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1076786.6666666667, ans=0.125 2023-12-23 09:56:30,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.28 vs. limit=15.0 2023-12-23 09:56:35,484 INFO [train.py:886] (0/4) Epoch 34, batch 4250, loss[loss=0.01135, audio_tagging_loss=0.01135, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4944429.42 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 32.0 2023-12-23 09:56:39,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2023-12-23 09:56:45,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1076920.0, ans=0.125 2023-12-23 09:56:49,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1076920.0, ans=0.1 2023-12-23 09:57:06,237 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:57:17,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1077120.0, ans=0.125 2023-12-23 09:57:18,332 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.979e+01 3.377e+01 3.552e+01 3.782e+01 4.205e+01, threshold=7.103e+01, percent-clipped=0.0 2023-12-23 09:57:21,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1077120.0, ans=0.0 2023-12-23 09:57:26,355 INFO [train.py:886] (0/4) Epoch 34, batch 4300, loss[loss=0.01053, audio_tagging_loss=0.01053, over 25000.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4952859.82 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 32.0 2023-12-23 09:57:29,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=12.0 2023-12-23 09:57:53,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1077320.0, ans=0.2 2023-12-23 09:58:00,081 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:58:07,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1077453.3333333333, ans=0.0 2023-12-23 09:58:11,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2023-12-23 09:58:11,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1077453.3333333333, ans=0.125 2023-12-23 09:58:17,341 INFO [train.py:886] (0/4) Epoch 34, batch 4350, loss[loss=0.01304, audio_tagging_loss=0.01304, over 24750.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4950866.86 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 32.0 2023-12-23 09:58:36,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1077586.6666666667, ans=0.125 2023-12-23 09:58:58,676 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.84 vs. limit=12.0 2023-12-23 09:59:00,727 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.247e+01 3.499e+01 3.632e+01 3.842e+01 4.825e+01, threshold=7.264e+01, percent-clipped=0.0 2023-12-23 09:59:00,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1077786.6666666667, ans=0.125 2023-12-23 09:59:02,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1077786.6666666667, ans=0.2 2023-12-23 09:59:09,498 INFO [train.py:886] (0/4) Epoch 34, batch 4400, loss[loss=0.01258, audio_tagging_loss=0.01258, over 24750.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4943422.67 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 32.0 2023-12-23 09:59:09,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1077853.3333333333, ans=0.07 2023-12-23 09:59:11,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=15.0 2023-12-23 09:59:23,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1077920.0, ans=0.125 2023-12-23 09:59:25,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1077920.0, ans=0.125 2023-12-23 09:59:31,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1077986.6666666667, ans=0.1 2023-12-23 09:59:42,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1078053.3333333333, ans=0.2 2023-12-23 09:59:46,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1078053.3333333333, ans=0.125 2023-12-23 09:59:53,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1078120.0, ans=0.125 2023-12-23 09:59:59,434 INFO [train.py:886] (0/4) Epoch 34, batch 4450, loss[loss=0.01123, audio_tagging_loss=0.01123, over 24750.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4937975.87 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 32.0 2023-12-23 10:00:00,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.96 vs. limit=6.0 2023-12-23 10:00:18,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.27 vs. limit=6.0 2023-12-23 10:00:19,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1078320.0, ans=0.0 2023-12-23 10:00:37,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.25 vs. limit=22.5 2023-12-23 10:00:45,201 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.115e+01 3.445e+01 3.585e+01 3.809e+01 4.204e+01, threshold=7.171e+01, percent-clipped=0.0 2023-12-23 10:00:51,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1078520.0, ans=0.125 2023-12-23 10:00:51,838 INFO [train.py:886] (0/4) Epoch 34, batch 4500, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4938591.44 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:01:10,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1078586.6666666667, ans=0.2 2023-12-23 10:01:30,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.03 vs. limit=10.0 2023-12-23 10:01:43,525 INFO [train.py:886] (0/4) Epoch 34, batch 4550, loss[loss=0.009007, audio_tagging_loss=0.009007, over 25000.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4941888.62 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:01:47,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1078853.3333333333, ans=0.125 2023-12-23 10:02:09,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1078986.6666666667, ans=0.0 2023-12-23 10:02:21,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=22.5 2023-12-23 10:02:28,640 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.071e+01 3.416e+01 3.558e+01 3.707e+01 4.537e+01, threshold=7.116e+01, percent-clipped=0.0 2023-12-23 10:02:35,205 INFO [train.py:886] (0/4) Epoch 34, batch 4600, loss[loss=0.01192, audio_tagging_loss=0.01192, over 24901.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4946555.01 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:02:35,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1079186.6666666667, ans=0.0 2023-12-23 10:02:36,627 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.46 vs. limit=22.5 2023-12-23 10:02:41,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1079186.6666666667, ans=0.1 2023-12-23 10:02:46,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1079253.3333333333, ans=0.125 2023-12-23 10:02:50,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1079253.3333333333, ans=0.0 2023-12-23 10:02:52,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1079253.3333333333, ans=0.1 2023-12-23 10:02:59,983 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:03:01,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1079320.0, ans=0.07 2023-12-23 10:03:27,343 INFO [train.py:886] (0/4) Epoch 34, batch 4650, loss[loss=0.01264, audio_tagging_loss=0.01264, over 25000.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4952993.08 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:03:47,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1079653.3333333333, ans=10.0 2023-12-23 10:03:55,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=6.0 2023-12-23 10:03:57,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1079720.0, ans=0.125 2023-12-23 10:04:10,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1079786.6666666667, ans=0.125 2023-12-23 10:04:11,227 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.081e+01 3.447e+01 3.567e+01 3.799e+01 4.284e+01, threshold=7.135e+01, percent-clipped=0.0 2023-12-23 10:04:13,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1079786.6666666667, ans=0.2 2023-12-23 10:04:17,748 INFO [train.py:886] (0/4) Epoch 34, batch 4700, loss[loss=0.01242, audio_tagging_loss=0.01242, over 24750.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4945159.17 frames. ], batch size: 99, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:04:21,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1079853.3333333333, ans=0.0 2023-12-23 10:04:26,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-12-23 10:04:27,245 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.25 vs. limit=22.5 2023-12-23 10:04:35,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1079986.6666666667, ans=0.125 2023-12-23 10:04:57,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1080120.0, ans=0.1 2023-12-23 10:04:57,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1080120.0, ans=0.0 2023-12-23 10:05:00,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1080120.0, ans=0.125 2023-12-23 10:05:05,719 INFO [train.py:886] (0/4) Epoch 34, batch 4750, loss[loss=0.0119, audio_tagging_loss=0.0119, over 24750.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4943796.68 frames. ], batch size: 99, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:05:05,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1080186.6666666667, ans=0.2 2023-12-23 10:05:08,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.63 vs. limit=22.5 2023-12-23 10:05:15,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1080253.3333333333, ans=0.0 2023-12-23 10:05:20,856 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-34.pt 2023-12-23 10:05:40,664 INFO [train.py:886] (0/4) Epoch 35, batch 0, loss[loss=0.02832, audio_tagging_loss=0.02832, over 25000.00 frames. ], tot_loss[loss=0.02832, audio_tagging_loss=0.02832, over 25000.00 frames. ], batch size: 100, lr: 3.10e-03, grad_scale: 32.0 2023-12-23 10:05:40,666 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 10:06:01,412 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3422, 3.4496, 4.1456, 3.9116], device='cuda:0') 2023-12-23 10:06:02,110 INFO [train.py:917] (0/4) Epoch 35, validation: loss=0.03353, audio_tagging_loss=0.03353, over 3737520.00 frames. 2023-12-23 10:06:02,110 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 10:06:29,558 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.161e+01 3.510e+01 3.765e+01 4.838e+01 9.519e+01, threshold=7.530e+01, percent-clipped=6.0 2023-12-23 10:06:31,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1080493.3333333333, ans=22.5 2023-12-23 10:06:36,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1080493.3333333333, ans=0.95 2023-12-23 10:06:52,668 INFO [train.py:886] (0/4) Epoch 35, batch 50, loss[loss=0.01242, audio_tagging_loss=0.01242, over 25000.00 frames. ], tot_loss[loss=0.01913, audio_tagging_loss=0.01913, over 1112657.02 frames. ], batch size: 100, lr: 3.10e-03, grad_scale: 32.0 2023-12-23 10:06:56,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2023-12-23 10:07:08,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1080693.3333333333, ans=0.0 2023-12-23 10:07:12,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1080693.3333333333, ans=0.125 2023-12-23 10:07:19,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2023-12-23 10:07:26,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.45 vs. limit=10.0 2023-12-23 10:07:26,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1080826.6666666667, ans=0.0 2023-12-23 10:07:35,177 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2023-12-23 10:07:44,749 INFO [train.py:886] (0/4) Epoch 35, batch 100, loss[loss=0.01311, audio_tagging_loss=0.01311, over 25000.00 frames. ], tot_loss[loss=0.01649, audio_tagging_loss=0.01649, over 1968151.08 frames. ], batch size: 100, lr: 3.10e-03, grad_scale: 32.0 2023-12-23 10:07:56,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1081026.6666666667, ans=0.2 2023-12-23 10:08:12,431 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.410e+01 3.823e+01 4.080e+01 4.340e+01 5.302e+01, threshold=8.159e+01, percent-clipped=0.0 2023-12-23 10:08:36,355 INFO [train.py:886] (0/4) Epoch 35, batch 150, loss[loss=0.01254, audio_tagging_loss=0.01254, over 25000.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 2625915.00 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:08:49,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-12-23 10:08:54,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1081360.0, ans=0.1 2023-12-23 10:08:55,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1081360.0, ans=0.1 2023-12-23 10:08:56,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1081426.6666666667, ans=0.2 2023-12-23 10:08:58,389 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.42 vs. limit=10.0 2023-12-23 10:09:04,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1081426.6666666667, ans=0.125 2023-12-23 10:09:06,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1081493.3333333333, ans=0.125 2023-12-23 10:09:28,091 INFO [train.py:886] (0/4) Epoch 35, batch 200, loss[loss=0.01181, audio_tagging_loss=0.01181, over 25000.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 3146721.86 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:09:33,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1081626.6666666667, ans=0.2 2023-12-23 10:09:38,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1081693.3333333333, ans=0.125 2023-12-23 10:09:46,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1081693.3333333333, ans=0.0 2023-12-23 10:09:55,677 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.255e+01 3.529e+01 3.676e+01 3.871e+01 4.435e+01, threshold=7.352e+01, percent-clipped=0.0 2023-12-23 10:09:58,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.62 vs. limit=8.0 2023-12-23 10:10:20,452 INFO [train.py:886] (0/4) Epoch 35, batch 250, loss[loss=0.01377, audio_tagging_loss=0.01377, over 25000.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 3548954.88 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:10:25,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1081960.0, ans=0.2 2023-12-23 10:10:32,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1082026.6666666667, ans=0.0 2023-12-23 10:10:33,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1082026.6666666667, ans=0.125 2023-12-23 10:10:35,007 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2023-12-23 10:10:43,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2023-12-23 10:10:50,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1082093.3333333333, ans=0.0 2023-12-23 10:10:52,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.94 vs. limit=6.0 2023-12-23 10:10:54,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1082160.0, ans=0.1 2023-12-23 10:11:11,746 INFO [train.py:886] (0/4) Epoch 35, batch 300, loss[loss=0.01154, audio_tagging_loss=0.01154, over 25000.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 3858077.29 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:11:40,255 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.154e+01 3.473e+01 3.613e+01 3.760e+01 4.806e+01, threshold=7.226e+01, percent-clipped=0.0 2023-12-23 10:11:44,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.14 vs. limit=22.5 2023-12-23 10:12:00,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1082560.0, ans=0.125 2023-12-23 10:12:04,013 INFO [train.py:886] (0/4) Epoch 35, batch 350, loss[loss=0.01199, audio_tagging_loss=0.01199, over 24750.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4096777.04 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:12:23,701 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-12-23 10:12:31,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2023-12-23 10:12:42,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1082826.6666666667, ans=0.0 2023-12-23 10:12:57,046 INFO [train.py:886] (0/4) Epoch 35, batch 400, loss[loss=0.01065, audio_tagging_loss=0.01065, over 25000.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4283760.37 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:13:03,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1082960.0, ans=0.07 2023-12-23 10:13:07,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-12-23 10:13:08,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1083026.6666666667, ans=0.125 2023-12-23 10:13:18,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1083093.3333333333, ans=0.2 2023-12-23 10:13:23,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1083093.3333333333, ans=0.125 2023-12-23 10:13:24,735 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.006e+01 3.393e+01 3.521e+01 3.659e+01 4.330e+01, threshold=7.042e+01, percent-clipped=0.0 2023-12-23 10:13:48,027 INFO [train.py:886] (0/4) Epoch 35, batch 450, loss[loss=0.01122, audio_tagging_loss=0.01122, over 22106.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4423656.68 frames. ], batch size: 107, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:14:00,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2023-12-23 10:14:36,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1083560.0, ans=0.1 2023-12-23 10:14:40,521 INFO [train.py:886] (0/4) Epoch 35, batch 500, loss[loss=0.01058, audio_tagging_loss=0.01058, over 25000.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4541586.57 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:14:44,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1083626.6666666667, ans=0.125 2023-12-23 10:14:52,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1083693.3333333333, ans=0.0 2023-12-23 10:15:02,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1083760.0, ans=0.0 2023-12-23 10:15:09,074 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.163e+01 3.425e+01 3.572e+01 3.739e+01 4.112e+01, threshold=7.144e+01, percent-clipped=0.0 2023-12-23 10:15:19,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1083826.6666666667, ans=0.125 2023-12-23 10:15:23,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1083893.3333333333, ans=0.0 2023-12-23 10:15:25,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.34 vs. limit=22.5 2023-12-23 10:15:29,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1083893.3333333333, ans=0.125 2023-12-23 10:15:32,477 INFO [train.py:886] (0/4) Epoch 35, batch 550, loss[loss=0.01107, audio_tagging_loss=0.01107, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4638271.89 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:15:40,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1083960.0, ans=0.0 2023-12-23 10:15:45,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1084026.6666666667, ans=0.0 2023-12-23 10:15:45,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1084026.6666666667, ans=0.04949747468305833 2023-12-23 10:15:47,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1084026.6666666667, ans=0.0 2023-12-23 10:15:48,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.33 vs. limit=22.5 2023-12-23 10:15:51,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2023-12-23 10:15:54,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1084093.3333333333, ans=0.1 2023-12-23 10:16:09,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1084160.0, ans=0.0 2023-12-23 10:16:09,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2023-12-23 10:16:24,364 INFO [train.py:886] (0/4) Epoch 35, batch 600, loss[loss=0.01339, audio_tagging_loss=0.01339, over 24750.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4706498.77 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:16:24,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1084293.3333333333, ans=0.125 2023-12-23 10:16:49,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1084426.6666666667, ans=0.125 2023-12-23 10:16:52,608 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.983e+01 3.476e+01 3.624e+01 3.793e+01 4.486e+01, threshold=7.249e+01, percent-clipped=0.0 2023-12-23 10:17:16,795 INFO [train.py:886] (0/4) Epoch 35, batch 650, loss[loss=0.01267, audio_tagging_loss=0.01267, over 24750.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4756448.10 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:17:27,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1084693.3333333333, ans=0.1 2023-12-23 10:17:32,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.68 vs. limit=22.5 2023-12-23 10:17:33,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1084693.3333333333, ans=0.1 2023-12-23 10:17:41,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1084760.0, ans=0.125 2023-12-23 10:17:47,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1084826.6666666667, ans=0.0 2023-12-23 10:17:49,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1084826.6666666667, ans=0.125 2023-12-23 10:18:03,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1084893.3333333333, ans=0.0 2023-12-23 10:18:05,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-12-23 10:18:06,872 INFO [train.py:886] (0/4) Epoch 35, batch 700, loss[loss=0.01302, audio_tagging_loss=0.01302, over 21908.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4799637.66 frames. ], batch size: 107, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:18:19,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1085026.6666666667, ans=0.125 2023-12-23 10:18:21,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1085026.6666666667, ans=0.125 2023-12-23 10:18:35,075 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.034e+01 3.416e+01 3.588e+01 3.767e+01 4.947e+01, threshold=7.176e+01, percent-clipped=0.0 2023-12-23 10:18:40,354 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2023-12-23 10:18:58,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1085226.6666666667, ans=10.0 2023-12-23 10:18:59,684 INFO [train.py:886] (0/4) Epoch 35, batch 750, loss[loss=0.01271, audio_tagging_loss=0.01271, over 25000.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4834865.24 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:19:05,107 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.28 vs. limit=22.5 2023-12-23 10:19:05,996 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2023-12-23 10:19:25,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.58 vs. limit=15.0 2023-12-23 10:19:43,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1085560.0, ans=0.125 2023-12-23 10:19:50,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.30 vs. limit=12.0 2023-12-23 10:19:51,855 INFO [train.py:886] (0/4) Epoch 35, batch 800, loss[loss=0.01035, audio_tagging_loss=0.01035, over 22167.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4861967.21 frames. ], batch size: 107, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:19:58,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1085626.6666666667, ans=0.0 2023-12-23 10:20:10,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1085760.0, ans=0.1 2023-12-23 10:20:18,777 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.132e+01 3.460e+01 3.638e+01 3.746e+01 4.332e+01, threshold=7.276e+01, percent-clipped=0.0 2023-12-23 10:20:42,800 INFO [train.py:886] (0/4) Epoch 35, batch 850, loss[loss=0.01056, audio_tagging_loss=0.01056, over 21242.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4875036.22 frames. ], batch size: 107, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:20:53,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1086026.6666666667, ans=0.125 2023-12-23 10:21:17,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1086160.0, ans=0.0 2023-12-23 10:21:35,514 INFO [train.py:886] (0/4) Epoch 35, batch 900, loss[loss=0.01613, audio_tagging_loss=0.01613, over 24944.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4891371.76 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:21:40,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1086293.3333333333, ans=0.2 2023-12-23 10:21:53,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1086426.6666666667, ans=0.1 2023-12-23 10:22:03,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.26 vs. limit=22.5 2023-12-23 10:22:03,190 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.210e+01 3.429e+01 3.563e+01 3.739e+01 4.144e+01, threshold=7.126e+01, percent-clipped=0.0 2023-12-23 10:22:06,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1086493.3333333333, ans=0.125 2023-12-23 10:22:12,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.69 vs. limit=22.5 2023-12-23 10:22:23,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1086560.0, ans=0.125 2023-12-23 10:22:25,575 INFO [train.py:886] (0/4) Epoch 35, batch 950, loss[loss=0.01264, audio_tagging_loss=0.01264, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4897637.59 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:22:40,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1086693.3333333333, ans=0.125 2023-12-23 10:22:55,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1086826.6666666667, ans=0.0 2023-12-23 10:23:09,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.75 vs. limit=22.5 2023-12-23 10:23:15,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1086893.3333333333, ans=0.0 2023-12-23 10:23:18,174 INFO [train.py:886] (0/4) Epoch 35, batch 1000, loss[loss=0.01505, audio_tagging_loss=0.01505, over 24750.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4896164.69 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:23:35,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1087026.6666666667, ans=0.125 2023-12-23 10:23:38,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1087093.3333333333, ans=0.0 2023-12-23 10:23:46,652 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.127e+01 3.397e+01 3.527e+01 3.697e+01 4.160e+01, threshold=7.054e+01, percent-clipped=0.0 2023-12-23 10:23:58,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1087226.6666666667, ans=0.1 2023-12-23 10:24:10,458 INFO [train.py:886] (0/4) Epoch 35, batch 1050, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4908166.72 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:24:23,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1087360.0, ans=0.125 2023-12-23 10:24:30,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1087426.6666666667, ans=0.0 2023-12-23 10:24:57,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1087560.0, ans=0.125 2023-12-23 10:25:00,973 INFO [train.py:886] (0/4) Epoch 35, batch 1100, loss[loss=0.01038, audio_tagging_loss=0.01038, over 24036.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4918081.84 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:25:01,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1087626.6666666667, ans=0.5 2023-12-23 10:25:22,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.42 vs. limit=22.5 2023-12-23 10:25:29,193 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.089e+01 3.393e+01 3.590e+01 3.785e+01 4.427e+01, threshold=7.180e+01, percent-clipped=0.0 2023-12-23 10:25:29,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1087760.0, ans=0.125 2023-12-23 10:25:36,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1087826.6666666667, ans=0.0 2023-12-23 10:25:48,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1087893.3333333333, ans=0.125 2023-12-23 10:25:53,739 INFO [train.py:886] (0/4) Epoch 35, batch 1150, loss[loss=0.01081, audio_tagging_loss=0.01081, over 25000.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4926708.16 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:26:10,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1088026.6666666667, ans=0.125 2023-12-23 10:26:13,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2023-12-23 10:26:40,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1088226.6666666667, ans=0.1 2023-12-23 10:26:40,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1088226.6666666667, ans=0.2 2023-12-23 10:26:42,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1088226.6666666667, ans=0.0 2023-12-23 10:26:44,941 INFO [train.py:886] (0/4) Epoch 35, batch 1200, loss[loss=0.01314, audio_tagging_loss=0.01314, over 25000.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4939549.39 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:27:04,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1088360.0, ans=0.1 2023-12-23 10:27:12,514 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.047e+01 3.487e+01 3.620e+01 3.766e+01 4.374e+01, threshold=7.240e+01, percent-clipped=0.0 2023-12-23 10:27:33,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-12-23 10:27:33,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.10 vs. limit=15.0 2023-12-23 10:27:36,884 INFO [train.py:886] (0/4) Epoch 35, batch 1250, loss[loss=0.009835, audio_tagging_loss=0.009835, over 24750.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4940689.36 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:27:39,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1088626.6666666667, ans=0.2 2023-12-23 10:27:41,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1088626.6666666667, ans=0.04949747468305833 2023-12-23 10:27:57,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=15.0 2023-12-23 10:28:15,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1088826.6666666667, ans=0.2 2023-12-23 10:28:19,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.97 vs. limit=15.0 2023-12-23 10:28:27,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1088893.3333333333, ans=0.95 2023-12-23 10:28:29,142 INFO [train.py:886] (0/4) Epoch 35, batch 1300, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4940630.50 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:28:39,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1089026.6666666667, ans=0.0 2023-12-23 10:28:46,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1089026.6666666667, ans=0.125 2023-12-23 10:28:47,212 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=15.0 2023-12-23 10:28:47,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1089093.3333333333, ans=0.07 2023-12-23 10:28:57,283 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.209e+01 3.442e+01 3.551e+01 3.705e+01 4.359e+01, threshold=7.103e+01, percent-clipped=0.0 2023-12-23 10:29:08,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1089160.0, ans=0.125 2023-12-23 10:29:13,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1089226.6666666667, ans=0.0 2023-12-23 10:29:19,924 INFO [train.py:886] (0/4) Epoch 35, batch 1350, loss[loss=0.01105, audio_tagging_loss=0.01105, over 24750.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4934379.35 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:29:23,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1089293.3333333333, ans=0.0 2023-12-23 10:29:38,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1089360.0, ans=0.125 2023-12-23 10:29:47,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1089426.6666666667, ans=0.125 2023-12-23 10:29:54,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1089493.3333333333, ans=0.125 2023-12-23 10:30:05,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1089560.0, ans=0.1 2023-12-23 10:30:06,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.35 vs. limit=10.0 2023-12-23 10:30:12,248 INFO [train.py:886] (0/4) Epoch 35, batch 1400, loss[loss=0.01214, audio_tagging_loss=0.01214, over 25000.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4944248.06 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:30:12,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-12-23 10:30:24,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2023-12-23 10:30:39,921 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.952e+01 3.418e+01 3.570e+01 3.781e+01 4.202e+01, threshold=7.140e+01, percent-clipped=0.0 2023-12-23 10:30:44,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1089826.6666666667, ans=0.1 2023-12-23 10:30:45,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1089826.6666666667, ans=0.125 2023-12-23 10:30:56,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1089893.3333333333, ans=0.035 2023-12-23 10:31:04,673 INFO [train.py:886] (0/4) Epoch 35, batch 1450, loss[loss=0.01134, audio_tagging_loss=0.01134, over 24750.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4943981.51 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:31:37,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1090160.0, ans=0.0 2023-12-23 10:31:45,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=12.0 2023-12-23 10:31:54,687 INFO [train.py:886] (0/4) Epoch 35, batch 1500, loss[loss=0.01245, audio_tagging_loss=0.01245, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4945495.62 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:32:03,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1090293.3333333333, ans=0.07 2023-12-23 10:32:04,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1090360.0, ans=0.125 2023-12-23 10:32:05,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1090360.0, ans=0.1 2023-12-23 10:32:13,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1090360.0, ans=0.025 2023-12-23 10:32:22,401 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.067e+01 3.460e+01 3.584e+01 3.712e+01 4.259e+01, threshold=7.168e+01, percent-clipped=0.0 2023-12-23 10:32:37,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1090560.0, ans=0.0 2023-12-23 10:32:40,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1090560.0, ans=0.025 2023-12-23 10:32:41,267 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2023-12-23 10:32:44,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2023-12-23 10:32:46,364 INFO [train.py:886] (0/4) Epoch 35, batch 1550, loss[loss=0.01061, audio_tagging_loss=0.01061, over 24750.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4943712.61 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:32:48,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1090626.6666666667, ans=0.2 2023-12-23 10:33:03,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1090693.3333333333, ans=0.2 2023-12-23 10:33:04,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1090693.3333333333, ans=0.035 2023-12-23 10:33:15,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1090760.0, ans=0.125 2023-12-23 10:33:22,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1090826.6666666667, ans=0.2 2023-12-23 10:33:26,725 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=15.0 2023-12-23 10:33:27,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1090893.3333333333, ans=0.125 2023-12-23 10:33:28,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1090893.3333333333, ans=0.125 2023-12-23 10:33:29,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1090893.3333333333, ans=0.1 2023-12-23 10:33:37,818 INFO [train.py:886] (0/4) Epoch 35, batch 1600, loss[loss=0.01486, audio_tagging_loss=0.01486, over 24750.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4937670.80 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:33:43,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1090960.0, ans=0.2 2023-12-23 10:33:50,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1091026.6666666667, ans=0.125 2023-12-23 10:33:51,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2023-12-23 10:33:53,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1091026.6666666667, ans=0.0 2023-12-23 10:34:01,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1091093.3333333333, ans=0.125 2023-12-23 10:34:03,963 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.241e+01 3.514e+01 3.658e+01 3.797e+01 4.440e+01, threshold=7.316e+01, percent-clipped=0.0 2023-12-23 10:34:13,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1091160.0, ans=0.1 2023-12-23 10:34:27,909 INFO [train.py:886] (0/4) Epoch 35, batch 1650, loss[loss=0.01179, audio_tagging_loss=0.01179, over 25000.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4934251.27 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:34:51,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1091426.6666666667, ans=0.125 2023-12-23 10:35:16,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1091560.0, ans=0.1 2023-12-23 10:35:19,577 INFO [train.py:886] (0/4) Epoch 35, batch 1700, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4935445.88 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:35:34,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.84 vs. limit=22.5 2023-12-23 10:35:35,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1091693.3333333333, ans=0.95 2023-12-23 10:35:37,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=12.0 2023-12-23 10:35:45,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1091760.0, ans=0.125 2023-12-23 10:35:46,550 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.088e+01 3.407e+01 3.580e+01 3.752e+01 4.487e+01, threshold=7.159e+01, percent-clipped=0.0 2023-12-23 10:35:51,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1091826.6666666667, ans=0.125 2023-12-23 10:35:58,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.15 vs. limit=10.0 2023-12-23 10:36:05,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1091893.3333333333, ans=0.0 2023-12-23 10:36:07,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1091893.3333333333, ans=0.125 2023-12-23 10:36:07,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1091893.3333333333, ans=10.0 2023-12-23 10:36:09,043 INFO [train.py:886] (0/4) Epoch 35, batch 1750, loss[loss=0.01184, audio_tagging_loss=0.01184, over 25000.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4938281.88 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:36:13,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1091960.0, ans=0.1 2023-12-23 10:36:24,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1092026.6666666667, ans=0.125 2023-12-23 10:36:30,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1092093.3333333333, ans=0.0 2023-12-23 10:36:50,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2023-12-23 10:36:58,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2023-12-23 10:37:01,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2023-12-23 10:37:01,892 INFO [train.py:886] (0/4) Epoch 35, batch 1800, loss[loss=0.01144, audio_tagging_loss=0.01144, over 25000.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4943564.42 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:37:04,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1092293.3333333333, ans=0.1 2023-12-23 10:37:04,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1092293.3333333333, ans=0.05 2023-12-23 10:37:18,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1092360.0, ans=0.125 2023-12-23 10:37:29,523 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.056e+01 3.488e+01 3.614e+01 3.772e+01 4.751e+01, threshold=7.228e+01, percent-clipped=0.0 2023-12-23 10:37:51,339 INFO [train.py:886] (0/4) Epoch 35, batch 1850, loss[loss=0.01036, audio_tagging_loss=0.01036, over 20807.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4942830.26 frames. ], batch size: 107, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:38:08,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2023-12-23 10:38:10,077 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-12-23 10:38:13,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2023-12-23 10:38:30,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1092826.6666666667, ans=0.1 2023-12-23 10:38:34,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1092893.3333333333, ans=0.0 2023-12-23 10:38:42,691 INFO [train.py:886] (0/4) Epoch 35, batch 1900, loss[loss=0.01257, audio_tagging_loss=0.01257, over 24750.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4942489.34 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:38:44,957 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-12-23 10:38:55,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1093026.6666666667, ans=0.1 2023-12-23 10:39:01,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1093026.6666666667, ans=0.125 2023-12-23 10:39:10,605 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.113e+01 3.449e+01 3.631e+01 3.772e+01 4.886e+01, threshold=7.262e+01, percent-clipped=0.0 2023-12-23 10:39:16,831 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.61 vs. limit=15.0 2023-12-23 10:39:19,750 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.73 vs. limit=22.5 2023-12-23 10:39:22,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1093226.6666666667, ans=0.0 2023-12-23 10:39:26,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1093226.6666666667, ans=0.0 2023-12-23 10:39:32,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1093226.6666666667, ans=0.025 2023-12-23 10:39:35,181 INFO [train.py:886] (0/4) Epoch 35, batch 1950, loss[loss=0.01218, audio_tagging_loss=0.01218, over 25000.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4942578.65 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:39:37,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1093293.3333333333, ans=0.125 2023-12-23 10:39:40,130 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-164000.pt 2023-12-23 10:39:48,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1093360.0, ans=0.125 2023-12-23 10:39:49,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1093360.0, ans=0.0 2023-12-23 10:39:53,047 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=15.0 2023-12-23 10:39:59,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1093426.6666666667, ans=0.0 2023-12-23 10:39:59,137 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:39:59,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1093426.6666666667, ans=0.1 2023-12-23 10:40:01,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1093426.6666666667, ans=0.1 2023-12-23 10:40:04,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1093426.6666666667, ans=0.0 2023-12-23 10:40:15,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1093493.3333333333, ans=0.125 2023-12-23 10:40:16,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=12.0 2023-12-23 10:40:27,284 INFO [train.py:886] (0/4) Epoch 35, batch 2000, loss[loss=0.01111, audio_tagging_loss=0.01111, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4947951.17 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:40:28,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1093626.6666666667, ans=0.1 2023-12-23 10:40:36,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=12.0 2023-12-23 10:40:52,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1093760.0, ans=0.125 2023-12-23 10:40:55,723 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.152e+01 3.397e+01 3.574e+01 3.710e+01 4.548e+01, threshold=7.148e+01, percent-clipped=0.0 2023-12-23 10:41:03,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1093826.6666666667, ans=0.125 2023-12-23 10:41:13,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1093893.3333333333, ans=0.125 2023-12-23 10:41:14,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1093893.3333333333, ans=0.125 2023-12-23 10:41:20,423 INFO [train.py:886] (0/4) Epoch 35, batch 2050, loss[loss=0.01121, audio_tagging_loss=0.01121, over 25000.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4953986.57 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:41:40,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1094093.3333333333, ans=0.125 2023-12-23 10:41:42,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1094093.3333333333, ans=0.125 2023-12-23 10:41:45,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1094093.3333333333, ans=15.0 2023-12-23 10:41:51,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.58 vs. limit=15.0 2023-12-23 10:41:54,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2023-12-23 10:41:59,504 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:42:02,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.83 vs. limit=10.0 2023-12-23 10:42:07,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1094226.6666666667, ans=0.125 2023-12-23 10:42:10,519 INFO [train.py:886] (0/4) Epoch 35, batch 2100, loss[loss=0.01357, audio_tagging_loss=0.01357, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4959161.48 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:42:15,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1094293.3333333333, ans=0.1 2023-12-23 10:42:18,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2023-12-23 10:42:21,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2023-12-23 10:42:23,680 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:42:25,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=12.0 2023-12-23 10:42:37,893 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.139e+01 3.441e+01 3.605e+01 3.842e+01 4.378e+01, threshold=7.210e+01, percent-clipped=0.0 2023-12-23 10:42:44,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2023-12-23 10:42:52,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1094560.0, ans=15.0 2023-12-23 10:43:02,010 INFO [train.py:886] (0/4) Epoch 35, batch 2150, loss[loss=0.01298, audio_tagging_loss=0.01298, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4958702.70 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:43:14,699 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.07 vs. limit=15.0 2023-12-23 10:43:19,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1094693.3333333333, ans=0.125 2023-12-23 10:43:19,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1094693.3333333333, ans=0.1 2023-12-23 10:43:32,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1094826.6666666667, ans=0.125 2023-12-23 10:43:36,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.81 vs. limit=22.5 2023-12-23 10:43:53,291 INFO [train.py:886] (0/4) Epoch 35, batch 2200, loss[loss=0.01511, audio_tagging_loss=0.01511, over 24951.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4955385.94 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:44:20,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1095093.3333333333, ans=0.125 2023-12-23 10:44:22,586 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.222e+01 3.461e+01 3.667e+01 3.781e+01 4.293e+01, threshold=7.335e+01, percent-clipped=0.0 2023-12-23 10:44:44,094 INFO [train.py:886] (0/4) Epoch 35, batch 2250, loss[loss=0.0107, audio_tagging_loss=0.0107, over 24750.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4949989.59 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:45:01,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.80 vs. limit=6.0 2023-12-23 10:45:09,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1095426.6666666667, ans=0.125 2023-12-23 10:45:34,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1095626.6666666667, ans=0.2 2023-12-23 10:45:35,340 INFO [train.py:886] (0/4) Epoch 35, batch 2300, loss[loss=0.01003, audio_tagging_loss=0.01003, over 25000.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4947206.88 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:45:45,127 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:45:53,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1095693.3333333333, ans=0.125 2023-12-23 10:45:57,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1095760.0, ans=0.125 2023-12-23 10:46:03,976 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.037e+01 3.412e+01 3.576e+01 3.677e+01 4.204e+01, threshold=7.151e+01, percent-clipped=0.0 2023-12-23 10:46:04,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1095760.0, ans=0.0 2023-12-23 10:46:15,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1095893.3333333333, ans=0.125 2023-12-23 10:46:17,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1095893.3333333333, ans=0.125 2023-12-23 10:46:27,857 INFO [train.py:886] (0/4) Epoch 35, batch 2350, loss[loss=0.01218, audio_tagging_loss=0.01218, over 25000.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4949814.70 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:46:35,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1095960.0, ans=0.1 2023-12-23 10:46:42,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1096026.6666666667, ans=0.125 2023-12-23 10:47:11,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1096226.6666666667, ans=0.2 2023-12-23 10:47:18,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1096293.3333333333, ans=0.125 2023-12-23 10:47:19,163 INFO [train.py:886] (0/4) Epoch 35, batch 2400, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4953357.57 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:47:22,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1096293.3333333333, ans=0.125 2023-12-23 10:47:28,529 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2023-12-23 10:47:32,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1096360.0, ans=0.0 2023-12-23 10:47:41,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1096426.6666666667, ans=0.0 2023-12-23 10:47:47,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1096426.6666666667, ans=0.0 2023-12-23 10:47:48,608 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.150e+01 3.451e+01 3.579e+01 3.689e+01 4.162e+01, threshold=7.158e+01, percent-clipped=0.0 2023-12-23 10:47:49,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1096493.3333333333, ans=0.0 2023-12-23 10:47:57,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1096493.3333333333, ans=0.2 2023-12-23 10:47:59,247 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=12.0 2023-12-23 10:48:05,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1096560.0, ans=0.5 2023-12-23 10:48:10,844 INFO [train.py:886] (0/4) Epoch 35, batch 2450, loss[loss=0.01057, audio_tagging_loss=0.01057, over 25000.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4959112.98 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:48:12,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1096626.6666666667, ans=0.1 2023-12-23 10:48:16,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1096626.6666666667, ans=0.0 2023-12-23 10:48:18,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=22.5 2023-12-23 10:48:20,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1096693.3333333333, ans=0.125 2023-12-23 10:48:24,479 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.27 vs. limit=22.5 2023-12-23 10:48:26,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1096693.3333333333, ans=0.125 2023-12-23 10:48:38,057 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.99 vs. limit=10.0 2023-12-23 10:48:41,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1096826.6666666667, ans=0.04949747468305833 2023-12-23 10:48:44,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.74 vs. limit=15.0 2023-12-23 10:49:01,551 INFO [train.py:886] (0/4) Epoch 35, batch 2500, loss[loss=0.01074, audio_tagging_loss=0.01074, over 24750.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4961411.00 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:49:18,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1097026.6666666667, ans=0.09899494936611666 2023-12-23 10:49:22,343 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2023-12-23 10:49:30,649 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.928e+01 3.456e+01 3.603e+01 3.843e+01 4.868e+01, threshold=7.207e+01, percent-clipped=0.0 2023-12-23 10:49:33,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1097160.0, ans=0.125 2023-12-23 10:49:43,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1097226.6666666667, ans=0.1 2023-12-23 10:49:52,986 INFO [train.py:886] (0/4) Epoch 35, batch 2550, loss[loss=0.01126, audio_tagging_loss=0.01126, over 24750.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4945739.34 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:50:21,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1097426.6666666667, ans=0.125 2023-12-23 10:50:29,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1097493.3333333333, ans=0.1 2023-12-23 10:50:46,618 INFO [train.py:886] (0/4) Epoch 35, batch 2600, loss[loss=0.01176, audio_tagging_loss=0.01176, over 25000.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4940226.13 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:50:52,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1097626.6666666667, ans=0.125 2023-12-23 10:50:53,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1097626.6666666667, ans=0.0 2023-12-23 10:51:02,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.08 vs. limit=12.0 2023-12-23 10:51:05,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.62 vs. limit=15.0 2023-12-23 10:51:05,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=12.0 2023-12-23 10:51:15,870 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.061e+01 3.460e+01 3.619e+01 3.732e+01 4.232e+01, threshold=7.237e+01, percent-clipped=0.0 2023-12-23 10:51:23,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1097826.6666666667, ans=0.0 2023-12-23 10:51:31,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.64 vs. limit=15.0 2023-12-23 10:51:34,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1097893.3333333333, ans=0.125 2023-12-23 10:51:37,641 INFO [train.py:886] (0/4) Epoch 35, batch 2650, loss[loss=0.00872, audio_tagging_loss=0.00872, over 25000.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4944173.48 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:51:45,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1097960.0, ans=0.0 2023-12-23 10:51:52,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1098026.6666666667, ans=0.125 2023-12-23 10:51:53,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-12-23 10:52:04,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1098093.3333333333, ans=0.125 2023-12-23 10:52:28,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1098226.6666666667, ans=0.125 2023-12-23 10:52:28,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1098226.6666666667, ans=0.1 2023-12-23 10:52:29,776 INFO [train.py:886] (0/4) Epoch 35, batch 2700, loss[loss=0.01139, audio_tagging_loss=0.01139, over 23977.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4944738.43 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:52:35,080 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.33 vs. limit=15.0 2023-12-23 10:52:50,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1098426.6666666667, ans=0.05 2023-12-23 10:52:54,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1098426.6666666667, ans=0.125 2023-12-23 10:52:58,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.51 vs. limit=22.5 2023-12-23 10:52:59,079 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.192e+01 3.447e+01 3.571e+01 3.720e+01 4.339e+01, threshold=7.142e+01, percent-clipped=0.0 2023-12-23 10:53:19,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1098560.0, ans=0.125 2023-12-23 10:53:21,984 INFO [train.py:886] (0/4) Epoch 35, batch 2750, loss[loss=0.009815, audio_tagging_loss=0.009815, over 25000.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4949742.73 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:53:35,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.62 vs. limit=15.0 2023-12-23 10:53:38,642 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=12.0 2023-12-23 10:53:39,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1098693.3333333333, ans=0.1 2023-12-23 10:53:49,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2023-12-23 10:53:53,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1098826.6666666667, ans=10.0 2023-12-23 10:53:59,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2023-12-23 10:54:10,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1098893.3333333333, ans=0.5 2023-12-23 10:54:11,631 INFO [train.py:886] (0/4) Epoch 35, batch 2800, loss[loss=0.01251, audio_tagging_loss=0.01251, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4949088.84 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:54:41,033 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.165e+01 3.455e+01 3.628e+01 3.845e+01 4.485e+01, threshold=7.256e+01, percent-clipped=0.0 2023-12-23 10:54:41,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1099093.3333333333, ans=0.0 2023-12-23 10:54:49,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1099160.0, ans=15.0 2023-12-23 10:54:52,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1099160.0, ans=0.125 2023-12-23 10:55:04,711 INFO [train.py:886] (0/4) Epoch 35, batch 2850, loss[loss=0.01165, audio_tagging_loss=0.01165, over 24750.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4944596.02 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:55:11,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1099293.3333333333, ans=0.125 2023-12-23 10:55:18,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1099360.0, ans=0.0 2023-12-23 10:55:29,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1099426.6666666667, ans=0.04949747468305833 2023-12-23 10:55:44,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1099493.3333333333, ans=0.1 2023-12-23 10:55:48,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1099560.0, ans=0.1 2023-12-23 10:55:57,059 INFO [train.py:886] (0/4) Epoch 35, batch 2900, loss[loss=0.01184, audio_tagging_loss=0.01184, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4947488.07 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:56:07,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1099693.3333333333, ans=0.125 2023-12-23 10:56:20,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1099760.0, ans=0.125 2023-12-23 10:56:24,421 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.115e+01 3.411e+01 3.569e+01 3.817e+01 4.301e+01, threshold=7.139e+01, percent-clipped=0.0 2023-12-23 10:56:39,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1099893.3333333333, ans=0.015 2023-12-23 10:56:48,110 INFO [train.py:886] (0/4) Epoch 35, batch 2950, loss[loss=0.01114, audio_tagging_loss=0.01114, over 25000.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4945752.25 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:57:02,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-12-23 10:57:07,799 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=12.0 2023-12-23 10:57:17,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1100093.3333333333, ans=0.125 2023-12-23 10:57:22,040 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=12.0 2023-12-23 10:57:41,393 INFO [train.py:886] (0/4) Epoch 35, batch 3000, loss[loss=0.01133, audio_tagging_loss=0.01133, over 23978.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4950636.84 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:57:41,394 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 10:58:02,714 INFO [train.py:917] (0/4) Epoch 35, validation: loss=0.03345, audio_tagging_loss=0.03345, over 3737520.00 frames. 2023-12-23 10:58:02,715 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 10:58:03,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.25 vs. limit=6.0 2023-12-23 10:58:09,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1100293.3333333333, ans=0.2 2023-12-23 10:58:27,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1100426.6666666667, ans=0.125 2023-12-23 10:58:30,593 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.128e+01 3.436e+01 3.631e+01 3.835e+01 4.770e+01, threshold=7.261e+01, percent-clipped=0.0 2023-12-23 10:58:36,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1100493.3333333333, ans=0.1 2023-12-23 10:58:52,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1100560.0, ans=0.125 2023-12-23 10:58:54,472 INFO [train.py:886] (0/4) Epoch 35, batch 3050, loss[loss=0.01125, audio_tagging_loss=0.01125, over 25000.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4954316.39 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:59:34,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2023-12-23 10:59:37,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1100893.3333333333, ans=0.125 2023-12-23 10:59:45,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2023-12-23 10:59:45,920 INFO [train.py:886] (0/4) Epoch 35, batch 3100, loss[loss=0.01214, audio_tagging_loss=0.01214, over 24750.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4950446.64 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:59:48,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1100960.0, ans=0.125 2023-12-23 11:00:15,064 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.234e+01 3.536e+01 3.677e+01 3.842e+01 4.191e+01, threshold=7.354e+01, percent-clipped=0.0 2023-12-23 11:00:17,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1101160.0, ans=0.1 2023-12-23 11:00:27,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2023-12-23 11:00:27,767 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2023-12-23 11:00:36,557 INFO [train.py:886] (0/4) Epoch 35, batch 3150, loss[loss=0.0127, audio_tagging_loss=0.0127, over 24750.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4946889.05 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 11:00:39,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1101293.3333333333, ans=0.0 2023-12-23 11:01:04,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1101426.6666666667, ans=0.125 2023-12-23 11:01:25,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2023-12-23 11:01:26,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1101560.0, ans=0.1 2023-12-23 11:01:28,645 INFO [train.py:886] (0/4) Epoch 35, batch 3200, loss[loss=0.01068, audio_tagging_loss=0.01068, over 25000.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4945093.00 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 11:01:33,750 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2023-12-23 11:01:42,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1101693.3333333333, ans=0.125 2023-12-23 11:01:46,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1101693.3333333333, ans=0.125 2023-12-23 11:01:57,269 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.006e+01 3.428e+01 3.617e+01 3.805e+01 4.182e+01, threshold=7.234e+01, percent-clipped=0.0 2023-12-23 11:02:09,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1101893.3333333333, ans=0.1 2023-12-23 11:02:19,489 INFO [train.py:886] (0/4) Epoch 35, batch 3250, loss[loss=0.01072, audio_tagging_loss=0.01072, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4946586.54 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 11:02:22,687 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:02:26,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1101960.0, ans=0.125 2023-12-23 11:02:29,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=15.0 2023-12-23 11:02:34,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2023-12-23 11:02:42,320 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.140e-03 2023-12-23 11:02:45,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1102093.3333333333, ans=0.1 2023-12-23 11:03:09,890 INFO [train.py:886] (0/4) Epoch 35, batch 3300, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4949772.22 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 11:03:13,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1102293.3333333333, ans=0.0 2023-12-23 11:03:23,723 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=6.0 2023-12-23 11:03:39,474 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.083e+01 3.432e+01 3.586e+01 3.721e+01 4.248e+01, threshold=7.173e+01, percent-clipped=0.0 2023-12-23 11:03:51,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1102560.0, ans=0.1 2023-12-23 11:03:56,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1102560.0, ans=0.015 2023-12-23 11:04:02,289 INFO [train.py:886] (0/4) Epoch 35, batch 3350, loss[loss=0.01002, audio_tagging_loss=0.01002, over 22234.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4948971.93 frames. ], batch size: 107, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:04:07,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1102626.6666666667, ans=0.125 2023-12-23 11:04:09,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1102626.6666666667, ans=0.125 2023-12-23 11:04:10,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1102626.6666666667, ans=0.125 2023-12-23 11:04:13,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1102693.3333333333, ans=0.0 2023-12-23 11:04:27,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1102760.0, ans=0.0 2023-12-23 11:04:35,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1102826.6666666667, ans=0.125 2023-12-23 11:04:37,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1102826.6666666667, ans=0.1 2023-12-23 11:04:47,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2023-12-23 11:04:48,310 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:04:49,515 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2023-12-23 11:04:52,947 INFO [train.py:886] (0/4) Epoch 35, batch 3400, loss[loss=0.01124, audio_tagging_loss=0.01124, over 25000.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4951253.95 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:05:11,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2023-12-23 11:05:22,235 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.114e+01 3.482e+01 3.648e+01 3.813e+01 4.164e+01, threshold=7.297e+01, percent-clipped=0.0 2023-12-23 11:05:38,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1103226.6666666667, ans=0.2 2023-12-23 11:05:45,972 INFO [train.py:886] (0/4) Epoch 35, batch 3450, loss[loss=0.009631, audio_tagging_loss=0.009631, over 24253.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4948553.04 frames. ], batch size: 101, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:05:48,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1103293.3333333333, ans=0.09899494936611666 2023-12-23 11:05:49,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1103293.3333333333, ans=0.125 2023-12-23 11:06:02,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1103360.0, ans=0.125 2023-12-23 11:06:05,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1103360.0, ans=0.0 2023-12-23 11:06:07,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1103426.6666666667, ans=0.1 2023-12-23 11:06:08,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1103426.6666666667, ans=0.125 2023-12-23 11:06:21,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1103493.3333333333, ans=0.04949747468305833 2023-12-23 11:06:26,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1103560.0, ans=0.125 2023-12-23 11:06:37,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2023-12-23 11:06:38,340 INFO [train.py:886] (0/4) Epoch 35, batch 3500, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4942677.21 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:06:39,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1103626.6666666667, ans=0.125 2023-12-23 11:06:48,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1103693.3333333333, ans=0.2 2023-12-23 11:06:55,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1103693.3333333333, ans=0.0 2023-12-23 11:06:56,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2023-12-23 11:07:07,533 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.194e+01 3.520e+01 3.666e+01 3.849e+01 4.626e+01, threshold=7.332e+01, percent-clipped=0.0 2023-12-23 11:07:15,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1103826.6666666667, ans=0.07 2023-12-23 11:07:17,376 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.88 vs. limit=15.0 2023-12-23 11:07:22,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1103893.3333333333, ans=0.125 2023-12-23 11:07:22,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1103893.3333333333, ans=0.125 2023-12-23 11:07:25,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2023-12-23 11:07:29,125 INFO [train.py:886] (0/4) Epoch 35, batch 3550, loss[loss=0.01288, audio_tagging_loss=0.01288, over 25000.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4944013.50 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:08:00,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1104160.0, ans=0.2 2023-12-23 11:08:02,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1104160.0, ans=0.0 2023-12-23 11:08:18,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1104226.6666666667, ans=0.125 2023-12-23 11:08:21,772 INFO [train.py:886] (0/4) Epoch 35, batch 3600, loss[loss=0.01224, audio_tagging_loss=0.01224, over 25000.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4945955.11 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:08:31,588 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.97 vs. limit=10.0 2023-12-23 11:08:44,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1104426.6666666667, ans=0.0 2023-12-23 11:08:51,253 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.150e+01 3.469e+01 3.628e+01 3.807e+01 4.642e+01, threshold=7.257e+01, percent-clipped=0.0 2023-12-23 11:09:01,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1104493.3333333333, ans=0.125 2023-12-23 11:09:06,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1104560.0, ans=0.1 2023-12-23 11:09:14,213 INFO [train.py:886] (0/4) Epoch 35, batch 3650, loss[loss=0.01111, audio_tagging_loss=0.01111, over 25000.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4947885.26 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:09:28,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1104693.3333333333, ans=0.1 2023-12-23 11:09:31,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1104693.3333333333, ans=0.0 2023-12-23 11:09:36,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2023-12-23 11:10:05,073 INFO [train.py:886] (0/4) Epoch 35, batch 3700, loss[loss=0.01062, audio_tagging_loss=0.01062, over 25000.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4947034.98 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:10:06,254 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:10:06,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1104960.0, ans=0.125 2023-12-23 11:10:30,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1105093.3333333333, ans=0.125 2023-12-23 11:10:33,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1105093.3333333333, ans=0.0 2023-12-23 11:10:34,222 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.068e+01 3.497e+01 3.613e+01 3.767e+01 4.191e+01, threshold=7.225e+01, percent-clipped=0.0 2023-12-23 11:10:44,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1105160.0, ans=0.125 2023-12-23 11:10:52,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1105226.6666666667, ans=0.125 2023-12-23 11:10:58,100 INFO [train.py:886] (0/4) Epoch 35, batch 3750, loss[loss=0.0121, audio_tagging_loss=0.0121, over 24750.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4952917.53 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:11:27,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1105426.6666666667, ans=0.0 2023-12-23 11:11:49,016 INFO [train.py:886] (0/4) Epoch 35, batch 3800, loss[loss=0.009153, audio_tagging_loss=0.009153, over 24750.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4950032.28 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:12:17,343 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.65 vs. limit=15.0 2023-12-23 11:12:17,673 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.190e+01 3.510e+01 3.632e+01 3.780e+01 4.294e+01, threshold=7.263e+01, percent-clipped=0.0 2023-12-23 11:12:32,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1105893.3333333333, ans=0.125 2023-12-23 11:12:41,343 INFO [train.py:886] (0/4) Epoch 35, batch 3850, loss[loss=0.01229, audio_tagging_loss=0.01229, over 22480.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4949588.80 frames. ], batch size: 107, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:12:41,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1105960.0, ans=0.1 2023-12-23 11:12:48,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1105960.0, ans=0.0 2023-12-23 11:13:11,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1106160.0, ans=0.125 2023-12-23 11:13:17,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1106160.0, ans=0.125 2023-12-23 11:13:33,054 INFO [train.py:886] (0/4) Epoch 35, batch 3900, loss[loss=0.01226, audio_tagging_loss=0.01226, over 24750.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4952973.27 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:13:36,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1106293.3333333333, ans=0.0 2023-12-23 11:13:41,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1106360.0, ans=0.1 2023-12-23 11:13:44,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2023-12-23 11:14:01,026 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.137e+01 3.458e+01 3.613e+01 3.736e+01 4.379e+01, threshold=7.225e+01, percent-clipped=0.0 2023-12-23 11:14:11,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2023-12-23 11:14:11,924 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.57 vs. limit=15.0 2023-12-23 11:14:22,786 INFO [train.py:886] (0/4) Epoch 35, batch 3950, loss[loss=0.01332, audio_tagging_loss=0.01332, over 25000.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4956374.84 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:14:24,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1106626.6666666667, ans=0.2 2023-12-23 11:14:41,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1106693.3333333333, ans=0.0 2023-12-23 11:15:14,814 INFO [train.py:886] (0/4) Epoch 35, batch 4000, loss[loss=0.01136, audio_tagging_loss=0.01136, over 25000.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4960968.81 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:15:17,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1106960.0, ans=0.0 2023-12-23 11:15:22,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1106960.0, ans=0.1 2023-12-23 11:15:25,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=12.0 2023-12-23 11:15:39,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=15.0 2023-12-23 11:15:42,968 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.176e+01 3.470e+01 3.615e+01 3.743e+01 4.164e+01, threshold=7.230e+01, percent-clipped=0.0 2023-12-23 11:15:46,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1107160.0, ans=0.0 2023-12-23 11:16:03,887 INFO [train.py:886] (0/4) Epoch 35, batch 4050, loss[loss=0.01144, audio_tagging_loss=0.01144, over 24750.00 frames. ], tot_loss[loss=0.01204, audio_tagging_loss=0.01204, over 4964949.29 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:16:13,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1107360.0, ans=0.125 2023-12-23 11:16:14,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1107360.0, ans=0.1 2023-12-23 11:16:27,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1107426.6666666667, ans=0.025 2023-12-23 11:16:39,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1107493.3333333333, ans=0.1 2023-12-23 11:16:54,001 INFO [train.py:886] (0/4) Epoch 35, batch 4100, loss[loss=0.0163, audio_tagging_loss=0.0163, over 24941.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4960214.05 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:16:55,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1107626.6666666667, ans=0.2 2023-12-23 11:17:02,774 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:17:09,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1107693.3333333333, ans=0.2 2023-12-23 11:17:12,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1107693.3333333333, ans=0.0 2023-12-23 11:17:23,016 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.082e+01 3.446e+01 3.615e+01 3.831e+01 4.582e+01, threshold=7.231e+01, percent-clipped=0.0 2023-12-23 11:17:35,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1107893.3333333333, ans=0.0 2023-12-23 11:17:46,692 INFO [train.py:886] (0/4) Epoch 35, batch 4150, loss[loss=0.01232, audio_tagging_loss=0.01232, over 24750.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4954481.96 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:17:53,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1107960.0, ans=0.125 2023-12-23 11:18:02,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1108026.6666666667, ans=0.125 2023-12-23 11:18:04,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1108026.6666666667, ans=0.125 2023-12-23 11:18:14,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1108093.3333333333, ans=0.2 2023-12-23 11:18:14,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1108093.3333333333, ans=0.2 2023-12-23 11:18:30,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1108226.6666666667, ans=0.125 2023-12-23 11:18:36,268 INFO [train.py:886] (0/4) Epoch 35, batch 4200, loss[loss=0.01171, audio_tagging_loss=0.01171, over 24078.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4956804.55 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 64.0 2023-12-23 11:18:48,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1108360.0, ans=0.1 2023-12-23 11:18:48,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1108360.0, ans=0.125 2023-12-23 11:18:54,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1108360.0, ans=0.0 2023-12-23 11:19:00,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1108426.6666666667, ans=0.0 2023-12-23 11:19:04,620 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.094e+01 3.379e+01 3.552e+01 3.711e+01 4.184e+01, threshold=7.105e+01, percent-clipped=0.0 2023-12-23 11:19:06,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1108493.3333333333, ans=0.125 2023-12-23 11:19:22,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=1108560.0, ans=0.1 2023-12-23 11:19:27,413 INFO [train.py:886] (0/4) Epoch 35, batch 4250, loss[loss=0.01067, audio_tagging_loss=0.01067, over 25000.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4958548.67 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 64.0 2023-12-23 11:19:33,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1108626.6666666667, ans=0.2 2023-12-23 11:19:37,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1108693.3333333333, ans=0.125 2023-12-23 11:19:49,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1108760.0, ans=0.1 2023-12-23 11:19:59,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1108826.6666666667, ans=0.125 2023-12-23 11:20:04,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1108826.6666666667, ans=0.0 2023-12-23 11:20:08,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=22.5 2023-12-23 11:20:15,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1108893.3333333333, ans=0.125 2023-12-23 11:20:18,353 INFO [train.py:886] (0/4) Epoch 35, batch 4300, loss[loss=0.01114, audio_tagging_loss=0.01114, over 23987.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4953027.37 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 64.0 2023-12-23 11:20:33,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1109026.6666666667, ans=0.125 2023-12-23 11:20:47,000 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.212e+01 3.455e+01 3.593e+01 3.734e+01 4.513e+01, threshold=7.186e+01, percent-clipped=0.0 2023-12-23 11:20:55,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1109160.0, ans=0.2 2023-12-23 11:20:56,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2023-12-23 11:21:06,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.18 vs. limit=15.0 2023-12-23 11:21:10,812 INFO [train.py:886] (0/4) Epoch 35, batch 4350, loss[loss=0.01045, audio_tagging_loss=0.01045, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4950407.75 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 64.0 2023-12-23 11:21:27,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1109360.0, ans=0.1 2023-12-23 11:21:32,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1109426.6666666667, ans=0.1 2023-12-23 11:21:44,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.09 vs. limit=22.5 2023-12-23 11:21:48,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1109493.3333333333, ans=0.07 2023-12-23 11:22:03,355 INFO [train.py:886] (0/4) Epoch 35, batch 4400, loss[loss=0.0113, audio_tagging_loss=0.0113, over 24750.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4947934.47 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 64.0 2023-12-23 11:22:03,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1109626.6666666667, ans=0.125 2023-12-23 11:22:13,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.42 vs. limit=15.0 2023-12-23 11:22:16,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1109693.3333333333, ans=0.125 2023-12-23 11:22:32,612 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.229e+01 3.551e+01 3.706e+01 3.864e+01 4.550e+01, threshold=7.411e+01, percent-clipped=0.0 2023-12-23 11:22:32,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1109760.0, ans=0.04949747468305833 2023-12-23 11:22:54,178 INFO [train.py:886] (0/4) Epoch 35, batch 4450, loss[loss=0.01487, audio_tagging_loss=0.01487, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4944807.08 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:23:00,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1109960.0, ans=0.125 2023-12-23 11:23:05,105 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:23:12,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-12-23 11:23:18,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1110093.3333333333, ans=0.2 2023-12-23 11:23:22,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1110093.3333333333, ans=0.0 2023-12-23 11:23:27,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1110160.0, ans=0.0 2023-12-23 11:23:39,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1110226.6666666667, ans=0.1 2023-12-23 11:23:44,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=12.0 2023-12-23 11:23:47,259 INFO [train.py:886] (0/4) Epoch 35, batch 4500, loss[loss=0.01246, audio_tagging_loss=0.01246, over 25000.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4944026.46 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:24:00,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1110360.0, ans=0.0 2023-12-23 11:24:02,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2023-12-23 11:24:13,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1110426.6666666667, ans=0.0 2023-12-23 11:24:16,593 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.136e+01 3.438e+01 3.620e+01 3.863e+01 4.550e+01, threshold=7.241e+01, percent-clipped=0.0 2023-12-23 11:24:24,679 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2023-12-23 11:24:30,390 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.37 vs. limit=15.0 2023-12-23 11:24:39,139 INFO [train.py:886] (0/4) Epoch 35, batch 4550, loss[loss=0.009732, audio_tagging_loss=0.009732, over 25000.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4946690.70 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:24:44,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1110626.6666666667, ans=0.1 2023-12-23 11:24:52,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=15.0 2023-12-23 11:25:10,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=22.5 2023-12-23 11:25:16,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2023-12-23 11:25:22,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1110893.3333333333, ans=0.125 2023-12-23 11:25:29,950 INFO [train.py:886] (0/4) Epoch 35, batch 4600, loss[loss=0.013, audio_tagging_loss=0.013, over 25000.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4951550.83 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:25:43,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1111026.6666666667, ans=0.1 2023-12-23 11:25:59,324 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.087e+01 3.428e+01 3.565e+01 3.717e+01 4.348e+01, threshold=7.130e+01, percent-clipped=0.0 2023-12-23 11:26:22,426 INFO [train.py:886] (0/4) Epoch 35, batch 4650, loss[loss=0.01363, audio_tagging_loss=0.01363, over 24935.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4955187.02 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:26:41,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1111426.6666666667, ans=0.125 2023-12-23 11:26:51,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1111426.6666666667, ans=0.125 2023-12-23 11:26:58,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=1111493.3333333333, ans=12.0 2023-12-23 11:27:00,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2023-12-23 11:27:06,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1111560.0, ans=0.0 2023-12-23 11:27:13,082 INFO [train.py:886] (0/4) Epoch 35, batch 4700, loss[loss=0.01222, audio_tagging_loss=0.01222, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4956674.73 frames. ], batch size: 99, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:27:15,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2023-12-23 11:27:20,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1111626.6666666667, ans=0.125 2023-12-23 11:27:24,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.65 vs. limit=10.0 2023-12-23 11:27:35,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1111760.0, ans=0.125 2023-12-23 11:27:39,775 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.244e+01 3.493e+01 3.654e+01 3.823e+01 4.545e+01, threshold=7.308e+01, percent-clipped=0.0 2023-12-23 11:27:48,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1111826.6666666667, ans=0.125 2023-12-23 11:27:55,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1111893.3333333333, ans=0.0 2023-12-23 11:28:00,257 INFO [train.py:886] (0/4) Epoch 35, batch 4750, loss[loss=0.01055, audio_tagging_loss=0.01055, over 24750.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4953455.35 frames. ], batch size: 99, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:28:11,177 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2023-12-23 11:28:15,910 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-35.pt 2023-12-23 11:28:35,990 INFO [train.py:886] (0/4) Epoch 36, batch 0, loss[loss=0.02542, audio_tagging_loss=0.02542, over 25000.00 frames. ], tot_loss[loss=0.02542, audio_tagging_loss=0.02542, over 25000.00 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:28:35,991 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 11:28:52,883 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1730, 1.0032, 4.4297, 4.2692], device='cuda:0') 2023-12-23 11:28:56,824 INFO [train.py:917] (0/4) Epoch 36, validation: loss=0.0339, audio_tagging_loss=0.0339, over 3737520.00 frames. 2023-12-23 11:28:56,825 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 11:28:58,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.43 vs. limit=15.0 2023-12-23 11:29:01,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1112066.6666666667, ans=0.0 2023-12-23 11:29:07,068 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:29:11,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1112133.3333333333, ans=0.125 2023-12-23 11:29:11,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.84 vs. limit=15.0 2023-12-23 11:29:48,248 INFO [train.py:886] (0/4) Epoch 36, batch 50, loss[loss=0.01447, audio_tagging_loss=0.01447, over 24042.00 frames. ], tot_loss[loss=0.01919, audio_tagging_loss=0.01919, over 1124184.79 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:29:53,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.05 vs. limit=12.0 2023-12-23 11:30:01,871 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.062e+01 3.818e+01 4.375e+01 4.992e+01 9.452e+01, threshold=8.751e+01, percent-clipped=8.0 2023-12-23 11:30:03,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1112466.6666666667, ans=0.125 2023-12-23 11:30:09,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1112533.3333333333, ans=0.125 2023-12-23 11:30:10,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1112533.3333333333, ans=0.05 2023-12-23 11:30:12,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1112533.3333333333, ans=0.05 2023-12-23 11:30:25,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.24 vs. limit=15.0 2023-12-23 11:30:32,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1112666.6666666667, ans=0.1 2023-12-23 11:30:34,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1112666.6666666667, ans=0.1 2023-12-23 11:30:40,121 INFO [train.py:886] (0/4) Epoch 36, batch 100, loss[loss=0.01383, audio_tagging_loss=0.01383, over 25000.00 frames. ], tot_loss[loss=0.01652, audio_tagging_loss=0.01652, over 1978700.59 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:30:56,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1112800.0, ans=0.04949747468305833 2023-12-23 11:31:06,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1112866.6666666667, ans=0.125 2023-12-23 11:31:08,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1112866.6666666667, ans=0.125 2023-12-23 11:31:09,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1112933.3333333333, ans=0.125 2023-12-23 11:31:23,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1113000.0, ans=0.1 2023-12-23 11:31:30,713 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-12-23 11:31:31,070 INFO [train.py:886] (0/4) Epoch 36, batch 150, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 2632857.78 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:31:43,988 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.213e+01 3.709e+01 3.865e+01 4.019e+01 4.619e+01, threshold=7.729e+01, percent-clipped=0.0 2023-12-23 11:31:55,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1113200.0, ans=0.125 2023-12-23 11:32:21,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1113333.3333333333, ans=0.125 2023-12-23 11:32:22,757 INFO [train.py:886] (0/4) Epoch 36, batch 200, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 3150139.02 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:32:31,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1113400.0, ans=0.125 2023-12-23 11:33:04,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1113666.6666666667, ans=0.125 2023-12-23 11:33:15,340 INFO [train.py:886] (0/4) Epoch 36, batch 250, loss[loss=0.01536, audio_tagging_loss=0.01536, over 24941.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 3555090.96 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:33:19,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1113733.3333333333, ans=0.0 2023-12-23 11:33:28,142 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.045e+01 3.532e+01 3.658e+01 3.837e+01 4.468e+01, threshold=7.316e+01, percent-clipped=0.0 2023-12-23 11:33:34,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1113866.6666666667, ans=0.125 2023-12-23 11:33:35,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1113866.6666666667, ans=0.0 2023-12-23 11:34:06,865 INFO [train.py:886] (0/4) Epoch 36, batch 300, loss[loss=0.01117, audio_tagging_loss=0.01117, over 24750.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 3859869.67 frames. ], batch size: 99, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:34:13,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1114066.6666666667, ans=0.0 2023-12-23 11:34:13,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=12.0 2023-12-23 11:34:14,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1114066.6666666667, ans=0.1 2023-12-23 11:34:18,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.41 vs. limit=15.0 2023-12-23 11:34:25,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1114133.3333333333, ans=0.125 2023-12-23 11:34:43,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1114266.6666666667, ans=0.2 2023-12-23 11:34:44,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1114266.6666666667, ans=0.2 2023-12-23 11:34:46,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1114266.6666666667, ans=0.2 2023-12-23 11:34:58,115 INFO [train.py:886] (0/4) Epoch 36, batch 350, loss[loss=0.0124, audio_tagging_loss=0.0124, over 24750.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4098809.67 frames. ], batch size: 99, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:35:09,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1114466.6666666667, ans=0.0 2023-12-23 11:35:12,702 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.147e+01 3.467e+01 3.650e+01 3.765e+01 4.145e+01, threshold=7.301e+01, percent-clipped=0.0 2023-12-23 11:35:25,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.51 vs. limit=15.0 2023-12-23 11:35:26,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1114533.3333333333, ans=0.2 2023-12-23 11:35:51,304 INFO [train.py:886] (0/4) Epoch 36, batch 400, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4286169.38 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:36:09,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1114800.0, ans=0.125 2023-12-23 11:36:17,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1114866.6666666667, ans=0.0 2023-12-23 11:36:25,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1114933.3333333333, ans=0.1 2023-12-23 11:36:42,385 INFO [train.py:886] (0/4) Epoch 36, batch 450, loss[loss=0.01169, audio_tagging_loss=0.01169, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4439100.24 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:36:46,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1115066.6666666667, ans=0.125 2023-12-23 11:36:56,755 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.749e+01 3.426e+01 3.574e+01 3.756e+01 4.682e+01, threshold=7.147e+01, percent-clipped=0.0 2023-12-23 11:37:14,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1115266.6666666667, ans=0.2 2023-12-23 11:37:18,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1115266.6666666667, ans=0.05 2023-12-23 11:37:18,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1115266.6666666667, ans=0.0 2023-12-23 11:37:34,602 INFO [train.py:886] (0/4) Epoch 36, batch 500, loss[loss=0.01363, audio_tagging_loss=0.01363, over 25000.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4554779.84 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:38:01,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=15.0 2023-12-23 11:38:11,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.73 vs. limit=15.0 2023-12-23 11:38:14,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2023-12-23 11:38:26,146 INFO [train.py:886] (0/4) Epoch 36, batch 550, loss[loss=0.01017, audio_tagging_loss=0.01017, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4642139.39 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:38:34,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1115733.3333333333, ans=0.0 2023-12-23 11:38:39,184 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.207e+01 3.469e+01 3.649e+01 3.829e+01 4.187e+01, threshold=7.298e+01, percent-clipped=0.0 2023-12-23 11:38:45,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1115866.6666666667, ans=0.2 2023-12-23 11:38:52,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1115866.6666666667, ans=0.125 2023-12-23 11:39:05,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1115933.3333333333, ans=0.125 2023-12-23 11:39:17,357 INFO [train.py:886] (0/4) Epoch 36, batch 600, loss[loss=0.01022, audio_tagging_loss=0.01022, over 24750.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4713002.50 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:39:21,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1116066.6666666667, ans=0.2 2023-12-23 11:39:24,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=12.0 2023-12-23 11:39:31,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1116133.3333333333, ans=0.125 2023-12-23 11:39:37,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1116200.0, ans=0.05 2023-12-23 11:39:44,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1116200.0, ans=0.125 2023-12-23 11:39:58,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1116333.3333333333, ans=0.125 2023-12-23 11:40:06,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1116333.3333333333, ans=0.125 2023-12-23 11:40:08,960 INFO [train.py:886] (0/4) Epoch 36, batch 650, loss[loss=0.0137, audio_tagging_loss=0.0137, over 24025.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4754494.30 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:40:09,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1116400.0, ans=0.1 2023-12-23 11:40:12,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1116400.0, ans=0.0 2023-12-23 11:40:16,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1116400.0, ans=0.2 2023-12-23 11:40:19,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1116466.6666666667, ans=0.125 2023-12-23 11:40:21,171 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.182e+01 3.505e+01 3.653e+01 3.781e+01 4.331e+01, threshold=7.305e+01, percent-clipped=0.0 2023-12-23 11:40:37,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1116533.3333333333, ans=0.0 2023-12-23 11:40:41,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1116600.0, ans=0.125 2023-12-23 11:40:48,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1116600.0, ans=0.125 2023-12-23 11:40:50,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1116666.6666666667, ans=0.1 2023-12-23 11:40:54,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1116666.6666666667, ans=0.1 2023-12-23 11:41:00,103 INFO [train.py:886] (0/4) Epoch 36, batch 700, loss[loss=0.01011, audio_tagging_loss=0.01011, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4798389.67 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:41:01,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1116733.3333333333, ans=15.0 2023-12-23 11:41:25,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1116866.6666666667, ans=0.125 2023-12-23 11:41:45,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1117000.0, ans=0.1 2023-12-23 11:41:50,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1117000.0, ans=0.0 2023-12-23 11:41:52,368 INFO [train.py:886] (0/4) Epoch 36, batch 750, loss[loss=0.01244, audio_tagging_loss=0.01244, over 24750.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4831466.30 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:42:06,171 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.142e+01 3.455e+01 3.620e+01 3.726e+01 4.614e+01, threshold=7.241e+01, percent-clipped=0.0 2023-12-23 11:42:14,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1117200.0, ans=0.0 2023-12-23 11:42:20,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1117200.0, ans=0.2 2023-12-23 11:42:24,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1117266.6666666667, ans=0.1 2023-12-23 11:42:45,288 INFO [train.py:886] (0/4) Epoch 36, batch 800, loss[loss=0.008893, audio_tagging_loss=0.008893, over 24040.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4857320.56 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:42:53,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1117400.0, ans=0.0 2023-12-23 11:42:56,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1117466.6666666667, ans=0.025 2023-12-23 11:42:57,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1117466.6666666667, ans=0.125 2023-12-23 11:43:07,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1117533.3333333333, ans=0.125 2023-12-23 11:43:14,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1117533.3333333333, ans=0.125 2023-12-23 11:43:30,390 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=12.0 2023-12-23 11:43:36,793 INFO [train.py:886] (0/4) Epoch 36, batch 850, loss[loss=0.01458, audio_tagging_loss=0.01458, over 25000.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4884920.69 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:43:37,385 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.31 vs. limit=6.0 2023-12-23 11:43:44,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1117733.3333333333, ans=0.0 2023-12-23 11:43:50,409 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.033e+01 3.502e+01 3.619e+01 3.758e+01 4.758e+01, threshold=7.237e+01, percent-clipped=0.0 2023-12-23 11:43:53,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1117800.0, ans=0.0 2023-12-23 11:44:01,986 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.52 vs. limit=8.0 2023-12-23 11:44:02,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1117866.6666666667, ans=0.1 2023-12-23 11:44:29,654 INFO [train.py:886] (0/4) Epoch 36, batch 900, loss[loss=0.01381, audio_tagging_loss=0.01381, over 25000.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4896687.48 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:45:02,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1118266.6666666667, ans=0.2 2023-12-23 11:45:04,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1118266.6666666667, ans=0.125 2023-12-23 11:45:05,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.51 vs. limit=22.5 2023-12-23 11:45:08,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1118266.6666666667, ans=0.2 2023-12-23 11:45:21,014 INFO [train.py:886] (0/4) Epoch 36, batch 950, loss[loss=0.01221, audio_tagging_loss=0.01221, over 24750.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4902031.62 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:45:25,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1118400.0, ans=0.0 2023-12-23 11:45:34,597 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.217e+01 3.528e+01 3.631e+01 3.836e+01 4.993e+01, threshold=7.263e+01, percent-clipped=0.0 2023-12-23 11:46:12,681 INFO [train.py:886] (0/4) Epoch 36, batch 1000, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4909243.78 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:46:15,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1118733.3333333333, ans=0.5 2023-12-23 11:46:15,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1118733.3333333333, ans=0.125 2023-12-23 11:46:16,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1118733.3333333333, ans=0.125 2023-12-23 11:46:21,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1118733.3333333333, ans=0.1 2023-12-23 11:46:30,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1118800.0, ans=0.2 2023-12-23 11:46:37,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1118866.6666666667, ans=0.1 2023-12-23 11:46:37,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1118866.6666666667, ans=0.05 2023-12-23 11:46:39,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2023-12-23 11:47:04,986 INFO [train.py:886] (0/4) Epoch 36, batch 1050, loss[loss=0.01176, audio_tagging_loss=0.01176, over 21819.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4917020.32 frames. ], batch size: 107, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:47:14,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1119133.3333333333, ans=0.1 2023-12-23 11:47:18,040 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.121e+01 3.507e+01 3.657e+01 3.818e+01 4.217e+01, threshold=7.313e+01, percent-clipped=0.0 2023-12-23 11:47:32,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2023-12-23 11:47:56,204 INFO [train.py:886] (0/4) Epoch 36, batch 1100, loss[loss=0.01295, audio_tagging_loss=0.01295, over 25000.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4926593.95 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:48:10,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1119466.6666666667, ans=0.125 2023-12-23 11:48:21,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1119533.3333333333, ans=0.0 2023-12-23 11:48:22,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1119533.3333333333, ans=0.125 2023-12-23 11:48:24,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=15.0 2023-12-23 11:48:26,869 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=12.0 2023-12-23 11:48:36,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1119600.0, ans=0.2 2023-12-23 11:48:46,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1119666.6666666667, ans=0.5 2023-12-23 11:48:48,631 INFO [train.py:886] (0/4) Epoch 36, batch 1150, loss[loss=0.01209, audio_tagging_loss=0.01209, over 25000.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4937840.26 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:48:59,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1119800.0, ans=0.0 2023-12-23 11:49:00,921 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.046e+01 3.437e+01 3.573e+01 3.725e+01 4.671e+01, threshold=7.145e+01, percent-clipped=0.0 2023-12-23 11:49:09,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=6.0 2023-12-23 11:49:20,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1119933.3333333333, ans=0.0 2023-12-23 11:49:26,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1119933.3333333333, ans=0.0 2023-12-23 11:49:28,419 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-168000.pt 2023-12-23 11:49:35,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1120000.0, ans=0.05 2023-12-23 11:49:41,584 INFO [train.py:886] (0/4) Epoch 36, batch 1200, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.01204, audio_tagging_loss=0.01204, over 4938849.24 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:49:58,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2023-12-23 11:50:04,196 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2023-12-23 11:50:06,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1120200.0, ans=0.025 2023-12-23 11:50:14,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1120266.6666666667, ans=0.0 2023-12-23 11:50:28,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1120333.3333333333, ans=0.125 2023-12-23 11:50:32,379 INFO [train.py:886] (0/4) Epoch 36, batch 1250, loss[loss=0.01066, audio_tagging_loss=0.01066, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4943503.35 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:50:45,960 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.245e+01 3.454e+01 3.601e+01 3.737e+01 4.840e+01, threshold=7.203e+01, percent-clipped=0.0 2023-12-23 11:50:55,491 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2023-12-23 11:51:11,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.04 vs. limit=15.0 2023-12-23 11:51:21,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1120666.6666666667, ans=0.125 2023-12-23 11:51:24,561 INFO [train.py:886] (0/4) Epoch 36, batch 1300, loss[loss=0.01117, audio_tagging_loss=0.01117, over 24750.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4947351.19 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:51:40,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2023-12-23 11:51:51,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.19 vs. limit=15.0 2023-12-23 11:52:16,832 INFO [train.py:886] (0/4) Epoch 36, batch 1350, loss[loss=0.01154, audio_tagging_loss=0.01154, over 24750.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4948998.97 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:52:29,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1121133.3333333333, ans=0.1 2023-12-23 11:52:29,816 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.021e+01 3.453e+01 3.612e+01 3.766e+01 4.357e+01, threshold=7.223e+01, percent-clipped=0.0 2023-12-23 11:52:31,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1121133.3333333333, ans=0.125 2023-12-23 11:52:39,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1121200.0, ans=0.0 2023-12-23 11:53:01,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1121333.3333333333, ans=0.0 2023-12-23 11:53:01,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1121333.3333333333, ans=0.125 2023-12-23 11:53:07,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2023-12-23 11:53:07,528 INFO [train.py:886] (0/4) Epoch 36, batch 1400, loss[loss=0.01204, audio_tagging_loss=0.01204, over 24750.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4951117.17 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:53:12,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1121400.0, ans=0.0 2023-12-23 11:53:13,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1121400.0, ans=0.1 2023-12-23 11:53:33,580 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2023-12-23 11:53:58,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1121666.6666666667, ans=0.125 2023-12-23 11:53:59,926 INFO [train.py:886] (0/4) Epoch 36, batch 1450, loss[loss=0.0107, audio_tagging_loss=0.0107, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4950356.53 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:54:04,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1121733.3333333333, ans=0.0 2023-12-23 11:54:05,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1121733.3333333333, ans=0.0 2023-12-23 11:54:05,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.97 vs. limit=6.0 2023-12-23 11:54:08,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1121733.3333333333, ans=0.2 2023-12-23 11:54:12,996 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.148e+01 3.437e+01 3.604e+01 3.782e+01 4.556e+01, threshold=7.209e+01, percent-clipped=0.0 2023-12-23 11:54:34,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1121933.3333333333, ans=0.0 2023-12-23 11:54:42,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1122000.0, ans=0.125 2023-12-23 11:54:46,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1122000.0, ans=0.1 2023-12-23 11:54:47,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1122000.0, ans=0.1 2023-12-23 11:54:50,914 INFO [train.py:886] (0/4) Epoch 36, batch 1500, loss[loss=0.01205, audio_tagging_loss=0.01205, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4950261.33 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:54:55,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-12-23 11:55:02,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1122133.3333333333, ans=0.1 2023-12-23 11:55:02,309 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.45 vs. limit=15.0 2023-12-23 11:55:19,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1122200.0, ans=0.2 2023-12-23 11:55:22,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-12-23 11:55:41,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1122333.3333333333, ans=0.1 2023-12-23 11:55:42,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=12.0 2023-12-23 11:55:42,837 INFO [train.py:886] (0/4) Epoch 36, batch 1550, loss[loss=0.009326, audio_tagging_loss=0.009326, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4951457.18 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:55:43,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1122400.0, ans=0.0 2023-12-23 11:55:55,143 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.215e+01 3.501e+01 3.689e+01 3.879e+01 4.418e+01, threshold=7.378e+01, percent-clipped=0.0 2023-12-23 11:55:56,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1122466.6666666667, ans=0.125 2023-12-23 11:56:18,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1122600.0, ans=0.125 2023-12-23 11:56:34,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.03 vs. limit=6.0 2023-12-23 11:56:34,937 INFO [train.py:886] (0/4) Epoch 36, batch 1600, loss[loss=0.01326, audio_tagging_loss=0.01326, over 24750.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4948533.55 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:56:38,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1122733.3333333333, ans=0.125 2023-12-23 11:56:48,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1122800.0, ans=0.125 2023-12-23 11:56:54,853 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.54 vs. limit=22.5 2023-12-23 11:56:57,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1122866.6666666667, ans=0.0 2023-12-23 11:56:58,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1122866.6666666667, ans=0.0 2023-12-23 11:56:58,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1122866.6666666667, ans=0.0 2023-12-23 11:57:01,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2023-12-23 11:57:11,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1122933.3333333333, ans=0.2 2023-12-23 11:57:24,969 INFO [train.py:886] (0/4) Epoch 36, batch 1650, loss[loss=0.01203, audio_tagging_loss=0.01203, over 25000.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4948763.79 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:57:25,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1123066.6666666667, ans=0.125 2023-12-23 11:57:32,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1123066.6666666667, ans=0.1 2023-12-23 11:57:38,146 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.059e+01 3.477e+01 3.648e+01 3.896e+01 4.999e+01, threshold=7.295e+01, percent-clipped=0.0 2023-12-23 11:57:50,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1123200.0, ans=0.0 2023-12-23 11:57:58,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.65 vs. limit=15.0 2023-12-23 11:58:16,225 INFO [train.py:886] (0/4) Epoch 36, batch 1700, loss[loss=0.01144, audio_tagging_loss=0.01144, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4946402.43 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:58:16,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1123400.0, ans=0.125 2023-12-23 11:58:56,927 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=12.0 2023-12-23 11:58:58,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1123666.6666666667, ans=0.0 2023-12-23 11:59:01,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1123666.6666666667, ans=0.0 2023-12-23 11:59:05,912 INFO [train.py:886] (0/4) Epoch 36, batch 1750, loss[loss=0.01095, audio_tagging_loss=0.01095, over 25000.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4950533.89 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:59:11,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1123733.3333333333, ans=0.125 2023-12-23 11:59:20,314 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.178e+01 3.484e+01 3.617e+01 3.775e+01 4.286e+01, threshold=7.233e+01, percent-clipped=0.0 2023-12-23 11:59:38,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2023-12-23 11:59:41,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1123933.3333333333, ans=0.2 2023-12-23 11:59:48,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1124000.0, ans=0.0 2023-12-23 11:59:53,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1124000.0, ans=0.0 2023-12-23 11:59:55,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=1124000.0, ans=12.0 2023-12-23 11:59:56,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1124000.0, ans=0.2 2023-12-23 11:59:57,793 INFO [train.py:886] (0/4) Epoch 36, batch 1800, loss[loss=0.01038, audio_tagging_loss=0.01038, over 25000.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4952171.99 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:59:58,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1124066.6666666667, ans=0.0 2023-12-23 12:00:19,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1124200.0, ans=0.07 2023-12-23 12:00:21,520 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2023-12-23 12:00:24,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1124200.0, ans=0.1 2023-12-23 12:00:27,835 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2023-12-23 12:00:30,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1124266.6666666667, ans=0.07 2023-12-23 12:00:36,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1124266.6666666667, ans=0.0 2023-12-23 12:00:38,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1124333.3333333333, ans=0.1 2023-12-23 12:00:40,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1124333.3333333333, ans=0.0 2023-12-23 12:00:48,657 INFO [train.py:886] (0/4) Epoch 36, batch 1850, loss[loss=0.01471, audio_tagging_loss=0.01471, over 24750.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4950206.24 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 12:01:02,418 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.973e+01 3.520e+01 3.688e+01 3.897e+01 4.478e+01, threshold=7.376e+01, percent-clipped=0.0 2023-12-23 12:01:02,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1124466.6666666667, ans=0.2 2023-12-23 12:01:11,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1124533.3333333333, ans=0.1 2023-12-23 12:01:13,150 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:01:24,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1124600.0, ans=0.125 2023-12-23 12:01:38,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.63 vs. limit=15.0 2023-12-23 12:01:39,396 INFO [train.py:886] (0/4) Epoch 36, batch 1900, loss[loss=0.01004, audio_tagging_loss=0.01004, over 24750.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4945358.26 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 12:01:46,247 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2023-12-23 12:01:49,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1124800.0, ans=0.0 2023-12-23 12:02:00,821 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.45 vs. limit=22.5 2023-12-23 12:02:02,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2023-12-23 12:02:05,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1124866.6666666667, ans=15.0 2023-12-23 12:02:05,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-12-23 12:02:07,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1124866.6666666667, ans=0.1 2023-12-23 12:02:07,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2023-12-23 12:02:10,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1124933.3333333333, ans=0.125 2023-12-23 12:02:16,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.34 vs. limit=22.5 2023-12-23 12:02:31,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1125066.6666666667, ans=0.1 2023-12-23 12:02:32,616 INFO [train.py:886] (0/4) Epoch 36, batch 1950, loss[loss=0.01083, audio_tagging_loss=0.01083, over 22666.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4945694.25 frames. ], batch size: 107, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 12:02:45,209 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.139e+01 3.533e+01 3.647e+01 3.862e+01 4.201e+01, threshold=7.294e+01, percent-clipped=0.0 2023-12-23 12:02:51,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1125133.3333333333, ans=0.0 2023-12-23 12:02:58,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1125200.0, ans=0.125 2023-12-23 12:02:58,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2023-12-23 12:03:23,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1125400.0, ans=0.125 2023-12-23 12:03:24,579 INFO [train.py:886] (0/4) Epoch 36, batch 2000, loss[loss=0.01019, audio_tagging_loss=0.01019, over 24750.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4942965.03 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:03:24,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1125400.0, ans=0.125 2023-12-23 12:03:43,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.08 vs. limit=15.0 2023-12-23 12:03:50,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1125533.3333333333, ans=0.2 2023-12-23 12:03:57,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=12.0 2023-12-23 12:03:59,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1125600.0, ans=0.0 2023-12-23 12:04:00,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1125600.0, ans=0.125 2023-12-23 12:04:04,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1125666.6666666667, ans=0.0 2023-12-23 12:04:14,849 INFO [train.py:886] (0/4) Epoch 36, batch 2050, loss[loss=0.01285, audio_tagging_loss=0.01285, over 24750.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4947320.10 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:04:24,747 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2023-12-23 12:04:28,472 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.177e+01 3.473e+01 3.640e+01 3.754e+01 4.279e+01, threshold=7.281e+01, percent-clipped=0.0 2023-12-23 12:04:45,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1125933.3333333333, ans=0.125 2023-12-23 12:04:50,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1125933.3333333333, ans=10.0 2023-12-23 12:05:05,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.71 vs. limit=10.0 2023-12-23 12:05:06,210 INFO [train.py:886] (0/4) Epoch 36, batch 2100, loss[loss=0.01162, audio_tagging_loss=0.01162, over 22445.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4950293.32 frames. ], batch size: 107, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:05:11,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1126066.6666666667, ans=0.035 2023-12-23 12:05:17,573 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=12.0 2023-12-23 12:05:27,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1126200.0, ans=0.2 2023-12-23 12:05:50,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.75 vs. limit=22.5 2023-12-23 12:05:52,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1126333.3333333333, ans=0.0 2023-12-23 12:05:52,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1126333.3333333333, ans=0.125 2023-12-23 12:05:54,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1126333.3333333333, ans=0.1 2023-12-23 12:05:57,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2023-12-23 12:05:58,001 INFO [train.py:886] (0/4) Epoch 36, batch 2150, loss[loss=0.01466, audio_tagging_loss=0.01466, over 25000.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4951875.69 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:05:58,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1126400.0, ans=0.025 2023-12-23 12:05:59,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1126400.0, ans=0.125 2023-12-23 12:06:08,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1126466.6666666667, ans=0.125 2023-12-23 12:06:11,653 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.150e+01 3.510e+01 3.673e+01 3.806e+01 4.496e+01, threshold=7.347e+01, percent-clipped=0.0 2023-12-23 12:06:15,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1126466.6666666667, ans=0.2 2023-12-23 12:06:37,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1126600.0, ans=0.125 2023-12-23 12:06:38,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1126600.0, ans=0.1 2023-12-23 12:06:38,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1126600.0, ans=0.1 2023-12-23 12:06:42,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1126666.6666666667, ans=0.125 2023-12-23 12:06:45,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1126666.6666666667, ans=0.1 2023-12-23 12:06:50,293 INFO [train.py:886] (0/4) Epoch 36, batch 2200, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4944187.64 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:06:52,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1126733.3333333333, ans=0.2 2023-12-23 12:07:02,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1126800.0, ans=0.0 2023-12-23 12:07:06,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.12 vs. limit=10.0 2023-12-23 12:07:11,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1126866.6666666667, ans=0.125 2023-12-23 12:07:12,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1126866.6666666667, ans=0.0 2023-12-23 12:07:41,585 INFO [train.py:886] (0/4) Epoch 36, batch 2250, loss[loss=0.01376, audio_tagging_loss=0.01376, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4939505.39 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:07:50,332 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2023-12-23 12:07:54,579 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.130e+01 3.509e+01 3.634e+01 3.761e+01 4.553e+01, threshold=7.267e+01, percent-clipped=0.0 2023-12-23 12:08:07,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1127200.0, ans=0.125 2023-12-23 12:08:15,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1127266.6666666667, ans=0.1 2023-12-23 12:08:26,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1127333.3333333333, ans=0.0 2023-12-23 12:08:33,254 INFO [train.py:886] (0/4) Epoch 36, batch 2300, loss[loss=0.01217, audio_tagging_loss=0.01217, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4938216.98 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:08:38,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1127400.0, ans=0.125 2023-12-23 12:08:55,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1127533.3333333333, ans=10.0 2023-12-23 12:08:56,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1127533.3333333333, ans=0.07 2023-12-23 12:09:25,048 INFO [train.py:886] (0/4) Epoch 36, batch 2350, loss[loss=0.01069, audio_tagging_loss=0.01069, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4941930.41 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:09:39,020 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.137e+01 3.388e+01 3.533e+01 3.734e+01 4.498e+01, threshold=7.065e+01, percent-clipped=0.0 2023-12-23 12:09:41,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1127800.0, ans=0.0 2023-12-23 12:10:08,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1128000.0, ans=0.125 2023-12-23 12:10:11,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1128000.0, ans=0.2 2023-12-23 12:10:17,013 INFO [train.py:886] (0/4) Epoch 36, batch 2400, loss[loss=0.01489, audio_tagging_loss=0.01489, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4950770.17 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:10:17,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1128066.6666666667, ans=0.125 2023-12-23 12:10:30,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=12.0 2023-12-23 12:10:39,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1128200.0, ans=0.0 2023-12-23 12:10:40,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1128200.0, ans=0.125 2023-12-23 12:11:00,737 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2023-12-23 12:11:06,151 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2023-12-23 12:11:09,464 INFO [train.py:886] (0/4) Epoch 36, batch 2450, loss[loss=0.01278, audio_tagging_loss=0.01278, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4953115.88 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:11:09,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1128400.0, ans=0.0 2023-12-23 12:11:20,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2023-12-23 12:11:22,548 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.115e+01 3.491e+01 3.672e+01 3.810e+01 4.386e+01, threshold=7.343e+01, percent-clipped=0.0 2023-12-23 12:11:27,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1128466.6666666667, ans=0.0 2023-12-23 12:11:34,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.42 vs. limit=15.0 2023-12-23 12:11:41,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1128600.0, ans=0.125 2023-12-23 12:11:46,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2023-12-23 12:12:01,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=15.0 2023-12-23 12:12:02,006 INFO [train.py:886] (0/4) Epoch 36, batch 2500, loss[loss=0.01139, audio_tagging_loss=0.01139, over 21599.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4946225.69 frames. ], batch size: 107, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:12:08,276 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=15.0 2023-12-23 12:12:32,796 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.25 vs. limit=10.0 2023-12-23 12:12:36,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1128933.3333333333, ans=0.125 2023-12-23 12:12:37,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.41 vs. limit=5.0 2023-12-23 12:12:38,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1128933.3333333333, ans=0.0 2023-12-23 12:12:47,651 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.83 vs. limit=22.5 2023-12-23 12:12:52,156 INFO [train.py:886] (0/4) Epoch 36, batch 2550, loss[loss=0.01166, audio_tagging_loss=0.01166, over 24750.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4945189.45 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:13:06,011 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.15 vs. limit=15.0 2023-12-23 12:13:06,437 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.215e+01 3.554e+01 3.683e+01 3.809e+01 4.296e+01, threshold=7.365e+01, percent-clipped=0.0 2023-12-23 12:13:08,754 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.70 vs. limit=10.0 2023-12-23 12:13:16,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1129200.0, ans=0.125 2023-12-23 12:13:19,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1129200.0, ans=0.125 2023-12-23 12:13:20,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1129200.0, ans=0.0 2023-12-23 12:13:20,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1129200.0, ans=0.125 2023-12-23 12:13:21,182 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2023-12-23 12:13:39,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1129333.3333333333, ans=0.1 2023-12-23 12:13:44,965 INFO [train.py:886] (0/4) Epoch 36, batch 2600, loss[loss=0.01357, audio_tagging_loss=0.01357, over 25000.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4946294.95 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:13:51,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1129400.0, ans=0.125 2023-12-23 12:13:56,547 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=12.0 2023-12-23 12:14:01,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1129466.6666666667, ans=0.05 2023-12-23 12:14:09,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1129533.3333333333, ans=0.2 2023-12-23 12:14:17,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1129600.0, ans=0.0 2023-12-23 12:14:17,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=12.0 2023-12-23 12:14:21,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1129600.0, ans=0.125 2023-12-23 12:14:35,668 INFO [train.py:886] (0/4) Epoch 36, batch 2650, loss[loss=0.01515, audio_tagging_loss=0.01515, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4943582.49 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:14:36,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1129733.3333333333, ans=0.125 2023-12-23 12:14:36,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1129733.3333333333, ans=0.0 2023-12-23 12:14:40,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1129733.3333333333, ans=0.0 2023-12-23 12:14:48,681 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.201e+01 3.506e+01 3.658e+01 3.796e+01 4.295e+01, threshold=7.317e+01, percent-clipped=0.0 2023-12-23 12:14:56,872 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2023-12-23 12:15:04,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1129933.3333333333, ans=0.125 2023-12-23 12:15:11,802 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2023-12-23 12:15:26,042 INFO [train.py:886] (0/4) Epoch 36, batch 2700, loss[loss=0.01301, audio_tagging_loss=0.01301, over 21759.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4946285.09 frames. ], batch size: 107, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:15:26,333 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:15:43,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1130133.3333333333, ans=0.1 2023-12-23 12:15:45,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1130200.0, ans=0.125 2023-12-23 12:16:16,498 INFO [train.py:886] (0/4) Epoch 36, batch 2750, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4949493.88 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:16:25,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1130400.0, ans=0.125 2023-12-23 12:16:25,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1130466.6666666667, ans=0.0 2023-12-23 12:16:29,405 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.206e+01 3.461e+01 3.594e+01 3.787e+01 4.376e+01, threshold=7.188e+01, percent-clipped=0.0 2023-12-23 12:16:30,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1130466.6666666667, ans=0.1 2023-12-23 12:16:40,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1130533.3333333333, ans=0.125 2023-12-23 12:16:47,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1130600.0, ans=0.0 2023-12-23 12:16:52,994 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.96 vs. limit=22.5 2023-12-23 12:16:58,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1130666.6666666667, ans=0.0 2023-12-23 12:17:06,859 INFO [train.py:886] (0/4) Epoch 36, batch 2800, loss[loss=0.01134, audio_tagging_loss=0.01134, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4952105.74 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:17:22,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.63 vs. limit=15.0 2023-12-23 12:17:22,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2023-12-23 12:17:34,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1130866.6666666667, ans=0.1 2023-12-23 12:17:35,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1130866.6666666667, ans=0.125 2023-12-23 12:17:59,701 INFO [train.py:886] (0/4) Epoch 36, batch 2850, loss[loss=0.01214, audio_tagging_loss=0.01214, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4951439.16 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:18:04,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1131066.6666666667, ans=0.125 2023-12-23 12:18:06,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1131066.6666666667, ans=0.125 2023-12-23 12:18:11,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2023-12-23 12:18:11,934 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.234e+01 3.495e+01 3.630e+01 3.797e+01 4.361e+01, threshold=7.259e+01, percent-clipped=0.0 2023-12-23 12:18:25,439 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.29 vs. limit=15.0 2023-12-23 12:18:29,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1131200.0, ans=0.125 2023-12-23 12:18:38,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1131266.6666666667, ans=0.0 2023-12-23 12:18:38,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=1131266.6666666667, ans=0.05 2023-12-23 12:18:43,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2023-12-23 12:18:52,300 INFO [train.py:886] (0/4) Epoch 36, batch 2900, loss[loss=0.01134, audio_tagging_loss=0.01134, over 24750.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4943157.72 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:19:02,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1131466.6666666667, ans=0.2 2023-12-23 12:19:11,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1131533.3333333333, ans=0.0 2023-12-23 12:19:25,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2023-12-23 12:19:33,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1131666.6666666667, ans=0.0 2023-12-23 12:19:36,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1131666.6666666667, ans=0.125 2023-12-23 12:19:43,659 INFO [train.py:886] (0/4) Epoch 36, batch 2950, loss[loss=0.008881, audio_tagging_loss=0.008881, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4945594.15 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:19:43,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1131733.3333333333, ans=0.125 2023-12-23 12:19:44,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1131733.3333333333, ans=0.025 2023-12-23 12:19:51,662 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.46 vs. limit=22.5 2023-12-23 12:19:57,310 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.076e+01 3.440e+01 3.602e+01 3.791e+01 4.339e+01, threshold=7.205e+01, percent-clipped=0.0 2023-12-23 12:19:57,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1131800.0, ans=0.1 2023-12-23 12:20:22,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1131933.3333333333, ans=0.125 2023-12-23 12:20:36,167 INFO [train.py:886] (0/4) Epoch 36, batch 3000, loss[loss=0.01189, audio_tagging_loss=0.01189, over 24750.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4951081.41 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:20:36,169 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 12:20:57,376 INFO [train.py:917] (0/4) Epoch 36, validation: loss=0.0342, audio_tagging_loss=0.0342, over 3737520.00 frames. 2023-12-23 12:20:57,377 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 12:21:06,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1132066.6666666667, ans=0.0 2023-12-23 12:21:11,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-12-23 12:21:14,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1132133.3333333333, ans=0.125 2023-12-23 12:21:47,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1132333.3333333333, ans=0.125 2023-12-23 12:21:48,950 INFO [train.py:886] (0/4) Epoch 36, batch 3050, loss[loss=0.0123, audio_tagging_loss=0.0123, over 25000.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4945839.06 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:21:52,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1132400.0, ans=0.125 2023-12-23 12:21:54,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=15.0 2023-12-23 12:22:02,617 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.131e+01 3.482e+01 3.618e+01 3.774e+01 4.339e+01, threshold=7.235e+01, percent-clipped=0.0 2023-12-23 12:22:03,811 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:22:38,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-12-23 12:22:41,229 INFO [train.py:886] (0/4) Epoch 36, batch 3100, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24750.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4942802.63 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:22:47,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1132733.3333333333, ans=0.1 2023-12-23 12:22:50,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1132800.0, ans=0.125 2023-12-23 12:22:56,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1132800.0, ans=0.1 2023-12-23 12:22:57,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1132800.0, ans=0.025 2023-12-23 12:23:24,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1133000.0, ans=0.0 2023-12-23 12:23:29,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1133000.0, ans=0.05 2023-12-23 12:23:32,809 INFO [train.py:886] (0/4) Epoch 36, batch 3150, loss[loss=0.01144, audio_tagging_loss=0.01144, over 22096.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4943865.18 frames. ], batch size: 107, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:23:46,310 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.140e+01 3.564e+01 3.708e+01 3.854e+01 4.503e+01, threshold=7.417e+01, percent-clipped=0.0 2023-12-23 12:23:54,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=15.0 2023-12-23 12:24:21,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1133333.3333333333, ans=0.0 2023-12-23 12:24:24,399 INFO [train.py:886] (0/4) Epoch 36, batch 3200, loss[loss=0.01179, audio_tagging_loss=0.01179, over 24750.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4936940.83 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:24:31,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1133400.0, ans=0.07 2023-12-23 12:25:04,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2023-12-23 12:25:11,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1133666.6666666667, ans=0.95 2023-12-23 12:25:16,136 INFO [train.py:886] (0/4) Epoch 36, batch 3250, loss[loss=0.01097, audio_tagging_loss=0.01097, over 25000.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4943519.26 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:25:29,202 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:25:29,900 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.117e+01 3.485e+01 3.604e+01 3.735e+01 4.433e+01, threshold=7.209e+01, percent-clipped=0.0 2023-12-23 12:25:44,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1133866.6666666667, ans=0.125 2023-12-23 12:25:45,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1133866.6666666667, ans=0.05 2023-12-23 12:26:07,785 INFO [train.py:886] (0/4) Epoch 36, batch 3300, loss[loss=0.01162, audio_tagging_loss=0.01162, over 24750.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4947754.54 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:26:08,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.58 vs. limit=15.0 2023-12-23 12:26:17,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1134066.6666666667, ans=0.0 2023-12-23 12:26:18,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1134133.3333333333, ans=0.0 2023-12-23 12:26:23,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1134133.3333333333, ans=0.125 2023-12-23 12:26:31,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1134200.0, ans=0.125 2023-12-23 12:26:41,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=12.0 2023-12-23 12:27:00,478 INFO [train.py:886] (0/4) Epoch 36, batch 3350, loss[loss=0.01209, audio_tagging_loss=0.01209, over 25000.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4953334.15 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:27:09,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1134400.0, ans=0.0 2023-12-23 12:27:13,419 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.068e+01 3.479e+01 3.642e+01 3.791e+01 4.255e+01, threshold=7.283e+01, percent-clipped=0.0 2023-12-23 12:27:15,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1134466.6666666667, ans=0.125 2023-12-23 12:27:18,627 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.67 vs. limit=15.0 2023-12-23 12:27:20,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1134533.3333333333, ans=0.125 2023-12-23 12:27:23,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1134533.3333333333, ans=0.1 2023-12-23 12:27:37,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1134600.0, ans=0.0 2023-12-23 12:27:40,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1134600.0, ans=0.125 2023-12-23 12:27:53,064 INFO [train.py:886] (0/4) Epoch 36, batch 3400, loss[loss=0.01221, audio_tagging_loss=0.01221, over 25000.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4958308.51 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:28:14,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1134866.6666666667, ans=0.125 2023-12-23 12:28:30,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1134933.3333333333, ans=0.125 2023-12-23 12:28:45,260 INFO [train.py:886] (0/4) Epoch 36, batch 3450, loss[loss=0.01215, audio_tagging_loss=0.01215, over 23977.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4953188.70 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:28:58,959 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.129e+01 3.599e+01 3.745e+01 3.958e+01 4.783e+01, threshold=7.490e+01, percent-clipped=0.0 2023-12-23 12:29:00,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.20 vs. limit=15.0 2023-12-23 12:29:01,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=1135133.3333333333, ans=0.02 2023-12-23 12:29:02,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.37 vs. limit=12.0 2023-12-23 12:29:04,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1135200.0, ans=0.0 2023-12-23 12:29:13,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1135200.0, ans=0.1 2023-12-23 12:29:13,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1135200.0, ans=0.2 2023-12-23 12:29:22,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1135266.6666666667, ans=0.0 2023-12-23 12:29:26,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1135333.3333333333, ans=0.125 2023-12-23 12:29:27,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1135333.3333333333, ans=0.09899494936611666 2023-12-23 12:29:33,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1135333.3333333333, ans=0.1 2023-12-23 12:29:37,381 INFO [train.py:886] (0/4) Epoch 36, batch 3500, loss[loss=0.01104, audio_tagging_loss=0.01104, over 24750.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4948552.38 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:29:41,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=22.5 2023-12-23 12:29:48,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1135466.6666666667, ans=0.0 2023-12-23 12:29:51,614 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2023-12-23 12:29:56,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1135533.3333333333, ans=0.125 2023-12-23 12:29:57,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1135533.3333333333, ans=0.0 2023-12-23 12:30:02,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1135533.3333333333, ans=0.0 2023-12-23 12:30:17,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1135666.6666666667, ans=0.0 2023-12-23 12:30:28,986 INFO [train.py:886] (0/4) Epoch 36, batch 3550, loss[loss=0.01302, audio_tagging_loss=0.01302, over 25000.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4946225.10 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:30:42,831 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.203e+01 3.480e+01 3.652e+01 3.812e+01 4.664e+01, threshold=7.305e+01, percent-clipped=0.0 2023-12-23 12:30:44,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1135800.0, ans=0.0 2023-12-23 12:30:46,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1135800.0, ans=0.125 2023-12-23 12:30:58,325 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.26 vs. limit=10.0 2023-12-23 12:31:06,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1135933.3333333333, ans=0.0 2023-12-23 12:31:15,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2023-12-23 12:31:21,214 INFO [train.py:886] (0/4) Epoch 36, batch 3600, loss[loss=0.01228, audio_tagging_loss=0.01228, over 25000.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4949509.48 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:31:48,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1136200.0, ans=0.125 2023-12-23 12:31:54,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1136266.6666666667, ans=0.025 2023-12-23 12:31:58,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.18 vs. limit=22.5 2023-12-23 12:32:01,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1136266.6666666667, ans=0.125 2023-12-23 12:32:09,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1136333.3333333333, ans=10.0 2023-12-23 12:32:13,778 INFO [train.py:886] (0/4) Epoch 36, batch 3650, loss[loss=0.0121, audio_tagging_loss=0.0121, over 25000.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4953852.44 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:32:17,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2023-12-23 12:32:26,860 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.117e+01 3.484e+01 3.633e+01 3.763e+01 4.234e+01, threshold=7.265e+01, percent-clipped=0.0 2023-12-23 12:32:48,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1136600.0, ans=0.2 2023-12-23 12:32:49,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1136600.0, ans=0.2 2023-12-23 12:32:49,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2023-12-23 12:32:56,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2023-12-23 12:33:04,725 INFO [train.py:886] (0/4) Epoch 36, batch 3700, loss[loss=0.01219, audio_tagging_loss=0.01219, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4957713.25 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:33:05,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1136733.3333333333, ans=0.125 2023-12-23 12:33:19,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1136800.0, ans=0.2 2023-12-23 12:33:28,463 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:33:31,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1136866.6666666667, ans=0.125 2023-12-23 12:33:35,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.62 vs. limit=12.0 2023-12-23 12:33:51,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1137000.0, ans=0.0 2023-12-23 12:33:57,644 INFO [train.py:886] (0/4) Epoch 36, batch 3750, loss[loss=0.01318, audio_tagging_loss=0.01318, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4961775.39 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:33:59,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.93 vs. limit=8.0 2023-12-23 12:34:07,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1137133.3333333333, ans=0.125 2023-12-23 12:34:09,922 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.235e+01 3.553e+01 3.740e+01 3.871e+01 4.273e+01, threshold=7.479e+01, percent-clipped=0.0 2023-12-23 12:34:14,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1137133.3333333333, ans=0.125 2023-12-23 12:34:15,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1137133.3333333333, ans=0.125 2023-12-23 12:34:15,599 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:34:17,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.50 vs. limit=12.0 2023-12-23 12:34:18,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1137200.0, ans=0.2 2023-12-23 12:34:20,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1137200.0, ans=0.0 2023-12-23 12:34:21,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1137200.0, ans=0.5 2023-12-23 12:34:22,143 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2023-12-23 12:34:23,010 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-12-23 12:34:23,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1137200.0, ans=0.2 2023-12-23 12:34:27,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1137200.0, ans=0.0 2023-12-23 12:34:39,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1137333.3333333333, ans=0.0 2023-12-23 12:34:40,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1137333.3333333333, ans=0.1 2023-12-23 12:34:42,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1137333.3333333333, ans=0.0 2023-12-23 12:34:49,188 INFO [train.py:886] (0/4) Epoch 36, batch 3800, loss[loss=0.01041, audio_tagging_loss=0.01041, over 24750.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4954987.67 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:34:59,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1137466.6666666667, ans=0.1 2023-12-23 12:35:09,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1137533.3333333333, ans=0.0 2023-12-23 12:35:21,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1137600.0, ans=0.05 2023-12-23 12:35:40,314 INFO [train.py:886] (0/4) Epoch 36, batch 3850, loss[loss=0.01502, audio_tagging_loss=0.01502, over 25000.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4950162.61 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:35:46,614 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=12.0 2023-12-23 12:35:50,018 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2023-12-23 12:35:54,878 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.273e+01 3.476e+01 3.694e+01 3.926e+01 4.509e+01, threshold=7.388e+01, percent-clipped=0.0 2023-12-23 12:35:58,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1137800.0, ans=0.2 2023-12-23 12:36:13,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1137933.3333333333, ans=0.04949747468305833 2023-12-23 12:36:17,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1137933.3333333333, ans=0.5 2023-12-23 12:36:17,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1137933.3333333333, ans=0.1 2023-12-23 12:36:22,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1138000.0, ans=0.125 2023-12-23 12:36:29,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2023-12-23 12:36:33,079 INFO [train.py:886] (0/4) Epoch 36, batch 3900, loss[loss=0.01356, audio_tagging_loss=0.01356, over 25000.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4948832.23 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:36:44,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1138133.3333333333, ans=0.0 2023-12-23 12:36:59,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1138200.0, ans=0.0 2023-12-23 12:36:59,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.65 vs. limit=15.0 2023-12-23 12:37:04,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=12.0 2023-12-23 12:37:23,433 INFO [train.py:886] (0/4) Epoch 36, batch 3950, loss[loss=0.01043, audio_tagging_loss=0.01043, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4948288.12 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:37:23,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1138400.0, ans=0.0 2023-12-23 12:37:37,658 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.120e+01 3.438e+01 3.584e+01 3.728e+01 4.194e+01, threshold=7.169e+01, percent-clipped=0.0 2023-12-23 12:37:39,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1138466.6666666667, ans=0.125 2023-12-23 12:37:45,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1138533.3333333333, ans=0.125 2023-12-23 12:37:45,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1138533.3333333333, ans=0.1 2023-12-23 12:37:47,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1138533.3333333333, ans=0.125 2023-12-23 12:37:49,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1138533.3333333333, ans=0.1 2023-12-23 12:38:16,572 INFO [train.py:886] (0/4) Epoch 36, batch 4000, loss[loss=0.013, audio_tagging_loss=0.013, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4954097.86 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 128.0 2023-12-23 12:38:18,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1138733.3333333333, ans=0.0 2023-12-23 12:38:19,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1138733.3333333333, ans=0.0 2023-12-23 12:38:22,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1138733.3333333333, ans=0.2 2023-12-23 12:38:38,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1138866.6666666667, ans=0.0 2023-12-23 12:38:42,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.46 vs. limit=5.0 2023-12-23 12:38:45,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1138866.6666666667, ans=0.125 2023-12-23 12:38:46,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.92 vs. limit=15.0 2023-12-23 12:38:54,876 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.72 vs. limit=22.5 2023-12-23 12:39:02,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-12-23 12:39:07,332 INFO [train.py:886] (0/4) Epoch 36, batch 4050, loss[loss=0.01274, audio_tagging_loss=0.01274, over 24750.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4952818.19 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:39:22,656 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.270e+01 3.561e+01 3.698e+01 3.887e+01 4.580e+01, threshold=7.397e+01, percent-clipped=0.0 2023-12-23 12:39:28,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1139200.0, ans=0.2 2023-12-23 12:39:33,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1139200.0, ans=0.125 2023-12-23 12:39:36,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1139200.0, ans=0.05 2023-12-23 12:39:55,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1139333.3333333333, ans=0.0 2023-12-23 12:39:55,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1139333.3333333333, ans=0.2 2023-12-23 12:39:58,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1139400.0, ans=0.125 2023-12-23 12:39:59,475 INFO [train.py:886] (0/4) Epoch 36, batch 4100, loss[loss=0.01105, audio_tagging_loss=0.01105, over 24750.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4948103.35 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:40:03,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2023-12-23 12:40:10,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1139466.6666666667, ans=0.125 2023-12-23 12:40:26,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1139533.3333333333, ans=0.0 2023-12-23 12:40:37,843 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.90 vs. limit=22.5 2023-12-23 12:40:45,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1139666.6666666667, ans=0.125 2023-12-23 12:40:52,655 INFO [train.py:886] (0/4) Epoch 36, batch 4150, loss[loss=0.009701, audio_tagging_loss=0.009701, over 23989.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4946257.39 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:40:53,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1139733.3333333333, ans=0.1 2023-12-23 12:41:05,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1139800.0, ans=0.125 2023-12-23 12:41:06,033 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.179e+01 3.544e+01 3.659e+01 3.852e+01 4.683e+01, threshold=7.319e+01, percent-clipped=0.0 2023-12-23 12:41:08,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1139800.0, ans=0.0 2023-12-23 12:41:38,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1140000.0, ans=0.04949747468305833 2023-12-23 12:41:41,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1140000.0, ans=0.1 2023-12-23 12:41:43,952 INFO [train.py:886] (0/4) Epoch 36, batch 4200, loss[loss=0.01229, audio_tagging_loss=0.01229, over 25000.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4952610.36 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:41:52,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1140066.6666666667, ans=0.0 2023-12-23 12:41:53,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1140066.6666666667, ans=0.035 2023-12-23 12:42:09,218 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:42:30,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1140333.3333333333, ans=0.125 2023-12-23 12:42:36,323 INFO [train.py:886] (0/4) Epoch 36, batch 4250, loss[loss=0.01278, audio_tagging_loss=0.01278, over 24750.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4954972.82 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:42:40,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1140400.0, ans=0.0 2023-12-23 12:42:46,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1140466.6666666667, ans=0.125 2023-12-23 12:42:50,260 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.180e+01 3.490e+01 3.625e+01 3.785e+01 4.316e+01, threshold=7.251e+01, percent-clipped=0.0 2023-12-23 12:42:51,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1140466.6666666667, ans=0.125 2023-12-23 12:42:52,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1140466.6666666667, ans=0.125 2023-12-23 12:42:54,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1140466.6666666667, ans=0.0 2023-12-23 12:43:05,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1140533.3333333333, ans=0.0 2023-12-23 12:43:12,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1140600.0, ans=0.125 2023-12-23 12:43:16,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2023-12-23 12:43:24,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1140666.6666666667, ans=0.0 2023-12-23 12:43:27,472 INFO [train.py:886] (0/4) Epoch 36, batch 4300, loss[loss=0.01117, audio_tagging_loss=0.01117, over 25000.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4956571.02 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:43:29,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1140733.3333333333, ans=0.125 2023-12-23 12:43:30,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1140733.3333333333, ans=0.0 2023-12-23 12:43:39,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1140800.0, ans=0.2 2023-12-23 12:44:18,283 INFO [train.py:886] (0/4) Epoch 36, batch 4350, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24750.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4959321.16 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:44:32,757 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.183e+01 3.485e+01 3.639e+01 3.859e+01 4.692e+01, threshold=7.279e+01, percent-clipped=0.0 2023-12-23 12:44:33,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1141133.3333333333, ans=15.0 2023-12-23 12:44:45,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1141200.0, ans=0.125 2023-12-23 12:45:03,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=12.0 2023-12-23 12:45:09,866 INFO [train.py:886] (0/4) Epoch 36, batch 4400, loss[loss=0.01296, audio_tagging_loss=0.01296, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4952279.93 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:45:21,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1141466.6666666667, ans=0.1 2023-12-23 12:45:28,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1141466.6666666667, ans=0.0 2023-12-23 12:45:48,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1141600.0, ans=0.125 2023-12-23 12:45:49,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1141600.0, ans=0.0 2023-12-23 12:45:55,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1141666.6666666667, ans=0.0 2023-12-23 12:46:01,361 INFO [train.py:886] (0/4) Epoch 36, batch 4450, loss[loss=0.01033, audio_tagging_loss=0.01033, over 24750.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4947172.96 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:46:07,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.85 vs. limit=15.0 2023-12-23 12:46:15,970 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.197e+01 3.605e+01 3.750e+01 3.906e+01 4.617e+01, threshold=7.499e+01, percent-clipped=0.0 2023-12-23 12:46:16,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1141800.0, ans=0.0 2023-12-23 12:46:35,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1141933.3333333333, ans=0.0 2023-12-23 12:46:53,747 INFO [train.py:886] (0/4) Epoch 36, batch 4500, loss[loss=0.01159, audio_tagging_loss=0.01159, over 25000.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4946958.61 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:46:54,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1142066.6666666667, ans=0.125 2023-12-23 12:47:22,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1142200.0, ans=0.07 2023-12-23 12:47:33,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1142333.3333333333, ans=0.0 2023-12-23 12:47:40,168 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2023-12-23 12:47:43,025 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.51 vs. limit=12.0 2023-12-23 12:47:45,903 INFO [train.py:886] (0/4) Epoch 36, batch 4550, loss[loss=0.01079, audio_tagging_loss=0.01079, over 24750.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4951881.96 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:47:47,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=22.5 2023-12-23 12:48:00,334 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.214e+01 3.540e+01 3.639e+01 3.809e+01 4.565e+01, threshold=7.278e+01, percent-clipped=0.0 2023-12-23 12:48:12,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1142533.3333333333, ans=0.125 2023-12-23 12:48:28,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1142666.6666666667, ans=0.125 2023-12-23 12:48:37,315 INFO [train.py:886] (0/4) Epoch 36, batch 4600, loss[loss=0.01268, audio_tagging_loss=0.01268, over 25000.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4954203.54 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:48:37,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.88 vs. limit=15.0 2023-12-23 12:48:40,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2023-12-23 12:49:23,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1143000.0, ans=0.125 2023-12-23 12:49:29,414 INFO [train.py:886] (0/4) Epoch 36, batch 4650, loss[loss=0.01272, audio_tagging_loss=0.01272, over 25000.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4953006.94 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:49:31,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1143066.6666666667, ans=0.2 2023-12-23 12:49:33,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1143066.6666666667, ans=0.125 2023-12-23 12:49:38,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.02 vs. limit=22.5 2023-12-23 12:49:43,307 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.146e+01 3.508e+01 3.620e+01 3.811e+01 5.127e+01, threshold=7.240e+01, percent-clipped=0.0 2023-12-23 12:49:47,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1143133.3333333333, ans=0.07 2023-12-23 12:49:50,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1143200.0, ans=0.0 2023-12-23 12:49:55,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=12.0 2023-12-23 12:50:05,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1143266.6666666667, ans=10.0 2023-12-23 12:50:17,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1143333.3333333333, ans=0.125 2023-12-23 12:50:19,602 INFO [train.py:886] (0/4) Epoch 36, batch 4700, loss[loss=0.01132, audio_tagging_loss=0.01132, over 24750.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4950881.90 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:50:24,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1143400.0, ans=0.2 2023-12-23 12:50:25,527 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.74 vs. limit=8.0 2023-12-23 12:50:26,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2023-12-23 12:50:33,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1143466.6666666667, ans=0.07 2023-12-23 12:50:34,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1143466.6666666667, ans=15.0 2023-12-23 12:50:57,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1143666.6666666667, ans=0.2 2023-12-23 12:51:06,986 INFO [train.py:886] (0/4) Epoch 36, batch 4750, loss[loss=0.01018, audio_tagging_loss=0.01018, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4950404.00 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:51:09,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2023-12-23 12:51:16,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1143800.0, ans=0.125 2023-12-23 12:51:19,671 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.214e+01 3.564e+01 3.739e+01 3.867e+01 4.563e+01, threshold=7.477e+01, percent-clipped=0.0 2023-12-23 12:51:22,289 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-36.pt 2023-12-23 12:51:42,459 INFO [train.py:886] (0/4) Epoch 37, batch 0, loss[loss=0.02328, audio_tagging_loss=0.02328, over 24063.00 frames. ], tot_loss[loss=0.02328, audio_tagging_loss=0.02328, over 24063.00 frames. ], batch size: 100, lr: 2.93e-03, grad_scale: 32.0 2023-12-23 12:51:42,461 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 12:51:52,507 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1104, 4.5289, 5.0578, 4.6528], device='cuda:0') 2023-12-23 12:52:03,036 INFO [train.py:917] (0/4) Epoch 37, validation: loss=0.03436, audio_tagging_loss=0.03436, over 3737520.00 frames. 2023-12-23 12:52:03,037 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 12:52:04,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1143840.0, ans=0.2 2023-12-23 12:52:16,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1143906.6666666667, ans=0.125 2023-12-23 12:52:18,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1143906.6666666667, ans=0.0 2023-12-23 12:52:28,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=12.0 2023-12-23 12:52:46,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1144106.6666666667, ans=0.125 2023-12-23 12:52:53,465 INFO [train.py:886] (0/4) Epoch 37, batch 50, loss[loss=0.01484, audio_tagging_loss=0.01484, over 25000.00 frames. ], tot_loss[loss=0.01865, audio_tagging_loss=0.01865, over 1117972.79 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:53:03,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1144240.0, ans=0.125 2023-12-23 12:53:04,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1144240.0, ans=0.025 2023-12-23 12:53:10,583 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:53:12,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1144240.0, ans=0.1 2023-12-23 12:53:22,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1144306.6666666667, ans=0.125 2023-12-23 12:53:26,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1144373.3333333333, ans=0.125 2023-12-23 12:53:31,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1144373.3333333333, ans=0.125 2023-12-23 12:53:39,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1144440.0, ans=0.0 2023-12-23 12:53:42,400 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.737e+01 4.184e+01 4.552e+01 5.178e+01 9.780e+01, threshold=9.104e+01, percent-clipped=7.0 2023-12-23 12:53:44,066 INFO [train.py:886] (0/4) Epoch 37, batch 100, loss[loss=0.01404, audio_tagging_loss=0.01404, over 25000.00 frames. ], tot_loss[loss=0.0163, audio_tagging_loss=0.0163, over 1969220.77 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:53:46,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1144506.6666666667, ans=0.2 2023-12-23 12:53:53,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1144573.3333333333, ans=0.0 2023-12-23 12:53:58,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1144573.3333333333, ans=0.0 2023-12-23 12:54:00,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2023-12-23 12:54:29,240 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.68 vs. limit=15.0 2023-12-23 12:54:34,850 INFO [train.py:886] (0/4) Epoch 37, batch 150, loss[loss=0.01455, audio_tagging_loss=0.01455, over 25000.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 2634404.34 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:54:36,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2023-12-23 12:54:42,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.87 vs. limit=22.5 2023-12-23 12:54:45,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.75 vs. limit=10.0 2023-12-23 12:55:05,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1145040.0, ans=0.125 2023-12-23 12:55:14,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-12-23 12:55:14,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1145106.6666666667, ans=0.125 2023-12-23 12:55:14,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1145106.6666666667, ans=0.125 2023-12-23 12:55:24,533 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.092e+01 3.546e+01 3.734e+01 3.978e+01 4.632e+01, threshold=7.469e+01, percent-clipped=0.0 2023-12-23 12:55:25,501 INFO [train.py:886] (0/4) Epoch 37, batch 200, loss[loss=0.01252, audio_tagging_loss=0.01252, over 25000.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 3142694.28 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:55:26,746 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:55:57,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1145373.3333333333, ans=0.2 2023-12-23 12:56:16,873 INFO [train.py:886] (0/4) Epoch 37, batch 250, loss[loss=0.009306, audio_tagging_loss=0.009306, over 24019.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 3546128.17 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:56:20,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1145506.6666666667, ans=0.125 2023-12-23 12:56:27,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1145573.3333333333, ans=0.125 2023-12-23 12:56:51,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1145706.6666666667, ans=0.0 2023-12-23 12:57:07,245 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.131e+01 3.485e+01 3.632e+01 3.797e+01 5.071e+01, threshold=7.264e+01, percent-clipped=0.0 2023-12-23 12:57:08,194 INFO [train.py:886] (0/4) Epoch 37, batch 300, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 3854736.04 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:57:22,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=12.0 2023-12-23 12:57:24,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=12.0 2023-12-23 12:57:33,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1145973.3333333333, ans=0.0 2023-12-23 12:57:37,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1145973.3333333333, ans=0.2 2023-12-23 12:57:53,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1146106.6666666667, ans=0.1 2023-12-23 12:57:58,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1146173.3333333333, ans=0.1 2023-12-23 12:57:59,110 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=22.5 2023-12-23 12:57:59,459 INFO [train.py:886] (0/4) Epoch 37, batch 350, loss[loss=0.01199, audio_tagging_loss=0.01199, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4096426.01 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:58:00,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1146173.3333333333, ans=0.0 2023-12-23 12:58:05,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1146173.3333333333, ans=0.125 2023-12-23 12:58:09,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1146240.0, ans=0.125 2023-12-23 12:58:17,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1146240.0, ans=0.125 2023-12-23 12:58:32,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2023-12-23 12:58:33,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1146373.3333333333, ans=0.0 2023-12-23 12:58:45,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1146440.0, ans=0.0 2023-12-23 12:58:49,555 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.203e+01 3.540e+01 3.697e+01 3.881e+01 4.233e+01, threshold=7.395e+01, percent-clipped=0.0 2023-12-23 12:58:50,506 INFO [train.py:886] (0/4) Epoch 37, batch 400, loss[loss=0.01157, audio_tagging_loss=0.01157, over 25000.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4282183.47 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:58:55,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1146506.6666666667, ans=0.0 2023-12-23 12:58:59,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1146573.3333333333, ans=0.0 2023-12-23 12:59:14,120 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-172000.pt 2023-12-23 12:59:20,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-12-23 12:59:25,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1146706.6666666667, ans=0.1 2023-12-23 12:59:30,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1146706.6666666667, ans=0.125 2023-12-23 12:59:39,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1146773.3333333333, ans=0.2 2023-12-23 12:59:43,032 INFO [train.py:886] (0/4) Epoch 37, batch 450, loss[loss=0.0101, audio_tagging_loss=0.0101, over 23945.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4433876.36 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:59:47,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1146840.0, ans=0.5 2023-12-23 12:59:58,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1146906.6666666667, ans=0.0 2023-12-23 12:59:59,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1146906.6666666667, ans=0.0 2023-12-23 13:00:05,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2023-12-23 13:00:13,017 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.14 vs. limit=15.0 2023-12-23 13:00:16,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2023-12-23 13:00:17,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.59 vs. limit=10.0 2023-12-23 13:00:34,448 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.088e+01 3.542e+01 3.669e+01 3.831e+01 4.789e+01, threshold=7.338e+01, percent-clipped=0.0 2023-12-23 13:00:35,450 INFO [train.py:886] (0/4) Epoch 37, batch 500, loss[loss=0.01108, audio_tagging_loss=0.01108, over 25000.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4544072.35 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:00:36,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1147173.3333333333, ans=0.0 2023-12-23 13:00:38,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1147173.3333333333, ans=10.0 2023-12-23 13:00:58,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1147306.6666666667, ans=0.0 2023-12-23 13:01:05,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1147306.6666666667, ans=0.125 2023-12-23 13:01:15,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1147373.3333333333, ans=0.125 2023-12-23 13:01:16,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=15.0 2023-12-23 13:01:23,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.95 vs. limit=22.5 2023-12-23 13:01:27,936 INFO [train.py:886] (0/4) Epoch 37, batch 550, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4639232.74 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:01:38,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1147573.3333333333, ans=0.125 2023-12-23 13:01:47,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1147640.0, ans=0.0 2023-12-23 13:01:56,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1147706.6666666667, ans=0.125 2023-12-23 13:02:17,065 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.258e+01 3.550e+01 3.713e+01 3.833e+01 4.389e+01, threshold=7.427e+01, percent-clipped=0.0 2023-12-23 13:02:18,064 INFO [train.py:886] (0/4) Epoch 37, batch 600, loss[loss=0.01651, audio_tagging_loss=0.01651, over 24954.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4701909.54 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:02:18,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1147840.0, ans=0.2 2023-12-23 13:02:18,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1147840.0, ans=0.0 2023-12-23 13:02:24,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1147840.0, ans=0.1 2023-12-23 13:02:32,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1147906.6666666667, ans=0.0 2023-12-23 13:02:42,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1147973.3333333333, ans=0.1 2023-12-23 13:02:52,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.51 vs. limit=22.5 2023-12-23 13:02:57,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1148040.0, ans=0.0 2023-12-23 13:03:02,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.66 vs. limit=22.5 2023-12-23 13:03:10,619 INFO [train.py:886] (0/4) Epoch 37, batch 650, loss[loss=0.01478, audio_tagging_loss=0.01478, over 24947.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4756936.42 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:03:11,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1148173.3333333333, ans=10.0 2023-12-23 13:03:12,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.98 vs. limit=22.5 2023-12-23 13:03:34,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.18 vs. limit=10.0 2023-12-23 13:03:46,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1148373.3333333333, ans=0.2 2023-12-23 13:03:52,435 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:03:52,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1148440.0, ans=0.125 2023-12-23 13:03:57,274 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:04:00,809 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.245e+01 3.542e+01 3.681e+01 3.830e+01 5.017e+01, threshold=7.361e+01, percent-clipped=0.0 2023-12-23 13:04:01,830 INFO [train.py:886] (0/4) Epoch 37, batch 700, loss[loss=0.01149, audio_tagging_loss=0.01149, over 24007.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4801539.24 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:04:12,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.24 vs. limit=10.0 2023-12-23 13:04:12,150 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.63 vs. limit=15.0 2023-12-23 13:04:21,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-12-23 13:04:24,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1148640.0, ans=0.0 2023-12-23 13:04:25,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1148640.0, ans=0.125 2023-12-23 13:04:33,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1148706.6666666667, ans=0.125 2023-12-23 13:04:53,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1148840.0, ans=10.0 2023-12-23 13:04:53,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1148840.0, ans=0.125 2023-12-23 13:04:54,205 INFO [train.py:886] (0/4) Epoch 37, batch 750, loss[loss=0.01368, audio_tagging_loss=0.01368, over 25000.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4836403.41 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:04:55,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1148840.0, ans=0.125 2023-12-23 13:05:01,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1148840.0, ans=0.0 2023-12-23 13:05:01,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1148840.0, ans=0.1 2023-12-23 13:05:04,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2023-12-23 13:05:13,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1148906.6666666667, ans=0.125 2023-12-23 13:05:22,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1148973.3333333333, ans=0.0 2023-12-23 13:05:33,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.62 vs. limit=8.0 2023-12-23 13:05:38,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1149106.6666666667, ans=0.125 2023-12-23 13:05:45,670 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.160e+01 3.496e+01 3.641e+01 3.857e+01 4.376e+01, threshold=7.282e+01, percent-clipped=0.0 2023-12-23 13:05:46,705 INFO [train.py:886] (0/4) Epoch 37, batch 800, loss[loss=0.01258, audio_tagging_loss=0.01258, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4868444.32 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:05:54,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.09 vs. limit=10.0 2023-12-23 13:06:11,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=15.0 2023-12-23 13:06:16,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1149306.6666666667, ans=0.5 2023-12-23 13:06:27,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1149440.0, ans=0.125 2023-12-23 13:06:32,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1149440.0, ans=0.125 2023-12-23 13:06:35,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1149440.0, ans=0.125 2023-12-23 13:06:38,683 INFO [train.py:886] (0/4) Epoch 37, batch 850, loss[loss=0.01344, audio_tagging_loss=0.01344, over 25000.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4892284.59 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:06:38,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1149506.6666666667, ans=0.125 2023-12-23 13:06:39,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1149506.6666666667, ans=0.5 2023-12-23 13:06:43,823 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:06:46,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1149506.6666666667, ans=0.0 2023-12-23 13:07:28,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1149773.3333333333, ans=0.0 2023-12-23 13:07:29,568 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.137e+01 3.523e+01 3.659e+01 3.806e+01 4.931e+01, threshold=7.318e+01, percent-clipped=0.0 2023-12-23 13:07:30,533 INFO [train.py:886] (0/4) Epoch 37, batch 900, loss[loss=0.01207, audio_tagging_loss=0.01207, over 24750.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4909201.59 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:07:33,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1149840.0, ans=0.0 2023-12-23 13:07:52,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.65 vs. limit=10.0 2023-12-23 13:08:01,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1150040.0, ans=0.125 2023-12-23 13:08:08,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.37 vs. limit=15.0 2023-12-23 13:08:23,566 INFO [train.py:886] (0/4) Epoch 37, batch 950, loss[loss=0.01342, audio_tagging_loss=0.01342, over 24750.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4914455.79 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:08:30,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1150173.3333333333, ans=0.0 2023-12-23 13:08:44,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1150306.6666666667, ans=22.5 2023-12-23 13:09:00,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2023-12-23 13:09:14,549 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.245e+01 3.535e+01 3.650e+01 3.854e+01 4.325e+01, threshold=7.300e+01, percent-clipped=0.0 2023-12-23 13:09:15,527 INFO [train.py:886] (0/4) Epoch 37, batch 1000, loss[loss=0.009256, audio_tagging_loss=0.009256, over 24750.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4922351.52 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:09:41,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1150640.0, ans=0.125 2023-12-23 13:09:52,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1150706.6666666667, ans=0.125 2023-12-23 13:10:07,175 INFO [train.py:886] (0/4) Epoch 37, batch 1050, loss[loss=0.01116, audio_tagging_loss=0.01116, over 25000.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4931720.34 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:10:12,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1150840.0, ans=0.0 2023-12-23 13:10:22,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1150906.6666666667, ans=0.125 2023-12-23 13:10:28,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1150973.3333333333, ans=0.125 2023-12-23 13:10:32,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1150973.3333333333, ans=0.125 2023-12-23 13:10:56,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1151106.6666666667, ans=0.0 2023-12-23 13:10:58,213 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.215e+01 3.461e+01 3.648e+01 3.824e+01 4.846e+01, threshold=7.297e+01, percent-clipped=0.0 2023-12-23 13:10:59,201 INFO [train.py:886] (0/4) Epoch 37, batch 1100, loss[loss=0.01036, audio_tagging_loss=0.01036, over 24750.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4938826.92 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:11:08,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1151173.3333333333, ans=0.2 2023-12-23 13:11:39,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1151440.0, ans=0.2 2023-12-23 13:11:50,165 INFO [train.py:886] (0/4) Epoch 37, batch 1150, loss[loss=0.01201, audio_tagging_loss=0.01201, over 25000.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4943978.42 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:11:54,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1151506.6666666667, ans=0.0 2023-12-23 13:12:02,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1151573.3333333333, ans=0.0 2023-12-23 13:12:07,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1151573.3333333333, ans=0.2 2023-12-23 13:12:17,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=15.0 2023-12-23 13:12:40,752 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.217e+01 3.496e+01 3.649e+01 3.812e+01 4.517e+01, threshold=7.299e+01, percent-clipped=0.0 2023-12-23 13:12:42,429 INFO [train.py:886] (0/4) Epoch 37, batch 1200, loss[loss=0.01402, audio_tagging_loss=0.01402, over 25000.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4951180.76 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:13:02,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1151973.3333333333, ans=0.125 2023-12-23 13:13:04,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1151973.3333333333, ans=0.0 2023-12-23 13:13:09,167 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2023-12-23 13:13:16,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.89 vs. limit=10.0 2023-12-23 13:13:34,075 INFO [train.py:886] (0/4) Epoch 37, batch 1250, loss[loss=0.01181, audio_tagging_loss=0.01181, over 24750.00 frames. ], tot_loss[loss=0.01204, audio_tagging_loss=0.01204, over 4944684.35 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:14:02,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1152306.6666666667, ans=0.125 2023-12-23 13:14:24,777 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.171e+01 3.559e+01 3.736e+01 3.887e+01 4.435e+01, threshold=7.472e+01, percent-clipped=0.0 2023-12-23 13:14:25,763 INFO [train.py:886] (0/4) Epoch 37, batch 1300, loss[loss=0.01178, audio_tagging_loss=0.01178, over 24055.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4939166.01 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:14:54,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1152640.0, ans=0.0 2023-12-23 13:15:13,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1152773.3333333333, ans=0.025 2023-12-23 13:15:13,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.55 vs. limit=22.5 2023-12-23 13:15:16,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1152773.3333333333, ans=0.125 2023-12-23 13:15:18,421 INFO [train.py:886] (0/4) Epoch 37, batch 1350, loss[loss=0.01181, audio_tagging_loss=0.01181, over 23994.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4946737.17 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:15:22,454 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2023-12-23 13:15:24,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1152840.0, ans=0.125 2023-12-23 13:15:29,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1152906.6666666667, ans=0.0 2023-12-23 13:15:32,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1152906.6666666667, ans=0.0 2023-12-23 13:15:57,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1153040.0, ans=0.0 2023-12-23 13:16:08,516 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.118e+01 3.523e+01 3.683e+01 3.854e+01 4.444e+01, threshold=7.366e+01, percent-clipped=0.0 2023-12-23 13:16:08,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1153173.3333333333, ans=0.125 2023-12-23 13:16:10,185 INFO [train.py:886] (0/4) Epoch 37, batch 1400, loss[loss=0.01009, audio_tagging_loss=0.01009, over 25000.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4949761.95 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:16:12,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1153173.3333333333, ans=0.125 2023-12-23 13:16:18,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1153173.3333333333, ans=0.1 2023-12-23 13:16:50,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1153440.0, ans=0.125 2023-12-23 13:16:58,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1153440.0, ans=0.125 2023-12-23 13:17:01,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.24 vs. limit=15.0 2023-12-23 13:17:02,927 INFO [train.py:886] (0/4) Epoch 37, batch 1450, loss[loss=0.01379, audio_tagging_loss=0.01379, over 25000.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4953618.31 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:17:05,075 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:17:06,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1153506.6666666667, ans=0.125 2023-12-23 13:17:12,621 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:17:49,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1153773.3333333333, ans=0.0 2023-12-23 13:17:51,897 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.59 vs. limit=15.0 2023-12-23 13:17:53,072 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.154e+01 3.474e+01 3.653e+01 3.847e+01 4.349e+01, threshold=7.306e+01, percent-clipped=0.0 2023-12-23 13:17:54,032 INFO [train.py:886] (0/4) Epoch 37, batch 1500, loss[loss=0.009267, audio_tagging_loss=0.009267, over 25000.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4948198.19 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:17:56,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1153840.0, ans=0.0 2023-12-23 13:18:17,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1153973.3333333333, ans=0.2 2023-12-23 13:18:17,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.30 vs. limit=10.0 2023-12-23 13:18:18,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1153973.3333333333, ans=0.125 2023-12-23 13:18:23,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2023-12-23 13:18:24,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1154040.0, ans=0.125 2023-12-23 13:18:37,381 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:18:46,637 INFO [train.py:886] (0/4) Epoch 37, batch 1550, loss[loss=0.01341, audio_tagging_loss=0.01341, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4952874.11 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:18:49,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1154173.3333333333, ans=0.125 2023-12-23 13:18:53,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1154173.3333333333, ans=0.1 2023-12-23 13:19:34,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1154440.0, ans=0.2 2023-12-23 13:19:37,585 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.188e+01 3.596e+01 3.735e+01 3.939e+01 4.850e+01, threshold=7.470e+01, percent-clipped=0.0 2023-12-23 13:19:39,220 INFO [train.py:886] (0/4) Epoch 37, batch 1600, loss[loss=0.01132, audio_tagging_loss=0.01132, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4946387.42 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:19:47,207 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=12.0 2023-12-23 13:20:08,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1154706.6666666667, ans=0.125 2023-12-23 13:20:11,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1154706.6666666667, ans=0.0 2023-12-23 13:20:25,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1154773.3333333333, ans=0.2 2023-12-23 13:20:29,947 INFO [train.py:886] (0/4) Epoch 37, batch 1650, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4943742.44 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:20:37,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.70 vs. limit=15.0 2023-12-23 13:20:38,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.64 vs. limit=22.5 2023-12-23 13:20:42,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1154906.6666666667, ans=0.0 2023-12-23 13:20:42,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1154906.6666666667, ans=0.2 2023-12-23 13:20:47,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1154906.6666666667, ans=0.0 2023-12-23 13:21:06,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1155040.0, ans=0.125 2023-12-23 13:21:13,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1155106.6666666667, ans=0.125 2023-12-23 13:21:17,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1155106.6666666667, ans=0.125 2023-12-23 13:21:20,725 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.000e+01 3.491e+01 3.659e+01 3.859e+01 4.664e+01, threshold=7.317e+01, percent-clipped=0.0 2023-12-23 13:21:21,685 INFO [train.py:886] (0/4) Epoch 37, batch 1700, loss[loss=0.01297, audio_tagging_loss=0.01297, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4946705.48 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:21:27,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1155173.3333333333, ans=10.0 2023-12-23 13:21:39,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1155240.0, ans=0.0 2023-12-23 13:21:53,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1155373.3333333333, ans=0.0 2023-12-23 13:22:12,584 INFO [train.py:886] (0/4) Epoch 37, batch 1750, loss[loss=0.009979, audio_tagging_loss=0.009979, over 21968.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4946076.72 frames. ], batch size: 107, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:22:28,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1155573.3333333333, ans=0.1 2023-12-23 13:22:29,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1155573.3333333333, ans=0.125 2023-12-23 13:22:32,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1155640.0, ans=0.125 2023-12-23 13:22:37,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1155640.0, ans=0.0 2023-12-23 13:22:37,581 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=15.0 2023-12-23 13:22:49,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1155706.6666666667, ans=0.1 2023-12-23 13:23:02,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1155773.3333333333, ans=0.125 2023-12-23 13:23:03,626 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.050e+01 3.504e+01 3.681e+01 3.884e+01 4.385e+01, threshold=7.361e+01, percent-clipped=0.0 2023-12-23 13:23:04,632 INFO [train.py:886] (0/4) Epoch 37, batch 1800, loss[loss=0.01199, audio_tagging_loss=0.01199, over 25000.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4949698.44 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:23:07,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1155840.0, ans=0.0 2023-12-23 13:23:10,243 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.509e-02 2023-12-23 13:23:12,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1155840.0, ans=0.0 2023-12-23 13:23:17,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1155906.6666666667, ans=0.125 2023-12-23 13:23:18,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1155906.6666666667, ans=0.125 2023-12-23 13:23:23,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1155906.6666666667, ans=0.125 2023-12-23 13:23:37,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1156040.0, ans=0.0 2023-12-23 13:23:47,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=12.0 2023-12-23 13:23:56,192 INFO [train.py:886] (0/4) Epoch 37, batch 1850, loss[loss=0.01389, audio_tagging_loss=0.01389, over 25000.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4953026.77 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:24:10,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1156240.0, ans=0.125 2023-12-23 13:24:14,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1156240.0, ans=0.125 2023-12-23 13:24:16,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1156306.6666666667, ans=0.125 2023-12-23 13:24:22,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1156306.6666666667, ans=0.125 2023-12-23 13:24:26,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.55 vs. limit=22.5 2023-12-23 13:24:46,826 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.290e+01 3.589e+01 3.735e+01 3.882e+01 5.249e+01, threshold=7.471e+01, percent-clipped=0.0 2023-12-23 13:24:47,857 INFO [train.py:886] (0/4) Epoch 37, batch 1900, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4950360.76 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:24:49,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1156506.6666666667, ans=0.0 2023-12-23 13:25:39,104 INFO [train.py:886] (0/4) Epoch 37, batch 1950, loss[loss=0.01063, audio_tagging_loss=0.01063, over 24750.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4950227.56 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:26:09,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1157040.0, ans=0.0 2023-12-23 13:26:16,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1157040.0, ans=0.05 2023-12-23 13:26:26,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1157106.6666666667, ans=0.125 2023-12-23 13:26:28,621 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.224e+01 3.487e+01 3.716e+01 3.886e+01 4.600e+01, threshold=7.432e+01, percent-clipped=0.0 2023-12-23 13:26:30,335 INFO [train.py:886] (0/4) Epoch 37, batch 2000, loss[loss=0.009514, audio_tagging_loss=0.009514, over 25000.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4951922.16 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:26:43,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1157240.0, ans=0.125 2023-12-23 13:26:47,771 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2023-12-23 13:26:54,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1157306.6666666667, ans=0.0 2023-12-23 13:26:59,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1157373.3333333333, ans=0.2 2023-12-23 13:27:00,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2023-12-23 13:27:05,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2023-12-23 13:27:15,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1157440.0, ans=0.125 2023-12-23 13:27:21,485 INFO [train.py:886] (0/4) Epoch 37, batch 2050, loss[loss=0.01469, audio_tagging_loss=0.01469, over 24750.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4952114.28 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:27:58,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1157706.6666666667, ans=0.0 2023-12-23 13:28:09,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1157773.3333333333, ans=0.05 2023-12-23 13:28:12,074 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.053e+01 3.452e+01 3.576e+01 3.817e+01 4.679e+01, threshold=7.151e+01, percent-clipped=0.0 2023-12-23 13:28:13,066 INFO [train.py:886] (0/4) Epoch 37, batch 2100, loss[loss=0.01096, audio_tagging_loss=0.01096, over 24750.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4960031.20 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:28:18,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1157840.0, ans=0.0 2023-12-23 13:28:21,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1157840.0, ans=0.125 2023-12-23 13:28:23,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2023-12-23 13:28:32,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1157973.3333333333, ans=0.035 2023-12-23 13:28:34,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1157973.3333333333, ans=0.1 2023-12-23 13:28:35,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1157973.3333333333, ans=0.2 2023-12-23 13:29:02,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1158106.6666666667, ans=0.0 2023-12-23 13:29:04,790 INFO [train.py:886] (0/4) Epoch 37, batch 2150, loss[loss=0.01533, audio_tagging_loss=0.01533, over 24948.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4955312.15 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:29:05,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1158173.3333333333, ans=0.2 2023-12-23 13:29:05,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1158173.3333333333, ans=0.125 2023-12-23 13:29:11,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1158173.3333333333, ans=0.1 2023-12-23 13:29:12,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=1158173.3333333333, ans=0.1 2023-12-23 13:29:14,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1158240.0, ans=0.125 2023-12-23 13:29:19,735 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:29:30,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1158306.6666666667, ans=0.0 2023-12-23 13:29:37,412 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:29:46,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2023-12-23 13:29:55,696 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.073e+01 3.588e+01 3.736e+01 3.896e+01 4.550e+01, threshold=7.472e+01, percent-clipped=0.0 2023-12-23 13:29:56,689 INFO [train.py:886] (0/4) Epoch 37, batch 2200, loss[loss=0.01284, audio_tagging_loss=0.01284, over 24028.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4949186.01 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:30:05,253 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-12-23 13:30:18,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1158640.0, ans=0.0 2023-12-23 13:30:21,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1158640.0, ans=0.125 2023-12-23 13:30:35,797 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.17 vs. limit=10.0 2023-12-23 13:30:49,045 INFO [train.py:886] (0/4) Epoch 37, batch 2250, loss[loss=0.01179, audio_tagging_loss=0.01179, over 24750.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4950357.18 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:30:49,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.51 vs. limit=22.5 2023-12-23 13:31:02,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1158906.6666666667, ans=0.0 2023-12-23 13:31:34,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1159106.6666666667, ans=0.0 2023-12-23 13:31:38,834 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.103e+01 3.551e+01 3.685e+01 3.844e+01 4.557e+01, threshold=7.371e+01, percent-clipped=0.0 2023-12-23 13:31:40,554 INFO [train.py:886] (0/4) Epoch 37, batch 2300, loss[loss=0.01034, audio_tagging_loss=0.01034, over 25000.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4955279.23 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:32:01,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1159306.6666666667, ans=0.1 2023-12-23 13:32:02,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2023-12-23 13:32:08,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1159306.6666666667, ans=22.5 2023-12-23 13:32:08,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1159306.6666666667, ans=0.125 2023-12-23 13:32:08,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1159306.6666666667, ans=0.0 2023-12-23 13:32:16,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1159373.3333333333, ans=0.125 2023-12-23 13:32:31,780 INFO [train.py:886] (0/4) Epoch 37, batch 2350, loss[loss=0.008951, audio_tagging_loss=0.008951, over 25000.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4955946.16 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:32:43,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1159573.3333333333, ans=0.125 2023-12-23 13:33:00,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.61 vs. limit=12.0 2023-12-23 13:33:22,665 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.090e+01 3.485e+01 3.642e+01 3.779e+01 4.463e+01, threshold=7.283e+01, percent-clipped=0.0 2023-12-23 13:33:23,705 INFO [train.py:886] (0/4) Epoch 37, batch 2400, loss[loss=0.01172, audio_tagging_loss=0.01172, over 24750.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4958164.86 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:33:40,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1159906.6666666667, ans=0.125 2023-12-23 13:33:44,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=22.5 2023-12-23 13:33:46,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1159973.3333333333, ans=0.1 2023-12-23 13:33:54,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1160040.0, ans=0.125 2023-12-23 13:34:07,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1160106.6666666667, ans=0.125 2023-12-23 13:34:11,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1160106.6666666667, ans=0.0 2023-12-23 13:34:14,802 INFO [train.py:886] (0/4) Epoch 37, batch 2450, loss[loss=0.01047, audio_tagging_loss=0.01047, over 25000.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4961744.51 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:34:26,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1160240.0, ans=0.125 2023-12-23 13:34:27,022 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.03 vs. limit=22.5 2023-12-23 13:34:34,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1160240.0, ans=0.1 2023-12-23 13:34:43,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2023-12-23 13:35:06,194 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.251e+01 3.588e+01 3.750e+01 3.881e+01 5.588e+01, threshold=7.500e+01, percent-clipped=0.0 2023-12-23 13:35:07,154 INFO [train.py:886] (0/4) Epoch 37, batch 2500, loss[loss=0.01151, audio_tagging_loss=0.01151, over 24750.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4955157.09 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:35:08,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2023-12-23 13:35:13,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1160506.6666666667, ans=0.0 2023-12-23 13:35:15,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1160573.3333333333, ans=0.125 2023-12-23 13:35:21,754 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.82 vs. limit=22.5 2023-12-23 13:35:27,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2023-12-23 13:35:51,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1160773.3333333333, ans=0.1 2023-12-23 13:35:55,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1160773.3333333333, ans=0.1 2023-12-23 13:35:57,195 INFO [train.py:886] (0/4) Epoch 37, batch 2550, loss[loss=0.013, audio_tagging_loss=0.013, over 24750.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4950727.25 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:36:15,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1160906.6666666667, ans=0.2 2023-12-23 13:36:19,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1160973.3333333333, ans=0.125 2023-12-23 13:36:21,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.48 vs. limit=15.0 2023-12-23 13:36:48,130 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.252e+01 3.558e+01 3.722e+01 3.971e+01 4.498e+01, threshold=7.443e+01, percent-clipped=0.0 2023-12-23 13:36:49,155 INFO [train.py:886] (0/4) Epoch 37, batch 2600, loss[loss=0.01093, audio_tagging_loss=0.01093, over 24750.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4948146.16 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:36:53,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1161173.3333333333, ans=0.1 2023-12-23 13:37:04,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1161240.0, ans=0.125 2023-12-23 13:37:24,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2023-12-23 13:37:28,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1161373.3333333333, ans=0.125 2023-12-23 13:37:38,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1161440.0, ans=0.0 2023-12-23 13:37:42,241 INFO [train.py:886] (0/4) Epoch 37, batch 2650, loss[loss=0.01166, audio_tagging_loss=0.01166, over 24750.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4955046.36 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:37:51,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1161573.3333333333, ans=0.125 2023-12-23 13:37:53,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1161573.3333333333, ans=0.125 2023-12-23 13:37:58,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1161573.3333333333, ans=0.1 2023-12-23 13:38:16,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1161706.6666666667, ans=0.125 2023-12-23 13:38:28,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=15.0 2023-12-23 13:38:31,344 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.155e+01 3.489e+01 3.610e+01 3.806e+01 4.521e+01, threshold=7.219e+01, percent-clipped=0.0 2023-12-23 13:38:32,317 INFO [train.py:886] (0/4) Epoch 37, batch 2700, loss[loss=0.01133, audio_tagging_loss=0.01133, over 25000.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4953749.44 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:38:38,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1161840.0, ans=0.0 2023-12-23 13:38:48,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1161906.6666666667, ans=0.1 2023-12-23 13:38:51,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.57 vs. limit=8.0 2023-12-23 13:39:25,528 INFO [train.py:886] (0/4) Epoch 37, batch 2750, loss[loss=0.01501, audio_tagging_loss=0.01501, over 24750.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4957128.62 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:39:41,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1162240.0, ans=0.0 2023-12-23 13:39:41,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=15.0 2023-12-23 13:39:54,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1162306.6666666667, ans=0.1 2023-12-23 13:40:08,813 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.07 vs. limit=10.0 2023-12-23 13:40:15,310 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.145e+01 3.537e+01 3.670e+01 3.855e+01 4.195e+01, threshold=7.340e+01, percent-clipped=0.0 2023-12-23 13:40:16,313 INFO [train.py:886] (0/4) Epoch 37, batch 2800, loss[loss=0.01242, audio_tagging_loss=0.01242, over 24750.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4951568.30 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:40:25,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1162506.6666666667, ans=0.0 2023-12-23 13:40:39,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1162640.0, ans=0.125 2023-12-23 13:40:43,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1162640.0, ans=0.2 2023-12-23 13:40:47,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.46 vs. limit=15.0 2023-12-23 13:40:49,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.02 vs. limit=6.0 2023-12-23 13:41:03,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.19 vs. limit=6.0 2023-12-23 13:41:07,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1162773.3333333333, ans=0.0 2023-12-23 13:41:09,008 INFO [train.py:886] (0/4) Epoch 37, batch 2850, loss[loss=0.01272, audio_tagging_loss=0.01272, over 25000.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4951331.28 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:41:16,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1162840.0, ans=0.125 2023-12-23 13:41:41,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1163040.0, ans=0.125 2023-12-23 13:41:56,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1163106.6666666667, ans=0.025 2023-12-23 13:42:00,272 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.241e+01 3.535e+01 3.716e+01 3.872e+01 4.379e+01, threshold=7.432e+01, percent-clipped=0.0 2023-12-23 13:42:00,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1163173.3333333333, ans=0.2 2023-12-23 13:42:01,222 INFO [train.py:886] (0/4) Epoch 37, batch 2900, loss[loss=0.01163, audio_tagging_loss=0.01163, over 25000.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4950428.66 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:42:08,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1163173.3333333333, ans=0.0 2023-12-23 13:42:25,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1163306.6666666667, ans=0.125 2023-12-23 13:42:29,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1163306.6666666667, ans=0.1 2023-12-23 13:42:31,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1163373.3333333333, ans=0.125 2023-12-23 13:42:38,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1163373.3333333333, ans=0.0 2023-12-23 13:42:38,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1163373.3333333333, ans=0.125 2023-12-23 13:42:40,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1163373.3333333333, ans=0.0 2023-12-23 13:42:41,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1163440.0, ans=0.125 2023-12-23 13:42:42,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1163440.0, ans=0.125 2023-12-23 13:42:52,801 INFO [train.py:886] (0/4) Epoch 37, batch 2950, loss[loss=0.01058, audio_tagging_loss=0.01058, over 24750.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4955384.35 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:42:56,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1163506.6666666667, ans=0.0 2023-12-23 13:43:13,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1163640.0, ans=0.0 2023-12-23 13:43:15,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1163640.0, ans=0.1 2023-12-23 13:43:19,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2023-12-23 13:43:25,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1163706.6666666667, ans=0.0 2023-12-23 13:43:29,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1163706.6666666667, ans=0.125 2023-12-23 13:43:37,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1163773.3333333333, ans=0.0 2023-12-23 13:43:43,101 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.214e+01 3.508e+01 3.692e+01 3.793e+01 4.892e+01, threshold=7.383e+01, percent-clipped=0.0 2023-12-23 13:43:44,818 INFO [train.py:886] (0/4) Epoch 37, batch 3000, loss[loss=0.01122, audio_tagging_loss=0.01122, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4955287.71 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:43:44,820 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 13:44:05,925 INFO [train.py:917] (0/4) Epoch 37, validation: loss=0.03402, audio_tagging_loss=0.03402, over 3737520.00 frames. 2023-12-23 13:44:05,926 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 13:44:35,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1163973.3333333333, ans=0.2 2023-12-23 13:44:53,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1164106.6666666667, ans=0.2 2023-12-23 13:44:57,803 INFO [train.py:886] (0/4) Epoch 37, batch 3050, loss[loss=0.0112, audio_tagging_loss=0.0112, over 22600.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4959890.55 frames. ], batch size: 107, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:45:19,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1164306.6666666667, ans=0.125 2023-12-23 13:45:19,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1164306.6666666667, ans=0.0 2023-12-23 13:45:25,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1164306.6666666667, ans=0.0 2023-12-23 13:45:43,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1164440.0, ans=15.0 2023-12-23 13:45:48,485 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.196e+01 3.531e+01 3.694e+01 3.920e+01 4.438e+01, threshold=7.387e+01, percent-clipped=0.0 2023-12-23 13:45:49,440 INFO [train.py:886] (0/4) Epoch 37, batch 3100, loss[loss=0.0118, audio_tagging_loss=0.0118, over 24750.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4961914.99 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:45:51,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1164506.6666666667, ans=0.0 2023-12-23 13:45:53,498 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.62 vs. limit=12.0 2023-12-23 13:46:03,265 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:46:06,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-12-23 13:46:14,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1164640.0, ans=0.0 2023-12-23 13:46:16,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=12.0 2023-12-23 13:46:21,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.44 vs. limit=22.5 2023-12-23 13:46:22,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=1164706.6666666667, ans=0.02 2023-12-23 13:46:36,904 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2023-12-23 13:46:41,321 INFO [train.py:886] (0/4) Epoch 37, batch 3150, loss[loss=0.009599, audio_tagging_loss=0.009599, over 24750.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4952345.35 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:47:10,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2023-12-23 13:47:32,503 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.188e+01 3.567e+01 3.724e+01 3.935e+01 4.508e+01, threshold=7.447e+01, percent-clipped=0.0 2023-12-23 13:47:33,498 INFO [train.py:886] (0/4) Epoch 37, batch 3200, loss[loss=0.01061, audio_tagging_loss=0.01061, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4949668.22 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:47:41,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.45 vs. limit=22.5 2023-12-23 13:47:41,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1165173.3333333333, ans=0.1 2023-12-23 13:47:44,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1165240.0, ans=0.125 2023-12-23 13:48:01,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1165306.6666666667, ans=0.95 2023-12-23 13:48:02,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=15.0 2023-12-23 13:48:04,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1165373.3333333333, ans=0.0 2023-12-23 13:48:05,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.90 vs. limit=10.0 2023-12-23 13:48:13,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1165373.3333333333, ans=0.125 2023-12-23 13:48:25,162 INFO [train.py:886] (0/4) Epoch 37, batch 3250, loss[loss=0.01068, audio_tagging_loss=0.01068, over 22226.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4943139.99 frames. ], batch size: 107, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:48:28,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1165506.6666666667, ans=0.2 2023-12-23 13:48:37,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1165573.3333333333, ans=0.0 2023-12-23 13:48:54,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1165640.0, ans=0.0 2023-12-23 13:49:06,391 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:49:12,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1165773.3333333333, ans=0.0 2023-12-23 13:49:15,388 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.154e+01 3.509e+01 3.653e+01 3.824e+01 4.316e+01, threshold=7.305e+01, percent-clipped=0.0 2023-12-23 13:49:16,373 INFO [train.py:886] (0/4) Epoch 37, batch 3300, loss[loss=0.01196, audio_tagging_loss=0.01196, over 24750.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4950142.36 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:49:17,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2023-12-23 13:49:18,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1165840.0, ans=0.0 2023-12-23 13:49:31,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-12-23 13:49:36,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1165973.3333333333, ans=0.125 2023-12-23 13:49:36,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1165973.3333333333, ans=0.2 2023-12-23 13:49:45,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1165973.3333333333, ans=0.1 2023-12-23 13:49:58,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.46 vs. limit=6.0 2023-12-23 13:50:00,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2023-12-23 13:50:01,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2023-12-23 13:50:07,852 INFO [train.py:886] (0/4) Epoch 37, batch 3350, loss[loss=0.01223, audio_tagging_loss=0.01223, over 25000.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4949163.37 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:50:13,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1166173.3333333333, ans=0.0 2023-12-23 13:50:19,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1166240.0, ans=0.0 2023-12-23 13:50:42,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1166373.3333333333, ans=0.125 2023-12-23 13:50:42,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2023-12-23 13:50:58,582 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.110e+01 3.562e+01 3.721e+01 3.851e+01 4.497e+01, threshold=7.442e+01, percent-clipped=0.0 2023-12-23 13:51:00,227 INFO [train.py:886] (0/4) Epoch 37, batch 3400, loss[loss=0.01259, audio_tagging_loss=0.01259, over 25000.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4955163.86 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:51:10,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1166573.3333333333, ans=0.2 2023-12-23 13:51:19,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1166640.0, ans=0.2 2023-12-23 13:51:25,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1166640.0, ans=0.125 2023-12-23 13:51:36,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1166706.6666666667, ans=0.0 2023-12-23 13:51:40,500 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.26 vs. limit=10.0 2023-12-23 13:51:43,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1166773.3333333333, ans=0.0 2023-12-23 13:51:45,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1166773.3333333333, ans=0.125 2023-12-23 13:51:50,556 INFO [train.py:886] (0/4) Epoch 37, batch 3450, loss[loss=0.01313, audio_tagging_loss=0.01313, over 24750.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4951171.37 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:51:58,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1166840.0, ans=0.0 2023-12-23 13:52:09,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1166906.6666666667, ans=0.0 2023-12-23 13:52:18,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1166973.3333333333, ans=0.125 2023-12-23 13:52:37,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1167106.6666666667, ans=0.125 2023-12-23 13:52:41,674 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.255e+01 3.580e+01 3.707e+01 3.907e+01 4.394e+01, threshold=7.413e+01, percent-clipped=0.0 2023-12-23 13:52:42,666 INFO [train.py:886] (0/4) Epoch 37, batch 3500, loss[loss=0.01053, audio_tagging_loss=0.01053, over 24750.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4942579.05 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:52:57,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1167240.0, ans=0.125 2023-12-23 13:53:19,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1167373.3333333333, ans=0.1 2023-12-23 13:53:35,211 INFO [train.py:886] (0/4) Epoch 37, batch 3550, loss[loss=0.01253, audio_tagging_loss=0.01253, over 24750.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4941675.18 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 32.0 2023-12-23 13:53:36,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1167506.6666666667, ans=0.0 2023-12-23 13:53:36,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1167506.6666666667, ans=0.0 2023-12-23 13:53:51,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1167573.3333333333, ans=0.125 2023-12-23 13:53:56,713 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.38 vs. limit=10.0 2023-12-23 13:53:59,975 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:54:01,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1167640.0, ans=0.125 2023-12-23 13:54:26,590 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.024e+01 3.522e+01 3.659e+01 3.853e+01 4.426e+01, threshold=7.319e+01, percent-clipped=0.0 2023-12-23 13:54:26,614 INFO [train.py:886] (0/4) Epoch 37, batch 3600, loss[loss=0.01127, audio_tagging_loss=0.01127, over 25000.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4946232.40 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 32.0 2023-12-23 13:54:27,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1167840.0, ans=0.125 2023-12-23 13:54:36,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1167906.6666666667, ans=0.1 2023-12-23 13:54:42,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1167906.6666666667, ans=0.125 2023-12-23 13:55:19,999 INFO [train.py:886] (0/4) Epoch 37, batch 3650, loss[loss=0.01512, audio_tagging_loss=0.01512, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4942272.38 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:55:27,258 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2023-12-23 13:55:32,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.46 vs. limit=15.0 2023-12-23 13:55:50,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1168373.3333333333, ans=0.05 2023-12-23 13:56:01,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1168440.0, ans=0.2 2023-12-23 13:56:11,235 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.231e+01 3.538e+01 3.706e+01 3.858e+01 4.224e+01, threshold=7.412e+01, percent-clipped=0.0 2023-12-23 13:56:11,270 INFO [train.py:886] (0/4) Epoch 37, batch 3700, loss[loss=0.01191, audio_tagging_loss=0.01191, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4949491.01 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:56:17,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1168506.6666666667, ans=0.125 2023-12-23 13:56:23,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1168573.3333333333, ans=0.125 2023-12-23 13:56:26,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.94 vs. limit=6.0 2023-12-23 13:56:47,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2023-12-23 13:56:57,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1168773.3333333333, ans=0.125 2023-12-23 13:57:02,251 INFO [train.py:886] (0/4) Epoch 37, batch 3750, loss[loss=0.01381, audio_tagging_loss=0.01381, over 22020.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4943445.24 frames. ], batch size: 107, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:57:02,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1168840.0, ans=0.0 2023-12-23 13:57:10,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1168840.0, ans=0.1 2023-12-23 13:57:54,656 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.286e+01 3.646e+01 3.784e+01 3.939e+01 4.535e+01, threshold=7.569e+01, percent-clipped=0.0 2023-12-23 13:57:54,680 INFO [train.py:886] (0/4) Epoch 37, batch 3800, loss[loss=0.009451, audio_tagging_loss=0.009451, over 24750.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4942844.30 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:57:55,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1169173.3333333333, ans=0.2 2023-12-23 13:57:57,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1169173.3333333333, ans=0.125 2023-12-23 13:58:12,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.53 vs. limit=15.0 2023-12-23 13:58:13,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1169306.6666666667, ans=0.0 2023-12-23 13:58:17,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1169306.6666666667, ans=0.2 2023-12-23 13:58:28,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1169373.3333333333, ans=0.1 2023-12-23 13:58:31,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1169373.3333333333, ans=0.125 2023-12-23 13:58:38,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2023-12-23 13:58:38,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1169440.0, ans=0.125 2023-12-23 13:58:38,843 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2023-12-23 13:58:40,744 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.83 vs. limit=15.0 2023-12-23 13:58:44,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1169440.0, ans=0.1 2023-12-23 13:58:45,934 INFO [train.py:886] (0/4) Epoch 37, batch 3850, loss[loss=0.009729, audio_tagging_loss=0.009729, over 24750.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4941022.90 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:58:46,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1169506.6666666667, ans=0.1 2023-12-23 13:58:55,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1169573.3333333333, ans=0.1 2023-12-23 13:59:06,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.55 vs. limit=22.5 2023-12-23 13:59:39,047 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.134e+01 3.574e+01 3.713e+01 3.863e+01 4.498e+01, threshold=7.426e+01, percent-clipped=0.0 2023-12-23 13:59:39,071 INFO [train.py:886] (0/4) Epoch 37, batch 3900, loss[loss=0.0101, audio_tagging_loss=0.0101, over 25000.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4942874.46 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:59:43,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1169840.0, ans=0.125 2023-12-23 13:59:55,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1169906.6666666667, ans=0.0 2023-12-23 14:00:07,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2023-12-23 14:00:12,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.81 vs. limit=22.5 2023-12-23 14:00:29,158 INFO [train.py:886] (0/4) Epoch 37, batch 3950, loss[loss=0.01158, audio_tagging_loss=0.01158, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4942882.78 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:00:49,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1170306.6666666667, ans=0.1 2023-12-23 14:00:57,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1170306.6666666667, ans=0.1 2023-12-23 14:00:57,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1170306.6666666667, ans=0.125 2023-12-23 14:01:07,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1170373.3333333333, ans=0.125 2023-12-23 14:01:10,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.19 vs. limit=10.0 2023-12-23 14:01:10,515 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2023-12-23 14:01:12,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1170440.0, ans=0.125 2023-12-23 14:01:19,036 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:01:21,649 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.229e+01 3.552e+01 3.697e+01 3.830e+01 4.529e+01, threshold=7.395e+01, percent-clipped=0.0 2023-12-23 14:01:21,673 INFO [train.py:886] (0/4) Epoch 37, batch 4000, loss[loss=0.01092, audio_tagging_loss=0.01092, over 25000.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4954235.84 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:01:23,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1170506.6666666667, ans=0.2 2023-12-23 14:01:35,965 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.51 vs. limit=6.0 2023-12-23 14:02:02,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1170773.3333333333, ans=0.0 2023-12-23 14:02:14,050 INFO [train.py:886] (0/4) Epoch 37, batch 4050, loss[loss=0.01275, audio_tagging_loss=0.01275, over 24750.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4955778.01 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:02:53,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1171040.0, ans=0.0 2023-12-23 14:03:06,580 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.265e+01 3.643e+01 3.788e+01 3.914e+01 4.371e+01, threshold=7.577e+01, percent-clipped=0.0 2023-12-23 14:03:06,604 INFO [train.py:886] (0/4) Epoch 37, batch 4100, loss[loss=0.01156, audio_tagging_loss=0.01156, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4950623.21 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:03:07,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1171173.3333333333, ans=0.2 2023-12-23 14:03:11,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1171173.3333333333, ans=0.125 2023-12-23 14:03:18,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1171240.0, ans=6.0 2023-12-23 14:03:19,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1171240.0, ans=0.125 2023-12-23 14:03:25,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1171240.0, ans=0.2 2023-12-23 14:03:25,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.17 vs. limit=10.0 2023-12-23 14:03:29,743 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:03:40,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1171373.3333333333, ans=0.1 2023-12-23 14:03:53,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1171440.0, ans=0.0 2023-12-23 14:03:59,157 INFO [train.py:886] (0/4) Epoch 37, batch 4150, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4949088.89 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:04:02,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1171506.6666666667, ans=0.125 2023-12-23 14:04:11,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1171573.3333333333, ans=10.0 2023-12-23 14:04:36,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1171706.6666666667, ans=0.0 2023-12-23 14:04:50,303 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.188e+01 3.521e+01 3.688e+01 3.874e+01 5.030e+01, threshold=7.375e+01, percent-clipped=0.0 2023-12-23 14:04:50,327 INFO [train.py:886] (0/4) Epoch 37, batch 4200, loss[loss=0.01319, audio_tagging_loss=0.01319, over 25000.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4944109.86 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:05:08,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=12.0 2023-12-23 14:05:19,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1171973.3333333333, ans=0.035 2023-12-23 14:05:42,848 INFO [train.py:886] (0/4) Epoch 37, batch 4250, loss[loss=0.01152, audio_tagging_loss=0.01152, over 24750.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4947880.12 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:06:11,464 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:06:13,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1172373.3333333333, ans=0.125 2023-12-23 14:06:16,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1172373.3333333333, ans=0.0 2023-12-23 14:06:17,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1172373.3333333333, ans=0.125 2023-12-23 14:06:34,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.64 vs. limit=22.5 2023-12-23 14:06:35,034 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.171e+01 3.544e+01 3.699e+01 3.834e+01 4.591e+01, threshold=7.397e+01, percent-clipped=0.0 2023-12-23 14:06:35,058 INFO [train.py:886] (0/4) Epoch 37, batch 4300, loss[loss=0.01129, audio_tagging_loss=0.01129, over 25000.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4956814.75 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:06:41,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1172506.6666666667, ans=0.1 2023-12-23 14:06:46,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.95 vs. limit=10.0 2023-12-23 14:06:50,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1172573.3333333333, ans=0.0 2023-12-23 14:06:54,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1172640.0, ans=0.0 2023-12-23 14:06:56,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1172640.0, ans=0.1 2023-12-23 14:06:59,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1172640.0, ans=0.0 2023-12-23 14:07:09,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.48 vs. limit=6.0 2023-12-23 14:07:15,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1172773.3333333333, ans=0.1 2023-12-23 14:07:26,726 INFO [train.py:886] (0/4) Epoch 37, batch 4350, loss[loss=0.01228, audio_tagging_loss=0.01228, over 25000.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4957940.67 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:07:27,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1172840.0, ans=0.125 2023-12-23 14:07:28,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1172840.0, ans=0.125 2023-12-23 14:07:33,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-12-23 14:07:35,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1172840.0, ans=0.05 2023-12-23 14:07:54,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=1172973.3333333333, ans=10.0 2023-12-23 14:08:05,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1173040.0, ans=0.1 2023-12-23 14:08:08,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1173106.6666666667, ans=0.0 2023-12-23 14:08:18,190 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.237e+01 3.582e+01 3.726e+01 3.912e+01 4.794e+01, threshold=7.453e+01, percent-clipped=0.0 2023-12-23 14:08:18,213 INFO [train.py:886] (0/4) Epoch 37, batch 4400, loss[loss=0.01283, audio_tagging_loss=0.01283, over 24750.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4949722.63 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:08:20,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1173173.3333333333, ans=0.0 2023-12-23 14:08:36,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1173240.0, ans=0.0 2023-12-23 14:08:43,006 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-176000.pt 2023-12-23 14:08:54,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=15.0 2023-12-23 14:09:13,013 INFO [train.py:886] (0/4) Epoch 37, batch 4450, loss[loss=0.01074, audio_tagging_loss=0.01074, over 22218.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4939365.01 frames. ], batch size: 107, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:09:18,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1173506.6666666667, ans=0.125 2023-12-23 14:09:50,716 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.649e-03 2023-12-23 14:09:51,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1173706.6666666667, ans=0.125 2023-12-23 14:09:51,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1173706.6666666667, ans=10.0 2023-12-23 14:09:55,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1173773.3333333333, ans=0.125 2023-12-23 14:10:05,055 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.150e+01 3.567e+01 3.732e+01 3.907e+01 4.248e+01, threshold=7.464e+01, percent-clipped=0.0 2023-12-23 14:10:05,080 INFO [train.py:886] (0/4) Epoch 37, batch 4500, loss[loss=0.01215, audio_tagging_loss=0.01215, over 24919.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4946609.92 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:10:09,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1173840.0, ans=0.1 2023-12-23 14:10:32,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2023-12-23 14:10:39,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1174040.0, ans=0.0 2023-12-23 14:10:56,692 INFO [train.py:886] (0/4) Epoch 37, batch 4550, loss[loss=0.00975, audio_tagging_loss=0.00975, over 25000.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4953667.67 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:10:59,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1174173.3333333333, ans=0.125 2023-12-23 14:11:09,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1174240.0, ans=0.07 2023-12-23 14:11:27,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=15.0 2023-12-23 14:11:43,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1174440.0, ans=0.125 2023-12-23 14:11:47,705 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.143e+01 3.524e+01 3.710e+01 3.904e+01 4.910e+01, threshold=7.420e+01, percent-clipped=0.0 2023-12-23 14:11:47,730 INFO [train.py:886] (0/4) Epoch 37, batch 4600, loss[loss=0.01165, audio_tagging_loss=0.01165, over 25000.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4954954.09 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:11:58,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1174573.3333333333, ans=0.0 2023-12-23 14:12:08,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1174640.0, ans=0.09899494936611666 2023-12-23 14:12:12,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1174640.0, ans=0.0 2023-12-23 14:12:29,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1174773.3333333333, ans=0.2 2023-12-23 14:12:29,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1174773.3333333333, ans=0.2 2023-12-23 14:12:34,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1174773.3333333333, ans=0.1 2023-12-23 14:12:36,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1174773.3333333333, ans=0.125 2023-12-23 14:12:37,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1174773.3333333333, ans=0.1 2023-12-23 14:12:40,252 INFO [train.py:886] (0/4) Epoch 37, batch 4650, loss[loss=0.009662, audio_tagging_loss=0.009662, over 25000.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4958294.07 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:12:40,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1174840.0, ans=0.2 2023-12-23 14:12:40,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1174840.0, ans=0.0 2023-12-23 14:13:00,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1174973.3333333333, ans=0.0 2023-12-23 14:13:10,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1174973.3333333333, ans=0.125 2023-12-23 14:13:31,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1175173.3333333333, ans=0.0 2023-12-23 14:13:31,793 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.187e+01 3.524e+01 3.714e+01 3.861e+01 4.972e+01, threshold=7.428e+01, percent-clipped=0.0 2023-12-23 14:13:31,817 INFO [train.py:886] (0/4) Epoch 37, batch 4700, loss[loss=0.01338, audio_tagging_loss=0.01338, over 24750.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4951595.85 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:13:43,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.91 vs. limit=15.0 2023-12-23 14:14:01,464 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2023-12-23 14:14:02,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=1175373.3333333333, ans=0.1 2023-12-23 14:14:05,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1175373.3333333333, ans=0.1 2023-12-23 14:14:13,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2023-12-23 14:14:14,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1175440.0, ans=0.125 2023-12-23 14:14:18,215 INFO [train.py:886] (0/4) Epoch 37, batch 4750, loss[loss=0.01251, audio_tagging_loss=0.01251, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4949129.00 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:14:20,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1175506.6666666667, ans=0.0 2023-12-23 14:14:21,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1175506.6666666667, ans=0.125 2023-12-23 14:14:33,876 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-37.pt 2023-12-23 14:14:52,799 INFO [train.py:886] (0/4) Epoch 38, batch 0, loss[loss=0.02541, audio_tagging_loss=0.02541, over 25000.00 frames. ], tot_loss[loss=0.02541, audio_tagging_loss=0.02541, over 25000.00 frames. ], batch size: 100, lr: 2.85e-03, grad_scale: 32.0 2023-12-23 14:14:52,800 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 14:15:14,008 INFO [train.py:917] (0/4) Epoch 38, validation: loss=0.03366, audio_tagging_loss=0.03366, over 3737520.00 frames. 2023-12-23 14:15:14,008 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 14:15:17,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1175613.3333333333, ans=0.125 2023-12-23 14:15:22,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1175613.3333333333, ans=0.125 2023-12-23 14:15:23,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.48 vs. limit=22.5 2023-12-23 14:15:48,236 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.219e+01 3.729e+01 3.996e+01 5.182e+01 1.024e+02, threshold=7.991e+01, percent-clipped=5.0 2023-12-23 14:15:56,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1175880.0, ans=0.125 2023-12-23 14:16:06,241 INFO [train.py:886] (0/4) Epoch 38, batch 50, loss[loss=0.01569, audio_tagging_loss=0.01569, over 25000.00 frames. ], tot_loss[loss=0.01874, audio_tagging_loss=0.01874, over 1118688.97 frames. ], batch size: 100, lr: 2.85e-03, grad_scale: 32.0 2023-12-23 14:16:07,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1175946.6666666667, ans=0.125 2023-12-23 14:16:14,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1175946.6666666667, ans=0.0 2023-12-23 14:16:24,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1176013.3333333333, ans=0.0 2023-12-23 14:16:47,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1176213.3333333333, ans=0.1 2023-12-23 14:16:50,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1176213.3333333333, ans=0.125 2023-12-23 14:16:55,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1176213.3333333333, ans=0.0 2023-12-23 14:16:57,335 INFO [train.py:886] (0/4) Epoch 38, batch 100, loss[loss=0.01424, audio_tagging_loss=0.01424, over 25000.00 frames. ], tot_loss[loss=0.01639, audio_tagging_loss=0.01639, over 1968369.88 frames. ], batch size: 100, lr: 2.85e-03, grad_scale: 32.0 2023-12-23 14:17:01,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2023-12-23 14:17:05,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1176280.0, ans=0.125 2023-12-23 14:17:06,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1176280.0, ans=0.0 2023-12-23 14:17:13,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1176346.6666666667, ans=0.1 2023-12-23 14:17:31,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1176480.0, ans=0.125 2023-12-23 14:17:32,539 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.495e+01 3.959e+01 4.145e+01 4.396e+01 5.235e+01, threshold=8.289e+01, percent-clipped=0.0 2023-12-23 14:17:49,836 INFO [train.py:886] (0/4) Epoch 38, batch 150, loss[loss=0.01087, audio_tagging_loss=0.01087, over 25000.00 frames. ], tot_loss[loss=0.01486, audio_tagging_loss=0.01486, over 2631646.79 frames. ], batch size: 100, lr: 2.85e-03, grad_scale: 32.0 2023-12-23 14:17:53,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1176613.3333333333, ans=0.025 2023-12-23 14:18:11,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1176746.6666666667, ans=0.125 2023-12-23 14:18:20,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1176813.3333333333, ans=0.125 2023-12-23 14:18:32,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1176880.0, ans=0.125 2023-12-23 14:18:37,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2023-12-23 14:18:41,134 INFO [train.py:886] (0/4) Epoch 38, batch 200, loss[loss=0.01309, audio_tagging_loss=0.01309, over 25000.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 3141835.51 frames. ], batch size: 100, lr: 2.85e-03, grad_scale: 32.0 2023-12-23 14:19:15,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1177146.6666666667, ans=0.2 2023-12-23 14:19:16,764 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.167e+01 3.567e+01 3.766e+01 3.950e+01 4.355e+01, threshold=7.532e+01, percent-clipped=0.0 2023-12-23 14:19:17,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=15.0 2023-12-23 14:19:32,663 INFO [train.py:886] (0/4) Epoch 38, batch 250, loss[loss=0.01573, audio_tagging_loss=0.01573, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 3549303.34 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:19:52,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1177346.6666666667, ans=0.125 2023-12-23 14:19:56,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1177413.3333333333, ans=0.1 2023-12-23 14:20:03,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1177480.0, ans=0.0 2023-12-23 14:20:24,855 INFO [train.py:886] (0/4) Epoch 38, batch 300, loss[loss=0.01148, audio_tagging_loss=0.01148, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 3857025.38 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:20:44,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1177746.6666666667, ans=0.125 2023-12-23 14:20:59,864 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.270e+01 3.565e+01 3.753e+01 3.870e+01 4.486e+01, threshold=7.506e+01, percent-clipped=0.0 2023-12-23 14:21:15,783 INFO [train.py:886] (0/4) Epoch 38, batch 350, loss[loss=0.0143, audio_tagging_loss=0.0143, over 24750.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4092858.14 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:21:35,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1178013.3333333333, ans=0.125 2023-12-23 14:21:41,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1178080.0, ans=0.125 2023-12-23 14:21:44,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1178080.0, ans=0.125 2023-12-23 14:21:51,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1178146.6666666667, ans=0.025 2023-12-23 14:22:08,443 INFO [train.py:886] (0/4) Epoch 38, batch 400, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4286085.66 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:22:16,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2023-12-23 14:22:25,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1178346.6666666667, ans=0.1 2023-12-23 14:22:36,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1178413.3333333333, ans=0.125 2023-12-23 14:22:43,732 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.271e+01 3.551e+01 3.719e+01 3.910e+01 4.441e+01, threshold=7.437e+01, percent-clipped=0.0 2023-12-23 14:22:50,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2023-12-23 14:22:53,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1178546.6666666667, ans=0.125 2023-12-23 14:22:58,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1178546.6666666667, ans=0.125 2023-12-23 14:23:01,092 INFO [train.py:886] (0/4) Epoch 38, batch 450, loss[loss=0.01098, audio_tagging_loss=0.01098, over 24750.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4432080.10 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:23:19,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1178680.0, ans=0.0 2023-12-23 14:23:23,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1178746.6666666667, ans=0.0 2023-12-23 14:23:32,252 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.02 vs. limit=10.0 2023-12-23 14:23:36,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1178813.3333333333, ans=0.125 2023-12-23 14:23:39,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1178813.3333333333, ans=0.2 2023-12-23 14:23:40,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1178813.3333333333, ans=0.2 2023-12-23 14:23:52,214 INFO [train.py:886] (0/4) Epoch 38, batch 500, loss[loss=0.01166, audio_tagging_loss=0.01166, over 25000.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4550502.19 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:23:57,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1178946.6666666667, ans=0.5 2023-12-23 14:24:15,459 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=12.0 2023-12-23 14:24:23,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1179146.6666666667, ans=0.125 2023-12-23 14:24:27,113 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.290e+01 3.549e+01 3.713e+01 3.859e+01 4.188e+01, threshold=7.427e+01, percent-clipped=0.0 2023-12-23 14:24:33,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1179213.3333333333, ans=0.2 2023-12-23 14:24:44,561 INFO [train.py:886] (0/4) Epoch 38, batch 550, loss[loss=0.01278, audio_tagging_loss=0.01278, over 25000.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4643270.63 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:24:47,301 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:24:58,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1179346.6666666667, ans=0.1 2023-12-23 14:25:00,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.28 vs. limit=15.0 2023-12-23 14:25:01,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2023-12-23 14:25:06,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1179413.3333333333, ans=0.0 2023-12-23 14:25:22,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1179480.0, ans=0.0 2023-12-23 14:25:25,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1179546.6666666667, ans=0.0 2023-12-23 14:25:26,363 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2023-12-23 14:25:32,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1179546.6666666667, ans=0.1 2023-12-23 14:25:36,031 INFO [train.py:886] (0/4) Epoch 38, batch 600, loss[loss=0.01435, audio_tagging_loss=0.01435, over 24750.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4707054.94 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:25:38,400 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2023-12-23 14:25:55,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1179680.0, ans=0.125 2023-12-23 14:25:55,602 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=6.0 2023-12-23 14:25:56,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1179746.6666666667, ans=0.125 2023-12-23 14:26:11,723 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.162e+01 3.607e+01 3.767e+01 3.907e+01 4.380e+01, threshold=7.534e+01, percent-clipped=0.0 2023-12-23 14:26:18,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1179880.0, ans=0.04949747468305833 2023-12-23 14:26:21,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1179880.0, ans=0.5 2023-12-23 14:26:23,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1179880.0, ans=0.125 2023-12-23 14:26:28,537 INFO [train.py:886] (0/4) Epoch 38, batch 650, loss[loss=0.009334, audio_tagging_loss=0.009334, over 21577.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4753571.83 frames. ], batch size: 107, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:26:42,077 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.79 vs. limit=10.0 2023-12-23 14:26:44,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1180013.3333333333, ans=0.1 2023-12-23 14:26:44,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2023-12-23 14:26:49,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1180080.0, ans=0.125 2023-12-23 14:26:58,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1180080.0, ans=0.0 2023-12-23 14:27:00,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1180146.6666666667, ans=0.035 2023-12-23 14:27:02,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2023-12-23 14:27:17,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.71 vs. limit=22.5 2023-12-23 14:27:20,562 INFO [train.py:886] (0/4) Epoch 38, batch 700, loss[loss=0.01248, audio_tagging_loss=0.01248, over 24750.00 frames. ], tot_loss[loss=0.01204, audio_tagging_loss=0.01204, over 4794407.33 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:27:32,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1180346.6666666667, ans=0.05 2023-12-23 14:27:55,670 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.264e+01 3.550e+01 3.700e+01 3.852e+01 4.697e+01, threshold=7.400e+01, percent-clipped=0.0 2023-12-23 14:28:00,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1180546.6666666667, ans=0.125 2023-12-23 14:28:11,524 INFO [train.py:886] (0/4) Epoch 38, batch 750, loss[loss=0.01165, audio_tagging_loss=0.01165, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4830112.80 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:28:19,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1180613.3333333333, ans=0.0 2023-12-23 14:28:37,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1180746.6666666667, ans=0.0 2023-12-23 14:28:37,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-12-23 14:28:50,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1180813.3333333333, ans=0.0 2023-12-23 14:29:02,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2023-12-23 14:29:04,142 INFO [train.py:886] (0/4) Epoch 38, batch 800, loss[loss=0.01283, audio_tagging_loss=0.01283, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4857733.91 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:29:05,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.91 vs. limit=6.0 2023-12-23 14:29:07,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1180946.6666666667, ans=0.0 2023-12-23 14:29:09,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1180946.6666666667, ans=10.0 2023-12-23 14:29:12,611 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.536e-02 2023-12-23 14:29:31,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1181080.0, ans=0.125 2023-12-23 14:29:32,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2023-12-23 14:29:39,326 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.258e+01 3.509e+01 3.712e+01 3.867e+01 4.445e+01, threshold=7.423e+01, percent-clipped=0.0 2023-12-23 14:29:44,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1181213.3333333333, ans=0.125 2023-12-23 14:29:46,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2023-12-23 14:29:53,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1181213.3333333333, ans=0.125 2023-12-23 14:29:54,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.34 vs. limit=6.0 2023-12-23 14:29:55,988 INFO [train.py:886] (0/4) Epoch 38, batch 850, loss[loss=0.01228, audio_tagging_loss=0.01228, over 25000.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4884869.59 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:30:07,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1181346.6666666667, ans=0.125 2023-12-23 14:30:13,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1181346.6666666667, ans=0.125 2023-12-23 14:30:15,325 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.96 vs. limit=15.0 2023-12-23 14:30:15,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1181413.3333333333, ans=0.125 2023-12-23 14:30:20,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.80 vs. limit=22.5 2023-12-23 14:30:38,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1181546.6666666667, ans=0.1 2023-12-23 14:30:40,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1181546.6666666667, ans=0.07 2023-12-23 14:30:47,864 INFO [train.py:886] (0/4) Epoch 38, batch 900, loss[loss=0.01044, audio_tagging_loss=0.01044, over 25000.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4898464.67 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:30:48,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1181613.3333333333, ans=0.0 2023-12-23 14:30:50,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1181613.3333333333, ans=0.0 2023-12-23 14:30:56,408 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:30:59,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1181680.0, ans=0.0 2023-12-23 14:31:11,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.85 vs. limit=22.5 2023-12-23 14:31:23,124 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.323e+01 3.602e+01 3.689e+01 3.922e+01 4.328e+01, threshold=7.378e+01, percent-clipped=0.0 2023-12-23 14:31:24,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1181813.3333333333, ans=0.125 2023-12-23 14:31:25,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1181813.3333333333, ans=0.0 2023-12-23 14:31:28,025 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2023-12-23 14:31:39,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=12.0 2023-12-23 14:31:41,097 INFO [train.py:886] (0/4) Epoch 38, batch 950, loss[loss=0.0122, audio_tagging_loss=0.0122, over 24750.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4907298.16 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:31:50,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1182013.3333333333, ans=0.0 2023-12-23 14:32:09,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2023-12-23 14:32:14,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1182146.6666666667, ans=0.125 2023-12-23 14:32:18,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-12-23 14:32:32,447 INFO [train.py:886] (0/4) Epoch 38, batch 1000, loss[loss=0.01162, audio_tagging_loss=0.01162, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4920518.70 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:32:56,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1182413.3333333333, ans=0.125 2023-12-23 14:32:57,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1182413.3333333333, ans=0.125 2023-12-23 14:33:06,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1182480.0, ans=0.2 2023-12-23 14:33:07,751 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.182e+01 3.581e+01 3.724e+01 3.913e+01 4.572e+01, threshold=7.448e+01, percent-clipped=0.0 2023-12-23 14:33:19,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1182546.6666666667, ans=0.125 2023-12-23 14:33:21,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2023-12-23 14:33:21,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1182546.6666666667, ans=0.125 2023-12-23 14:33:24,482 INFO [train.py:886] (0/4) Epoch 38, batch 1050, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4928730.53 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:33:35,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1182680.0, ans=0.0 2023-12-23 14:33:45,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1182746.6666666667, ans=0.0 2023-12-23 14:33:45,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1182746.6666666667, ans=0.125 2023-12-23 14:34:00,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1182813.3333333333, ans=0.125 2023-12-23 14:34:16,313 INFO [train.py:886] (0/4) Epoch 38, batch 1100, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4930235.40 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:34:32,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.00 vs. limit=6.0 2023-12-23 14:34:47,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1183146.6666666667, ans=0.125 2023-12-23 14:34:52,079 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.109e+01 3.541e+01 3.703e+01 3.876e+01 4.507e+01, threshold=7.407e+01, percent-clipped=0.0 2023-12-23 14:35:04,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2023-12-23 14:35:07,327 INFO [train.py:886] (0/4) Epoch 38, batch 1150, loss[loss=0.01035, audio_tagging_loss=0.01035, over 24048.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4942400.12 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:35:38,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1183480.0, ans=0.125 2023-12-23 14:35:49,341 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-12-23 14:35:55,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1183546.6666666667, ans=0.1 2023-12-23 14:35:59,438 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:36:00,183 INFO [train.py:886] (0/4) Epoch 38, batch 1200, loss[loss=0.01477, audio_tagging_loss=0.01477, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4947517.16 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:36:24,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2023-12-23 14:36:35,475 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.439e+01 3.644e+01 3.844e+01 4.002e+01 4.470e+01, threshold=7.688e+01, percent-clipped=0.0 2023-12-23 14:36:35,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1183813.3333333333, ans=0.125 2023-12-23 14:36:40,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1183880.0, ans=0.0 2023-12-23 14:36:46,149 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:36:46,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1183880.0, ans=0.07 2023-12-23 14:36:51,364 INFO [train.py:886] (0/4) Epoch 38, batch 1250, loss[loss=0.01233, audio_tagging_loss=0.01233, over 21655.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4944319.20 frames. ], batch size: 107, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:36:53,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1183946.6666666667, ans=0.1 2023-12-23 14:37:06,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1184013.3333333333, ans=0.0 2023-12-23 14:37:08,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1184013.3333333333, ans=0.1 2023-12-23 14:37:14,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1184080.0, ans=0.0 2023-12-23 14:37:14,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.58 vs. limit=12.0 2023-12-23 14:37:16,511 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:37:34,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=15.0 2023-12-23 14:37:40,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1184213.3333333333, ans=0.125 2023-12-23 14:37:43,630 INFO [train.py:886] (0/4) Epoch 38, batch 1300, loss[loss=0.009452, audio_tagging_loss=0.009452, over 24750.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4941373.76 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:37:49,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1184280.0, ans=0.05 2023-12-23 14:38:02,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-12-23 14:38:12,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1184413.3333333333, ans=0.2 2023-12-23 14:38:18,639 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.243e+01 3.655e+01 3.794e+01 4.002e+01 4.408e+01, threshold=7.587e+01, percent-clipped=0.0 2023-12-23 14:38:19,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1184480.0, ans=0.0 2023-12-23 14:38:29,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1184546.6666666667, ans=0.04949747468305833 2023-12-23 14:38:34,429 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.40 vs. limit=22.5 2023-12-23 14:38:35,821 INFO [train.py:886] (0/4) Epoch 38, batch 1350, loss[loss=0.01096, audio_tagging_loss=0.01096, over 25000.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4945417.25 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:38:41,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1184613.3333333333, ans=0.1 2023-12-23 14:38:42,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1184613.3333333333, ans=0.09899494936611666 2023-12-23 14:38:51,355 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:39:01,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1184746.6666666667, ans=0.125 2023-12-23 14:39:18,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1184880.0, ans=0.0 2023-12-23 14:39:21,716 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:39:23,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.73 vs. limit=15.0 2023-12-23 14:39:26,140 INFO [train.py:886] (0/4) Epoch 38, batch 1400, loss[loss=0.0111, audio_tagging_loss=0.0111, over 25000.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4942972.85 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:39:27,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1184946.6666666667, ans=0.0 2023-12-23 14:39:44,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=12.0 2023-12-23 14:39:48,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1185080.0, ans=0.0 2023-12-23 14:39:50,463 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2023-12-23 14:39:52,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1185080.0, ans=0.07 2023-12-23 14:39:54,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1185080.0, ans=0.125 2023-12-23 14:40:01,442 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.205e+01 3.485e+01 3.664e+01 3.812e+01 4.482e+01, threshold=7.328e+01, percent-clipped=0.0 2023-12-23 14:40:01,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1185146.6666666667, ans=0.125 2023-12-23 14:40:19,455 INFO [train.py:886] (0/4) Epoch 38, batch 1450, loss[loss=0.01259, audio_tagging_loss=0.01259, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4946218.45 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:40:53,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1185480.0, ans=0.05 2023-12-23 14:40:53,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1185480.0, ans=0.125 2023-12-23 14:41:00,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1185546.6666666667, ans=0.0 2023-12-23 14:41:05,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1185546.6666666667, ans=0.2 2023-12-23 14:41:11,594 INFO [train.py:886] (0/4) Epoch 38, batch 1500, loss[loss=0.01173, audio_tagging_loss=0.01173, over 25000.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4953394.92 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:41:12,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.62 vs. limit=15.0 2023-12-23 14:41:15,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1185613.3333333333, ans=0.035 2023-12-23 14:41:15,689 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2023-12-23 14:41:19,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1185613.3333333333, ans=0.95 2023-12-23 14:41:30,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.47 vs. limit=12.0 2023-12-23 14:41:38,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1185746.6666666667, ans=0.1 2023-12-23 14:41:42,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1185813.3333333333, ans=0.125 2023-12-23 14:41:47,044 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.260e+01 3.559e+01 3.727e+01 3.866e+01 4.257e+01, threshold=7.454e+01, percent-clipped=0.0 2023-12-23 14:42:01,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2023-12-23 14:42:03,332 INFO [train.py:886] (0/4) Epoch 38, batch 1550, loss[loss=0.01532, audio_tagging_loss=0.01532, over 24944.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4952215.30 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:42:08,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1185946.6666666667, ans=0.125 2023-12-23 14:42:08,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1185946.6666666667, ans=0.025 2023-12-23 14:42:19,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1186013.3333333333, ans=0.0 2023-12-23 14:42:20,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1186013.3333333333, ans=0.0 2023-12-23 14:42:22,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1186013.3333333333, ans=0.2 2023-12-23 14:42:23,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.26 vs. limit=10.0 2023-12-23 14:42:29,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1186080.0, ans=0.0 2023-12-23 14:42:30,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1186080.0, ans=10.0 2023-12-23 14:42:32,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1186080.0, ans=0.0 2023-12-23 14:42:54,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1186213.3333333333, ans=0.0 2023-12-23 14:42:55,889 INFO [train.py:886] (0/4) Epoch 38, batch 1600, loss[loss=0.01384, audio_tagging_loss=0.01384, over 24750.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4945375.88 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:43:05,928 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.99 vs. limit=22.5 2023-12-23 14:43:09,608 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.61 vs. limit=22.5 2023-12-23 14:43:19,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=15.0 2023-12-23 14:43:30,932 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.275e+01 3.656e+01 3.741e+01 3.949e+01 4.800e+01, threshold=7.482e+01, percent-clipped=0.0 2023-12-23 14:43:43,689 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.08 vs. limit=22.5 2023-12-23 14:43:46,547 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2023-12-23 14:43:47,043 INFO [train.py:886] (0/4) Epoch 38, batch 1650, loss[loss=0.01219, audio_tagging_loss=0.01219, over 24750.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4943087.16 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:43:50,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.83 vs. limit=10.0 2023-12-23 14:44:08,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1186746.6666666667, ans=0.125 2023-12-23 14:44:40,022 INFO [train.py:886] (0/4) Epoch 38, batch 1700, loss[loss=0.01269, audio_tagging_loss=0.01269, over 22176.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4937738.04 frames. ], batch size: 107, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:44:59,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1187013.3333333333, ans=0.07 2023-12-23 14:45:07,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=15.0 2023-12-23 14:45:08,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1187080.0, ans=0.0 2023-12-23 14:45:15,255 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.164e+01 3.542e+01 3.699e+01 3.849e+01 4.948e+01, threshold=7.398e+01, percent-clipped=0.0 2023-12-23 14:45:22,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1187213.3333333333, ans=0.0 2023-12-23 14:45:26,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1187213.3333333333, ans=0.125 2023-12-23 14:45:27,155 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=15.0 2023-12-23 14:45:32,636 INFO [train.py:886] (0/4) Epoch 38, batch 1750, loss[loss=0.01046, audio_tagging_loss=0.01046, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4945934.58 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:45:41,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1187346.6666666667, ans=0.0 2023-12-23 14:45:43,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2023-12-23 14:46:04,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1187480.0, ans=0.0 2023-12-23 14:46:06,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1187480.0, ans=0.0 2023-12-23 14:46:12,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1187546.6666666667, ans=0.125 2023-12-23 14:46:22,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1187613.3333333333, ans=0.125 2023-12-23 14:46:23,373 INFO [train.py:886] (0/4) Epoch 38, batch 1800, loss[loss=0.01151, audio_tagging_loss=0.01151, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4952489.29 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:46:27,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1187613.3333333333, ans=0.0 2023-12-23 14:46:28,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1187613.3333333333, ans=0.0 2023-12-23 14:46:41,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1187680.0, ans=0.125 2023-12-23 14:46:42,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1187680.0, ans=0.125 2023-12-23 14:46:48,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1187746.6666666667, ans=0.2 2023-12-23 14:46:52,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1187746.6666666667, ans=0.125 2023-12-23 14:46:56,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-12-23 14:46:58,332 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.277e+01 3.639e+01 3.766e+01 3.892e+01 4.518e+01, threshold=7.532e+01, percent-clipped=0.0 2023-12-23 14:47:07,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1187880.0, ans=0.2 2023-12-23 14:47:11,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1187880.0, ans=0.125 2023-12-23 14:47:15,567 INFO [train.py:886] (0/4) Epoch 38, batch 1850, loss[loss=0.01519, audio_tagging_loss=0.01519, over 24941.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4950298.08 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:47:15,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1187946.6666666667, ans=0.0 2023-12-23 14:47:16,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1187946.6666666667, ans=0.2 2023-12-23 14:47:36,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1188080.0, ans=0.2 2023-12-23 14:47:45,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1188080.0, ans=0.1 2023-12-23 14:47:51,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1188146.6666666667, ans=0.2 2023-12-23 14:47:53,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1188146.6666666667, ans=0.125 2023-12-23 14:48:06,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1188280.0, ans=0.025 2023-12-23 14:48:07,144 INFO [train.py:886] (0/4) Epoch 38, batch 1900, loss[loss=0.01086, audio_tagging_loss=0.01086, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4944301.41 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:48:11,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1188280.0, ans=0.125 2023-12-23 14:48:16,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1188280.0, ans=0.125 2023-12-23 14:48:18,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1188346.6666666667, ans=0.0 2023-12-23 14:48:24,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.44 vs. limit=22.5 2023-12-23 14:48:43,886 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.314e+01 3.569e+01 3.756e+01 3.902e+01 4.536e+01, threshold=7.513e+01, percent-clipped=0.0 2023-12-23 14:48:46,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1188480.0, ans=0.1 2023-12-23 14:48:59,025 INFO [train.py:886] (0/4) Epoch 38, batch 1950, loss[loss=0.01019, audio_tagging_loss=0.01019, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4938582.60 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:49:04,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-12-23 14:49:16,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=22.5 2023-12-23 14:49:26,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2023-12-23 14:49:27,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1188746.6666666667, ans=0.0 2023-12-23 14:49:29,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.59 vs. limit=15.0 2023-12-23 14:49:51,564 INFO [train.py:886] (0/4) Epoch 38, batch 2000, loss[loss=0.01111, audio_tagging_loss=0.01111, over 25000.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4940987.87 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:49:53,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=15.0 2023-12-23 14:50:23,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1189146.6666666667, ans=0.125 2023-12-23 14:50:23,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=22.30 vs. limit=22.5 2023-12-23 14:50:26,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2023-12-23 14:50:26,860 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.948e+01 3.554e+01 3.702e+01 3.907e+01 4.356e+01, threshold=7.404e+01, percent-clipped=0.0 2023-12-23 14:50:31,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1189213.3333333333, ans=0.2 2023-12-23 14:50:43,015 INFO [train.py:886] (0/4) Epoch 38, batch 2050, loss[loss=0.01192, audio_tagging_loss=0.01192, over 25000.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4946085.31 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:50:53,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1189346.6666666667, ans=0.0 2023-12-23 14:50:54,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1189346.6666666667, ans=0.125 2023-12-23 14:51:03,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1189413.3333333333, ans=0.125 2023-12-23 14:51:06,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1189413.3333333333, ans=0.0 2023-12-23 14:51:35,116 INFO [train.py:886] (0/4) Epoch 38, batch 2100, loss[loss=0.01214, audio_tagging_loss=0.01214, over 24750.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4954713.43 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:51:36,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1189613.3333333333, ans=0.2 2023-12-23 14:51:36,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1189613.3333333333, ans=0.125 2023-12-23 14:51:37,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1189613.3333333333, ans=0.125 2023-12-23 14:51:43,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1189680.0, ans=0.0 2023-12-23 14:51:59,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1189746.6666666667, ans=0.04949747468305833 2023-12-23 14:52:00,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1189746.6666666667, ans=10.0 2023-12-23 14:52:09,628 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.222e+01 3.562e+01 3.709e+01 3.873e+01 4.397e+01, threshold=7.419e+01, percent-clipped=0.0 2023-12-23 14:52:25,599 INFO [train.py:886] (0/4) Epoch 38, batch 2150, loss[loss=0.009223, audio_tagging_loss=0.009223, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4959675.83 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:52:49,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.93 vs. limit=22.5 2023-12-23 14:53:07,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.48 vs. limit=15.0 2023-12-23 14:53:17,050 INFO [train.py:886] (0/4) Epoch 38, batch 2200, loss[loss=0.01356, audio_tagging_loss=0.01356, over 24750.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4947290.21 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:53:19,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1190280.0, ans=0.1 2023-12-23 14:53:26,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1190280.0, ans=0.125 2023-12-23 14:53:27,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1190346.6666666667, ans=0.1 2023-12-23 14:53:48,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1190480.0, ans=0.0 2023-12-23 14:53:51,992 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.264e+01 3.641e+01 3.770e+01 3.908e+01 4.334e+01, threshold=7.540e+01, percent-clipped=0.0 2023-12-23 14:54:09,915 INFO [train.py:886] (0/4) Epoch 38, batch 2250, loss[loss=0.01048, audio_tagging_loss=0.01048, over 25000.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4944546.88 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:54:20,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1190680.0, ans=0.125 2023-12-23 14:54:20,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1190680.0, ans=0.125 2023-12-23 14:54:27,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1190680.0, ans=0.0 2023-12-23 14:54:30,143 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.70 vs. limit=15.0 2023-12-23 14:54:35,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1190746.6666666667, ans=0.1 2023-12-23 14:54:35,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1190746.6666666667, ans=0.125 2023-12-23 14:54:46,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1190813.3333333333, ans=0.125 2023-12-23 14:55:00,347 INFO [train.py:886] (0/4) Epoch 38, batch 2300, loss[loss=0.01321, audio_tagging_loss=0.01321, over 25000.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4950924.38 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:55:00,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1190946.6666666667, ans=0.1 2023-12-23 14:55:11,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1191013.3333333333, ans=0.1 2023-12-23 14:55:33,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=12.0 2023-12-23 14:55:35,744 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.145e+01 3.555e+01 3.723e+01 3.863e+01 4.649e+01, threshold=7.446e+01, percent-clipped=0.0 2023-12-23 14:55:42,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1191213.3333333333, ans=0.125 2023-12-23 14:55:52,321 INFO [train.py:886] (0/4) Epoch 38, batch 2350, loss[loss=0.01219, audio_tagging_loss=0.01219, over 24750.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4953631.23 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:56:01,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1191346.6666666667, ans=0.125 2023-12-23 14:56:04,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.62 vs. limit=15.0 2023-12-23 14:56:12,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1191346.6666666667, ans=0.125 2023-12-23 14:56:20,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1191413.3333333333, ans=0.1 2023-12-23 14:56:25,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1191480.0, ans=0.1 2023-12-23 14:56:44,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1191613.3333333333, ans=0.125 2023-12-23 14:56:44,876 INFO [train.py:886] (0/4) Epoch 38, batch 2400, loss[loss=0.01203, audio_tagging_loss=0.01203, over 25000.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4950459.98 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:56:52,592 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:56:58,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1191680.0, ans=0.0 2023-12-23 14:57:00,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1191680.0, ans=0.0 2023-12-23 14:57:20,630 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.189e+01 3.485e+01 3.668e+01 3.843e+01 4.329e+01, threshold=7.336e+01, percent-clipped=0.0 2023-12-23 14:57:24,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1191813.3333333333, ans=0.125 2023-12-23 14:57:35,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1191946.6666666667, ans=0.2 2023-12-23 14:57:36,114 INFO [train.py:886] (0/4) Epoch 38, batch 2450, loss[loss=0.01369, audio_tagging_loss=0.01369, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4948184.29 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:57:52,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1192013.3333333333, ans=0.04949747468305833 2023-12-23 14:58:08,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1192146.6666666667, ans=0.2 2023-12-23 14:58:13,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1192146.6666666667, ans=0.2 2023-12-23 14:58:13,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1192146.6666666667, ans=0.2 2023-12-23 14:58:24,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1192213.3333333333, ans=0.0 2023-12-23 14:58:25,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1192213.3333333333, ans=0.0 2023-12-23 14:58:29,277 INFO [train.py:886] (0/4) Epoch 38, batch 2500, loss[loss=0.01128, audio_tagging_loss=0.01128, over 21687.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4938140.07 frames. ], batch size: 107, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:58:47,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1192346.6666666667, ans=0.125 2023-12-23 14:59:04,276 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.151e+01 3.642e+01 3.821e+01 3.975e+01 4.588e+01, threshold=7.642e+01, percent-clipped=0.0 2023-12-23 14:59:19,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1192613.3333333333, ans=0.0 2023-12-23 14:59:20,245 INFO [train.py:886] (0/4) Epoch 38, batch 2550, loss[loss=0.01217, audio_tagging_loss=0.01217, over 24750.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4940754.88 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:59:22,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1192613.3333333333, ans=0.2 2023-12-23 14:59:23,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1192613.3333333333, ans=0.125 2023-12-23 14:59:24,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1192613.3333333333, ans=0.125 2023-12-23 14:59:35,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1192680.0, ans=0.1 2023-12-23 14:59:45,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1192746.6666666667, ans=0.05 2023-12-23 15:00:12,991 INFO [train.py:886] (0/4) Epoch 38, batch 2600, loss[loss=0.0116, audio_tagging_loss=0.0116, over 24750.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4937016.57 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 15:00:21,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1192946.6666666667, ans=0.125 2023-12-23 15:00:21,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1192946.6666666667, ans=0.0 2023-12-23 15:00:28,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1193013.3333333333, ans=10.0 2023-12-23 15:00:34,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1193080.0, ans=0.035 2023-12-23 15:00:47,977 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.256e+01 3.573e+01 3.736e+01 3.903e+01 5.523e+01, threshold=7.472e+01, percent-clipped=0.0 2023-12-23 15:00:52,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1193146.6666666667, ans=0.0 2023-12-23 15:01:05,288 INFO [train.py:886] (0/4) Epoch 38, batch 2650, loss[loss=0.01074, audio_tagging_loss=0.01074, over 25000.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4938273.42 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 15:01:10,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1193280.0, ans=0.125 2023-12-23 15:01:16,816 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:01:19,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1193346.6666666667, ans=0.125 2023-12-23 15:01:23,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-12-23 15:01:39,660 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:01:56,559 INFO [train.py:886] (0/4) Epoch 38, batch 2700, loss[loss=0.01319, audio_tagging_loss=0.01319, over 25000.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4946350.29 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 15:01:58,744 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.97 vs. limit=15.0 2023-12-23 15:02:12,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.74 vs. limit=15.0 2023-12-23 15:02:14,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2023-12-23 15:02:21,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1193746.6666666667, ans=0.0 2023-12-23 15:02:25,752 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.17 vs. limit=22.5 2023-12-23 15:02:32,429 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.252e+01 3.575e+01 3.684e+01 3.818e+01 4.506e+01, threshold=7.367e+01, percent-clipped=0.0 2023-12-23 15:02:42,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1193880.0, ans=0.125 2023-12-23 15:02:45,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1193880.0, ans=0.0 2023-12-23 15:02:46,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1193880.0, ans=0.125 2023-12-23 15:02:49,122 INFO [train.py:886] (0/4) Epoch 38, batch 2750, loss[loss=0.01036, audio_tagging_loss=0.01036, over 25000.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4943387.70 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:02:53,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1193946.6666666667, ans=0.0 2023-12-23 15:03:14,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2023-12-23 15:03:39,936 INFO [train.py:886] (0/4) Epoch 38, batch 2800, loss[loss=0.01279, audio_tagging_loss=0.01279, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4947184.87 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:03:57,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1194346.6666666667, ans=0.2 2023-12-23 15:04:05,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1194413.3333333333, ans=0.0 2023-12-23 15:04:12,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=15.0 2023-12-23 15:04:15,984 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.366e+01 3.677e+01 3.839e+01 3.970e+01 4.490e+01, threshold=7.678e+01, percent-clipped=0.0 2023-12-23 15:04:19,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1194480.0, ans=0.1 2023-12-23 15:04:22,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1194546.6666666667, ans=0.0 2023-12-23 15:04:26,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1194546.6666666667, ans=0.07 2023-12-23 15:04:30,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2023-12-23 15:04:31,023 INFO [train.py:886] (0/4) Epoch 38, batch 2850, loss[loss=0.01213, audio_tagging_loss=0.01213, over 24750.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4949501.50 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:04:31,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1194613.3333333333, ans=0.125 2023-12-23 15:04:32,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.70 vs. limit=15.0 2023-12-23 15:04:34,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1194613.3333333333, ans=0.1 2023-12-23 15:04:34,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1194613.3333333333, ans=0.125 2023-12-23 15:04:38,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1194613.3333333333, ans=0.0 2023-12-23 15:04:40,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1194680.0, ans=0.0 2023-12-23 15:04:52,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1194746.6666666667, ans=0.125 2023-12-23 15:04:55,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1194746.6666666667, ans=0.0 2023-12-23 15:05:14,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1194880.0, ans=0.125 2023-12-23 15:05:18,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1194880.0, ans=0.1 2023-12-23 15:05:19,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1194880.0, ans=0.125 2023-12-23 15:05:23,367 INFO [train.py:886] (0/4) Epoch 38, batch 2900, loss[loss=0.01427, audio_tagging_loss=0.01427, over 25000.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4950361.86 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:05:30,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1194946.6666666667, ans=0.0 2023-12-23 15:05:30,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1194946.6666666667, ans=0.0 2023-12-23 15:05:30,619 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.74 vs. limit=22.5 2023-12-23 15:05:30,694 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.26 vs. limit=15.0 2023-12-23 15:05:32,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1195013.3333333333, ans=0.125 2023-12-23 15:05:37,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1195013.3333333333, ans=0.125 2023-12-23 15:05:51,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2023-12-23 15:05:53,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1195146.6666666667, ans=0.04949747468305833 2023-12-23 15:05:58,809 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.116e+01 3.532e+01 3.702e+01 3.923e+01 4.475e+01, threshold=7.405e+01, percent-clipped=0.0 2023-12-23 15:06:14,378 INFO [train.py:886] (0/4) Epoch 38, batch 2950, loss[loss=0.01154, audio_tagging_loss=0.01154, over 25000.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4948600.26 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:06:34,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1195413.3333333333, ans=0.2 2023-12-23 15:06:34,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.42 vs. limit=15.0 2023-12-23 15:06:36,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2023-12-23 15:06:47,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=15.0 2023-12-23 15:06:48,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1195480.0, ans=0.125 2023-12-23 15:06:50,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1195480.0, ans=0.125 2023-12-23 15:06:55,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1195546.6666666667, ans=0.0 2023-12-23 15:07:04,608 INFO [train.py:886] (0/4) Epoch 38, batch 3000, loss[loss=0.009429, audio_tagging_loss=0.009429, over 25000.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4949172.27 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:07:04,610 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 15:07:14,591 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6336, 3.0631, 4.1906, 3.8392], device='cuda:0') 2023-12-23 15:07:19,000 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7527, 5.9238, 5.3522, 5.6379], device='cuda:0') 2023-12-23 15:07:25,413 INFO [train.py:917] (0/4) Epoch 38, validation: loss=0.03488, audio_tagging_loss=0.03488, over 3737520.00 frames. 2023-12-23 15:07:25,414 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 15:07:34,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1195680.0, ans=0.0 2023-12-23 15:08:01,317 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.200e+01 3.533e+01 3.703e+01 3.872e+01 4.647e+01, threshold=7.406e+01, percent-clipped=0.0 2023-12-23 15:08:16,280 INFO [train.py:886] (0/4) Epoch 38, batch 3050, loss[loss=0.0115, audio_tagging_loss=0.0115, over 25000.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4954055.56 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:08:20,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1195946.6666666667, ans=0.0 2023-12-23 15:08:23,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2023-12-23 15:08:31,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1196013.3333333333, ans=0.07 2023-12-23 15:08:34,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1196013.3333333333, ans=0.1 2023-12-23 15:08:47,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1196146.6666666667, ans=0.125 2023-12-23 15:08:49,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1196146.6666666667, ans=0.0 2023-12-23 15:08:51,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1196146.6666666667, ans=0.1 2023-12-23 15:09:01,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1196213.3333333333, ans=0.125 2023-12-23 15:09:08,025 INFO [train.py:886] (0/4) Epoch 38, batch 3100, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4954441.06 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:09:21,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1196346.6666666667, ans=0.2 2023-12-23 15:09:22,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1196346.6666666667, ans=0.0 2023-12-23 15:09:43,190 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.051e+01 3.615e+01 3.756e+01 3.923e+01 4.234e+01, threshold=7.513e+01, percent-clipped=0.0 2023-12-23 15:09:50,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1196546.6666666667, ans=0.125 2023-12-23 15:09:53,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=15.0 2023-12-23 15:10:00,193 INFO [train.py:886] (0/4) Epoch 38, batch 3150, loss[loss=0.01409, audio_tagging_loss=0.01409, over 25000.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4953186.09 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:10:05,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1196613.3333333333, ans=0.125 2023-12-23 15:10:14,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1196680.0, ans=0.125 2023-12-23 15:10:19,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1196746.6666666667, ans=0.0 2023-12-23 15:10:41,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1196880.0, ans=0.125 2023-12-23 15:10:44,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1196880.0, ans=0.125 2023-12-23 15:10:44,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1196880.0, ans=0.125 2023-12-23 15:10:49,008 INFO [train.py:886] (0/4) Epoch 38, batch 3200, loss[loss=0.009534, audio_tagging_loss=0.009534, over 24750.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4950724.97 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:10:58,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1197013.3333333333, ans=0.1 2023-12-23 15:11:01,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.12 vs. limit=15.0 2023-12-23 15:11:06,150 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.80 vs. limit=22.5 2023-12-23 15:11:10,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1197080.0, ans=22.5 2023-12-23 15:11:11,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1197080.0, ans=0.2 2023-12-23 15:11:23,714 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.251e+01 3.566e+01 3.766e+01 3.939e+01 4.413e+01, threshold=7.532e+01, percent-clipped=0.0 2023-12-23 15:11:40,197 INFO [train.py:886] (0/4) Epoch 38, batch 3250, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24750.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4951684.92 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:12:18,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1197480.0, ans=0.125 2023-12-23 15:12:30,371 INFO [train.py:886] (0/4) Epoch 38, batch 3300, loss[loss=0.01271, audio_tagging_loss=0.01271, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4951635.99 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:12:48,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1197680.0, ans=0.125 2023-12-23 15:13:07,166 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.087e+01 3.537e+01 3.698e+01 3.881e+01 4.434e+01, threshold=7.396e+01, percent-clipped=0.0 2023-12-23 15:13:13,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1197880.0, ans=0.125 2023-12-23 15:13:21,920 INFO [train.py:886] (0/4) Epoch 38, batch 3350, loss[loss=0.01055, audio_tagging_loss=0.01055, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4955186.29 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:13:23,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1197946.6666666667, ans=0.125 2023-12-23 15:13:37,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1198013.3333333333, ans=0.125 2023-12-23 15:13:40,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.85 vs. limit=15.0 2023-12-23 15:13:41,017 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=12.0 2023-12-23 15:13:46,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1198080.0, ans=0.125 2023-12-23 15:13:53,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1198146.6666666667, ans=0.0 2023-12-23 15:14:12,754 INFO [train.py:886] (0/4) Epoch 38, batch 3400, loss[loss=0.01308, audio_tagging_loss=0.01308, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4963231.26 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:14:16,057 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:14:22,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1198346.6666666667, ans=0.125 2023-12-23 15:14:26,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.84 vs. limit=10.0 2023-12-23 15:14:29,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.57 vs. limit=10.0 2023-12-23 15:14:30,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1198346.6666666667, ans=0.125 2023-12-23 15:14:34,360 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.19 vs. limit=22.5 2023-12-23 15:14:37,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1198413.3333333333, ans=0.125 2023-12-23 15:14:38,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1198413.3333333333, ans=0.1 2023-12-23 15:14:48,199 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.386e+01 3.674e+01 3.823e+01 3.991e+01 4.333e+01, threshold=7.645e+01, percent-clipped=0.0 2023-12-23 15:14:50,353 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:14:58,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1198546.6666666667, ans=0.125 2023-12-23 15:14:59,833 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:15:00,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1198546.6666666667, ans=0.0 2023-12-23 15:15:02,398 INFO [train.py:886] (0/4) Epoch 38, batch 3450, loss[loss=0.009943, audio_tagging_loss=0.009943, over 22047.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4959009.76 frames. ], batch size: 107, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:15:05,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1198613.3333333333, ans=0.0 2023-12-23 15:15:11,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2023-12-23 15:15:26,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1198746.6666666667, ans=0.2 2023-12-23 15:15:29,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1198746.6666666667, ans=0.125 2023-12-23 15:15:54,829 INFO [train.py:886] (0/4) Epoch 38, batch 3500, loss[loss=0.01237, audio_tagging_loss=0.01237, over 24750.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4953237.41 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:16:07,552 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2023-12-23 15:16:10,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1199013.3333333333, ans=0.125 2023-12-23 15:16:20,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1199080.0, ans=0.09899494936611666 2023-12-23 15:16:25,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1199146.6666666667, ans=0.0 2023-12-23 15:16:30,765 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.139e+01 3.551e+01 3.681e+01 3.822e+01 4.366e+01, threshold=7.362e+01, percent-clipped=0.0 2023-12-23 15:16:36,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.67 vs. limit=15.0 2023-12-23 15:16:45,621 INFO [train.py:886] (0/4) Epoch 38, batch 3550, loss[loss=0.01199, audio_tagging_loss=0.01199, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4953587.34 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:16:58,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=1199346.6666666667, ans=0.02 2023-12-23 15:16:59,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1199346.6666666667, ans=0.125 2023-12-23 15:17:14,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.67 vs. limit=6.0 2023-12-23 15:17:15,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1199480.0, ans=0.125 2023-12-23 15:17:16,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1199480.0, ans=0.125 2023-12-23 15:17:22,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1199480.0, ans=0.125 2023-12-23 15:17:27,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-12-23 15:17:35,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-12-23 15:17:37,475 INFO [train.py:886] (0/4) Epoch 38, batch 3600, loss[loss=0.01112, audio_tagging_loss=0.01112, over 25000.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4952908.00 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:18:14,460 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.264e+01 3.519e+01 3.755e+01 3.929e+01 4.573e+01, threshold=7.510e+01, percent-clipped=0.0 2023-12-23 15:18:16,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1199813.3333333333, ans=0.0 2023-12-23 15:18:21,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1199880.0, ans=0.125 2023-12-23 15:18:29,752 INFO [train.py:886] (0/4) Epoch 38, batch 3650, loss[loss=0.01116, audio_tagging_loss=0.01116, over 25000.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4952774.28 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:18:35,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2023-12-23 15:18:36,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1199946.6666666667, ans=0.0 2023-12-23 15:18:37,294 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-180000.pt 2023-12-23 15:18:53,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1200080.0, ans=0.125 2023-12-23 15:18:57,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.73 vs. limit=22.5 2023-12-23 15:19:02,461 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:19:22,997 INFO [train.py:886] (0/4) Epoch 38, batch 3700, loss[loss=0.01097, audio_tagging_loss=0.01097, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4958225.16 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:19:33,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1200346.6666666667, ans=0.125 2023-12-23 15:19:51,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1200413.3333333333, ans=0.0 2023-12-23 15:20:00,557 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.223e+01 3.534e+01 3.731e+01 3.943e+01 4.301e+01, threshold=7.462e+01, percent-clipped=0.0 2023-12-23 15:20:15,137 INFO [train.py:886] (0/4) Epoch 38, batch 3750, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4954391.93 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:20:31,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1200680.0, ans=0.0 2023-12-23 15:20:33,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1200680.0, ans=0.1 2023-12-23 15:20:36,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1200746.6666666667, ans=0.1 2023-12-23 15:21:06,193 INFO [train.py:886] (0/4) Epoch 38, batch 3800, loss[loss=0.01344, audio_tagging_loss=0.01344, over 24750.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4948236.81 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:21:07,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1200946.6666666667, ans=0.125 2023-12-23 15:21:22,182 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:21:36,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2023-12-23 15:21:43,258 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.264e+01 3.611e+01 3.778e+01 3.951e+01 5.109e+01, threshold=7.557e+01, percent-clipped=0.0 2023-12-23 15:21:55,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1201213.3333333333, ans=0.125 2023-12-23 15:21:57,330 INFO [train.py:886] (0/4) Epoch 38, batch 3850, loss[loss=0.009748, audio_tagging_loss=0.009748, over 24750.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4946016.14 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:22:19,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1201413.3333333333, ans=0.125 2023-12-23 15:22:48,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1201546.6666666667, ans=0.0 2023-12-23 15:22:48,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1201546.6666666667, ans=0.0 2023-12-23 15:22:49,999 INFO [train.py:886] (0/4) Epoch 38, batch 3900, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4947119.72 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:22:51,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1201613.3333333333, ans=0.04949747468305833 2023-12-23 15:23:24,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1201813.3333333333, ans=0.125 2023-12-23 15:23:26,945 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.160e+01 3.573e+01 3.725e+01 3.871e+01 4.594e+01, threshold=7.451e+01, percent-clipped=0.0 2023-12-23 15:23:29,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1201813.3333333333, ans=0.125 2023-12-23 15:23:31,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-12-23 15:23:34,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1201880.0, ans=0.95 2023-12-23 15:23:41,548 INFO [train.py:886] (0/4) Epoch 38, batch 3950, loss[loss=0.01228, audio_tagging_loss=0.01228, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4947204.37 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:23:54,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1202013.3333333333, ans=0.0 2023-12-23 15:24:10,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1202080.0, ans=0.0 2023-12-23 15:24:16,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1202146.6666666667, ans=0.125 2023-12-23 15:24:22,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1202213.3333333333, ans=0.125 2023-12-23 15:24:33,208 INFO [train.py:886] (0/4) Epoch 38, batch 4000, loss[loss=0.01263, audio_tagging_loss=0.01263, over 25000.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4953388.90 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:24:35,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1202280.0, ans=0.125 2023-12-23 15:24:40,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1202280.0, ans=0.0 2023-12-23 15:24:45,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1202346.6666666667, ans=0.0 2023-12-23 15:24:48,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2023-12-23 15:24:59,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1202413.3333333333, ans=0.0 2023-12-23 15:25:07,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1202480.0, ans=0.125 2023-12-23 15:25:10,931 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.209e+01 3.581e+01 3.725e+01 3.897e+01 4.371e+01, threshold=7.451e+01, percent-clipped=0.0 2023-12-23 15:25:19,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1202546.6666666667, ans=0.0 2023-12-23 15:25:26,268 INFO [train.py:886] (0/4) Epoch 38, batch 4050, loss[loss=0.01248, audio_tagging_loss=0.01248, over 24750.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4956369.87 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:25:29,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1202613.3333333333, ans=0.0 2023-12-23 15:25:47,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1202746.6666666667, ans=0.125 2023-12-23 15:26:05,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1202813.3333333333, ans=0.1 2023-12-23 15:26:15,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2023-12-23 15:26:16,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1202946.6666666667, ans=0.125 2023-12-23 15:26:16,818 INFO [train.py:886] (0/4) Epoch 38, batch 4100, loss[loss=0.01051, audio_tagging_loss=0.01051, over 24750.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4952496.20 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:26:39,042 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2023-12-23 15:26:53,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1203146.6666666667, ans=0.2 2023-12-23 15:26:53,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1203146.6666666667, ans=0.1 2023-12-23 15:26:54,190 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.258e+01 3.608e+01 3.845e+01 3.998e+01 4.535e+01, threshold=7.691e+01, percent-clipped=0.0 2023-12-23 15:27:02,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1203213.3333333333, ans=0.125 2023-12-23 15:27:06,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1203213.3333333333, ans=0.1 2023-12-23 15:27:09,030 INFO [train.py:886] (0/4) Epoch 38, batch 4150, loss[loss=0.01109, audio_tagging_loss=0.01109, over 25000.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4947051.05 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:27:10,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1203280.0, ans=0.125 2023-12-23 15:27:11,192 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:27:11,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1203280.0, ans=0.0 2023-12-23 15:28:00,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=12.0 2023-12-23 15:28:01,529 INFO [train.py:886] (0/4) Epoch 38, batch 4200, loss[loss=0.01173, audio_tagging_loss=0.01173, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4943100.88 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:28:10,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1203680.0, ans=0.125 2023-12-23 15:28:25,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1203746.6666666667, ans=0.125 2023-12-23 15:28:28,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1203746.6666666667, ans=0.125 2023-12-23 15:28:37,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2023-12-23 15:28:39,421 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.229e+01 3.592e+01 3.742e+01 3.876e+01 4.221e+01, threshold=7.484e+01, percent-clipped=0.0 2023-12-23 15:28:40,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1203813.3333333333, ans=0.0 2023-12-23 15:28:42,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1203880.0, ans=0.125 2023-12-23 15:28:52,644 INFO [train.py:886] (0/4) Epoch 38, batch 4250, loss[loss=0.01038, audio_tagging_loss=0.01038, over 25000.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4944932.29 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:28:52,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1203946.6666666667, ans=0.2 2023-12-23 15:29:08,825 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.77 vs. limit=15.0 2023-12-23 15:29:14,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1204080.0, ans=15.0 2023-12-23 15:29:23,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1204146.6666666667, ans=0.07 2023-12-23 15:29:26,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1204146.6666666667, ans=0.125 2023-12-23 15:29:27,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1204146.6666666667, ans=15.0 2023-12-23 15:29:45,925 INFO [train.py:886] (0/4) Epoch 38, batch 4300, loss[loss=0.01213, audio_tagging_loss=0.01213, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4947140.48 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:29:47,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=12.0 2023-12-23 15:29:49,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1204280.0, ans=0.2 2023-12-23 15:29:50,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1204280.0, ans=0.125 2023-12-23 15:29:54,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1204346.6666666667, ans=0.1 2023-12-23 15:29:58,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1204346.6666666667, ans=0.0 2023-12-23 15:30:21,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1204480.0, ans=0.125 2023-12-23 15:30:22,707 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.297e+01 3.587e+01 3.741e+01 3.975e+01 4.489e+01, threshold=7.482e+01, percent-clipped=0.0 2023-12-23 15:30:27,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1204546.6666666667, ans=0.1 2023-12-23 15:30:34,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1204546.6666666667, ans=0.125 2023-12-23 15:30:35,952 INFO [train.py:886] (0/4) Epoch 38, batch 4350, loss[loss=0.01082, audio_tagging_loss=0.01082, over 24750.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4944937.34 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:30:36,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.97 vs. limit=22.5 2023-12-23 15:30:38,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2023-12-23 15:30:45,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1204613.3333333333, ans=10.0 2023-12-23 15:30:47,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1204680.0, ans=0.0 2023-12-23 15:30:48,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=15.0 2023-12-23 15:30:57,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1204746.6666666667, ans=0.0 2023-12-23 15:31:02,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1204746.6666666667, ans=0.125 2023-12-23 15:31:03,436 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:31:14,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1204813.3333333333, ans=0.1 2023-12-23 15:31:20,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1204880.0, ans=0.125 2023-12-23 15:31:28,703 INFO [train.py:886] (0/4) Epoch 38, batch 4400, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4945619.42 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:31:40,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1205013.3333333333, ans=0.125 2023-12-23 15:31:40,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1205013.3333333333, ans=0.125 2023-12-23 15:32:01,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1205146.6666666667, ans=0.025 2023-12-23 15:32:05,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1205146.6666666667, ans=0.125 2023-12-23 15:32:05,843 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.267e+01 3.555e+01 3.761e+01 3.984e+01 4.470e+01, threshold=7.522e+01, percent-clipped=0.0 2023-12-23 15:32:21,250 INFO [train.py:886] (0/4) Epoch 38, batch 4450, loss[loss=0.01089, audio_tagging_loss=0.01089, over 25000.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4937235.15 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:32:46,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1205413.3333333333, ans=0.0 2023-12-23 15:32:53,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1205480.0, ans=0.125 2023-12-23 15:32:53,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1205480.0, ans=0.125 2023-12-23 15:32:54,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1205480.0, ans=0.125 2023-12-23 15:32:58,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1205480.0, ans=0.1 2023-12-23 15:33:11,830 INFO [train.py:886] (0/4) Epoch 38, batch 4500, loss[loss=0.01152, audio_tagging_loss=0.01152, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4933148.50 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:33:12,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1205613.3333333333, ans=0.125 2023-12-23 15:33:27,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1205680.0, ans=0.125 2023-12-23 15:33:36,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2023-12-23 15:33:41,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1205746.6666666667, ans=0.125 2023-12-23 15:33:49,547 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.168e+01 3.607e+01 3.784e+01 3.969e+01 5.476e+01, threshold=7.568e+01, percent-clipped=0.0 2023-12-23 15:33:53,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1205880.0, ans=0.125 2023-12-23 15:33:53,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1205880.0, ans=0.1 2023-12-23 15:33:54,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1205880.0, ans=0.2 2023-12-23 15:34:04,992 INFO [train.py:886] (0/4) Epoch 38, batch 4550, loss[loss=0.0101, audio_tagging_loss=0.0101, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4936046.23 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:34:18,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2023-12-23 15:34:24,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2023-12-23 15:34:25,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1206080.0, ans=0.125 2023-12-23 15:34:30,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.70 vs. limit=8.0 2023-12-23 15:34:31,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1206080.0, ans=0.0 2023-12-23 15:34:32,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1206080.0, ans=0.2 2023-12-23 15:34:55,728 INFO [train.py:886] (0/4) Epoch 38, batch 4600, loss[loss=0.01457, audio_tagging_loss=0.01457, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4940676.09 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:35:03,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1206280.0, ans=0.0 2023-12-23 15:35:06,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=12.0 2023-12-23 15:35:06,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1206346.6666666667, ans=0.125 2023-12-23 15:35:07,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1206346.6666666667, ans=0.125 2023-12-23 15:35:17,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1206413.3333333333, ans=0.1 2023-12-23 15:35:29,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1206480.0, ans=0.125 2023-12-23 15:35:30,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2023-12-23 15:35:32,886 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.218e+01 3.570e+01 3.726e+01 3.915e+01 4.554e+01, threshold=7.452e+01, percent-clipped=0.0 2023-12-23 15:35:34,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1206480.0, ans=0.125 2023-12-23 15:35:39,123 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2023-12-23 15:35:46,156 INFO [train.py:886] (0/4) Epoch 38, batch 4650, loss[loss=0.01187, audio_tagging_loss=0.01187, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4938104.01 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:35:49,353 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.078e-02 2023-12-23 15:35:51,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1206613.3333333333, ans=0.04949747468305833 2023-12-23 15:35:52,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1206613.3333333333, ans=0.125 2023-12-23 15:35:56,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1206680.0, ans=0.125 2023-12-23 15:36:07,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1206746.6666666667, ans=0.0 2023-12-23 15:36:10,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1206746.6666666667, ans=0.125 2023-12-23 15:36:16,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1206813.3333333333, ans=0.0 2023-12-23 15:36:36,151 INFO [train.py:886] (0/4) Epoch 38, batch 4700, loss[loss=0.01679, audio_tagging_loss=0.01679, over 24944.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4942220.19 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:36:39,119 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.349e-02 2023-12-23 15:36:39,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1206946.6666666667, ans=0.2 2023-12-23 15:36:58,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.71 vs. limit=22.5 2023-12-23 15:37:00,030 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:37:07,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1207146.6666666667, ans=0.1 2023-12-23 15:37:10,711 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.408e+01 3.692e+01 3.810e+01 4.007e+01 4.373e+01, threshold=7.619e+01, percent-clipped=0.0 2023-12-23 15:37:23,440 INFO [train.py:886] (0/4) Epoch 38, batch 4750, loss[loss=0.01065, audio_tagging_loss=0.01065, over 24750.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4940476.60 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:37:28,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2023-12-23 15:37:34,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1207346.6666666667, ans=0.125 2023-12-23 15:37:38,544 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-38.pt 2023-12-23 15:37:57,298 INFO [train.py:886] (0/4) Epoch 39, batch 0, loss[loss=0.02633, audio_tagging_loss=0.02633, over 25000.00 frames. ], tot_loss[loss=0.02633, audio_tagging_loss=0.02633, over 25000.00 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:37:57,299 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 15:38:17,202 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6802, 2.7627, 3.5993, 3.7269], device='cuda:0') 2023-12-23 15:38:17,958 INFO [train.py:917] (0/4) Epoch 39, validation: loss=0.03421, audio_tagging_loss=0.03421, over 3737520.00 frames. 2023-12-23 15:38:17,958 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 15:38:30,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1207453.3333333333, ans=0.125 2023-12-23 15:38:49,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1207586.6666666667, ans=0.2 2023-12-23 15:38:53,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1207586.6666666667, ans=0.0 2023-12-23 15:38:53,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1207586.6666666667, ans=0.125 2023-12-23 15:38:54,342 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.78 vs. limit=6.0 2023-12-23 15:38:55,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1207586.6666666667, ans=0.125 2023-12-23 15:39:00,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1207653.3333333333, ans=0.2 2023-12-23 15:39:10,795 INFO [train.py:886] (0/4) Epoch 39, batch 50, loss[loss=0.01807, audio_tagging_loss=0.01807, over 25000.00 frames. ], tot_loss[loss=0.01872, audio_tagging_loss=0.01872, over 1104729.50 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:39:13,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2023-12-23 15:39:18,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1207720.0, ans=0.125 2023-12-23 15:39:30,443 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.430e+01 4.003e+01 4.538e+01 5.178e+01 1.091e+02, threshold=9.075e+01, percent-clipped=8.0 2023-12-23 15:39:59,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1207986.6666666667, ans=0.1 2023-12-23 15:40:01,767 INFO [train.py:886] (0/4) Epoch 39, batch 100, loss[loss=0.01192, audio_tagging_loss=0.01192, over 25000.00 frames. ], tot_loss[loss=0.01624, audio_tagging_loss=0.01624, over 1967185.27 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:40:12,810 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:40:18,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2023-12-23 15:40:27,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1208186.6666666667, ans=0.125 2023-12-23 15:40:53,365 INFO [train.py:886] (0/4) Epoch 39, batch 150, loss[loss=0.01403, audio_tagging_loss=0.01403, over 25000.00 frames. ], tot_loss[loss=0.01479, audio_tagging_loss=0.01479, over 2627500.16 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:40:55,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1208386.6666666667, ans=0.125 2023-12-23 15:40:57,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1208386.6666666667, ans=0.0 2023-12-23 15:41:00,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1208386.6666666667, ans=0.0 2023-12-23 15:41:14,332 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.383e+01 3.761e+01 3.990e+01 4.223e+01 5.067e+01, threshold=7.980e+01, percent-clipped=0.0 2023-12-23 15:41:28,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1208586.6666666667, ans=10.0 2023-12-23 15:41:29,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2023-12-23 15:41:44,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1208720.0, ans=0.0 2023-12-23 15:41:45,771 INFO [train.py:886] (0/4) Epoch 39, batch 200, loss[loss=0.01109, audio_tagging_loss=0.01109, over 25000.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 3138021.42 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:41:52,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2023-12-23 15:42:18,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1208920.0, ans=0.125 2023-12-23 15:42:32,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.28 vs. limit=22.5 2023-12-23 15:42:36,433 INFO [train.py:886] (0/4) Epoch 39, batch 250, loss[loss=0.01505, audio_tagging_loss=0.01505, over 24036.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 3547816.93 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:42:42,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1209053.3333333333, ans=0.95 2023-12-23 15:42:50,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1209120.0, ans=0.0 2023-12-23 15:42:51,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.05 vs. limit=10.0 2023-12-23 15:42:51,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1209120.0, ans=0.2 2023-12-23 15:42:57,483 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.324e+01 3.633e+01 3.790e+01 3.994e+01 4.386e+01, threshold=7.580e+01, percent-clipped=0.0 2023-12-23 15:42:58,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1209186.6666666667, ans=0.125 2023-12-23 15:43:02,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1209186.6666666667, ans=0.0 2023-12-23 15:43:16,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1209253.3333333333, ans=0.125 2023-12-23 15:43:17,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=15.0 2023-12-23 15:43:24,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=1209320.0, ans=0.02 2023-12-23 15:43:27,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1209386.6666666667, ans=0.125 2023-12-23 15:43:28,120 INFO [train.py:886] (0/4) Epoch 39, batch 300, loss[loss=0.01664, audio_tagging_loss=0.01664, over 24750.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 3854570.28 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:43:30,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1209386.6666666667, ans=0.2 2023-12-23 15:44:06,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1209586.6666666667, ans=0.2 2023-12-23 15:44:16,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1209653.3333333333, ans=0.0 2023-12-23 15:44:17,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1209653.3333333333, ans=0.125 2023-12-23 15:44:19,128 INFO [train.py:886] (0/4) Epoch 39, batch 350, loss[loss=0.01091, audio_tagging_loss=0.01091, over 25000.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4096022.91 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:44:20,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1209720.0, ans=0.0 2023-12-23 15:44:22,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1209720.0, ans=0.1 2023-12-23 15:44:26,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1209720.0, ans=0.0 2023-12-23 15:44:27,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1209720.0, ans=0.2 2023-12-23 15:44:39,348 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.294e+01 3.641e+01 3.787e+01 3.947e+01 4.798e+01, threshold=7.575e+01, percent-clipped=0.0 2023-12-23 15:44:43,572 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.42 vs. limit=15.0 2023-12-23 15:44:55,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1209920.0, ans=0.025 2023-12-23 15:44:56,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1209920.0, ans=0.1 2023-12-23 15:45:02,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1209986.6666666667, ans=0.1 2023-12-23 15:45:09,623 INFO [train.py:886] (0/4) Epoch 39, batch 400, loss[loss=0.0106, audio_tagging_loss=0.0106, over 22627.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4278939.08 frames. ], batch size: 107, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:45:11,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1210053.3333333333, ans=0.05 2023-12-23 15:45:13,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1210053.3333333333, ans=0.2 2023-12-23 15:45:22,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1210120.0, ans=0.0 2023-12-23 15:45:37,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1210186.6666666667, ans=0.0 2023-12-23 15:46:00,629 INFO [train.py:886] (0/4) Epoch 39, batch 450, loss[loss=0.01355, audio_tagging_loss=0.01355, over 24918.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4426751.61 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:46:06,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1210386.6666666667, ans=0.125 2023-12-23 15:46:07,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1210386.6666666667, ans=0.125 2023-12-23 15:46:17,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2023-12-23 15:46:20,292 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.320e+01 3.614e+01 3.726e+01 3.946e+01 4.381e+01, threshold=7.452e+01, percent-clipped=0.0 2023-12-23 15:46:37,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.90 vs. limit=22.5 2023-12-23 15:46:51,855 INFO [train.py:886] (0/4) Epoch 39, batch 500, loss[loss=0.01126, audio_tagging_loss=0.01126, over 25000.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4544887.90 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:47:11,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1210853.3333333333, ans=0.125 2023-12-23 15:47:25,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-12-23 15:47:39,650 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.70 vs. limit=15.0 2023-12-23 15:47:40,414 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2023-12-23 15:47:43,609 INFO [train.py:886] (0/4) Epoch 39, batch 550, loss[loss=0.01261, audio_tagging_loss=0.01261, over 25000.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4639233.37 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:47:47,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1211053.3333333333, ans=0.125 2023-12-23 15:48:03,901 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.249e+01 3.629e+01 3.792e+01 3.921e+01 4.570e+01, threshold=7.585e+01, percent-clipped=0.0 2023-12-23 15:48:35,003 INFO [train.py:886] (0/4) Epoch 39, batch 600, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4711890.88 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:48:35,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2023-12-23 15:48:37,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1211386.6666666667, ans=0.125 2023-12-23 15:48:45,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1211453.3333333333, ans=0.125 2023-12-23 15:48:50,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1211453.3333333333, ans=0.125 2023-12-23 15:48:56,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1211520.0, ans=0.125 2023-12-23 15:49:14,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1211586.6666666667, ans=0.1 2023-12-23 15:49:19,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1211653.3333333333, ans=0.1 2023-12-23 15:49:25,994 INFO [train.py:886] (0/4) Epoch 39, batch 650, loss[loss=0.01268, audio_tagging_loss=0.01268, over 24750.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4762848.76 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:49:38,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1211786.6666666667, ans=0.0 2023-12-23 15:49:43,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1211786.6666666667, ans=0.125 2023-12-23 15:49:47,856 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.262e+01 3.662e+01 3.873e+01 3.983e+01 4.612e+01, threshold=7.746e+01, percent-clipped=0.0 2023-12-23 15:50:04,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2023-12-23 15:50:09,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1211986.6666666667, ans=0.125 2023-12-23 15:50:16,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2023-12-23 15:50:19,306 INFO [train.py:886] (0/4) Epoch 39, batch 700, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4804246.73 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:50:27,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1212053.3333333333, ans=0.0 2023-12-23 15:50:43,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1212186.6666666667, ans=0.0 2023-12-23 15:50:50,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1212253.3333333333, ans=0.2 2023-12-23 15:50:52,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.89 vs. limit=8.0 2023-12-23 15:51:01,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1212320.0, ans=0.0 2023-12-23 15:51:10,665 INFO [train.py:886] (0/4) Epoch 39, batch 750, loss[loss=0.01127, audio_tagging_loss=0.01127, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4836849.84 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:51:13,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2023-12-23 15:51:20,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1212453.3333333333, ans=0.2 2023-12-23 15:51:31,117 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.158e+01 3.606e+01 3.797e+01 3.983e+01 4.805e+01, threshold=7.593e+01, percent-clipped=0.0 2023-12-23 15:51:55,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1212653.3333333333, ans=0.2 2023-12-23 15:52:00,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1212653.3333333333, ans=0.125 2023-12-23 15:52:02,554 INFO [train.py:886] (0/4) Epoch 39, batch 800, loss[loss=0.01272, audio_tagging_loss=0.01272, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4864181.73 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:52:06,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1212720.0, ans=0.125 2023-12-23 15:52:12,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1212786.6666666667, ans=0.125 2023-12-23 15:52:17,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1212786.6666666667, ans=0.125 2023-12-23 15:52:24,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1212853.3333333333, ans=0.125 2023-12-23 15:52:27,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1212853.3333333333, ans=0.125 2023-12-23 15:52:28,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1212853.3333333333, ans=0.125 2023-12-23 15:52:33,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1212920.0, ans=0.125 2023-12-23 15:52:45,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1212986.6666666667, ans=0.125 2023-12-23 15:52:50,662 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.42 vs. limit=22.5 2023-12-23 15:52:52,159 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:52:53,717 INFO [train.py:886] (0/4) Epoch 39, batch 850, loss[loss=0.01452, audio_tagging_loss=0.01452, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4884774.81 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 64.0 2023-12-23 15:53:14,221 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.254e+01 3.592e+01 3.779e+01 3.968e+01 4.434e+01, threshold=7.558e+01, percent-clipped=0.0 2023-12-23 15:53:37,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1213320.0, ans=0.125 2023-12-23 15:53:45,639 INFO [train.py:886] (0/4) Epoch 39, batch 900, loss[loss=0.01067, audio_tagging_loss=0.01067, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4903099.13 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 64.0 2023-12-23 15:53:48,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1213386.6666666667, ans=0.0 2023-12-23 15:54:00,437 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.35 vs. limit=15.0 2023-12-23 15:54:06,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1213520.0, ans=0.125 2023-12-23 15:54:11,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1213520.0, ans=0.0 2023-12-23 15:54:11,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2023-12-23 15:54:12,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.69 vs. limit=10.0 2023-12-23 15:54:14,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1213520.0, ans=0.125 2023-12-23 15:54:15,059 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:54:20,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1213586.6666666667, ans=0.1 2023-12-23 15:54:25,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1213586.6666666667, ans=0.025 2023-12-23 15:54:38,401 INFO [train.py:886] (0/4) Epoch 39, batch 950, loss[loss=0.01065, audio_tagging_loss=0.01065, over 24003.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4909011.70 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:54:40,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1213720.0, ans=0.1 2023-12-23 15:54:44,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1213720.0, ans=0.125 2023-12-23 15:54:47,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1213786.6666666667, ans=0.2 2023-12-23 15:54:57,967 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.310e+01 3.605e+01 3.794e+01 3.970e+01 4.761e+01, threshold=7.588e+01, percent-clipped=0.0 2023-12-23 15:55:03,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1213853.3333333333, ans=0.125 2023-12-23 15:55:07,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1213853.3333333333, ans=0.125 2023-12-23 15:55:12,068 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2023-12-23 15:55:23,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1213986.6666666667, ans=0.1 2023-12-23 15:55:29,698 INFO [train.py:886] (0/4) Epoch 39, batch 1000, loss[loss=0.01071, audio_tagging_loss=0.01071, over 24750.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4917959.44 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:55:49,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1214120.0, ans=0.125 2023-12-23 15:56:09,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1214253.3333333333, ans=0.09899494936611666 2023-12-23 15:56:21,326 INFO [train.py:886] (0/4) Epoch 39, batch 1050, loss[loss=0.01219, audio_tagging_loss=0.01219, over 25000.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4927600.63 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:56:24,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1214386.6666666667, ans=0.0 2023-12-23 15:56:42,530 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.325e+01 3.673e+01 3.795e+01 3.964e+01 4.762e+01, threshold=7.590e+01, percent-clipped=0.0 2023-12-23 15:56:42,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1214520.0, ans=0.015 2023-12-23 15:56:47,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1214520.0, ans=0.2 2023-12-23 15:56:54,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1214586.6666666667, ans=0.125 2023-12-23 15:57:05,032 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.11 vs. limit=22.5 2023-12-23 15:57:05,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1214653.3333333333, ans=0.125 2023-12-23 15:57:13,364 INFO [train.py:886] (0/4) Epoch 39, batch 1100, loss[loss=0.01224, audio_tagging_loss=0.01224, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4935554.40 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:57:17,823 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2023-12-23 15:57:37,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1214853.3333333333, ans=0.125 2023-12-23 15:57:59,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1214986.6666666667, ans=0.125 2023-12-23 15:58:03,989 INFO [train.py:886] (0/4) Epoch 39, batch 1150, loss[loss=0.01017, audio_tagging_loss=0.01017, over 22238.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4939609.90 frames. ], batch size: 107, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:58:06,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1215053.3333333333, ans=0.1 2023-12-23 15:58:10,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1215053.3333333333, ans=0.125 2023-12-23 15:58:25,640 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.044e+01 3.585e+01 3.703e+01 3.926e+01 4.731e+01, threshold=7.406e+01, percent-clipped=0.0 2023-12-23 15:58:31,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1215186.6666666667, ans=0.035 2023-12-23 15:58:32,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=15.0 2023-12-23 15:58:35,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1215253.3333333333, ans=0.07 2023-12-23 15:58:36,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.35 vs. limit=15.0 2023-12-23 15:58:44,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1215320.0, ans=0.0 2023-12-23 15:58:50,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1215320.0, ans=0.0 2023-12-23 15:58:54,208 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:58:56,851 INFO [train.py:886] (0/4) Epoch 39, batch 1200, loss[loss=0.01066, audio_tagging_loss=0.01066, over 25000.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4951503.85 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:58:58,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1215386.6666666667, ans=0.1 2023-12-23 15:59:23,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1215520.0, ans=0.2 2023-12-23 15:59:40,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1215653.3333333333, ans=0.125 2023-12-23 15:59:45,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1215653.3333333333, ans=0.125 2023-12-23 15:59:47,987 INFO [train.py:886] (0/4) Epoch 39, batch 1250, loss[loss=0.01578, audio_tagging_loss=0.01578, over 24949.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4950332.28 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:59:51,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1215720.0, ans=0.125 2023-12-23 15:59:55,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1215720.0, ans=0.125 2023-12-23 16:00:04,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=1215786.6666666667, ans=0.2 2023-12-23 16:00:07,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1215786.6666666667, ans=0.125 2023-12-23 16:00:08,922 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.341e+01 3.597e+01 3.795e+01 3.980e+01 4.718e+01, threshold=7.591e+01, percent-clipped=0.0 2023-12-23 16:00:12,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.86 vs. limit=12.0 2023-12-23 16:00:23,871 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-12-23 16:00:32,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.97 vs. limit=15.0 2023-12-23 16:00:35,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1215986.6666666667, ans=0.09899494936611666 2023-12-23 16:00:40,162 INFO [train.py:886] (0/4) Epoch 39, batch 1300, loss[loss=0.0106, audio_tagging_loss=0.0106, over 24750.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4951736.91 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:00:41,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1216053.3333333333, ans=0.125 2023-12-23 16:00:47,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1216053.3333333333, ans=0.125 2023-12-23 16:00:48,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1216053.3333333333, ans=0.0 2023-12-23 16:00:55,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1216120.0, ans=0.125 2023-12-23 16:00:57,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1216120.0, ans=0.125 2023-12-23 16:00:57,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1216120.0, ans=0.0 2023-12-23 16:01:11,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1216253.3333333333, ans=0.0 2023-12-23 16:01:14,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=12.0 2023-12-23 16:01:17,722 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2023-12-23 16:01:29,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1216320.0, ans=0.125 2023-12-23 16:01:32,451 INFO [train.py:886] (0/4) Epoch 39, batch 1350, loss[loss=0.01042, audio_tagging_loss=0.01042, over 25000.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4950468.13 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:01:40,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1216386.6666666667, ans=0.125 2023-12-23 16:01:52,793 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.226e+01 3.613e+01 3.759e+01 3.931e+01 4.440e+01, threshold=7.518e+01, percent-clipped=0.0 2023-12-23 16:02:01,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1216520.0, ans=0.0 2023-12-23 16:02:12,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1216586.6666666667, ans=0.1 2023-12-23 16:02:14,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-12-23 16:02:24,097 INFO [train.py:886] (0/4) Epoch 39, batch 1400, loss[loss=0.01086, audio_tagging_loss=0.01086, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4954466.48 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:02:54,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1216920.0, ans=0.5 2023-12-23 16:03:03,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=15.0 2023-12-23 16:03:16,309 INFO [train.py:886] (0/4) Epoch 39, batch 1450, loss[loss=0.01041, audio_tagging_loss=0.01041, over 24750.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4955234.00 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:03:36,565 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.177e+01 3.585e+01 3.717e+01 3.896e+01 4.835e+01, threshold=7.434e+01, percent-clipped=0.0 2023-12-23 16:03:46,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1217253.3333333333, ans=0.035 2023-12-23 16:04:06,455 INFO [train.py:886] (0/4) Epoch 39, batch 1500, loss[loss=0.01125, audio_tagging_loss=0.01125, over 25000.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4960335.13 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:04:15,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=12.0 2023-12-23 16:04:27,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1217520.0, ans=0.125 2023-12-23 16:04:33,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1217520.0, ans=0.2 2023-12-23 16:04:57,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1217720.0, ans=0.125 2023-12-23 16:04:57,993 INFO [train.py:886] (0/4) Epoch 39, batch 1550, loss[loss=0.01175, audio_tagging_loss=0.01175, over 24750.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4958722.91 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:05:08,424 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=15.0 2023-12-23 16:05:18,863 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.273e+01 3.671e+01 3.823e+01 4.043e+01 4.664e+01, threshold=7.647e+01, percent-clipped=0.0 2023-12-23 16:05:21,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1217853.3333333333, ans=0.125 2023-12-23 16:05:30,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.39 vs. limit=15.0 2023-12-23 16:05:42,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1217986.6666666667, ans=0.0 2023-12-23 16:05:49,418 INFO [train.py:886] (0/4) Epoch 39, batch 1600, loss[loss=0.01284, audio_tagging_loss=0.01284, over 24750.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4953034.47 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:06:08,018 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:06:12,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.46 vs. limit=12.0 2023-12-23 16:06:18,977 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.35 vs. limit=22.5 2023-12-23 16:06:40,798 INFO [train.py:886] (0/4) Epoch 39, batch 1650, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4950422.80 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:06:45,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1218386.6666666667, ans=0.0 2023-12-23 16:06:49,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1218386.6666666667, ans=0.2 2023-12-23 16:06:50,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2023-12-23 16:06:51,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.39 vs. limit=15.0 2023-12-23 16:06:52,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1218453.3333333333, ans=0.0 2023-12-23 16:07:00,993 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.195e+01 3.628e+01 3.774e+01 3.923e+01 5.343e+01, threshold=7.548e+01, percent-clipped=0.0 2023-12-23 16:07:03,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1218520.0, ans=0.125 2023-12-23 16:07:06,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-12-23 16:07:21,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=1218653.3333333333, ans=0.1 2023-12-23 16:07:31,236 INFO [train.py:886] (0/4) Epoch 39, batch 1700, loss[loss=0.01032, audio_tagging_loss=0.01032, over 22116.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4945682.82 frames. ], batch size: 107, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:07:53,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.51 vs. limit=12.0 2023-12-23 16:08:21,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1218986.6666666667, ans=0.0 2023-12-23 16:08:23,666 INFO [train.py:886] (0/4) Epoch 39, batch 1750, loss[loss=0.01239, audio_tagging_loss=0.01239, over 25000.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4952941.82 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:08:26,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1219053.3333333333, ans=0.125 2023-12-23 16:08:26,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1219053.3333333333, ans=0.09899494936611666 2023-12-23 16:08:31,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1219053.3333333333, ans=0.125 2023-12-23 16:08:32,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1219120.0, ans=0.2 2023-12-23 16:08:43,369 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.314e+01 3.573e+01 3.705e+01 3.928e+01 4.407e+01, threshold=7.410e+01, percent-clipped=0.0 2023-12-23 16:09:04,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2023-12-23 16:09:13,948 INFO [train.py:886] (0/4) Epoch 39, batch 1800, loss[loss=0.01289, audio_tagging_loss=0.01289, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4954374.80 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:09:34,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1219520.0, ans=0.09899494936611666 2023-12-23 16:09:38,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=15.0 2023-12-23 16:09:39,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1219520.0, ans=0.0 2023-12-23 16:10:05,618 INFO [train.py:886] (0/4) Epoch 39, batch 1850, loss[loss=0.01191, audio_tagging_loss=0.01191, over 24750.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4959243.37 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:10:20,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.01 vs. limit=22.5 2023-12-23 16:10:22,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.31 vs. limit=22.5 2023-12-23 16:10:25,962 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.296e+01 3.679e+01 3.832e+01 4.036e+01 5.101e+01, threshold=7.663e+01, percent-clipped=0.0 2023-12-23 16:10:29,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1219853.3333333333, ans=0.1 2023-12-23 16:10:57,215 INFO [train.py:886] (0/4) Epoch 39, batch 1900, loss[loss=0.0121, audio_tagging_loss=0.0121, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4945531.11 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:11:03,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1220053.3333333333, ans=0.125 2023-12-23 16:11:23,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1220186.6666666667, ans=0.125 2023-12-23 16:11:46,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.59 vs. limit=6.0 2023-12-23 16:11:47,543 INFO [train.py:886] (0/4) Epoch 39, batch 1950, loss[loss=0.01201, audio_tagging_loss=0.01201, over 24750.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4944549.30 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:11:48,628 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:12:07,896 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.088e+01 3.588e+01 3.763e+01 3.930e+01 4.449e+01, threshold=7.526e+01, percent-clipped=0.0 2023-12-23 16:12:10,980 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:12:11,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1220520.0, ans=0.0 2023-12-23 16:12:35,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1220653.3333333333, ans=0.0 2023-12-23 16:12:35,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1220653.3333333333, ans=0.0 2023-12-23 16:12:38,736 INFO [train.py:886] (0/4) Epoch 39, batch 2000, loss[loss=0.01087, audio_tagging_loss=0.01087, over 25000.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4952908.05 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:12:43,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1220720.0, ans=0.125 2023-12-23 16:12:53,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1220786.6666666667, ans=0.1 2023-12-23 16:13:00,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=12.0 2023-12-23 16:13:09,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1220920.0, ans=0.125 2023-12-23 16:13:17,356 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2023-12-23 16:13:29,330 INFO [train.py:886] (0/4) Epoch 39, batch 2050, loss[loss=0.01223, audio_tagging_loss=0.01223, over 25000.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4953859.14 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:13:29,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1221053.3333333333, ans=0.0 2023-12-23 16:13:47,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1221120.0, ans=0.07 2023-12-23 16:13:48,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2023-12-23 16:13:51,338 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.148e+01 3.589e+01 3.729e+01 3.908e+01 4.611e+01, threshold=7.458e+01, percent-clipped=0.0 2023-12-23 16:13:54,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1221186.6666666667, ans=0.0 2023-12-23 16:14:01,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1221253.3333333333, ans=0.125 2023-12-23 16:14:23,172 INFO [train.py:886] (0/4) Epoch 39, batch 2100, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4957427.60 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:14:27,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1221386.6666666667, ans=0.2 2023-12-23 16:14:47,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1221520.0, ans=0.1 2023-12-23 16:14:53,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1221586.6666666667, ans=0.95 2023-12-23 16:15:05,842 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.58 vs. limit=12.0 2023-12-23 16:15:06,520 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-12-23 16:15:07,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1221653.3333333333, ans=0.0 2023-12-23 16:15:14,105 INFO [train.py:886] (0/4) Epoch 39, batch 2150, loss[loss=0.01505, audio_tagging_loss=0.01505, over 24750.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4956737.80 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:15:18,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1221720.0, ans=0.125 2023-12-23 16:15:18,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1221720.0, ans=0.125 2023-12-23 16:15:25,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1221786.6666666667, ans=0.025 2023-12-23 16:15:27,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2023-12-23 16:15:30,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1221786.6666666667, ans=0.125 2023-12-23 16:15:33,844 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.319e+01 3.644e+01 3.753e+01 3.947e+01 5.073e+01, threshold=7.506e+01, percent-clipped=0.0 2023-12-23 16:15:55,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1221986.6666666667, ans=0.0 2023-12-23 16:15:56,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1221986.6666666667, ans=0.09899494936611666 2023-12-23 16:16:03,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1222053.3333333333, ans=0.07 2023-12-23 16:16:04,607 INFO [train.py:886] (0/4) Epoch 39, batch 2200, loss[loss=0.009447, audio_tagging_loss=0.009447, over 24073.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4951335.36 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:16:12,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1222053.3333333333, ans=0.2 2023-12-23 16:16:32,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1222186.6666666667, ans=0.1 2023-12-23 16:16:36,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1222253.3333333333, ans=0.04949747468305833 2023-12-23 16:16:56,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1222386.6666666667, ans=0.0 2023-12-23 16:16:57,397 INFO [train.py:886] (0/4) Epoch 39, batch 2250, loss[loss=0.01027, audio_tagging_loss=0.01027, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4948044.30 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:17:07,193 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:17:07,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=12.0 2023-12-23 16:17:11,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1222453.3333333333, ans=0.0 2023-12-23 16:17:13,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2023-12-23 16:17:16,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2023-12-23 16:17:17,746 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.255e+01 3.644e+01 3.804e+01 3.965e+01 5.338e+01, threshold=7.608e+01, percent-clipped=0.0 2023-12-23 16:17:30,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1222586.6666666667, ans=0.2 2023-12-23 16:17:31,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2023-12-23 16:17:40,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1222653.3333333333, ans=0.125 2023-12-23 16:17:44,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=1222653.3333333333, ans=0.1 2023-12-23 16:17:48,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1222720.0, ans=0.07 2023-12-23 16:17:49,023 INFO [train.py:886] (0/4) Epoch 39, batch 2300, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4949113.28 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:18:08,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2023-12-23 16:18:15,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1222853.3333333333, ans=0.2 2023-12-23 16:18:21,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1222920.0, ans=0.0 2023-12-23 16:18:23,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0 2023-12-23 16:18:26,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1222920.0, ans=0.0 2023-12-23 16:18:41,198 INFO [train.py:886] (0/4) Epoch 39, batch 2350, loss[loss=0.01082, audio_tagging_loss=0.01082, over 25000.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4946907.39 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:18:44,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1223053.3333333333, ans=0.0 2023-12-23 16:18:46,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1223053.3333333333, ans=0.125 2023-12-23 16:18:55,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1223120.0, ans=0.2 2023-12-23 16:18:56,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1223120.0, ans=0.0 2023-12-23 16:19:02,462 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.209e+01 3.576e+01 3.750e+01 3.912e+01 4.537e+01, threshold=7.499e+01, percent-clipped=0.0 2023-12-23 16:19:10,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1223186.6666666667, ans=0.125 2023-12-23 16:19:15,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1223253.3333333333, ans=0.0 2023-12-23 16:19:22,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2023-12-23 16:19:24,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1223320.0, ans=0.025 2023-12-23 16:19:32,836 INFO [train.py:886] (0/4) Epoch 39, batch 2400, loss[loss=0.01042, audio_tagging_loss=0.01042, over 24750.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4954097.89 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:19:37,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1223386.6666666667, ans=0.125 2023-12-23 16:19:45,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1223453.3333333333, ans=0.2 2023-12-23 16:19:56,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1223520.0, ans=0.125 2023-12-23 16:19:56,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1223520.0, ans=0.125 2023-12-23 16:19:56,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1223520.0, ans=0.0 2023-12-23 16:20:00,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1223520.0, ans=0.1 2023-12-23 16:20:05,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1223586.6666666667, ans=0.125 2023-12-23 16:20:08,048 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:20:17,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1223653.3333333333, ans=0.0 2023-12-23 16:20:24,847 INFO [train.py:886] (0/4) Epoch 39, batch 2450, loss[loss=0.01393, audio_tagging_loss=0.01393, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4958221.52 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:20:25,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=12.0 2023-12-23 16:20:45,903 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.280e+01 3.674e+01 3.800e+01 3.952e+01 4.172e+01, threshold=7.601e+01, percent-clipped=0.0 2023-12-23 16:21:01,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1223920.0, ans=0.1 2023-12-23 16:21:02,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.53 vs. limit=22.5 2023-12-23 16:21:06,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1223986.6666666667, ans=0.2 2023-12-23 16:21:06,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1223986.6666666667, ans=0.0 2023-12-23 16:21:09,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1223986.6666666667, ans=0.0 2023-12-23 16:21:13,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1223986.6666666667, ans=0.125 2023-12-23 16:21:17,302 INFO [train.py:886] (0/4) Epoch 39, batch 2500, loss[loss=0.01208, audio_tagging_loss=0.01208, over 24750.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4952837.68 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:21:18,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1224053.3333333333, ans=0.125 2023-12-23 16:21:26,175 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2023-12-23 16:21:37,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1224186.6666666667, ans=0.125 2023-12-23 16:21:41,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1224186.6666666667, ans=0.1 2023-12-23 16:21:54,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1224253.3333333333, ans=0.125 2023-12-23 16:22:09,652 INFO [train.py:886] (0/4) Epoch 39, batch 2550, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4946704.61 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:22:20,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1224453.3333333333, ans=0.125 2023-12-23 16:22:30,076 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.292e+01 3.640e+01 3.880e+01 4.066e+01 5.144e+01, threshold=7.760e+01, percent-clipped=0.0 2023-12-23 16:22:42,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1224586.6666666667, ans=0.125 2023-12-23 16:22:43,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1224586.6666666667, ans=0.125 2023-12-23 16:22:51,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1224653.3333333333, ans=0.125 2023-12-23 16:23:01,522 INFO [train.py:886] (0/4) Epoch 39, batch 2600, loss[loss=0.01116, audio_tagging_loss=0.01116, over 25000.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4947278.29 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:23:09,946 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:23:12,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1224786.6666666667, ans=0.2 2023-12-23 16:23:16,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2023-12-23 16:23:21,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.31 vs. limit=15.0 2023-12-23 16:23:35,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-12-23 16:23:42,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1224986.6666666667, ans=0.125 2023-12-23 16:23:44,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1224986.6666666667, ans=0.125 2023-12-23 16:23:54,138 INFO [train.py:886] (0/4) Epoch 39, batch 2650, loss[loss=0.01066, audio_tagging_loss=0.01066, over 24022.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4940468.04 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:23:59,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1225053.3333333333, ans=0.125 2023-12-23 16:24:07,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1225120.0, ans=0.125 2023-12-23 16:24:11,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1225120.0, ans=0.2 2023-12-23 16:24:12,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1225120.0, ans=0.0 2023-12-23 16:24:14,557 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.356e+01 3.623e+01 3.755e+01 3.962e+01 4.665e+01, threshold=7.509e+01, percent-clipped=0.0 2023-12-23 16:24:18,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.06 vs. limit=12.0 2023-12-23 16:24:23,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1225186.6666666667, ans=0.1 2023-12-23 16:24:44,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.78 vs. limit=15.0 2023-12-23 16:24:46,261 INFO [train.py:886] (0/4) Epoch 39, batch 2700, loss[loss=0.01342, audio_tagging_loss=0.01342, over 25000.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4940259.84 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:24:48,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1225386.6666666667, ans=0.0 2023-12-23 16:24:52,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1225386.6666666667, ans=0.1 2023-12-23 16:24:55,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1225453.3333333333, ans=0.2 2023-12-23 16:25:03,799 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:25:25,258 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.63 vs. limit=15.0 2023-12-23 16:25:26,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1225653.3333333333, ans=0.035 2023-12-23 16:25:37,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1225720.0, ans=0.05 2023-12-23 16:25:37,941 INFO [train.py:886] (0/4) Epoch 39, batch 2750, loss[loss=0.01045, audio_tagging_loss=0.01045, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4947570.37 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:25:58,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1225853.3333333333, ans=0.125 2023-12-23 16:25:59,157 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.274e+01 3.579e+01 3.766e+01 3.936e+01 4.564e+01, threshold=7.531e+01, percent-clipped=0.0 2023-12-23 16:26:00,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1225853.3333333333, ans=0.125 2023-12-23 16:26:01,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.29 vs. limit=15.0 2023-12-23 16:26:30,267 INFO [train.py:886] (0/4) Epoch 39, batch 2800, loss[loss=0.01338, audio_tagging_loss=0.01338, over 24750.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4952201.49 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:26:32,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.13 vs. limit=22.5 2023-12-23 16:26:38,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1226053.3333333333, ans=0.1 2023-12-23 16:26:42,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.31 vs. limit=22.5 2023-12-23 16:26:48,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1226120.0, ans=0.04949747468305833 2023-12-23 16:26:49,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1226186.6666666667, ans=0.0 2023-12-23 16:26:54,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1226186.6666666667, ans=0.125 2023-12-23 16:26:58,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1226186.6666666667, ans=0.125 2023-12-23 16:27:06,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1226253.3333333333, ans=0.125 2023-12-23 16:27:10,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1226320.0, ans=0.125 2023-12-23 16:27:20,917 INFO [train.py:886] (0/4) Epoch 39, batch 2850, loss[loss=0.01129, audio_tagging_loss=0.01129, over 24750.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4943883.12 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:27:25,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1226386.6666666667, ans=0.0 2023-12-23 16:27:26,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1226386.6666666667, ans=0.5 2023-12-23 16:27:43,714 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.319e+01 3.655e+01 3.774e+01 3.936e+01 6.681e+01, threshold=7.549e+01, percent-clipped=0.0 2023-12-23 16:27:44,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1226520.0, ans=0.125 2023-12-23 16:28:04,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1226653.3333333333, ans=0.0 2023-12-23 16:28:05,221 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-184000.pt 2023-12-23 16:28:12,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1226653.3333333333, ans=0.0 2023-12-23 16:28:14,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1226653.3333333333, ans=0.0 2023-12-23 16:28:16,150 INFO [train.py:886] (0/4) Epoch 39, batch 2900, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4941530.71 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:28:23,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1226720.0, ans=0.125 2023-12-23 16:28:37,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1226853.3333333333, ans=0.0 2023-12-23 16:28:45,729 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:28:46,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1226920.0, ans=0.0 2023-12-23 16:28:52,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1226920.0, ans=0.125 2023-12-23 16:28:53,149 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:29:08,285 INFO [train.py:886] (0/4) Epoch 39, batch 2950, loss[loss=0.01342, audio_tagging_loss=0.01342, over 24750.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4938866.72 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:29:11,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1227053.3333333333, ans=0.125 2023-12-23 16:29:28,873 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.270e+01 3.645e+01 3.777e+01 3.932e+01 4.663e+01, threshold=7.553e+01, percent-clipped=0.0 2023-12-23 16:29:30,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1227186.6666666667, ans=0.0 2023-12-23 16:29:34,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1227186.6666666667, ans=0.125 2023-12-23 16:29:42,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1227253.3333333333, ans=0.0 2023-12-23 16:29:53,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1227320.0, ans=0.2 2023-12-23 16:29:55,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.19 vs. limit=22.5 2023-12-23 16:29:58,849 INFO [train.py:886] (0/4) Epoch 39, batch 3000, loss[loss=0.01159, audio_tagging_loss=0.01159, over 25000.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4948081.72 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:29:58,851 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 16:30:19,979 INFO [train.py:917] (0/4) Epoch 39, validation: loss=0.03462, audio_tagging_loss=0.03462, over 3737520.00 frames. 2023-12-23 16:30:19,980 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 16:30:34,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1227453.3333333333, ans=0.5 2023-12-23 16:30:55,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1227586.6666666667, ans=0.0 2023-12-23 16:31:10,892 INFO [train.py:886] (0/4) Epoch 39, batch 3050, loss[loss=0.0124, audio_tagging_loss=0.0124, over 25000.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4956933.09 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:31:11,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1227720.0, ans=0.125 2023-12-23 16:31:18,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1227720.0, ans=0.1 2023-12-23 16:31:20,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1227720.0, ans=0.1 2023-12-23 16:31:21,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1227786.6666666667, ans=0.2 2023-12-23 16:31:28,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1227786.6666666667, ans=0.1 2023-12-23 16:31:29,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1227786.6666666667, ans=0.125 2023-12-23 16:31:32,790 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.327e+01 3.613e+01 3.797e+01 3.917e+01 4.495e+01, threshold=7.595e+01, percent-clipped=0.0 2023-12-23 16:31:37,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1227853.3333333333, ans=0.125 2023-12-23 16:31:47,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1227920.0, ans=0.125 2023-12-23 16:31:58,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1227986.6666666667, ans=0.125 2023-12-23 16:32:01,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1227986.6666666667, ans=0.0 2023-12-23 16:32:02,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1228053.3333333333, ans=0.1 2023-12-23 16:32:03,069 INFO [train.py:886] (0/4) Epoch 39, batch 3100, loss[loss=0.009212, audio_tagging_loss=0.009212, over 25000.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4957676.92 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:32:03,551 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.86 vs. limit=15.0 2023-12-23 16:32:12,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.94 vs. limit=10.0 2023-12-23 16:32:15,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2023-12-23 16:32:24,740 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2023-12-23 16:32:32,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1228186.6666666667, ans=0.2 2023-12-23 16:32:44,634 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-12-23 16:32:55,425 INFO [train.py:886] (0/4) Epoch 39, batch 3150, loss[loss=0.01413, audio_tagging_loss=0.01413, over 24947.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4952084.32 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:33:05,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1228453.3333333333, ans=0.125 2023-12-23 16:33:16,707 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.313e+01 3.715e+01 3.835e+01 3.978e+01 4.506e+01, threshold=7.670e+01, percent-clipped=0.0 2023-12-23 16:33:25,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1228520.0, ans=0.1 2023-12-23 16:33:46,376 INFO [train.py:886] (0/4) Epoch 39, batch 3200, loss[loss=0.009485, audio_tagging_loss=0.009485, over 24750.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4946168.88 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:33:47,771 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2023-12-23 16:33:52,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.23 vs. limit=15.0 2023-12-23 16:33:59,568 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=15.0 2023-12-23 16:34:09,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1228853.3333333333, ans=0.125 2023-12-23 16:34:18,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-12-23 16:34:18,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.64 vs. limit=12.0 2023-12-23 16:34:20,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1228920.0, ans=0.125 2023-12-23 16:34:25,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-12-23 16:34:39,438 INFO [train.py:886] (0/4) Epoch 39, batch 3250, loss[loss=0.01102, audio_tagging_loss=0.01102, over 24750.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4940289.57 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:34:48,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.01 vs. limit=15.0 2023-12-23 16:34:49,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1229120.0, ans=0.125 2023-12-23 16:35:00,578 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.139e+01 3.582e+01 3.732e+01 3.928e+01 4.508e+01, threshold=7.464e+01, percent-clipped=0.0 2023-12-23 16:35:03,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1229186.6666666667, ans=0.1 2023-12-23 16:35:16,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1229253.3333333333, ans=0.125 2023-12-23 16:35:18,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1229253.3333333333, ans=0.0 2023-12-23 16:35:18,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2023-12-23 16:35:31,247 INFO [train.py:886] (0/4) Epoch 39, batch 3300, loss[loss=0.01067, audio_tagging_loss=0.01067, over 25000.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4947494.87 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:35:48,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1229453.3333333333, ans=0.2 2023-12-23 16:35:50,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1229453.3333333333, ans=0.125 2023-12-23 16:35:53,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=1229520.0, ans=0.1 2023-12-23 16:35:58,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1229520.0, ans=0.0 2023-12-23 16:35:59,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2023-12-23 16:36:06,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2023-12-23 16:36:15,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1229653.3333333333, ans=0.125 2023-12-23 16:36:17,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1229653.3333333333, ans=0.125 2023-12-23 16:36:22,426 INFO [train.py:886] (0/4) Epoch 39, batch 3350, loss[loss=0.01072, audio_tagging_loss=0.01072, over 25000.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4953774.99 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:36:33,528 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:36:45,134 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.391e+01 3.649e+01 3.789e+01 3.930e+01 4.813e+01, threshold=7.578e+01, percent-clipped=0.0 2023-12-23 16:37:03,596 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.05 vs. limit=12.0 2023-12-23 16:37:13,996 INFO [train.py:886] (0/4) Epoch 39, batch 3400, loss[loss=0.01543, audio_tagging_loss=0.01543, over 24945.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4959818.12 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:37:20,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1230053.3333333333, ans=0.0 2023-12-23 16:37:32,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.48 vs. limit=10.0 2023-12-23 16:37:38,167 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2023-12-23 16:37:48,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1230253.3333333333, ans=0.1 2023-12-23 16:37:52,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1230253.3333333333, ans=0.0 2023-12-23 16:38:06,189 INFO [train.py:886] (0/4) Epoch 39, batch 3450, loss[loss=0.01104, audio_tagging_loss=0.01104, over 24750.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4954027.10 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:38:28,018 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.315e+01 3.698e+01 3.845e+01 3.983e+01 4.520e+01, threshold=7.691e+01, percent-clipped=0.0 2023-12-23 16:38:38,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=1230586.6666666667, ans=0.02 2023-12-23 16:38:42,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=1230586.6666666667, ans=0.1 2023-12-23 16:38:42,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1230586.6666666667, ans=0.125 2023-12-23 16:38:58,262 INFO [train.py:886] (0/4) Epoch 39, batch 3500, loss[loss=0.01153, audio_tagging_loss=0.01153, over 24750.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4949961.76 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:39:01,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1230720.0, ans=0.1 2023-12-23 16:39:13,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1230786.6666666667, ans=0.0 2023-12-23 16:39:15,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1230786.6666666667, ans=0.09899494936611666 2023-12-23 16:39:25,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2023-12-23 16:39:49,920 INFO [train.py:886] (0/4) Epoch 39, batch 3550, loss[loss=0.0133, audio_tagging_loss=0.0133, over 24927.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4949826.06 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:40:11,392 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.150e+01 3.585e+01 3.771e+01 3.949e+01 4.246e+01, threshold=7.542e+01, percent-clipped=0.0 2023-12-23 16:40:25,254 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.10 vs. limit=22.5 2023-12-23 16:40:41,480 INFO [train.py:886] (0/4) Epoch 39, batch 3600, loss[loss=0.009388, audio_tagging_loss=0.009388, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4954271.52 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:40:41,704 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:40:41,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1231386.6666666667, ans=0.0 2023-12-23 16:40:42,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1231386.6666666667, ans=0.0 2023-12-23 16:40:49,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.15 vs. limit=10.0 2023-12-23 16:40:57,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1231453.3333333333, ans=0.2 2023-12-23 16:41:03,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1231520.0, ans=0.1 2023-12-23 16:41:05,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.74 vs. limit=15.0 2023-12-23 16:41:14,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1231586.6666666667, ans=0.0 2023-12-23 16:41:15,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1231586.6666666667, ans=6.0 2023-12-23 16:41:33,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=12.0 2023-12-23 16:41:34,300 INFO [train.py:886] (0/4) Epoch 39, batch 3650, loss[loss=0.01078, audio_tagging_loss=0.01078, over 25000.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4953913.04 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 64.0 2023-12-23 16:41:34,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1231720.0, ans=0.2 2023-12-23 16:41:56,215 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.259e+01 3.628e+01 3.795e+01 4.011e+01 5.130e+01, threshold=7.590e+01, percent-clipped=0.0 2023-12-23 16:42:13,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1231920.0, ans=0.125 2023-12-23 16:42:15,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1231986.6666666667, ans=0.1 2023-12-23 16:42:26,722 INFO [train.py:886] (0/4) Epoch 39, batch 3700, loss[loss=0.01295, audio_tagging_loss=0.01295, over 25000.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4950425.25 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 64.0 2023-12-23 16:42:27,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1232053.3333333333, ans=0.2 2023-12-23 16:42:40,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1232120.0, ans=0.125 2023-12-23 16:42:48,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1232186.6666666667, ans=0.125 2023-12-23 16:42:51,072 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2023-12-23 16:42:53,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.38 vs. limit=22.5 2023-12-23 16:42:53,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1232186.6666666667, ans=0.95 2023-12-23 16:42:58,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1232253.3333333333, ans=0.035 2023-12-23 16:43:10,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1232320.0, ans=0.125 2023-12-23 16:43:17,084 INFO [train.py:886] (0/4) Epoch 39, batch 3750, loss[loss=0.01374, audio_tagging_loss=0.01374, over 24750.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4947285.81 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 64.0 2023-12-23 16:43:17,217 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:43:20,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-12-23 16:43:20,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.06 vs. limit=22.5 2023-12-23 16:43:33,707 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2023-12-23 16:43:39,547 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.207e+01 3.636e+01 3.779e+01 3.931e+01 4.643e+01, threshold=7.558e+01, percent-clipped=0.0 2023-12-23 16:43:46,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1232520.0, ans=0.125 2023-12-23 16:44:00,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1232653.3333333333, ans=0.125 2023-12-23 16:44:10,057 INFO [train.py:886] (0/4) Epoch 39, batch 3800, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4943682.48 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 64.0 2023-12-23 16:44:26,577 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.41 vs. limit=22.5 2023-12-23 16:44:33,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1232853.3333333333, ans=0.1 2023-12-23 16:44:33,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1232853.3333333333, ans=0.0 2023-12-23 16:44:36,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.50 vs. limit=15.0 2023-12-23 16:44:45,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1232920.0, ans=0.2 2023-12-23 16:45:01,447 INFO [train.py:886] (0/4) Epoch 39, batch 3850, loss[loss=0.01217, audio_tagging_loss=0.01217, over 24750.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4940086.55 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:45:07,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1233053.3333333333, ans=0.2 2023-12-23 16:45:08,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1233053.3333333333, ans=0.125 2023-12-23 16:45:17,372 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:45:23,678 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.220e+01 3.590e+01 3.789e+01 3.957e+01 4.562e+01, threshold=7.578e+01, percent-clipped=0.0 2023-12-23 16:45:23,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1233186.6666666667, ans=0.0 2023-12-23 16:45:28,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2023-12-23 16:45:49,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.48 vs. limit=15.0 2023-12-23 16:45:52,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1233386.6666666667, ans=0.125 2023-12-23 16:45:53,258 INFO [train.py:886] (0/4) Epoch 39, batch 3900, loss[loss=0.01263, audio_tagging_loss=0.01263, over 24923.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4943937.92 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:46:02,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1233453.3333333333, ans=0.0 2023-12-23 16:46:43,915 INFO [train.py:886] (0/4) Epoch 39, batch 3950, loss[loss=0.009847, audio_tagging_loss=0.009847, over 25000.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4946436.85 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:46:52,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1233720.0, ans=0.0 2023-12-23 16:46:57,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.31 vs. limit=22.5 2023-12-23 16:46:58,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1233786.6666666667, ans=0.125 2023-12-23 16:47:05,591 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.227e+01 3.602e+01 3.745e+01 4.012e+01 4.573e+01, threshold=7.490e+01, percent-clipped=0.0 2023-12-23 16:47:09,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1233853.3333333333, ans=0.125 2023-12-23 16:47:14,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1233920.0, ans=0.0 2023-12-23 16:47:34,892 INFO [train.py:886] (0/4) Epoch 39, batch 4000, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4951749.89 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:47:51,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1234120.0, ans=0.1 2023-12-23 16:48:21,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.13 vs. limit=22.5 2023-12-23 16:48:27,939 INFO [train.py:886] (0/4) Epoch 39, batch 4050, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24750.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4954068.25 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:48:50,245 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.337e+01 3.639e+01 3.792e+01 4.036e+01 4.478e+01, threshold=7.585e+01, percent-clipped=0.0 2023-12-23 16:49:00,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1234586.6666666667, ans=0.125 2023-12-23 16:49:18,318 INFO [train.py:886] (0/4) Epoch 39, batch 4100, loss[loss=0.01234, audio_tagging_loss=0.01234, over 24750.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4946736.23 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:49:21,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1234720.0, ans=0.125 2023-12-23 16:49:21,798 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:49:32,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1234786.6666666667, ans=0.1 2023-12-23 16:49:33,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1234786.6666666667, ans=0.125 2023-12-23 16:49:35,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1234786.6666666667, ans=0.125 2023-12-23 16:49:43,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1234853.3333333333, ans=0.125 2023-12-23 16:49:44,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1234853.3333333333, ans=0.2 2023-12-23 16:49:50,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1234920.0, ans=0.125 2023-12-23 16:50:02,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1234986.6666666667, ans=0.0 2023-12-23 16:50:06,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1234986.6666666667, ans=0.1 2023-12-23 16:50:08,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1234986.6666666667, ans=0.125 2023-12-23 16:50:10,278 INFO [train.py:886] (0/4) Epoch 39, batch 4150, loss[loss=0.007397, audio_tagging_loss=0.007397, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4940259.24 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:50:33,717 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.167e+01 3.682e+01 3.809e+01 3.972e+01 4.566e+01, threshold=7.618e+01, percent-clipped=0.0 2023-12-23 16:50:34,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1235186.6666666667, ans=0.125 2023-12-23 16:50:40,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1235253.3333333333, ans=0.125 2023-12-23 16:50:49,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1235253.3333333333, ans=0.125 2023-12-23 16:50:55,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.52 vs. limit=22.5 2023-12-23 16:51:02,391 INFO [train.py:886] (0/4) Epoch 39, batch 4200, loss[loss=0.01187, audio_tagging_loss=0.01187, over 24750.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4941171.34 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:51:02,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1235386.6666666667, ans=0.125 2023-12-23 16:51:12,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1235453.3333333333, ans=0.0 2023-12-23 16:51:20,554 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:51:23,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1235520.0, ans=0.0 2023-12-23 16:51:34,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.64 vs. limit=6.0 2023-12-23 16:51:50,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1235653.3333333333, ans=0.2 2023-12-23 16:51:51,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1235653.3333333333, ans=0.125 2023-12-23 16:51:54,128 INFO [train.py:886] (0/4) Epoch 39, batch 4250, loss[loss=0.01402, audio_tagging_loss=0.01402, over 25000.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4951372.99 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:51:56,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1235720.0, ans=0.0 2023-12-23 16:51:59,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.13 vs. limit=22.5 2023-12-23 16:52:06,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1235786.6666666667, ans=0.0 2023-12-23 16:52:17,121 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.199e+01 3.604e+01 3.815e+01 3.941e+01 4.499e+01, threshold=7.630e+01, percent-clipped=0.0 2023-12-23 16:52:29,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1235920.0, ans=0.125 2023-12-23 16:52:30,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1235920.0, ans=0.95 2023-12-23 16:52:38,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1235986.6666666667, ans=0.1 2023-12-23 16:52:46,826 INFO [train.py:886] (0/4) Epoch 39, batch 4300, loss[loss=0.01454, audio_tagging_loss=0.01454, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4949480.70 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:52:58,663 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.56 vs. limit=22.5 2023-12-23 16:53:00,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1236120.0, ans=0.125 2023-12-23 16:53:06,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1236186.6666666667, ans=0.1 2023-12-23 16:53:11,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1236186.6666666667, ans=0.0 2023-12-23 16:53:19,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1236253.3333333333, ans=0.0 2023-12-23 16:53:23,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1236253.3333333333, ans=0.125 2023-12-23 16:53:25,265 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=12.0 2023-12-23 16:53:34,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1236320.0, ans=0.0 2023-12-23 16:53:37,802 INFO [train.py:886] (0/4) Epoch 39, batch 4350, loss[loss=0.0136, audio_tagging_loss=0.0136, over 24750.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4951644.77 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:53:43,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1236386.6666666667, ans=0.0 2023-12-23 16:54:00,706 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.228e+01 3.613e+01 3.861e+01 4.059e+01 4.961e+01, threshold=7.722e+01, percent-clipped=0.0 2023-12-23 16:54:13,573 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:54:15,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1236586.6666666667, ans=0.95 2023-12-23 16:54:29,096 INFO [train.py:886] (0/4) Epoch 39, batch 4400, loss[loss=0.01452, audio_tagging_loss=0.01452, over 24750.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4949644.74 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:54:36,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2023-12-23 16:54:47,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1236786.6666666667, ans=0.2 2023-12-23 16:54:52,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1236853.3333333333, ans=0.125 2023-12-23 16:55:02,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1236920.0, ans=0.0 2023-12-23 16:55:13,089 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.42 vs. limit=22.5 2023-12-23 16:55:13,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1236986.6666666667, ans=0.125 2023-12-23 16:55:14,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1236986.6666666667, ans=0.0 2023-12-23 16:55:20,788 INFO [train.py:886] (0/4) Epoch 39, batch 4450, loss[loss=0.0125, audio_tagging_loss=0.0125, over 24750.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4945568.21 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:55:31,764 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:55:37,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1237120.0, ans=0.07 2023-12-23 16:55:44,360 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.339e+01 3.658e+01 3.824e+01 3.990e+01 4.644e+01, threshold=7.648e+01, percent-clipped=0.0 2023-12-23 16:55:45,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1237186.6666666667, ans=0.0 2023-12-23 16:55:48,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1237186.6666666667, ans=0.125 2023-12-23 16:55:55,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1237253.3333333333, ans=0.0 2023-12-23 16:55:55,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1237253.3333333333, ans=0.125 2023-12-23 16:56:13,236 INFO [train.py:886] (0/4) Epoch 39, batch 4500, loss[loss=0.01291, audio_tagging_loss=0.01291, over 24750.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4949926.09 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:56:13,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1237386.6666666667, ans=0.1 2023-12-23 16:56:26,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1237453.3333333333, ans=0.2 2023-12-23 16:56:35,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1237520.0, ans=0.125 2023-12-23 16:56:36,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2023-12-23 16:56:40,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1237520.0, ans=0.1 2023-12-23 16:56:42,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=12.0 2023-12-23 16:56:49,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-12-23 16:57:05,492 INFO [train.py:886] (0/4) Epoch 39, batch 4550, loss[loss=0.01164, audio_tagging_loss=0.01164, over 25000.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4951215.36 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:57:06,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1237720.0, ans=0.0 2023-12-23 16:57:18,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1237786.6666666667, ans=0.125 2023-12-23 16:57:20,053 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.66 vs. limit=10.0 2023-12-23 16:57:23,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1237786.6666666667, ans=0.1 2023-12-23 16:57:27,607 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.279e+01 3.606e+01 3.763e+01 4.003e+01 4.650e+01, threshold=7.525e+01, percent-clipped=0.0 2023-12-23 16:57:30,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1237853.3333333333, ans=0.0 2023-12-23 16:57:34,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1237853.3333333333, ans=0.04949747468305833 2023-12-23 16:57:47,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1237986.6666666667, ans=0.0 2023-12-23 16:57:56,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1238053.3333333333, ans=0.09899494936611666 2023-12-23 16:57:56,798 INFO [train.py:886] (0/4) Epoch 39, batch 4600, loss[loss=0.01053, audio_tagging_loss=0.01053, over 25000.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4954911.61 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:58:02,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1238053.3333333333, ans=0.0 2023-12-23 16:58:10,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1238120.0, ans=0.125 2023-12-23 16:58:16,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1238186.6666666667, ans=0.125 2023-12-23 16:58:23,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1238186.6666666667, ans=0.95 2023-12-23 16:58:24,239 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.57 vs. limit=10.0 2023-12-23 16:58:24,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1238186.6666666667, ans=0.1 2023-12-23 16:58:36,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1238253.3333333333, ans=6.0 2023-12-23 16:58:48,795 INFO [train.py:886] (0/4) Epoch 39, batch 4650, loss[loss=0.01488, audio_tagging_loss=0.01488, over 24938.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4957716.67 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:59:02,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1238453.3333333333, ans=0.125 2023-12-23 16:59:04,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.48 vs. limit=15.0 2023-12-23 16:59:09,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.50 vs. limit=15.0 2023-12-23 16:59:11,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2023-12-23 16:59:11,643 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.377e+01 3.683e+01 3.896e+01 4.120e+01 5.056e+01, threshold=7.792e+01, percent-clipped=0.0 2023-12-23 16:59:26,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1238586.6666666667, ans=0.125 2023-12-23 16:59:40,161 INFO [train.py:886] (0/4) Epoch 39, batch 4700, loss[loss=0.01316, audio_tagging_loss=0.01316, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4948380.64 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:59:42,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1238720.0, ans=0.1 2023-12-23 17:00:05,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1238853.3333333333, ans=0.125 2023-12-23 17:00:05,909 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.54 vs. limit=22.5 2023-12-23 17:00:07,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1238920.0, ans=0.0 2023-12-23 17:00:07,624 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2023-12-23 17:00:27,130 INFO [train.py:886] (0/4) Epoch 39, batch 4750, loss[loss=0.00957, audio_tagging_loss=0.00957, over 24750.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4946032.52 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 17:00:28,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1239053.3333333333, ans=0.2 2023-12-23 17:00:34,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1239053.3333333333, ans=0.125 2023-12-23 17:00:36,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1239120.0, ans=0.1 2023-12-23 17:00:40,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1239120.0, ans=0.1 2023-12-23 17:00:42,595 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-39.pt 2023-12-23 17:01:01,906 INFO [train.py:886] (0/4) Epoch 40, batch 0, loss[loss=0.02824, audio_tagging_loss=0.02824, over 21349.00 frames. ], tot_loss[loss=0.02824, audio_tagging_loss=0.02824, over 21349.00 frames. ], batch size: 107, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:01:01,908 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 17:01:23,300 INFO [train.py:917] (0/4) Epoch 40, validation: loss=0.03439, audio_tagging_loss=0.03439, over 3737520.00 frames. 2023-12-23 17:01:23,300 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 17:01:25,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1239160.0, ans=0.1 2023-12-23 17:01:28,897 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.047e+01 3.718e+01 3.892e+01 4.077e+01 1.138e+02, threshold=7.784e+01, percent-clipped=4.0 2023-12-23 17:01:41,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1239226.6666666667, ans=0.2 2023-12-23 17:01:43,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.65 vs. limit=12.0 2023-12-23 17:01:48,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1239293.3333333333, ans=0.125 2023-12-23 17:02:14,204 INFO [train.py:886] (0/4) Epoch 40, batch 50, loss[loss=0.01542, audio_tagging_loss=0.01542, over 25000.00 frames. ], tot_loss[loss=0.01846, audio_tagging_loss=0.01846, over 1116669.71 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:02:19,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1239493.3333333333, ans=0.125 2023-12-23 17:02:23,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1239560.0, ans=0.2 2023-12-23 17:02:38,662 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2023-12-23 17:02:43,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1239626.6666666667, ans=0.0 2023-12-23 17:03:06,254 INFO [train.py:886] (0/4) Epoch 40, batch 100, loss[loss=0.01152, audio_tagging_loss=0.01152, over 25000.00 frames. ], tot_loss[loss=0.01602, audio_tagging_loss=0.01602, over 1971703.92 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:03:07,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.14 vs. limit=15.0 2023-12-23 17:03:11,823 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.790e+01 4.300e+01 4.589e+01 5.007e+01 8.087e+01, threshold=9.178e+01, percent-clipped=4.0 2023-12-23 17:03:13,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1239826.6666666667, ans=0.125 2023-12-23 17:03:13,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1239826.6666666667, ans=0.125 2023-12-23 17:03:21,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1239893.3333333333, ans=0.125 2023-12-23 17:03:43,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.10 vs. limit=22.5 2023-12-23 17:03:50,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1240093.3333333333, ans=0.09899494936611666 2023-12-23 17:03:54,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1240093.3333333333, ans=0.0 2023-12-23 17:03:56,507 INFO [train.py:886] (0/4) Epoch 40, batch 150, loss[loss=0.01265, audio_tagging_loss=0.01265, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 2638202.49 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:04:02,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1240160.0, ans=0.1 2023-12-23 17:04:16,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2023-12-23 17:04:27,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1240360.0, ans=0.0 2023-12-23 17:04:30,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1240360.0, ans=0.2 2023-12-23 17:04:32,493 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:04:48,683 INFO [train.py:886] (0/4) Epoch 40, batch 200, loss[loss=0.01056, audio_tagging_loss=0.01056, over 24750.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 3153855.40 frames. ], batch size: 99, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:04:55,102 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.318e+01 3.719e+01 3.873e+01 4.042e+01 6.291e+01, threshold=7.746e+01, percent-clipped=0.0 2023-12-23 17:04:56,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1240493.3333333333, ans=0.07 2023-12-23 17:04:59,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1240560.0, ans=0.0 2023-12-23 17:05:04,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1240560.0, ans=0.125 2023-12-23 17:05:07,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1240626.6666666667, ans=0.125 2023-12-23 17:05:12,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1240626.6666666667, ans=0.0 2023-12-23 17:05:19,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.18 vs. limit=6.0 2023-12-23 17:05:35,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1240760.0, ans=0.125 2023-12-23 17:05:36,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1240760.0, ans=0.125 2023-12-23 17:05:39,533 INFO [train.py:886] (0/4) Epoch 40, batch 250, loss[loss=0.01202, audio_tagging_loss=0.01202, over 25000.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 3553178.11 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:05:43,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.51 vs. limit=10.0 2023-12-23 17:05:44,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.92 vs. limit=15.0 2023-12-23 17:05:46,307 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.38 vs. limit=5.0 2023-12-23 17:05:52,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1240893.3333333333, ans=0.125 2023-12-23 17:05:56,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1240893.3333333333, ans=0.1 2023-12-23 17:06:00,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-12-23 17:06:15,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1241026.6666666667, ans=0.0 2023-12-23 17:06:26,084 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:06:30,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1241093.3333333333, ans=0.1 2023-12-23 17:06:32,185 INFO [train.py:886] (0/4) Epoch 40, batch 300, loss[loss=0.01364, audio_tagging_loss=0.01364, over 24750.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 3852623.67 frames. ], batch size: 99, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:06:35,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1241160.0, ans=0.1 2023-12-23 17:06:37,791 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.364e+01 3.703e+01 3.886e+01 3.999e+01 4.717e+01, threshold=7.771e+01, percent-clipped=0.0 2023-12-23 17:06:39,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1241160.0, ans=0.1 2023-12-23 17:06:49,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1241226.6666666667, ans=0.0 2023-12-23 17:07:17,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2023-12-23 17:07:23,924 INFO [train.py:886] (0/4) Epoch 40, batch 350, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4093133.66 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:07:36,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1241560.0, ans=0.125 2023-12-23 17:07:41,588 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.020e-02 2023-12-23 17:07:47,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1241626.6666666667, ans=0.125 2023-12-23 17:07:54,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1241693.3333333333, ans=0.0 2023-12-23 17:07:54,312 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:08:03,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1241693.3333333333, ans=0.0 2023-12-23 17:08:14,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1241826.6666666667, ans=0.0 2023-12-23 17:08:15,528 INFO [train.py:886] (0/4) Epoch 40, batch 400, loss[loss=0.01204, audio_tagging_loss=0.01204, over 24750.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4278881.56 frames. ], batch size: 99, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:08:15,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1241826.6666666667, ans=0.2 2023-12-23 17:08:19,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1241826.6666666667, ans=0.125 2023-12-23 17:08:21,793 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.642e+01 3.774e+01 3.991e+01 4.784e+01, threshold=7.549e+01, percent-clipped=0.0 2023-12-23 17:08:29,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1241893.3333333333, ans=0.04949747468305833 2023-12-23 17:08:47,835 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:08:54,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2023-12-23 17:09:06,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=12.0 2023-12-23 17:09:08,241 INFO [train.py:886] (0/4) Epoch 40, batch 450, loss[loss=0.01108, audio_tagging_loss=0.01108, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4425070.94 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:09:34,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1242293.3333333333, ans=0.1 2023-12-23 17:09:36,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1242293.3333333333, ans=0.0 2023-12-23 17:09:55,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1242426.6666666667, ans=0.125 2023-12-23 17:09:56,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=1242426.6666666667, ans=12.0 2023-12-23 17:09:58,577 INFO [train.py:886] (0/4) Epoch 40, batch 500, loss[loss=0.01194, audio_tagging_loss=0.01194, over 25000.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4545128.62 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:10:04,902 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.282e+01 3.617e+01 3.778e+01 3.928e+01 4.794e+01, threshold=7.557e+01, percent-clipped=0.0 2023-12-23 17:10:09,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1242560.0, ans=0.1 2023-12-23 17:10:18,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1242626.6666666667, ans=0.125 2023-12-23 17:10:20,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1242626.6666666667, ans=0.125 2023-12-23 17:10:21,438 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.87 vs. limit=22.5 2023-12-23 17:10:25,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1242626.6666666667, ans=0.125 2023-12-23 17:10:27,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1242626.6666666667, ans=0.125 2023-12-23 17:10:28,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1242693.3333333333, ans=0.125 2023-12-23 17:10:44,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1242760.0, ans=0.1 2023-12-23 17:10:50,637 INFO [train.py:886] (0/4) Epoch 40, batch 550, loss[loss=0.01098, audio_tagging_loss=0.01098, over 25000.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4638231.04 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:10:51,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1242826.6666666667, ans=0.125 2023-12-23 17:10:54,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1242826.6666666667, ans=0.125 2023-12-23 17:11:05,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1242893.3333333333, ans=0.125 2023-12-23 17:11:26,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1243026.6666666667, ans=0.125 2023-12-23 17:11:27,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.59 vs. limit=22.5 2023-12-23 17:11:29,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1243026.6666666667, ans=0.125 2023-12-23 17:11:29,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1243026.6666666667, ans=0.2 2023-12-23 17:11:35,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1243093.3333333333, ans=0.0 2023-12-23 17:11:36,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.69 vs. limit=10.0 2023-12-23 17:11:42,144 INFO [train.py:886] (0/4) Epoch 40, batch 600, loss[loss=0.01111, audio_tagging_loss=0.01111, over 24750.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4711065.49 frames. ], batch size: 99, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:11:47,801 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.374e+01 3.661e+01 3.804e+01 3.984e+01 4.384e+01, threshold=7.608e+01, percent-clipped=0.0 2023-12-23 17:11:52,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1243226.6666666667, ans=0.125 2023-12-23 17:12:09,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.10 vs. limit=10.0 2023-12-23 17:12:34,239 INFO [train.py:886] (0/4) Epoch 40, batch 650, loss[loss=0.01183, audio_tagging_loss=0.01183, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4760451.53 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:12:34,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1243493.3333333333, ans=0.1 2023-12-23 17:12:36,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1243493.3333333333, ans=0.1 2023-12-23 17:12:47,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1243560.0, ans=0.125 2023-12-23 17:13:01,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1243626.6666666667, ans=0.0 2023-12-23 17:13:06,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1243693.3333333333, ans=0.1 2023-12-23 17:13:23,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=12.0 2023-12-23 17:13:25,776 INFO [train.py:886] (0/4) Epoch 40, batch 700, loss[loss=0.01171, audio_tagging_loss=0.01171, over 25000.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4801629.15 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:13:32,120 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.657e+01 3.830e+01 4.035e+01 4.623e+01, threshold=7.660e+01, percent-clipped=0.0 2023-12-23 17:14:01,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1244026.6666666667, ans=0.1 2023-12-23 17:14:18,462 INFO [train.py:886] (0/4) Epoch 40, batch 750, loss[loss=0.0102, audio_tagging_loss=0.0102, over 24750.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4838862.93 frames. ], batch size: 99, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:14:29,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1244226.6666666667, ans=0.035 2023-12-23 17:14:30,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1244226.6666666667, ans=0.125 2023-12-23 17:14:42,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1244293.3333333333, ans=0.95 2023-12-23 17:14:42,453 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.39 vs. limit=15.0 2023-12-23 17:14:51,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1244360.0, ans=0.125 2023-12-23 17:14:52,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1244360.0, ans=0.2 2023-12-23 17:15:01,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1244426.6666666667, ans=0.0 2023-12-23 17:15:09,545 INFO [train.py:886] (0/4) Epoch 40, batch 800, loss[loss=0.01153, audio_tagging_loss=0.01153, over 25000.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4869342.83 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:15:16,529 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.149e+01 3.626e+01 3.799e+01 3.971e+01 4.663e+01, threshold=7.598e+01, percent-clipped=0.0 2023-12-23 17:15:26,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1244560.0, ans=0.125 2023-12-23 17:15:34,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1244626.6666666667, ans=0.2 2023-12-23 17:15:42,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2023-12-23 17:15:43,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1244693.3333333333, ans=0.0 2023-12-23 17:15:53,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.34 vs. limit=10.0 2023-12-23 17:15:59,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1244760.0, ans=0.0 2023-12-23 17:16:01,959 INFO [train.py:886] (0/4) Epoch 40, batch 850, loss[loss=0.01382, audio_tagging_loss=0.01382, over 25000.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4890274.66 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:16:10,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.11 vs. limit=15.0 2023-12-23 17:16:11,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1244893.3333333333, ans=22.5 2023-12-23 17:16:12,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2023-12-23 17:16:15,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1244893.3333333333, ans=0.125 2023-12-23 17:16:29,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1244960.0, ans=0.125 2023-12-23 17:16:54,244 INFO [train.py:886] (0/4) Epoch 40, batch 900, loss[loss=0.008333, audio_tagging_loss=0.008333, over 21397.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4901601.22 frames. ], batch size: 107, lr: 2.69e-03, grad_scale: 32.0 2023-12-23 17:17:00,644 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.293e+01 3.650e+01 3.794e+01 3.949e+01 4.349e+01, threshold=7.587e+01, percent-clipped=0.0 2023-12-23 17:17:01,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1245160.0, ans=0.125 2023-12-23 17:17:06,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1245226.6666666667, ans=0.125 2023-12-23 17:17:10,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1245226.6666666667, ans=0.125 2023-12-23 17:17:16,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1245293.3333333333, ans=0.035 2023-12-23 17:17:46,218 INFO [train.py:886] (0/4) Epoch 40, batch 950, loss[loss=0.01094, audio_tagging_loss=0.01094, over 24750.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4907647.01 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 32.0 2023-12-23 17:17:52,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1245493.3333333333, ans=0.125 2023-12-23 17:18:07,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1245626.6666666667, ans=0.125 2023-12-23 17:18:35,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1245760.0, ans=0.5 2023-12-23 17:18:37,900 INFO [train.py:886] (0/4) Epoch 40, batch 1000, loss[loss=0.01159, audio_tagging_loss=0.01159, over 24750.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4913853.44 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 32.0 2023-12-23 17:18:44,299 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.312e+01 3.606e+01 3.769e+01 4.019e+01 4.543e+01, threshold=7.537e+01, percent-clipped=0.0 2023-12-23 17:18:57,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1245960.0, ans=0.125 2023-12-23 17:19:01,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1245960.0, ans=0.04949747468305833 2023-12-23 17:19:01,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1245960.0, ans=0.0 2023-12-23 17:19:16,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2023-12-23 17:19:28,903 INFO [train.py:886] (0/4) Epoch 40, batch 1050, loss[loss=0.008975, audio_tagging_loss=0.008975, over 24750.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4922788.56 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:19:47,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2023-12-23 17:19:53,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1246293.3333333333, ans=0.0 2023-12-23 17:19:59,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1246360.0, ans=0.0 2023-12-23 17:20:20,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1246493.3333333333, ans=0.125 2023-12-23 17:20:21,897 INFO [train.py:886] (0/4) Epoch 40, batch 1100, loss[loss=0.01069, audio_tagging_loss=0.01069, over 25000.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4930491.24 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:20:23,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1246493.3333333333, ans=0.0 2023-12-23 17:20:25,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.61 vs. limit=10.0 2023-12-23 17:20:27,710 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.272e+01 3.694e+01 3.833e+01 4.003e+01 4.303e+01, threshold=7.667e+01, percent-clipped=0.0 2023-12-23 17:20:32,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1246560.0, ans=0.1 2023-12-23 17:20:45,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1246626.6666666667, ans=0.0 2023-12-23 17:20:59,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1246693.3333333333, ans=0.125 2023-12-23 17:21:03,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1246760.0, ans=0.04949747468305833 2023-12-23 17:21:07,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1246760.0, ans=0.125 2023-12-23 17:21:14,029 INFO [train.py:886] (0/4) Epoch 40, batch 1150, loss[loss=0.01192, audio_tagging_loss=0.01192, over 25000.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4939246.08 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:21:15,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1246826.6666666667, ans=15.0 2023-12-23 17:21:51,132 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.00 vs. limit=22.5 2023-12-23 17:21:54,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1247093.3333333333, ans=0.125 2023-12-23 17:21:59,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1247093.3333333333, ans=0.0 2023-12-23 17:22:05,586 INFO [train.py:886] (0/4) Epoch 40, batch 1200, loss[loss=0.01184, audio_tagging_loss=0.01184, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4946771.83 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:22:11,282 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.252e+01 3.680e+01 3.818e+01 4.022e+01 4.894e+01, threshold=7.635e+01, percent-clipped=0.0 2023-12-23 17:22:18,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1247226.6666666667, ans=0.0 2023-12-23 17:22:22,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1247226.6666666667, ans=0.0 2023-12-23 17:22:27,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2023-12-23 17:22:28,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1247293.3333333333, ans=0.025 2023-12-23 17:22:31,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.65 vs. limit=22.5 2023-12-23 17:22:41,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1247360.0, ans=0.125 2023-12-23 17:22:43,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1247360.0, ans=0.0 2023-12-23 17:22:44,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1247426.6666666667, ans=0.125 2023-12-23 17:22:53,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1247426.6666666667, ans=0.0 2023-12-23 17:22:57,004 INFO [train.py:886] (0/4) Epoch 40, batch 1250, loss[loss=0.01276, audio_tagging_loss=0.01276, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4943684.01 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:23:24,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1247626.6666666667, ans=0.125 2023-12-23 17:23:45,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1247760.0, ans=0.1 2023-12-23 17:23:48,809 INFO [train.py:886] (0/4) Epoch 40, batch 1300, loss[loss=0.01389, audio_tagging_loss=0.01389, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4936881.06 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:23:54,414 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.303e+01 3.637e+01 3.803e+01 3.958e+01 5.134e+01, threshold=7.605e+01, percent-clipped=0.0 2023-12-23 17:24:00,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1247893.3333333333, ans=0.05 2023-12-23 17:24:39,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1248093.3333333333, ans=0.2 2023-12-23 17:24:41,338 INFO [train.py:886] (0/4) Epoch 40, batch 1350, loss[loss=0.01027, audio_tagging_loss=0.01027, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4937741.52 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:24:48,601 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.16 vs. limit=10.0 2023-12-23 17:25:15,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1248360.0, ans=0.0 2023-12-23 17:25:32,071 INFO [train.py:886] (0/4) Epoch 40, batch 1400, loss[loss=0.01091, audio_tagging_loss=0.01091, over 25000.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4934010.85 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:25:34,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2023-12-23 17:25:39,078 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.177e+01 3.609e+01 3.720e+01 3.896e+01 4.432e+01, threshold=7.440e+01, percent-clipped=0.0 2023-12-23 17:26:06,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1248693.3333333333, ans=10.0 2023-12-23 17:26:06,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=15.0 2023-12-23 17:26:12,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1248693.3333333333, ans=0.0 2023-12-23 17:26:24,058 INFO [train.py:886] (0/4) Epoch 40, batch 1450, loss[loss=0.01172, audio_tagging_loss=0.01172, over 25000.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4937526.96 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:26:38,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1248893.3333333333, ans=0.125 2023-12-23 17:26:39,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-12-23 17:26:41,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1248893.3333333333, ans=0.2 2023-12-23 17:26:55,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1249026.6666666667, ans=0.125 2023-12-23 17:26:57,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2023-12-23 17:26:58,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1249026.6666666667, ans=0.1 2023-12-23 17:27:15,479 INFO [train.py:886] (0/4) Epoch 40, batch 1500, loss[loss=0.0121, audio_tagging_loss=0.0121, over 21669.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4940213.25 frames. ], batch size: 107, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:27:20,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1249160.0, ans=0.125 2023-12-23 17:27:22,484 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.173e+01 3.651e+01 3.787e+01 4.002e+01 4.522e+01, threshold=7.575e+01, percent-clipped=0.0 2023-12-23 17:27:28,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1249226.6666666667, ans=0.1 2023-12-23 17:27:35,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1249293.3333333333, ans=0.125 2023-12-23 17:27:38,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1249293.3333333333, ans=0.125 2023-12-23 17:27:44,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1249293.3333333333, ans=0.125 2023-12-23 17:27:49,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1249360.0, ans=0.0 2023-12-23 17:28:00,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=1249426.6666666667, ans=0.02 2023-12-23 17:28:08,006 INFO [train.py:886] (0/4) Epoch 40, batch 1550, loss[loss=0.009963, audio_tagging_loss=0.009963, over 24750.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4944224.05 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:28:13,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1249493.3333333333, ans=0.125 2023-12-23 17:28:44,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1249693.3333333333, ans=0.125 2023-12-23 17:28:47,121 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:28:59,830 INFO [train.py:886] (0/4) Epoch 40, batch 1600, loss[loss=0.01143, audio_tagging_loss=0.01143, over 24750.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4942073.47 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:29:03,349 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.92 vs. limit=22.5 2023-12-23 17:29:04,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1249826.6666666667, ans=0.125 2023-12-23 17:29:05,486 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.337e+01 3.668e+01 3.855e+01 4.043e+01 4.586e+01, threshold=7.710e+01, percent-clipped=0.0 2023-12-23 17:29:05,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1249826.6666666667, ans=0.0 2023-12-23 17:29:05,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1249826.6666666667, ans=0.0 2023-12-23 17:29:14,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=12.0 2023-12-23 17:29:28,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1249960.0, ans=0.125 2023-12-23 17:29:33,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1250026.6666666667, ans=0.125 2023-12-23 17:29:36,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1250026.6666666667, ans=0.2 2023-12-23 17:29:38,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1250026.6666666667, ans=0.1 2023-12-23 17:29:42,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.62 vs. limit=15.0 2023-12-23 17:29:48,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1250093.3333333333, ans=0.2 2023-12-23 17:29:51,687 INFO [train.py:886] (0/4) Epoch 40, batch 1650, loss[loss=0.01075, audio_tagging_loss=0.01075, over 25000.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4942937.44 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:29:54,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1250160.0, ans=0.0 2023-12-23 17:30:15,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1250293.3333333333, ans=0.125 2023-12-23 17:30:18,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1250293.3333333333, ans=0.0 2023-12-23 17:30:27,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1250360.0, ans=6.0 2023-12-23 17:30:42,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1250426.6666666667, ans=0.5 2023-12-23 17:30:43,758 INFO [train.py:886] (0/4) Epoch 40, batch 1700, loss[loss=0.009868, audio_tagging_loss=0.009868, over 25000.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4941438.60 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:30:46,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.01 vs. limit=15.0 2023-12-23 17:30:47,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1250493.3333333333, ans=0.07 2023-12-23 17:30:50,034 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.211e+01 3.572e+01 3.767e+01 3.944e+01 4.587e+01, threshold=7.535e+01, percent-clipped=0.0 2023-12-23 17:30:53,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1250560.0, ans=0.2 2023-12-23 17:31:34,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=12.0 2023-12-23 17:31:36,368 INFO [train.py:886] (0/4) Epoch 40, batch 1750, loss[loss=0.01129, audio_tagging_loss=0.01129, over 24750.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4948064.34 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:31:43,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1250826.6666666667, ans=0.0 2023-12-23 17:31:48,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1250893.3333333333, ans=0.2 2023-12-23 17:31:52,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1250893.3333333333, ans=0.1 2023-12-23 17:31:55,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1250893.3333333333, ans=0.125 2023-12-23 17:31:57,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1250960.0, ans=0.125 2023-12-23 17:32:00,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1250960.0, ans=0.125 2023-12-23 17:32:03,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1250960.0, ans=0.1 2023-12-23 17:32:04,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1250960.0, ans=0.125 2023-12-23 17:32:11,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1251026.6666666667, ans=0.125 2023-12-23 17:32:14,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1251026.6666666667, ans=0.125 2023-12-23 17:32:21,841 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:32:28,101 INFO [train.py:886] (0/4) Epoch 40, batch 1800, loss[loss=0.01311, audio_tagging_loss=0.01311, over 25000.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4952029.66 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:32:34,596 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.246e+01 3.649e+01 3.797e+01 4.032e+01 4.855e+01, threshold=7.595e+01, percent-clipped=0.0 2023-12-23 17:32:44,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1251226.6666666667, ans=0.1 2023-12-23 17:32:44,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1251226.6666666667, ans=0.1 2023-12-23 17:32:55,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1251293.3333333333, ans=0.2 2023-12-23 17:33:01,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.45 vs. limit=10.0 2023-12-23 17:33:10,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1251426.6666666667, ans=0.1 2023-12-23 17:33:20,809 INFO [train.py:886] (0/4) Epoch 40, batch 1850, loss[loss=0.01057, audio_tagging_loss=0.01057, over 24750.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4950944.25 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:33:30,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1251560.0, ans=0.1 2023-12-23 17:33:45,877 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:34:01,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1251760.0, ans=0.125 2023-12-23 17:34:01,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1251760.0, ans=0.0 2023-12-23 17:34:12,190 INFO [train.py:886] (0/4) Epoch 40, batch 1900, loss[loss=0.01172, audio_tagging_loss=0.01172, over 24750.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4943226.50 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:34:12,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1251826.6666666667, ans=0.125 2023-12-23 17:34:15,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1251826.6666666667, ans=0.0 2023-12-23 17:34:18,070 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=22.5 2023-12-23 17:34:18,621 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.248e+01 3.703e+01 3.895e+01 4.075e+01 4.598e+01, threshold=7.791e+01, percent-clipped=0.0 2023-12-23 17:34:19,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1251826.6666666667, ans=0.125 2023-12-23 17:34:20,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1251826.6666666667, ans=0.1 2023-12-23 17:34:26,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1251893.3333333333, ans=0.125 2023-12-23 17:34:33,775 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2023-12-23 17:34:37,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1251960.0, ans=0.07 2023-12-23 17:34:44,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1252026.6666666667, ans=0.0 2023-12-23 17:34:46,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=12.0 2023-12-23 17:34:48,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.61 vs. limit=15.0 2023-12-23 17:34:55,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1252093.3333333333, ans=0.0 2023-12-23 17:34:57,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1252093.3333333333, ans=0.0 2023-12-23 17:34:58,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2023-12-23 17:35:01,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.10 vs. limit=10.0 2023-12-23 17:35:04,804 INFO [train.py:886] (0/4) Epoch 40, batch 1950, loss[loss=0.01298, audio_tagging_loss=0.01298, over 25000.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4942668.50 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:35:05,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1252160.0, ans=0.0 2023-12-23 17:35:17,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1252226.6666666667, ans=0.0 2023-12-23 17:35:24,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1252293.3333333333, ans=0.07 2023-12-23 17:35:32,972 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-12-23 17:35:56,379 INFO [train.py:886] (0/4) Epoch 40, batch 2000, loss[loss=0.01088, audio_tagging_loss=0.01088, over 24750.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4940157.46 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:35:56,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1252493.3333333333, ans=0.0 2023-12-23 17:35:59,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1252493.3333333333, ans=0.125 2023-12-23 17:36:02,093 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.938e+01 3.596e+01 3.830e+01 3.994e+01 4.617e+01, threshold=7.661e+01, percent-clipped=0.0 2023-12-23 17:36:17,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-12-23 17:36:29,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1252693.3333333333, ans=0.125 2023-12-23 17:36:38,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1252760.0, ans=0.2 2023-12-23 17:36:40,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2023-12-23 17:36:48,984 INFO [train.py:886] (0/4) Epoch 40, batch 2050, loss[loss=0.008979, audio_tagging_loss=0.008979, over 24040.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4943948.30 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:36:49,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2023-12-23 17:37:07,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1252960.0, ans=0.125 2023-12-23 17:37:39,745 INFO [train.py:886] (0/4) Epoch 40, batch 2100, loss[loss=0.009819, audio_tagging_loss=0.009819, over 25000.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4953623.50 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:37:44,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.41 vs. limit=22.5 2023-12-23 17:37:46,112 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.253e+01 3.675e+01 3.853e+01 3.940e+01 4.464e+01, threshold=7.707e+01, percent-clipped=0.0 2023-12-23 17:37:49,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1253226.6666666667, ans=0.1 2023-12-23 17:37:59,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1253293.3333333333, ans=0.1 2023-12-23 17:38:05,975 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-188000.pt 2023-12-23 17:38:23,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1253426.6666666667, ans=0.125 2023-12-23 17:38:26,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1253426.6666666667, ans=0.2 2023-12-23 17:38:28,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1253426.6666666667, ans=0.1 2023-12-23 17:38:31,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1253426.6666666667, ans=0.0 2023-12-23 17:38:32,978 INFO [train.py:886] (0/4) Epoch 40, batch 2150, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4955951.78 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:38:33,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1253493.3333333333, ans=0.1 2023-12-23 17:38:41,551 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2023-12-23 17:38:53,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1253626.6666666667, ans=0.125 2023-12-23 17:38:54,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1253626.6666666667, ans=0.125 2023-12-23 17:38:58,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1253626.6666666667, ans=0.2 2023-12-23 17:38:59,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1253626.6666666667, ans=0.035 2023-12-23 17:39:24,478 INFO [train.py:886] (0/4) Epoch 40, batch 2200, loss[loss=0.01046, audio_tagging_loss=0.01046, over 24750.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4954014.45 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:39:30,823 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.315e+01 3.640e+01 3.872e+01 4.053e+01 7.102e+01, threshold=7.744e+01, percent-clipped=0.0 2023-12-23 17:39:32,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1253826.6666666667, ans=0.1 2023-12-23 17:39:37,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1253893.3333333333, ans=0.125 2023-12-23 17:39:57,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1254026.6666666667, ans=0.0 2023-12-23 17:40:15,869 INFO [train.py:886] (0/4) Epoch 40, batch 2250, loss[loss=0.01072, audio_tagging_loss=0.01072, over 24750.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4954618.10 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:40:17,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1254160.0, ans=0.0 2023-12-23 17:40:22,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1254160.0, ans=0.125 2023-12-23 17:40:26,310 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:40:36,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1254293.3333333333, ans=0.95 2023-12-23 17:40:38,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.54 vs. limit=22.5 2023-12-23 17:40:50,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1254360.0, ans=0.04949747468305833 2023-12-23 17:40:51,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1254360.0, ans=0.0 2023-12-23 17:41:06,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1254426.6666666667, ans=0.0 2023-12-23 17:41:08,086 INFO [train.py:886] (0/4) Epoch 40, batch 2300, loss[loss=0.009172, audio_tagging_loss=0.009172, over 25000.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4956576.77 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:41:13,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1254493.3333333333, ans=0.125 2023-12-23 17:41:13,762 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.326e+01 3.633e+01 3.746e+01 3.973e+01 4.571e+01, threshold=7.491e+01, percent-clipped=0.0 2023-12-23 17:41:16,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1254493.3333333333, ans=0.1 2023-12-23 17:41:18,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1254560.0, ans=0.0 2023-12-23 17:41:22,479 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2023-12-23 17:41:23,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1254560.0, ans=0.125 2023-12-23 17:41:25,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1254560.0, ans=0.0 2023-12-23 17:41:26,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1254560.0, ans=0.125 2023-12-23 17:41:29,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1254626.6666666667, ans=0.2 2023-12-23 17:41:33,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2023-12-23 17:41:44,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1254693.3333333333, ans=0.0 2023-12-23 17:41:59,038 INFO [train.py:886] (0/4) Epoch 40, batch 2350, loss[loss=0.009533, audio_tagging_loss=0.009533, over 25000.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4953082.06 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:41:59,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2023-12-23 17:42:09,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1254893.3333333333, ans=0.0 2023-12-23 17:42:12,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1254893.3333333333, ans=0.125 2023-12-23 17:42:17,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1254893.3333333333, ans=0.0 2023-12-23 17:42:29,871 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.17 vs. limit=10.0 2023-12-23 17:42:30,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1255026.6666666667, ans=0.0 2023-12-23 17:42:32,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1255026.6666666667, ans=0.1 2023-12-23 17:42:34,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2023-12-23 17:42:51,665 INFO [train.py:886] (0/4) Epoch 40, batch 2400, loss[loss=0.01527, audio_tagging_loss=0.01527, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4955141.40 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:42:57,944 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.192e+01 3.652e+01 3.798e+01 3.956e+01 4.622e+01, threshold=7.596e+01, percent-clipped=0.0 2023-12-23 17:42:59,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1255160.0, ans=0.0 2023-12-23 17:43:04,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1255226.6666666667, ans=15.0 2023-12-23 17:43:09,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1255226.6666666667, ans=0.125 2023-12-23 17:43:32,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=12.0 2023-12-23 17:43:36,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1255426.6666666667, ans=0.125 2023-12-23 17:43:43,021 INFO [train.py:886] (0/4) Epoch 40, batch 2450, loss[loss=0.01081, audio_tagging_loss=0.01081, over 25000.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4956087.40 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:43:52,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1255493.3333333333, ans=0.1 2023-12-23 17:43:54,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.92 vs. limit=15.0 2023-12-23 17:43:55,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1255560.0, ans=0.0 2023-12-23 17:43:56,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1255560.0, ans=0.0 2023-12-23 17:44:09,909 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2023-12-23 17:44:22,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1255693.3333333333, ans=0.125 2023-12-23 17:44:24,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1255760.0, ans=0.0 2023-12-23 17:44:34,761 INFO [train.py:886] (0/4) Epoch 40, batch 2500, loss[loss=0.01203, audio_tagging_loss=0.01203, over 24750.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4951803.59 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:44:40,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=22.5 2023-12-23 17:44:40,515 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.292e+01 3.682e+01 3.858e+01 3.997e+01 4.509e+01, threshold=7.716e+01, percent-clipped=0.0 2023-12-23 17:44:41,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1255826.6666666667, ans=0.2 2023-12-23 17:44:41,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1255826.6666666667, ans=0.0 2023-12-23 17:44:43,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=22.5 2023-12-23 17:45:01,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1255960.0, ans=0.0 2023-12-23 17:45:26,454 INFO [train.py:886] (0/4) Epoch 40, batch 2550, loss[loss=0.01382, audio_tagging_loss=0.01382, over 25000.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4949304.33 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:45:38,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1256226.6666666667, ans=0.2 2023-12-23 17:45:47,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1256293.3333333333, ans=0.1 2023-12-23 17:45:54,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1256293.3333333333, ans=0.125 2023-12-23 17:45:55,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=22.5 2023-12-23 17:46:07,234 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:46:17,274 INFO [train.py:886] (0/4) Epoch 40, batch 2600, loss[loss=0.01112, audio_tagging_loss=0.01112, over 25000.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4945100.67 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:46:22,944 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.345e+01 3.708e+01 3.865e+01 4.030e+01 5.247e+01, threshold=7.731e+01, percent-clipped=0.0 2023-12-23 17:46:33,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1256560.0, ans=0.0 2023-12-23 17:46:59,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2023-12-23 17:47:07,395 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-12-23 17:47:09,734 INFO [train.py:886] (0/4) Epoch 40, batch 2650, loss[loss=0.01126, audio_tagging_loss=0.01126, over 25000.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4941161.14 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:47:35,862 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:47:36,938 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.93 vs. limit=22.5 2023-12-23 17:47:47,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1257026.6666666667, ans=0.125 2023-12-23 17:47:54,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1257093.3333333333, ans=0.0 2023-12-23 17:48:00,531 INFO [train.py:886] (0/4) Epoch 40, batch 2700, loss[loss=0.0111, audio_tagging_loss=0.0111, over 25000.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4948773.83 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:48:06,813 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.216e+01 3.619e+01 3.803e+01 3.974e+01 4.399e+01, threshold=7.606e+01, percent-clipped=0.0 2023-12-23 17:48:30,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1257360.0, ans=0.125 2023-12-23 17:48:40,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1257360.0, ans=0.125 2023-12-23 17:48:52,686 INFO [train.py:886] (0/4) Epoch 40, batch 2750, loss[loss=0.01266, audio_tagging_loss=0.01266, over 25000.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4956501.89 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:49:07,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1257560.0, ans=10.0 2023-12-23 17:49:21,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1257626.6666666667, ans=0.0 2023-12-23 17:49:43,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1257826.6666666667, ans=0.125 2023-12-23 17:49:44,272 INFO [train.py:886] (0/4) Epoch 40, batch 2800, loss[loss=0.01193, audio_tagging_loss=0.01193, over 24750.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4959409.14 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:49:51,396 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.248e+01 3.697e+01 3.809e+01 4.002e+01 4.614e+01, threshold=7.617e+01, percent-clipped=0.0 2023-12-23 17:50:33,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1258093.3333333333, ans=0.125 2023-12-23 17:50:36,619 INFO [train.py:886] (0/4) Epoch 40, batch 2850, loss[loss=0.01133, audio_tagging_loss=0.01133, over 24750.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4953349.82 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:51:28,918 INFO [train.py:886] (0/4) Epoch 40, batch 2900, loss[loss=0.0143, audio_tagging_loss=0.0143, over 24750.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4952869.66 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:51:31,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1258493.3333333333, ans=0.0 2023-12-23 17:51:34,565 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.415e+01 3.699e+01 3.835e+01 4.007e+01 4.764e+01, threshold=7.669e+01, percent-clipped=0.0 2023-12-23 17:51:42,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1258560.0, ans=0.0 2023-12-23 17:51:44,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2023-12-23 17:51:47,775 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.35 vs. limit=22.5 2023-12-23 17:52:20,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1258826.6666666667, ans=0.125 2023-12-23 17:52:20,884 INFO [train.py:886] (0/4) Epoch 40, batch 2950, loss[loss=0.01065, audio_tagging_loss=0.01065, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4951722.97 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:52:40,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1258960.0, ans=0.125 2023-12-23 17:52:46,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1258960.0, ans=0.2 2023-12-23 17:52:53,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1259026.6666666667, ans=0.125 2023-12-23 17:53:08,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1259093.3333333333, ans=0.035 2023-12-23 17:53:12,745 INFO [train.py:886] (0/4) Epoch 40, batch 3000, loss[loss=0.01183, audio_tagging_loss=0.01183, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4951246.09 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:53:12,747 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 17:53:33,973 INFO [train.py:917] (0/4) Epoch 40, validation: loss=0.03529, audio_tagging_loss=0.03529, over 3737520.00 frames. 2023-12-23 17:53:33,973 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 17:53:35,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.92 vs. limit=8.0 2023-12-23 17:53:39,601 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.171e+01 3.612e+01 3.801e+01 4.054e+01 4.780e+01, threshold=7.602e+01, percent-clipped=0.0 2023-12-23 17:53:40,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1259160.0, ans=0.2 2023-12-23 17:53:44,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=12.0 2023-12-23 17:53:47,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1259226.6666666667, ans=0.09899494936611666 2023-12-23 17:53:50,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1259226.6666666667, ans=0.125 2023-12-23 17:53:59,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=15.0 2023-12-23 17:54:07,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1259360.0, ans=0.0 2023-12-23 17:54:17,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1259426.6666666667, ans=0.125 2023-12-23 17:54:19,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.74 vs. limit=22.5 2023-12-23 17:54:25,429 INFO [train.py:886] (0/4) Epoch 40, batch 3050, loss[loss=0.01228, audio_tagging_loss=0.01228, over 25000.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4957745.29 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:54:25,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2023-12-23 17:54:50,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1259626.6666666667, ans=0.2 2023-12-23 17:55:07,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1259760.0, ans=0.0 2023-12-23 17:55:16,902 INFO [train.py:886] (0/4) Epoch 40, batch 3100, loss[loss=0.01372, audio_tagging_loss=0.01372, over 24750.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4957384.37 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:55:22,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1259826.6666666667, ans=0.125 2023-12-23 17:55:22,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1259826.6666666667, ans=0.125 2023-12-23 17:55:24,103 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.262e+01 3.674e+01 3.864e+01 4.007e+01 4.526e+01, threshold=7.728e+01, percent-clipped=0.0 2023-12-23 17:55:26,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1259893.3333333333, ans=0.0 2023-12-23 17:56:08,319 INFO [train.py:886] (0/4) Epoch 40, batch 3150, loss[loss=0.01431, audio_tagging_loss=0.01431, over 24750.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4956284.65 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:56:18,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1260226.6666666667, ans=0.025 2023-12-23 17:56:24,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.66 vs. limit=15.0 2023-12-23 17:56:27,026 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.40 vs. limit=15.0 2023-12-23 17:56:42,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1260360.0, ans=0.2 2023-12-23 17:56:59,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1260493.3333333333, ans=0.2 2023-12-23 17:57:00,305 INFO [train.py:886] (0/4) Epoch 40, batch 3200, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4955426.53 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:57:00,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1260493.3333333333, ans=0.1 2023-12-23 17:57:03,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1260493.3333333333, ans=0.125 2023-12-23 17:57:07,586 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.204e+01 3.736e+01 3.854e+01 4.070e+01 4.738e+01, threshold=7.708e+01, percent-clipped=0.0 2023-12-23 17:57:14,555 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2023-12-23 17:57:16,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1260560.0, ans=0.125 2023-12-23 17:57:16,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1260560.0, ans=0.0 2023-12-23 17:57:18,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1260560.0, ans=0.125 2023-12-23 17:57:45,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1260760.0, ans=0.1 2023-12-23 17:57:48,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1260760.0, ans=0.0 2023-12-23 17:57:51,657 INFO [train.py:886] (0/4) Epoch 40, batch 3250, loss[loss=0.01152, audio_tagging_loss=0.01152, over 24750.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4953085.39 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:58:00,332 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2023-12-23 17:58:03,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1260893.3333333333, ans=0.1 2023-12-23 17:58:07,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1260893.3333333333, ans=0.0 2023-12-23 17:58:08,385 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.48 vs. limit=6.0 2023-12-23 17:58:11,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1260960.0, ans=0.0 2023-12-23 17:58:38,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1261093.3333333333, ans=0.125 2023-12-23 17:58:42,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1261093.3333333333, ans=0.0 2023-12-23 17:58:42,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.80 vs. limit=12.0 2023-12-23 17:58:43,812 INFO [train.py:886] (0/4) Epoch 40, batch 3300, loss[loss=0.01204, audio_tagging_loss=0.01204, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4958926.27 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:58:51,268 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.291e+01 3.619e+01 3.846e+01 4.004e+01 5.622e+01, threshold=7.691e+01, percent-clipped=0.0 2023-12-23 17:58:52,747 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-12-23 17:58:55,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1261226.6666666667, ans=0.0 2023-12-23 17:58:58,614 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=12.0 2023-12-23 17:59:00,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1261226.6666666667, ans=0.125 2023-12-23 17:59:06,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1261293.3333333333, ans=0.0 2023-12-23 17:59:27,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1261426.6666666667, ans=0.025 2023-12-23 17:59:35,966 INFO [train.py:886] (0/4) Epoch 40, batch 3350, loss[loss=0.01165, audio_tagging_loss=0.01165, over 24002.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4959949.48 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:59:36,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1261493.3333333333, ans=0.125 2023-12-23 17:59:52,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=15.0 2023-12-23 18:00:18,596 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=27.39 vs. limit=22.5 2023-12-23 18:00:28,517 INFO [train.py:886] (0/4) Epoch 40, batch 3400, loss[loss=0.01226, audio_tagging_loss=0.01226, over 24750.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4960960.38 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:00:35,078 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.304e+01 3.654e+01 3.811e+01 4.007e+01 4.560e+01, threshold=7.622e+01, percent-clipped=0.0 2023-12-23 18:00:42,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1261893.3333333333, ans=0.025 2023-12-23 18:01:14,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1262093.3333333333, ans=0.125 2023-12-23 18:01:20,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.72 vs. limit=12.0 2023-12-23 18:01:20,795 INFO [train.py:886] (0/4) Epoch 40, batch 3450, loss[loss=0.01318, audio_tagging_loss=0.01318, over 24750.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4955832.11 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:01:20,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1262160.0, ans=0.04949747468305833 2023-12-23 18:01:28,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1262160.0, ans=0.125 2023-12-23 18:01:44,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1262293.3333333333, ans=0.0 2023-12-23 18:02:03,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1262426.6666666667, ans=0.2 2023-12-23 18:02:03,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1262426.6666666667, ans=0.125 2023-12-23 18:02:07,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1262426.6666666667, ans=0.125 2023-12-23 18:02:11,099 INFO [train.py:886] (0/4) Epoch 40, batch 3500, loss[loss=0.0131, audio_tagging_loss=0.0131, over 24750.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4949365.23 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:02:11,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1262493.3333333333, ans=0.1 2023-12-23 18:02:16,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1262493.3333333333, ans=0.125 2023-12-23 18:02:18,416 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.273e+01 3.696e+01 3.842e+01 3.983e+01 4.882e+01, threshold=7.684e+01, percent-clipped=0.0 2023-12-23 18:02:38,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1262626.6666666667, ans=0.125 2023-12-23 18:02:46,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1262693.3333333333, ans=0.125 2023-12-23 18:02:48,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1262693.3333333333, ans=0.125 2023-12-23 18:02:59,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.94 vs. limit=10.0 2023-12-23 18:03:04,000 INFO [train.py:886] (0/4) Epoch 40, batch 3550, loss[loss=0.01117, audio_tagging_loss=0.01117, over 24750.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4946814.65 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:03:16,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1262893.3333333333, ans=0.1 2023-12-23 18:03:24,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1262960.0, ans=0.0 2023-12-23 18:03:38,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1263026.6666666667, ans=0.125 2023-12-23 18:03:52,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1263093.3333333333, ans=0.125 2023-12-23 18:03:55,061 INFO [train.py:886] (0/4) Epoch 40, batch 3600, loss[loss=0.01241, audio_tagging_loss=0.01241, over 25000.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4950330.74 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:03:55,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1263160.0, ans=0.0 2023-12-23 18:03:59,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1263160.0, ans=0.125 2023-12-23 18:04:03,185 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.291e+01 3.684e+01 3.812e+01 3.997e+01 4.511e+01, threshold=7.624e+01, percent-clipped=0.0 2023-12-23 18:04:14,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1263226.6666666667, ans=0.125 2023-12-23 18:04:18,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1263293.3333333333, ans=0.05 2023-12-23 18:04:37,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1263426.6666666667, ans=0.0 2023-12-23 18:04:38,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1263426.6666666667, ans=0.125 2023-12-23 18:04:47,307 INFO [train.py:886] (0/4) Epoch 40, batch 3650, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4951755.65 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:04:49,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.97 vs. limit=15.0 2023-12-23 18:04:55,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1263493.3333333333, ans=0.09899494936611666 2023-12-23 18:05:03,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1263560.0, ans=0.125 2023-12-23 18:05:33,701 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.06 vs. limit=22.5 2023-12-23 18:05:38,706 INFO [train.py:886] (0/4) Epoch 40, batch 3700, loss[loss=0.01363, audio_tagging_loss=0.01363, over 25000.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4953992.64 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:05:46,058 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.278e+01 3.659e+01 3.785e+01 3.953e+01 4.581e+01, threshold=7.570e+01, percent-clipped=0.0 2023-12-23 18:05:50,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1263893.3333333333, ans=0.0 2023-12-23 18:05:57,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.65 vs. limit=15.0 2023-12-23 18:05:59,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1263960.0, ans=0.1 2023-12-23 18:06:17,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1264026.6666666667, ans=0.125 2023-12-23 18:06:20,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1264093.3333333333, ans=0.125 2023-12-23 18:06:25,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1264093.3333333333, ans=0.5 2023-12-23 18:06:26,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1264093.3333333333, ans=0.0 2023-12-23 18:06:27,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1264093.3333333333, ans=0.2 2023-12-23 18:06:30,151 INFO [train.py:886] (0/4) Epoch 40, batch 3750, loss[loss=0.01384, audio_tagging_loss=0.01384, over 24750.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4947742.32 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:06:40,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=1264226.6666666667, ans=0.1 2023-12-23 18:06:48,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1264226.6666666667, ans=0.1 2023-12-23 18:07:04,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1264360.0, ans=0.125 2023-12-23 18:07:11,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1264426.6666666667, ans=0.125 2023-12-23 18:07:23,148 INFO [train.py:886] (0/4) Epoch 40, batch 3800, loss[loss=0.01387, audio_tagging_loss=0.01387, over 24750.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4937678.16 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:07:23,707 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-12-23 18:07:29,695 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.212e+01 3.721e+01 3.894e+01 4.067e+01 4.769e+01, threshold=7.788e+01, percent-clipped=0.0 2023-12-23 18:07:46,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1264626.6666666667, ans=0.125 2023-12-23 18:08:03,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1264760.0, ans=0.125 2023-12-23 18:08:13,779 INFO [train.py:886] (0/4) Epoch 40, batch 3850, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24750.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4940700.31 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:08:16,127 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-12-23 18:08:17,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2023-12-23 18:08:30,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1264893.3333333333, ans=0.125 2023-12-23 18:09:05,577 INFO [train.py:886] (0/4) Epoch 40, batch 3900, loss[loss=0.01048, audio_tagging_loss=0.01048, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4944131.01 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:09:09,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1265160.0, ans=0.125 2023-12-23 18:09:12,985 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.264e+01 3.639e+01 3.828e+01 4.025e+01 4.570e+01, threshold=7.656e+01, percent-clipped=0.0 2023-12-23 18:09:22,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1265226.6666666667, ans=0.2 2023-12-23 18:09:56,684 INFO [train.py:886] (0/4) Epoch 40, batch 3950, loss[loss=0.01099, audio_tagging_loss=0.01099, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4946209.39 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:10:05,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.09 vs. limit=10.0 2023-12-23 18:10:13,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.70 vs. limit=15.0 2023-12-23 18:10:47,694 INFO [train.py:886] (0/4) Epoch 40, batch 4000, loss[loss=0.01182, audio_tagging_loss=0.01182, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4952322.73 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:10:55,001 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.346e+01 3.601e+01 3.801e+01 3.964e+01 6.190e+01, threshold=7.601e+01, percent-clipped=0.0 2023-12-23 18:11:00,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1265893.3333333333, ans=0.0 2023-12-23 18:11:11,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1265960.0, ans=0.125 2023-12-23 18:11:15,924 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2023-12-23 18:11:34,228 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.22 vs. limit=22.5 2023-12-23 18:11:39,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.05 vs. limit=22.5 2023-12-23 18:11:40,054 INFO [train.py:886] (0/4) Epoch 40, batch 4050, loss[loss=0.01339, audio_tagging_loss=0.01339, over 24750.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4948255.15 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:11:48,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1266160.0, ans=0.0 2023-12-23 18:11:58,072 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-12-23 18:12:10,550 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:12:15,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1266360.0, ans=0.2 2023-12-23 18:12:23,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1266426.6666666667, ans=0.0 2023-12-23 18:12:23,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1266426.6666666667, ans=0.125 2023-12-23 18:12:25,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1266426.6666666667, ans=0.125 2023-12-23 18:12:26,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1266426.6666666667, ans=0.04949747468305833 2023-12-23 18:12:27,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1266426.6666666667, ans=0.125 2023-12-23 18:12:31,437 INFO [train.py:886] (0/4) Epoch 40, batch 4100, loss[loss=0.01133, audio_tagging_loss=0.01133, over 24750.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4946625.03 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:12:38,723 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.330e+01 3.783e+01 3.912e+01 4.094e+01 5.068e+01, threshold=7.824e+01, percent-clipped=0.0 2023-12-23 18:12:52,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2023-12-23 18:13:15,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2023-12-23 18:13:24,058 INFO [train.py:886] (0/4) Epoch 40, batch 4150, loss[loss=0.01299, audio_tagging_loss=0.01299, over 24750.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4941094.05 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:13:26,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1266826.6666666667, ans=0.125 2023-12-23 18:13:41,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1266893.3333333333, ans=0.0 2023-12-23 18:13:43,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1266960.0, ans=0.125 2023-12-23 18:13:48,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1266960.0, ans=0.0 2023-12-23 18:13:51,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1266960.0, ans=0.0 2023-12-23 18:14:03,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1267026.6666666667, ans=0.125 2023-12-23 18:14:07,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1267093.3333333333, ans=0.0 2023-12-23 18:14:15,843 INFO [train.py:886] (0/4) Epoch 40, batch 4200, loss[loss=0.009884, audio_tagging_loss=0.009884, over 25000.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4939037.77 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:14:23,314 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.248e+01 3.687e+01 3.822e+01 4.029e+01 4.624e+01, threshold=7.645e+01, percent-clipped=0.0 2023-12-23 18:14:37,596 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2023-12-23 18:14:43,568 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:14:53,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.84 vs. limit=22.5 2023-12-23 18:14:55,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.92 vs. limit=22.5 2023-12-23 18:14:57,693 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:15:01,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1267426.6666666667, ans=0.125 2023-12-23 18:15:06,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1267426.6666666667, ans=0.1 2023-12-23 18:15:07,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.87 vs. limit=22.5 2023-12-23 18:15:08,323 INFO [train.py:886] (0/4) Epoch 40, batch 4250, loss[loss=0.01055, audio_tagging_loss=0.01055, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4940205.59 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:15:11,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1267493.3333333333, ans=0.2 2023-12-23 18:15:18,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1267560.0, ans=0.125 2023-12-23 18:15:40,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1267693.3333333333, ans=0.1 2023-12-23 18:15:49,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1267760.0, ans=0.125 2023-12-23 18:15:59,891 INFO [train.py:886] (0/4) Epoch 40, batch 4300, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4952187.65 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:16:06,444 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.393e+01 3.668e+01 3.826e+01 3.970e+01 4.663e+01, threshold=7.653e+01, percent-clipped=0.0 2023-12-23 18:16:15,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1267893.3333333333, ans=0.1 2023-12-23 18:16:27,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1267960.0, ans=0.125 2023-12-23 18:16:35,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1268026.6666666667, ans=0.125 2023-12-23 18:16:48,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1268093.3333333333, ans=0.125 2023-12-23 18:16:51,150 INFO [train.py:886] (0/4) Epoch 40, batch 4350, loss[loss=0.01285, audio_tagging_loss=0.01285, over 25000.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4958977.90 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:16:54,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1268160.0, ans=0.1 2023-12-23 18:16:58,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1268160.0, ans=0.125 2023-12-23 18:17:37,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1268426.6666666667, ans=0.125 2023-12-23 18:17:39,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1268426.6666666667, ans=0.0 2023-12-23 18:17:42,508 INFO [train.py:886] (0/4) Epoch 40, batch 4400, loss[loss=0.01359, audio_tagging_loss=0.01359, over 24750.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4945902.56 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:17:50,634 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.283e+01 3.697e+01 3.843e+01 4.013e+01 4.471e+01, threshold=7.687e+01, percent-clipped=0.0 2023-12-23 18:17:50,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1268493.3333333333, ans=0.125 2023-12-23 18:17:51,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1268493.3333333333, ans=0.125 2023-12-23 18:18:20,934 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:18:21,120 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:18:28,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2023-12-23 18:18:35,362 INFO [train.py:886] (0/4) Epoch 40, batch 4450, loss[loss=0.01068, audio_tagging_loss=0.01068, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4942588.56 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:18:57,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1268960.0, ans=0.1 2023-12-23 18:19:04,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1268960.0, ans=0.0 2023-12-23 18:19:27,033 INFO [train.py:886] (0/4) Epoch 40, batch 4500, loss[loss=0.01353, audio_tagging_loss=0.01353, over 24750.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4949978.05 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:19:34,274 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.454e+01 3.692e+01 3.865e+01 4.053e+01 4.653e+01, threshold=7.730e+01, percent-clipped=0.0 2023-12-23 18:19:36,756 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.81 vs. limit=10.0 2023-12-23 18:19:41,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1269226.6666666667, ans=0.2 2023-12-23 18:19:44,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1269226.6666666667, ans=0.125 2023-12-23 18:19:48,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2023-12-23 18:19:48,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1269293.3333333333, ans=0.125 2023-12-23 18:20:18,883 INFO [train.py:886] (0/4) Epoch 40, batch 4550, loss[loss=0.01162, audio_tagging_loss=0.01162, over 25000.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4948279.72 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:20:35,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1269560.0, ans=0.0 2023-12-23 18:21:01,591 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:21:10,429 INFO [train.py:886] (0/4) Epoch 40, batch 4600, loss[loss=0.01356, audio_tagging_loss=0.01356, over 25000.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4954691.20 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:21:14,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1269826.6666666667, ans=0.035 2023-12-23 18:21:17,649 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.202e+01 3.671e+01 3.796e+01 3.991e+01 4.710e+01, threshold=7.593e+01, percent-clipped=0.0 2023-12-23 18:21:20,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1269893.3333333333, ans=0.035 2023-12-23 18:21:21,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1269893.3333333333, ans=0.0 2023-12-23 18:21:23,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1269893.3333333333, ans=0.125 2023-12-23 18:21:40,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2023-12-23 18:21:41,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.59 vs. limit=15.0 2023-12-23 18:21:53,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1270093.3333333333, ans=0.07 2023-12-23 18:21:57,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-12-23 18:22:00,871 INFO [train.py:886] (0/4) Epoch 40, batch 4650, loss[loss=0.01219, audio_tagging_loss=0.01219, over 25000.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4958281.13 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:22:20,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1270226.6666666667, ans=0.125 2023-12-23 18:22:28,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1270293.3333333333, ans=0.125 2023-12-23 18:22:39,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2023-12-23 18:22:40,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.29 vs. limit=15.0 2023-12-23 18:22:44,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2023-12-23 18:22:48,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1270426.6666666667, ans=0.125 2023-12-23 18:22:50,735 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.82 vs. limit=6.0 2023-12-23 18:22:51,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.65 vs. limit=22.5 2023-12-23 18:22:52,811 INFO [train.py:886] (0/4) Epoch 40, batch 4700, loss[loss=0.0114, audio_tagging_loss=0.0114, over 24750.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4956730.55 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:22:54,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1270493.3333333333, ans=0.1 2023-12-23 18:22:59,154 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.413e+01 3.757e+01 3.899e+01 4.092e+01 5.478e+01, threshold=7.799e+01, percent-clipped=0.0 2023-12-23 18:23:03,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1270560.0, ans=0.125 2023-12-23 18:23:08,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1270560.0, ans=0.025 2023-12-23 18:23:12,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1270626.6666666667, ans=0.125 2023-12-23 18:23:17,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1270626.6666666667, ans=0.1 2023-12-23 18:23:19,567 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2023-12-23 18:23:31,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1270760.0, ans=0.125 2023-12-23 18:23:37,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1270760.0, ans=0.1 2023-12-23 18:23:39,644 INFO [train.py:886] (0/4) Epoch 40, batch 4750, loss[loss=0.0114, audio_tagging_loss=0.0114, over 25000.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4949930.68 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:23:51,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1270893.3333333333, ans=0.0 2023-12-23 18:23:52,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1270893.3333333333, ans=0.2 2023-12-23 18:23:55,089 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-40.pt 2023-12-23 18:24:13,759 INFO [train.py:886] (0/4) Epoch 41, batch 0, loss[loss=0.03015, audio_tagging_loss=0.03015, over 23990.00 frames. ], tot_loss[loss=0.03015, audio_tagging_loss=0.03015, over 23990.00 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:24:13,760 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 18:24:32,086 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6436, 2.9720, 4.1597, 3.8080], device='cuda:0') 2023-12-23 18:24:35,139 INFO [train.py:917] (0/4) Epoch 41, validation: loss=0.03496, audio_tagging_loss=0.03496, over 3737520.00 frames. 2023-12-23 18:24:35,140 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 18:24:49,080 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.12 vs. limit=22.5 2023-12-23 18:25:02,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1271066.6666666667, ans=0.05 2023-12-23 18:25:18,894 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.493e+01 3.874e+01 4.258e+01 5.303e+01 1.010e+02, threshold=8.517e+01, percent-clipped=7.0 2023-12-23 18:25:19,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1271200.0, ans=0.125 2023-12-23 18:25:26,261 INFO [train.py:886] (0/4) Epoch 41, batch 50, loss[loss=0.01417, audio_tagging_loss=0.01417, over 25000.00 frames. ], tot_loss[loss=0.01873, audio_tagging_loss=0.01873, over 1114196.80 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:25:30,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.39 vs. limit=22.5 2023-12-23 18:25:40,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1271333.3333333333, ans=0.0 2023-12-23 18:25:42,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1271333.3333333333, ans=0.125 2023-12-23 18:25:55,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1271400.0, ans=0.1 2023-12-23 18:26:00,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1271466.6666666667, ans=0.125 2023-12-23 18:26:08,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1271533.3333333333, ans=0.2 2023-12-23 18:26:09,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1271533.3333333333, ans=0.0 2023-12-23 18:26:13,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1271533.3333333333, ans=0.125 2023-12-23 18:26:18,034 INFO [train.py:886] (0/4) Epoch 41, batch 100, loss[loss=0.01141, audio_tagging_loss=0.01141, over 25000.00 frames. ], tot_loss[loss=0.01591, audio_tagging_loss=0.01591, over 1968755.56 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:26:33,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=1271666.6666666667, ans=15.0 2023-12-23 18:26:41,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1271733.3333333333, ans=0.0 2023-12-23 18:26:48,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.75 vs. limit=22.5 2023-12-23 18:26:49,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1271800.0, ans=0.0 2023-12-23 18:27:03,060 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.548e+01 3.949e+01 4.182e+01 4.375e+01 5.097e+01, threshold=8.364e+01, percent-clipped=0.0 2023-12-23 18:27:09,799 INFO [train.py:886] (0/4) Epoch 41, batch 150, loss[loss=0.01445, audio_tagging_loss=0.01445, over 25000.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 2631061.79 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:27:09,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1271933.3333333333, ans=0.0 2023-12-23 18:27:13,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1271933.3333333333, ans=0.0 2023-12-23 18:27:35,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=15.0 2023-12-23 18:27:37,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1272066.6666666667, ans=0.2 2023-12-23 18:27:43,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1272133.3333333333, ans=0.1 2023-12-23 18:27:56,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2023-12-23 18:28:02,498 INFO [train.py:886] (0/4) Epoch 41, batch 200, loss[loss=0.009585, audio_tagging_loss=0.009585, over 25000.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 3151326.49 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:28:07,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1272266.6666666667, ans=0.125 2023-12-23 18:28:12,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.47 vs. limit=15.0 2023-12-23 18:28:13,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1272333.3333333333, ans=0.125 2023-12-23 18:28:29,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1272400.0, ans=0.025 2023-12-23 18:28:30,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1272400.0, ans=0.125 2023-12-23 18:28:31,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.13 vs. limit=6.0 2023-12-23 18:28:38,341 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=12.0 2023-12-23 18:28:38,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.39 vs. limit=5.0 2023-12-23 18:28:46,896 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.401e+01 3.687e+01 3.842e+01 3.967e+01 4.665e+01, threshold=7.685e+01, percent-clipped=0.0 2023-12-23 18:28:48,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.50 vs. limit=12.0 2023-12-23 18:28:54,256 INFO [train.py:886] (0/4) Epoch 41, batch 250, loss[loss=0.01176, audio_tagging_loss=0.01176, over 25000.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 3551767.63 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:29:17,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1272733.3333333333, ans=0.035 2023-12-23 18:29:32,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1272800.0, ans=0.0 2023-12-23 18:29:43,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=1272866.6666666667, ans=15.0 2023-12-23 18:29:45,609 INFO [train.py:886] (0/4) Epoch 41, batch 300, loss[loss=0.01087, audio_tagging_loss=0.01087, over 25000.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 3862811.91 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:30:00,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1273000.0, ans=0.125 2023-12-23 18:30:21,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1273133.3333333333, ans=0.125 2023-12-23 18:30:29,127 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.255e+01 3.673e+01 3.844e+01 4.121e+01 4.840e+01, threshold=7.689e+01, percent-clipped=0.0 2023-12-23 18:30:36,415 INFO [train.py:886] (0/4) Epoch 41, batch 350, loss[loss=0.01107, audio_tagging_loss=0.01107, over 24750.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4092203.85 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:31:00,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1273400.0, ans=0.2 2023-12-23 18:31:23,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1273533.3333333333, ans=0.035 2023-12-23 18:31:28,755 INFO [train.py:886] (0/4) Epoch 41, batch 400, loss[loss=0.0109, audio_tagging_loss=0.0109, over 25000.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4283407.89 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:31:28,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1273600.0, ans=0.07 2023-12-23 18:31:49,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1273733.3333333333, ans=0.0 2023-12-23 18:31:54,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1273733.3333333333, ans=0.1 2023-12-23 18:31:55,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.66 vs. limit=22.5 2023-12-23 18:32:11,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1273866.6666666667, ans=0.2 2023-12-23 18:32:13,130 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.313e+01 3.591e+01 3.744e+01 3.939e+01 4.826e+01, threshold=7.487e+01, percent-clipped=0.0 2023-12-23 18:32:20,534 INFO [train.py:886] (0/4) Epoch 41, batch 450, loss[loss=0.01009, audio_tagging_loss=0.01009, over 22540.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4428421.69 frames. ], batch size: 107, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:32:27,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1273933.3333333333, ans=0.125 2023-12-23 18:32:38,210 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:32:40,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.48 vs. limit=22.5 2023-12-23 18:32:47,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1274066.6666666667, ans=0.2 2023-12-23 18:32:54,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=12.0 2023-12-23 18:33:10,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.09 vs. limit=22.5 2023-12-23 18:33:12,243 INFO [train.py:886] (0/4) Epoch 41, batch 500, loss[loss=0.01219, audio_tagging_loss=0.01219, over 24750.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4547331.85 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:33:18,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1274266.6666666667, ans=0.0 2023-12-23 18:33:19,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1274266.6666666667, ans=0.0 2023-12-23 18:33:43,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.28 vs. limit=22.5 2023-12-23 18:33:45,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1274466.6666666667, ans=0.125 2023-12-23 18:33:56,401 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.109e+01 3.650e+01 3.820e+01 3.998e+01 4.800e+01, threshold=7.639e+01, percent-clipped=0.0 2023-12-23 18:34:02,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.39 vs. limit=12.0 2023-12-23 18:34:03,787 INFO [train.py:886] (0/4) Epoch 41, batch 550, loss[loss=0.009559, audio_tagging_loss=0.009559, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4640501.72 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:34:05,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1274600.0, ans=0.2 2023-12-23 18:34:09,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1274600.0, ans=0.0 2023-12-23 18:34:16,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2023-12-23 18:34:24,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1274733.3333333333, ans=0.125 2023-12-23 18:34:32,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1274733.3333333333, ans=22.5 2023-12-23 18:34:56,191 INFO [train.py:886] (0/4) Epoch 41, batch 600, loss[loss=0.01238, audio_tagging_loss=0.01238, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4710122.62 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:34:56,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1274933.3333333333, ans=0.0 2023-12-23 18:34:58,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1274933.3333333333, ans=0.125 2023-12-23 18:35:05,546 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:35:40,233 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.244e+01 3.747e+01 3.891e+01 4.086e+01 5.120e+01, threshold=7.782e+01, percent-clipped=0.0 2023-12-23 18:35:41,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1275200.0, ans=0.2 2023-12-23 18:35:48,372 INFO [train.py:886] (0/4) Epoch 41, batch 650, loss[loss=0.01018, audio_tagging_loss=0.01018, over 24750.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4754223.56 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:35:53,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1275266.6666666667, ans=0.0 2023-12-23 18:35:54,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.02 vs. limit=10.0 2023-12-23 18:35:56,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1275266.6666666667, ans=0.1 2023-12-23 18:35:59,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1275333.3333333333, ans=0.1 2023-12-23 18:36:39,879 INFO [train.py:886] (0/4) Epoch 41, batch 700, loss[loss=0.01058, audio_tagging_loss=0.01058, over 24750.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4789656.19 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:37:13,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1275800.0, ans=0.125 2023-12-23 18:37:22,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1275866.6666666667, ans=0.125 2023-12-23 18:37:24,899 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.413e+01 3.750e+01 3.860e+01 4.061e+01 4.621e+01, threshold=7.721e+01, percent-clipped=0.0 2023-12-23 18:37:31,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1275933.3333333333, ans=0.0 2023-12-23 18:37:32,203 INFO [train.py:886] (0/4) Epoch 41, batch 750, loss[loss=0.01242, audio_tagging_loss=0.01242, over 25000.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4825548.54 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:37:52,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1276066.6666666667, ans=0.125 2023-12-23 18:37:53,908 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:37:59,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1276066.6666666667, ans=0.0 2023-12-23 18:38:21,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1276200.0, ans=0.0 2023-12-23 18:38:22,875 INFO [train.py:886] (0/4) Epoch 41, batch 800, loss[loss=0.01215, audio_tagging_loss=0.01215, over 25000.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4854052.83 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:38:46,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1276400.0, ans=0.0 2023-12-23 18:39:07,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1276533.3333333333, ans=0.0 2023-12-23 18:39:08,858 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.333e+01 3.674e+01 3.793e+01 3.949e+01 4.603e+01, threshold=7.587e+01, percent-clipped=0.0 2023-12-23 18:39:15,580 INFO [train.py:886] (0/4) Epoch 41, batch 850, loss[loss=0.01241, audio_tagging_loss=0.01241, over 25000.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4877552.48 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:39:28,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.41 vs. limit=15.0 2023-12-23 18:39:46,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1276800.0, ans=0.025 2023-12-23 18:40:07,464 INFO [train.py:886] (0/4) Epoch 41, batch 900, loss[loss=0.01346, audio_tagging_loss=0.01346, over 24750.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4896793.93 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:40:09,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1276933.3333333333, ans=0.05 2023-12-23 18:40:28,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1277066.6666666667, ans=0.1 2023-12-23 18:40:31,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1277066.6666666667, ans=0.125 2023-12-23 18:40:38,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1277133.3333333333, ans=0.1 2023-12-23 18:40:52,163 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.459e+01 3.699e+01 3.943e+01 4.115e+01 4.708e+01, threshold=7.886e+01, percent-clipped=0.0 2023-12-23 18:40:53,739 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2023-12-23 18:40:59,575 INFO [train.py:886] (0/4) Epoch 41, batch 950, loss[loss=0.01188, audio_tagging_loss=0.01188, over 24942.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4903257.98 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:41:03,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1277266.6666666667, ans=0.1 2023-12-23 18:41:16,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.83 vs. limit=22.5 2023-12-23 18:41:22,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1277400.0, ans=0.1 2023-12-23 18:41:28,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1277400.0, ans=0.0 2023-12-23 18:41:30,516 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.444e-01 2023-12-23 18:41:34,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1277466.6666666667, ans=0.0 2023-12-23 18:41:35,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1277466.6666666667, ans=0.125 2023-12-23 18:41:41,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1277533.3333333333, ans=0.1 2023-12-23 18:41:52,143 INFO [train.py:886] (0/4) Epoch 41, batch 1000, loss[loss=0.0112, audio_tagging_loss=0.0112, over 24750.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4913470.17 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:41:53,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1277600.0, ans=0.07 2023-12-23 18:42:20,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.52 vs. limit=10.0 2023-12-23 18:42:32,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=1277866.6666666667, ans=15.0 2023-12-23 18:42:32,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2023-12-23 18:42:35,891 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.182e+01 3.700e+01 3.857e+01 4.081e+01 5.233e+01, threshold=7.714e+01, percent-clipped=0.0 2023-12-23 18:42:43,287 INFO [train.py:886] (0/4) Epoch 41, batch 1050, loss[loss=0.01074, audio_tagging_loss=0.01074, over 25000.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4927681.25 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:42:49,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1277933.3333333333, ans=0.5 2023-12-23 18:43:12,185 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2023-12-23 18:43:18,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1278133.3333333333, ans=0.125 2023-12-23 18:43:24,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1278200.0, ans=0.09899494936611666 2023-12-23 18:43:30,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.68 vs. limit=6.0 2023-12-23 18:43:35,401 INFO [train.py:886] (0/4) Epoch 41, batch 1100, loss[loss=0.0107, audio_tagging_loss=0.0107, over 25000.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4934848.17 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:43:44,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1278333.3333333333, ans=0.1 2023-12-23 18:43:52,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.31 vs. limit=22.5 2023-12-23 18:44:02,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1278400.0, ans=0.2 2023-12-23 18:44:12,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1278466.6666666667, ans=0.1 2023-12-23 18:44:12,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1278466.6666666667, ans=0.125 2023-12-23 18:44:14,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1278466.6666666667, ans=0.125 2023-12-23 18:44:19,474 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.288e+01 3.657e+01 3.787e+01 4.008e+01 4.824e+01, threshold=7.573e+01, percent-clipped=0.0 2023-12-23 18:44:26,097 INFO [train.py:886] (0/4) Epoch 41, batch 1150, loss[loss=0.01017, audio_tagging_loss=0.01017, over 25000.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4940587.39 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:44:27,414 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=12.0 2023-12-23 18:44:29,912 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2023-12-23 18:44:38,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1278666.6666666667, ans=0.0 2023-12-23 18:44:48,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1278733.3333333333, ans=0.125 2023-12-23 18:44:55,870 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2023-12-23 18:45:14,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1278866.6666666667, ans=0.125 2023-12-23 18:45:18,179 INFO [train.py:886] (0/4) Epoch 41, batch 1200, loss[loss=0.01121, audio_tagging_loss=0.01121, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4947494.17 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:45:20,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1278933.3333333333, ans=0.0 2023-12-23 18:45:28,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1279000.0, ans=0.0 2023-12-23 18:45:34,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1279000.0, ans=0.2 2023-12-23 18:45:35,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1279000.0, ans=0.125 2023-12-23 18:45:50,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2023-12-23 18:46:02,170 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.367e+01 3.706e+01 3.872e+01 4.011e+01 4.696e+01, threshold=7.743e+01, percent-clipped=0.0 2023-12-23 18:46:09,429 INFO [train.py:886] (0/4) Epoch 41, batch 1250, loss[loss=0.01137, audio_tagging_loss=0.01137, over 24750.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4950798.08 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:46:19,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1279333.3333333333, ans=0.0 2023-12-23 18:46:20,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=12.0 2023-12-23 18:46:22,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1279333.3333333333, ans=0.125 2023-12-23 18:46:44,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1279466.6666666667, ans=0.125 2023-12-23 18:46:53,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1279533.3333333333, ans=0.125 2023-12-23 18:46:54,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1279533.3333333333, ans=0.125 2023-12-23 18:46:58,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1279533.3333333333, ans=0.09899494936611666 2023-12-23 18:47:01,529 INFO [train.py:886] (0/4) Epoch 41, batch 1300, loss[loss=0.008937, audio_tagging_loss=0.008937, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4941708.89 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:47:45,471 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.348e+01 3.706e+01 3.903e+01 4.052e+01 5.836e+01, threshold=7.805e+01, percent-clipped=0.0 2023-12-23 18:47:52,841 INFO [train.py:886] (0/4) Epoch 41, batch 1350, loss[loss=0.01068, audio_tagging_loss=0.01068, over 25000.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4942685.25 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:48:02,292 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-192000.pt 2023-12-23 18:48:05,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1280000.0, ans=10.0 2023-12-23 18:48:14,519 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=12.0 2023-12-23 18:48:23,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1280066.6666666667, ans=0.125 2023-12-23 18:48:24,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1280133.3333333333, ans=0.1 2023-12-23 18:48:31,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1280133.3333333333, ans=0.125 2023-12-23 18:48:45,523 INFO [train.py:886] (0/4) Epoch 41, batch 1400, loss[loss=0.01086, audio_tagging_loss=0.01086, over 25000.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4943242.73 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:48:56,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1280333.3333333333, ans=0.125 2023-12-23 18:48:58,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1280333.3333333333, ans=10.0 2023-12-23 18:49:06,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1280400.0, ans=0.035 2023-12-23 18:49:09,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1280400.0, ans=0.125 2023-12-23 18:49:10,007 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.12 vs. limit=10.0 2023-12-23 18:49:21,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1280466.6666666667, ans=0.1 2023-12-23 18:49:30,414 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.274e+01 3.669e+01 3.890e+01 4.024e+01 4.902e+01, threshold=7.779e+01, percent-clipped=0.0 2023-12-23 18:49:35,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1280533.3333333333, ans=0.125 2023-12-23 18:49:37,002 INFO [train.py:886] (0/4) Epoch 41, batch 1450, loss[loss=0.01406, audio_tagging_loss=0.01406, over 25000.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4946966.98 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:49:54,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.25 vs. limit=10.0 2023-12-23 18:50:16,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1280800.0, ans=0.0 2023-12-23 18:50:18,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1280866.6666666667, ans=0.07 2023-12-23 18:50:18,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1280866.6666666667, ans=0.0 2023-12-23 18:50:28,723 INFO [train.py:886] (0/4) Epoch 41, batch 1500, loss[loss=0.01224, audio_tagging_loss=0.01224, over 25000.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4944644.41 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:50:38,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1281000.0, ans=0.125 2023-12-23 18:50:57,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1281066.6666666667, ans=0.125 2023-12-23 18:51:13,185 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.443e+01 3.676e+01 3.888e+01 4.062e+01 4.485e+01, threshold=7.775e+01, percent-clipped=0.0 2023-12-23 18:51:20,535 INFO [train.py:886] (0/4) Epoch 41, batch 1550, loss[loss=0.01138, audio_tagging_loss=0.01138, over 24750.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4941424.38 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:51:30,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1281333.3333333333, ans=0.5 2023-12-23 18:51:42,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1281400.0, ans=0.1 2023-12-23 18:51:53,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1281466.6666666667, ans=0.125 2023-12-23 18:51:53,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1281466.6666666667, ans=0.1 2023-12-23 18:51:54,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1281466.6666666667, ans=0.1 2023-12-23 18:52:01,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1281533.3333333333, ans=0.125 2023-12-23 18:52:10,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1281533.3333333333, ans=0.0 2023-12-23 18:52:12,604 INFO [train.py:886] (0/4) Epoch 41, batch 1600, loss[loss=0.009871, audio_tagging_loss=0.009871, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4933336.17 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:52:27,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1281666.6666666667, ans=0.0 2023-12-23 18:52:35,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1281733.3333333333, ans=0.0 2023-12-23 18:52:45,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1281800.0, ans=0.125 2023-12-23 18:52:56,609 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.336e+01 3.686e+01 3.838e+01 4.087e+01 6.862e+01, threshold=7.676e+01, percent-clipped=0.0 2023-12-23 18:53:03,952 INFO [train.py:886] (0/4) Epoch 41, batch 1650, loss[loss=0.01031, audio_tagging_loss=0.01031, over 25000.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4930336.51 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:53:17,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1282000.0, ans=0.0 2023-12-23 18:53:42,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1282133.3333333333, ans=0.2 2023-12-23 18:53:56,369 INFO [train.py:886] (0/4) Epoch 41, batch 1700, loss[loss=0.009487, audio_tagging_loss=0.009487, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4940457.71 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:53:58,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1282266.6666666667, ans=0.1 2023-12-23 18:54:05,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1282333.3333333333, ans=0.125 2023-12-23 18:54:12,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1282333.3333333333, ans=0.125 2023-12-23 18:54:16,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.44 vs. limit=6.0 2023-12-23 18:54:21,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1282400.0, ans=0.125 2023-12-23 18:54:27,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1282466.6666666667, ans=0.0 2023-12-23 18:54:40,556 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.322e+01 3.662e+01 3.806e+01 4.000e+01 4.786e+01, threshold=7.612e+01, percent-clipped=0.0 2023-12-23 18:54:48,037 INFO [train.py:886] (0/4) Epoch 41, batch 1750, loss[loss=0.01031, audio_tagging_loss=0.01031, over 25000.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4944677.15 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:55:05,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1282666.6666666667, ans=0.125 2023-12-23 18:55:14,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1282733.3333333333, ans=0.0 2023-12-23 18:55:32,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1282866.6666666667, ans=0.0 2023-12-23 18:55:39,840 INFO [train.py:886] (0/4) Epoch 41, batch 1800, loss[loss=0.01191, audio_tagging_loss=0.01191, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4948360.56 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:55:45,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1282933.3333333333, ans=0.125 2023-12-23 18:55:45,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1282933.3333333333, ans=0.0 2023-12-23 18:55:51,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1283000.0, ans=0.04949747468305833 2023-12-23 18:55:59,439 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-12-23 18:56:23,836 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.191e+01 3.787e+01 3.915e+01 4.054e+01 5.277e+01, threshold=7.830e+01, percent-clipped=0.0 2023-12-23 18:56:31,240 INFO [train.py:886] (0/4) Epoch 41, batch 1850, loss[loss=0.01144, audio_tagging_loss=0.01144, over 25000.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4948538.78 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:56:37,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1283266.6666666667, ans=0.125 2023-12-23 18:56:40,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.70 vs. limit=15.0 2023-12-23 18:57:22,450 INFO [train.py:886] (0/4) Epoch 41, batch 1900, loss[loss=0.01104, audio_tagging_loss=0.01104, over 24750.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4947322.89 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:57:48,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1283733.3333333333, ans=0.125 2023-12-23 18:57:51,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1283733.3333333333, ans=0.125 2023-12-23 18:57:59,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1283800.0, ans=0.0 2023-12-23 18:58:01,205 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:58:03,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.62 vs. limit=10.0 2023-12-23 18:58:06,574 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.411e+01 3.762e+01 3.907e+01 4.041e+01 4.562e+01, threshold=7.814e+01, percent-clipped=0.0 2023-12-23 18:58:06,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1283866.6666666667, ans=0.0 2023-12-23 18:58:13,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1283933.3333333333, ans=0.125 2023-12-23 18:58:13,898 INFO [train.py:886] (0/4) Epoch 41, batch 1950, loss[loss=0.009353, audio_tagging_loss=0.009353, over 24750.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4944590.21 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:58:18,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1283933.3333333333, ans=0.0 2023-12-23 18:58:20,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.27 vs. limit=15.0 2023-12-23 18:58:23,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1283933.3333333333, ans=0.0 2023-12-23 18:58:24,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1284000.0, ans=0.2 2023-12-23 18:59:05,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=1284266.6666666667, ans=0.2 2023-12-23 18:59:06,061 INFO [train.py:886] (0/4) Epoch 41, batch 2000, loss[loss=0.01193, audio_tagging_loss=0.01193, over 24750.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4944863.04 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:59:25,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1284400.0, ans=0.0 2023-12-23 18:59:45,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1284466.6666666667, ans=0.125 2023-12-23 18:59:50,424 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.337e+01 3.662e+01 3.860e+01 4.077e+01 4.836e+01, threshold=7.721e+01, percent-clipped=0.0 2023-12-23 18:59:57,748 INFO [train.py:886] (0/4) Epoch 41, batch 2050, loss[loss=0.01107, audio_tagging_loss=0.01107, over 24750.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4942261.08 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 64.0 2023-12-23 19:00:01,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1284600.0, ans=0.125 2023-12-23 19:00:10,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1284666.6666666667, ans=15.0 2023-12-23 19:00:20,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1284733.3333333333, ans=0.0 2023-12-23 19:00:20,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1284733.3333333333, ans=0.125 2023-12-23 19:00:39,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.86 vs. limit=15.0 2023-12-23 19:00:41,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1284866.6666666667, ans=0.2 2023-12-23 19:00:45,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1284866.6666666667, ans=0.2 2023-12-23 19:00:49,180 INFO [train.py:886] (0/4) Epoch 41, batch 2100, loss[loss=0.01329, audio_tagging_loss=0.01329, over 25000.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4943141.53 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 64.0 2023-12-23 19:00:53,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1284933.3333333333, ans=0.0 2023-12-23 19:01:23,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1285133.3333333333, ans=0.1 2023-12-23 19:01:23,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=15.0 2023-12-23 19:01:30,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1285200.0, ans=0.0 2023-12-23 19:01:34,047 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 3.671e+01 3.827e+01 4.014e+01 4.652e+01, threshold=7.654e+01, percent-clipped=0.0 2023-12-23 19:01:40,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1285266.6666666667, ans=0.0 2023-12-23 19:01:41,374 INFO [train.py:886] (0/4) Epoch 41, batch 2150, loss[loss=0.01198, audio_tagging_loss=0.01198, over 20802.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4943218.14 frames. ], batch size: 107, lr: 2.62e-03, grad_scale: 64.0 2023-12-23 19:01:49,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1285266.6666666667, ans=0.125 2023-12-23 19:02:20,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1285466.6666666667, ans=0.1 2023-12-23 19:02:24,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1285533.3333333333, ans=0.2 2023-12-23 19:02:25,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1285533.3333333333, ans=0.125 2023-12-23 19:02:33,074 INFO [train.py:886] (0/4) Epoch 41, batch 2200, loss[loss=0.01052, audio_tagging_loss=0.01052, over 24750.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4934047.10 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:02:46,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1285666.6666666667, ans=0.125 2023-12-23 19:02:51,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1285666.6666666667, ans=0.125 2023-12-23 19:02:59,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.16 vs. limit=15.0 2023-12-23 19:03:03,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1285800.0, ans=0.0 2023-12-23 19:03:18,841 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.225e+01 3.764e+01 3.888e+01 4.003e+01 5.031e+01, threshold=7.777e+01, percent-clipped=0.0 2023-12-23 19:03:21,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1285866.6666666667, ans=0.125 2023-12-23 19:03:24,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2023-12-23 19:03:25,262 INFO [train.py:886] (0/4) Epoch 41, batch 2250, loss[loss=0.01175, audio_tagging_loss=0.01175, over 25000.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4938309.14 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:03:32,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1285933.3333333333, ans=0.125 2023-12-23 19:03:34,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-12-23 19:03:47,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1286066.6666666667, ans=0.125 2023-12-23 19:03:59,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1286133.3333333333, ans=0.125 2023-12-23 19:04:04,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1286133.3333333333, ans=0.025 2023-12-23 19:04:16,985 INFO [train.py:886] (0/4) Epoch 41, batch 2300, loss[loss=0.0121, audio_tagging_loss=0.0121, over 25000.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4942085.81 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:04:22,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1286266.6666666667, ans=0.1 2023-12-23 19:04:28,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1286333.3333333333, ans=0.0 2023-12-23 19:05:02,013 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.305e+01 3.676e+01 3.827e+01 3.947e+01 4.404e+01, threshold=7.653e+01, percent-clipped=0.0 2023-12-23 19:05:08,341 INFO [train.py:886] (0/4) Epoch 41, batch 2350, loss[loss=0.01097, audio_tagging_loss=0.01097, over 25000.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4944915.59 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:05:08,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1286600.0, ans=0.05 2023-12-23 19:05:14,613 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-12-23 19:05:26,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.03 vs. limit=22.5 2023-12-23 19:05:29,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1286733.3333333333, ans=0.0 2023-12-23 19:05:32,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1286733.3333333333, ans=0.125 2023-12-23 19:05:39,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.71 vs. limit=22.5 2023-12-23 19:05:41,940 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:05:44,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-12-23 19:06:00,410 INFO [train.py:886] (0/4) Epoch 41, batch 2400, loss[loss=0.01169, audio_tagging_loss=0.01169, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4946795.60 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:06:07,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1286933.3333333333, ans=0.0 2023-12-23 19:06:23,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.09 vs. limit=10.0 2023-12-23 19:06:26,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=12.0 2023-12-23 19:06:26,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1287066.6666666667, ans=15.0 2023-12-23 19:06:46,640 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.257e+01 3.616e+01 3.787e+01 3.992e+01 4.640e+01, threshold=7.573e+01, percent-clipped=0.0 2023-12-23 19:06:52,474 INFO [train.py:886] (0/4) Epoch 41, batch 2450, loss[loss=0.01075, audio_tagging_loss=0.01075, over 24041.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4951189.24 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:06:58,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1287266.6666666667, ans=0.1 2023-12-23 19:07:44,510 INFO [train.py:886] (0/4) Epoch 41, batch 2500, loss[loss=0.01112, audio_tagging_loss=0.01112, over 24750.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4947527.45 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:07:50,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.92 vs. limit=12.0 2023-12-23 19:08:04,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1287733.3333333333, ans=0.125 2023-12-23 19:08:27,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1287866.6666666667, ans=0.125 2023-12-23 19:08:29,631 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.365e+01 3.727e+01 3.879e+01 4.092e+01 4.648e+01, threshold=7.757e+01, percent-clipped=0.0 2023-12-23 19:08:29,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1287866.6666666667, ans=0.2 2023-12-23 19:08:36,079 INFO [train.py:886] (0/4) Epoch 41, batch 2550, loss[loss=0.01463, audio_tagging_loss=0.01463, over 24750.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4944606.65 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:08:42,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1287933.3333333333, ans=0.0 2023-12-23 19:08:55,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1288000.0, ans=0.125 2023-12-23 19:08:58,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1288066.6666666667, ans=0.125 2023-12-23 19:09:01,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.98 vs. limit=10.0 2023-12-23 19:09:05,648 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.52 vs. limit=10.0 2023-12-23 19:09:28,087 INFO [train.py:886] (0/4) Epoch 41, batch 2600, loss[loss=0.01266, audio_tagging_loss=0.01266, over 22070.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4940577.43 frames. ], batch size: 107, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:09:46,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1288333.3333333333, ans=0.2 2023-12-23 19:09:51,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1288400.0, ans=0.125 2023-12-23 19:10:05,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1288466.6666666667, ans=0.0 2023-12-23 19:10:13,064 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.409e+01 3.737e+01 3.877e+01 4.049e+01 5.026e+01, threshold=7.754e+01, percent-clipped=0.0 2023-12-23 19:10:14,452 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.35 vs. limit=10.0 2023-12-23 19:10:20,178 INFO [train.py:886] (0/4) Epoch 41, batch 2650, loss[loss=0.01372, audio_tagging_loss=0.01372, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4945594.77 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:10:41,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1288733.3333333333, ans=0.125 2023-12-23 19:10:41,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1288733.3333333333, ans=0.1 2023-12-23 19:10:53,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1288800.0, ans=0.125 2023-12-23 19:11:01,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1288866.6666666667, ans=0.1 2023-12-23 19:11:03,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1288866.6666666667, ans=0.1 2023-12-23 19:11:11,314 INFO [train.py:886] (0/4) Epoch 41, batch 2700, loss[loss=0.01047, audio_tagging_loss=0.01047, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4950890.17 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:11:34,212 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=12.0 2023-12-23 19:11:37,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1289066.6666666667, ans=0.125 2023-12-23 19:11:38,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1289066.6666666667, ans=0.2 2023-12-23 19:11:41,651 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.38 vs. limit=15.0 2023-12-23 19:11:50,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1289133.3333333333, ans=0.0 2023-12-23 19:11:56,739 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.268e+01 3.652e+01 3.827e+01 3.975e+01 4.292e+01, threshold=7.655e+01, percent-clipped=0.0 2023-12-23 19:12:03,181 INFO [train.py:886] (0/4) Epoch 41, batch 2750, loss[loss=0.013, audio_tagging_loss=0.013, over 25000.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4955035.67 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:12:09,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1289266.6666666667, ans=0.1 2023-12-23 19:12:23,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1289400.0, ans=0.0 2023-12-23 19:12:27,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2023-12-23 19:12:31,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1289400.0, ans=0.05 2023-12-23 19:12:31,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1289400.0, ans=0.1 2023-12-23 19:12:38,803 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2023-12-23 19:12:42,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1289466.6666666667, ans=0.0 2023-12-23 19:12:42,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1289466.6666666667, ans=0.125 2023-12-23 19:12:44,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1289533.3333333333, ans=0.0 2023-12-23 19:12:54,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1289600.0, ans=0.0 2023-12-23 19:12:55,086 INFO [train.py:886] (0/4) Epoch 41, batch 2800, loss[loss=0.009994, audio_tagging_loss=0.009994, over 25000.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4950808.75 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:12:58,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1289600.0, ans=0.0 2023-12-23 19:13:21,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1289733.3333333333, ans=0.1 2023-12-23 19:13:21,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-12-23 19:13:41,850 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.413e+01 3.762e+01 3.896e+01 4.063e+01 4.589e+01, threshold=7.791e+01, percent-clipped=0.0 2023-12-23 19:13:42,543 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.56 vs. limit=22.5 2023-12-23 19:13:47,603 INFO [train.py:886] (0/4) Epoch 41, batch 2850, loss[loss=0.01499, audio_tagging_loss=0.01499, over 25000.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4940039.74 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:13:59,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1290000.0, ans=0.1 2023-12-23 19:14:07,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2023-12-23 19:14:33,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1290200.0, ans=0.5 2023-12-23 19:14:35,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1290200.0, ans=0.125 2023-12-23 19:14:37,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1290200.0, ans=0.05 2023-12-23 19:14:39,148 INFO [train.py:886] (0/4) Epoch 41, batch 2900, loss[loss=0.01288, audio_tagging_loss=0.01288, over 24750.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4940291.80 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:14:51,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1290333.3333333333, ans=0.1 2023-12-23 19:14:53,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2023-12-23 19:14:53,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1290333.3333333333, ans=0.125 2023-12-23 19:14:53,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1290333.3333333333, ans=0.1 2023-12-23 19:14:56,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1290333.3333333333, ans=0.125 2023-12-23 19:14:58,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1290333.3333333333, ans=0.1 2023-12-23 19:15:19,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1290466.6666666667, ans=0.125 2023-12-23 19:15:24,928 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.234e+01 3.667e+01 3.837e+01 4.047e+01 4.824e+01, threshold=7.673e+01, percent-clipped=0.0 2023-12-23 19:15:28,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1290533.3333333333, ans=0.1 2023-12-23 19:15:31,329 INFO [train.py:886] (0/4) Epoch 41, batch 2950, loss[loss=0.01265, audio_tagging_loss=0.01265, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4947130.37 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:15:38,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1290600.0, ans=0.125 2023-12-23 19:15:48,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1290666.6666666667, ans=0.125 2023-12-23 19:16:05,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1290800.0, ans=0.2 2023-12-23 19:16:06,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1290800.0, ans=0.1 2023-12-23 19:16:15,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=12.0 2023-12-23 19:16:23,709 INFO [train.py:886] (0/4) Epoch 41, batch 3000, loss[loss=0.009106, audio_tagging_loss=0.009106, over 25000.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4944794.92 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:16:23,711 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 19:16:45,001 INFO [train.py:917] (0/4) Epoch 41, validation: loss=0.03524, audio_tagging_loss=0.03524, over 3737520.00 frames. 2023-12-23 19:16:45,002 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 19:16:46,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1290933.3333333333, ans=0.0 2023-12-23 19:17:01,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1291000.0, ans=0.0 2023-12-23 19:17:06,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2023-12-23 19:17:28,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1291200.0, ans=0.0 2023-12-23 19:17:30,602 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.333e+01 3.633e+01 3.842e+01 3.990e+01 4.593e+01, threshold=7.683e+01, percent-clipped=0.0 2023-12-23 19:17:37,026 INFO [train.py:886] (0/4) Epoch 41, batch 3050, loss[loss=0.01138, audio_tagging_loss=0.01138, over 25000.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4953430.75 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:17:40,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1291266.6666666667, ans=0.5 2023-12-23 19:17:53,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1291333.3333333333, ans=0.125 2023-12-23 19:17:57,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1291400.0, ans=0.125 2023-12-23 19:18:02,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1291400.0, ans=0.125 2023-12-23 19:18:21,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1291533.3333333333, ans=0.2 2023-12-23 19:18:28,507 INFO [train.py:886] (0/4) Epoch 41, batch 3100, loss[loss=0.01378, audio_tagging_loss=0.01378, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4953397.15 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:18:29,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1291600.0, ans=0.125 2023-12-23 19:19:14,101 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.432e+01 3.751e+01 3.878e+01 4.025e+01 4.905e+01, threshold=7.756e+01, percent-clipped=0.0 2023-12-23 19:19:19,790 INFO [train.py:886] (0/4) Epoch 41, batch 3150, loss[loss=0.01145, audio_tagging_loss=0.01145, over 24750.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4950204.03 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:19:31,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2023-12-23 19:19:33,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1292000.0, ans=0.1 2023-12-23 19:19:52,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1292133.3333333333, ans=0.2 2023-12-23 19:19:55,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1292133.3333333333, ans=0.125 2023-12-23 19:19:55,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1292133.3333333333, ans=0.125 2023-12-23 19:20:06,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1292200.0, ans=0.125 2023-12-23 19:20:10,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1292200.0, ans=0.1 2023-12-23 19:20:12,134 INFO [train.py:886] (0/4) Epoch 41, batch 3200, loss[loss=0.01205, audio_tagging_loss=0.01205, over 25000.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4949883.97 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:20:22,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1292333.3333333333, ans=0.5 2023-12-23 19:20:46,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1292466.6666666667, ans=0.125 2023-12-23 19:20:53,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1292533.3333333333, ans=0.0 2023-12-23 19:20:54,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1292533.3333333333, ans=0.125 2023-12-23 19:20:55,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1292533.3333333333, ans=0.05 2023-12-23 19:20:57,348 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.348e+01 3.736e+01 3.877e+01 4.151e+01 5.106e+01, threshold=7.754e+01, percent-clipped=0.0 2023-12-23 19:21:04,207 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2023-12-23 19:21:04,475 INFO [train.py:886] (0/4) Epoch 41, batch 3250, loss[loss=0.01121, audio_tagging_loss=0.01121, over 24750.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4948815.13 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:21:05,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.23 vs. limit=10.0 2023-12-23 19:21:21,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1292666.6666666667, ans=0.1 2023-12-23 19:21:24,203 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:21:32,296 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:21:44,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1292866.6666666667, ans=0.125 2023-12-23 19:21:49,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1292866.6666666667, ans=0.125 2023-12-23 19:21:56,102 INFO [train.py:886] (0/4) Epoch 41, batch 3300, loss[loss=0.01217, audio_tagging_loss=0.01217, over 25000.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4951197.18 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:22:07,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1293000.0, ans=0.125 2023-12-23 19:22:12,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1293000.0, ans=0.125 2023-12-23 19:22:17,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1293066.6666666667, ans=0.125 2023-12-23 19:22:19,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1293066.6666666667, ans=0.125 2023-12-23 19:22:36,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.21 vs. limit=22.5 2023-12-23 19:22:41,789 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.319e+01 3.618e+01 3.799e+01 4.010e+01 4.611e+01, threshold=7.598e+01, percent-clipped=0.0 2023-12-23 19:22:47,451 INFO [train.py:886] (0/4) Epoch 41, batch 3350, loss[loss=0.01091, audio_tagging_loss=0.01091, over 24750.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4956593.83 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:22:49,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.33 vs. limit=22.5 2023-12-23 19:23:00,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1293333.3333333333, ans=0.0 2023-12-23 19:23:02,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.34 vs. limit=22.5 2023-12-23 19:23:03,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1293333.3333333333, ans=0.125 2023-12-23 19:23:06,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1293400.0, ans=0.0 2023-12-23 19:23:19,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1293466.6666666667, ans=0.125 2023-12-23 19:23:22,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1293466.6666666667, ans=0.125 2023-12-23 19:23:22,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1293466.6666666667, ans=0.2 2023-12-23 19:23:25,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1293466.6666666667, ans=0.5 2023-12-23 19:23:31,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1293533.3333333333, ans=0.1 2023-12-23 19:23:39,111 INFO [train.py:886] (0/4) Epoch 41, batch 3400, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4962915.64 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:24:00,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1293733.3333333333, ans=0.125 2023-12-23 19:24:06,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1293733.3333333333, ans=0.125 2023-12-23 19:24:14,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.60 vs. limit=22.5 2023-12-23 19:24:24,805 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.278e+01 3.727e+01 3.891e+01 4.043e+01 4.833e+01, threshold=7.782e+01, percent-clipped=0.0 2023-12-23 19:24:29,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.17 vs. limit=6.0 2023-12-23 19:24:30,513 INFO [train.py:886] (0/4) Epoch 41, batch 3450, loss[loss=0.01339, audio_tagging_loss=0.01339, over 22375.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4956455.12 frames. ], batch size: 107, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:24:53,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1294066.6666666667, ans=0.5 2023-12-23 19:24:57,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1294066.6666666667, ans=0.0 2023-12-23 19:25:08,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1294133.3333333333, ans=0.2 2023-12-23 19:25:08,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1294133.3333333333, ans=0.125 2023-12-23 19:25:09,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1294133.3333333333, ans=0.0 2023-12-23 19:25:23,512 INFO [train.py:886] (0/4) Epoch 41, batch 3500, loss[loss=0.01075, audio_tagging_loss=0.01075, over 24750.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4953143.10 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:25:27,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1294266.6666666667, ans=0.125 2023-12-23 19:26:07,727 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.332e+01 3.699e+01 3.860e+01 4.048e+01 4.617e+01, threshold=7.721e+01, percent-clipped=0.0 2023-12-23 19:26:11,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.20 vs. limit=15.0 2023-12-23 19:26:14,147 INFO [train.py:886] (0/4) Epoch 41, batch 3550, loss[loss=0.00921, audio_tagging_loss=0.00921, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4950604.19 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:26:27,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1294666.6666666667, ans=0.125 2023-12-23 19:26:42,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1294733.3333333333, ans=0.2 2023-12-23 19:26:44,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1294800.0, ans=0.125 2023-12-23 19:26:51,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1294800.0, ans=0.0 2023-12-23 19:26:58,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1294866.6666666667, ans=0.125 2023-12-23 19:27:05,727 INFO [train.py:886] (0/4) Epoch 41, batch 3600, loss[loss=0.009986, audio_tagging_loss=0.009986, over 25000.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4948532.55 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:27:51,208 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.401e+01 3.671e+01 3.840e+01 4.001e+01 4.394e+01, threshold=7.680e+01, percent-clipped=0.0 2023-12-23 19:27:54,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1295200.0, ans=0.125 2023-12-23 19:27:56,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1295200.0, ans=0.0 2023-12-23 19:27:58,368 INFO [train.py:886] (0/4) Epoch 41, batch 3650, loss[loss=0.008314, audio_tagging_loss=0.008314, over 22877.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4949825.56 frames. ], batch size: 107, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:27:58,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1295266.6666666667, ans=0.125 2023-12-23 19:28:01,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.16 vs. limit=22.5 2023-12-23 19:28:10,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1295333.3333333333, ans=0.0 2023-12-23 19:28:11,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2023-12-23 19:28:20,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1295400.0, ans=0.125 2023-12-23 19:28:29,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1295466.6666666667, ans=0.0 2023-12-23 19:28:39,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1295533.3333333333, ans=0.0 2023-12-23 19:28:43,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1295533.3333333333, ans=0.0 2023-12-23 19:28:47,968 INFO [train.py:886] (0/4) Epoch 41, batch 3700, loss[loss=0.01343, audio_tagging_loss=0.01343, over 25000.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4948890.10 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:28:48,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1295600.0, ans=0.0 2023-12-23 19:28:48,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.23 vs. limit=22.5 2023-12-23 19:28:52,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1295600.0, ans=0.125 2023-12-23 19:28:58,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1295666.6666666667, ans=0.0 2023-12-23 19:29:12,742 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=12.0 2023-12-23 19:29:34,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1295866.6666666667, ans=0.125 2023-12-23 19:29:35,183 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.261e+01 3.680e+01 3.875e+01 4.029e+01 4.590e+01, threshold=7.750e+01, percent-clipped=0.0 2023-12-23 19:29:40,933 INFO [train.py:886] (0/4) Epoch 41, batch 3750, loss[loss=0.01249, audio_tagging_loss=0.01249, over 24750.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4948920.22 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:29:47,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1295933.3333333333, ans=0.0 2023-12-23 19:29:57,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1296000.0, ans=0.1 2023-12-23 19:29:59,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1296066.6666666667, ans=0.1 2023-12-23 19:30:05,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.52 vs. limit=10.0 2023-12-23 19:30:30,885 INFO [train.py:886] (0/4) Epoch 41, batch 3800, loss[loss=0.0137, audio_tagging_loss=0.0137, over 24750.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4943048.79 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:30:44,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1296333.3333333333, ans=0.125 2023-12-23 19:30:56,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.01 vs. limit=22.5 2023-12-23 19:31:04,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=12.0 2023-12-23 19:31:06,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1296466.6666666667, ans=0.5 2023-12-23 19:31:17,207 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.336e+01 3.684e+01 3.876e+01 4.085e+01 4.684e+01, threshold=7.752e+01, percent-clipped=0.0 2023-12-23 19:31:22,573 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2023-12-23 19:31:23,022 INFO [train.py:886] (0/4) Epoch 41, batch 3850, loss[loss=0.01115, audio_tagging_loss=0.01115, over 24750.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4940594.68 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:31:43,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1296666.6666666667, ans=0.125 2023-12-23 19:32:04,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1296866.6666666667, ans=0.1 2023-12-23 19:32:10,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1296866.6666666667, ans=0.025 2023-12-23 19:32:11,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1296866.6666666667, ans=0.125 2023-12-23 19:32:16,079 INFO [train.py:886] (0/4) Epoch 41, batch 3900, loss[loss=0.01129, audio_tagging_loss=0.01129, over 24931.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4948775.21 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:32:18,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.75 vs. limit=10.0 2023-12-23 19:32:19,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1296933.3333333333, ans=0.0 2023-12-23 19:32:42,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1297066.6666666667, ans=0.1 2023-12-23 19:32:47,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1297133.3333333333, ans=0.1 2023-12-23 19:33:00,595 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.392e+01 3.712e+01 3.871e+01 3.981e+01 4.576e+01, threshold=7.742e+01, percent-clipped=0.0 2023-12-23 19:33:01,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1297200.0, ans=0.04949747468305833 2023-12-23 19:33:05,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1297200.0, ans=0.05 2023-12-23 19:33:07,017 INFO [train.py:886] (0/4) Epoch 41, batch 3950, loss[loss=0.01005, audio_tagging_loss=0.01005, over 25000.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4953513.50 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:33:23,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1297333.3333333333, ans=0.0 2023-12-23 19:33:56,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1297533.3333333333, ans=0.125 2023-12-23 19:33:57,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=1297533.3333333333, ans=22.5 2023-12-23 19:33:59,513 INFO [train.py:886] (0/4) Epoch 41, batch 4000, loss[loss=0.01217, audio_tagging_loss=0.01217, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4960714.11 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:34:03,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1297600.0, ans=0.125 2023-12-23 19:34:24,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1297733.3333333333, ans=0.125 2023-12-23 19:34:33,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1297800.0, ans=0.0 2023-12-23 19:34:44,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1297866.6666666667, ans=0.125 2023-12-23 19:34:44,943 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.343e+01 3.739e+01 3.854e+01 4.037e+01 4.729e+01, threshold=7.708e+01, percent-clipped=0.0 2023-12-23 19:34:49,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.47 vs. limit=22.5 2023-12-23 19:34:51,360 INFO [train.py:886] (0/4) Epoch 41, batch 4050, loss[loss=0.01206, audio_tagging_loss=0.01206, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4962318.45 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:34:52,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1297933.3333333333, ans=0.125 2023-12-23 19:34:55,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-12-23 19:35:01,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=15.0 2023-12-23 19:35:03,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1298000.0, ans=0.125 2023-12-23 19:35:13,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1298066.6666666667, ans=0.125 2023-12-23 19:35:22,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1298133.3333333333, ans=0.0 2023-12-23 19:35:23,840 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2023-12-23 19:35:36,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1298200.0, ans=0.125 2023-12-23 19:35:42,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1298266.6666666667, ans=0.125 2023-12-23 19:35:43,289 INFO [train.py:886] (0/4) Epoch 41, batch 4100, loss[loss=0.01349, audio_tagging_loss=0.01349, over 25000.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4951586.20 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:35:47,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1298266.6666666667, ans=0.2 2023-12-23 19:35:48,194 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-12-23 19:35:57,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2023-12-23 19:35:57,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1298333.3333333333, ans=0.0 2023-12-23 19:36:18,108 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:36:20,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1298466.6666666667, ans=0.125 2023-12-23 19:36:29,705 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.286e+01 3.678e+01 3.896e+01 4.080e+01 4.675e+01, threshold=7.792e+01, percent-clipped=0.0 2023-12-23 19:36:35,439 INFO [train.py:886] (0/4) Epoch 41, batch 4150, loss[loss=0.01321, audio_tagging_loss=0.01321, over 24750.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4944484.79 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:36:39,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1298600.0, ans=0.0 2023-12-23 19:36:46,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1298666.6666666667, ans=0.125 2023-12-23 19:36:55,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2023-12-23 19:36:58,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1298733.3333333333, ans=0.07 2023-12-23 19:37:03,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1298733.3333333333, ans=0.2 2023-12-23 19:37:08,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1298800.0, ans=0.2 2023-12-23 19:37:25,699 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.53 vs. limit=15.0 2023-12-23 19:37:27,143 INFO [train.py:886] (0/4) Epoch 41, batch 4200, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4947031.76 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 64.0 2023-12-23 19:37:39,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1299000.0, ans=0.1 2023-12-23 19:37:42,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=12.0 2023-12-23 19:37:42,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1299000.0, ans=0.125 2023-12-23 19:37:43,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1299000.0, ans=0.125 2023-12-23 19:38:02,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1299133.3333333333, ans=0.0 2023-12-23 19:38:12,737 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.418e+01 3.759e+01 3.866e+01 4.012e+01 4.707e+01, threshold=7.732e+01, percent-clipped=0.0 2023-12-23 19:38:19,229 INFO [train.py:886] (0/4) Epoch 41, batch 4250, loss[loss=0.01016, audio_tagging_loss=0.01016, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4946319.10 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 64.0 2023-12-23 19:38:22,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1299266.6666666667, ans=0.0 2023-12-23 19:38:24,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1299266.6666666667, ans=0.0 2023-12-23 19:38:37,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1299333.3333333333, ans=0.1 2023-12-23 19:38:42,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1299400.0, ans=0.125 2023-12-23 19:38:44,906 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:38:56,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.79 vs. limit=15.0 2023-12-23 19:39:01,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1299533.3333333333, ans=6.0 2023-12-23 19:39:02,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1299533.3333333333, ans=0.1 2023-12-23 19:39:11,491 INFO [train.py:886] (0/4) Epoch 41, batch 4300, loss[loss=0.009322, audio_tagging_loss=0.009322, over 24068.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4942364.45 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:39:12,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1299600.0, ans=0.125 2023-12-23 19:39:32,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1299733.3333333333, ans=0.2 2023-12-23 19:39:35,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1299733.3333333333, ans=0.0 2023-12-23 19:39:40,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=12.0 2023-12-23 19:39:48,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1299800.0, ans=10.0 2023-12-23 19:39:55,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1299866.6666666667, ans=0.125 2023-12-23 19:39:55,933 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.398e+01 3.671e+01 3.818e+01 3.950e+01 4.671e+01, threshold=7.635e+01, percent-clipped=0.0 2023-12-23 19:40:01,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1299933.3333333333, ans=0.125 2023-12-23 19:40:02,283 INFO [train.py:886] (0/4) Epoch 41, batch 4350, loss[loss=0.0101, audio_tagging_loss=0.0101, over 24750.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4948729.49 frames. ], batch size: 99, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:40:06,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1299933.3333333333, ans=0.0 2023-12-23 19:40:08,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.50 vs. limit=22.5 2023-12-23 19:40:09,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1299933.3333333333, ans=0.0 2023-12-23 19:40:18,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1300000.0, ans=0.1 2023-12-23 19:40:24,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1300066.6666666667, ans=0.125 2023-12-23 19:40:26,874 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2023-12-23 19:40:28,291 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:40:54,469 INFO [train.py:886] (0/4) Epoch 41, batch 4400, loss[loss=0.01147, audio_tagging_loss=0.01147, over 24750.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4949317.06 frames. ], batch size: 99, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:40:59,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1300266.6666666667, ans=0.125 2023-12-23 19:41:11,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1300333.3333333333, ans=0.07 2023-12-23 19:41:26,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=12.0 2023-12-23 19:41:34,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1300533.3333333333, ans=0.125 2023-12-23 19:41:39,976 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.419e+01 3.802e+01 3.969e+01 4.169e+01 5.691e+01, threshold=7.937e+01, percent-clipped=0.0 2023-12-23 19:41:43,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1300533.3333333333, ans=0.1 2023-12-23 19:41:46,422 INFO [train.py:886] (0/4) Epoch 41, batch 4450, loss[loss=0.01148, audio_tagging_loss=0.01148, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4946585.80 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:42:13,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1300733.3333333333, ans=0.2 2023-12-23 19:42:19,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.66 vs. limit=22.5 2023-12-23 19:42:37,318 INFO [train.py:886] (0/4) Epoch 41, batch 4500, loss[loss=0.01106, audio_tagging_loss=0.01106, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4946407.13 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:42:58,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1301066.6666666667, ans=0.1 2023-12-23 19:43:05,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1301066.6666666667, ans=0.125 2023-12-23 19:43:20,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1301200.0, ans=0.1 2023-12-23 19:43:24,739 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.397e+01 3.696e+01 3.844e+01 4.118e+01 4.848e+01, threshold=7.689e+01, percent-clipped=0.0 2023-12-23 19:43:30,445 INFO [train.py:886] (0/4) Epoch 41, batch 4550, loss[loss=0.01062, audio_tagging_loss=0.01062, over 25000.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4945473.87 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:43:37,723 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2023-12-23 19:43:51,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1301400.0, ans=0.09899494936611666 2023-12-23 19:44:10,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1301533.3333333333, ans=0.2 2023-12-23 19:44:10,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1301533.3333333333, ans=0.125 2023-12-23 19:44:21,572 INFO [train.py:886] (0/4) Epoch 41, batch 4600, loss[loss=0.01207, audio_tagging_loss=0.01207, over 24750.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4947815.32 frames. ], batch size: 99, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:44:28,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1301600.0, ans=0.0 2023-12-23 19:44:29,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1301600.0, ans=0.125 2023-12-23 19:44:37,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1301666.6666666667, ans=0.125 2023-12-23 19:44:38,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1301666.6666666667, ans=0.125 2023-12-23 19:44:40,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.13 vs. limit=15.0 2023-12-23 19:45:08,420 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.368e+01 3.753e+01 3.899e+01 4.048e+01 4.729e+01, threshold=7.798e+01, percent-clipped=0.0 2023-12-23 19:45:09,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1301866.6666666667, ans=0.125 2023-12-23 19:45:13,159 INFO [train.py:886] (0/4) Epoch 41, batch 4650, loss[loss=0.01082, audio_tagging_loss=0.01082, over 25000.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4951042.57 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 32.0 2023-12-23 19:45:28,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1302000.0, ans=0.125 2023-12-23 19:45:39,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1302066.6666666667, ans=0.125 2023-12-23 19:45:40,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1302066.6666666667, ans=0.2 2023-12-23 19:45:47,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-12-23 19:45:51,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1302133.3333333333, ans=0.07 2023-12-23 19:45:52,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1302200.0, ans=0.1 2023-12-23 19:45:56,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1302200.0, ans=0.0 2023-12-23 19:45:56,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1302200.0, ans=0.2 2023-12-23 19:45:58,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1302200.0, ans=0.125 2023-12-23 19:45:58,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1302200.0, ans=0.1 2023-12-23 19:46:03,509 INFO [train.py:886] (0/4) Epoch 41, batch 4700, loss[loss=0.01141, audio_tagging_loss=0.01141, over 24750.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4948799.16 frames. ], batch size: 99, lr: 2.60e-03, grad_scale: 32.0 2023-12-23 19:46:03,954 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2023-12-23 19:46:06,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=12.0 2023-12-23 19:46:09,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1302266.6666666667, ans=0.125 2023-12-23 19:46:14,472 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.98 vs. limit=15.0 2023-12-23 19:46:16,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1302333.3333333333, ans=0.025 2023-12-23 19:46:33,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1302466.6666666667, ans=0.125 2023-12-23 19:46:46,306 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.396e+01 3.793e+01 3.987e+01 4.159e+01 5.000e+01, threshold=7.973e+01, percent-clipped=0.0 2023-12-23 19:46:50,805 INFO [train.py:886] (0/4) Epoch 41, batch 4750, loss[loss=0.01207, audio_tagging_loss=0.01207, over 24750.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4945436.71 frames. ], batch size: 99, lr: 2.60e-03, grad_scale: 32.0 2023-12-23 19:46:59,718 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2023-12-23 19:47:06,074 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-41.pt 2023-12-23 19:47:24,817 INFO [train.py:886] (0/4) Epoch 42, batch 0, loss[loss=0.02606, audio_tagging_loss=0.02606, over 24011.00 frames. ], tot_loss[loss=0.02606, audio_tagging_loss=0.02606, over 24011.00 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:47:24,818 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 19:47:45,225 INFO [train.py:917] (0/4) Epoch 42, validation: loss=0.03462, audio_tagging_loss=0.03462, over 3737520.00 frames. 2023-12-23 19:47:45,226 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 19:47:46,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1302706.6666666667, ans=0.1 2023-12-23 19:47:48,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.50 vs. limit=10.0 2023-12-23 19:47:59,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1302773.3333333333, ans=0.025 2023-12-23 19:47:59,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1302773.3333333333, ans=0.07 2023-12-23 19:48:00,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1302773.3333333333, ans=0.0 2023-12-23 19:48:02,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2023-12-23 19:48:04,679 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2023-12-23 19:48:24,175 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.08 vs. limit=22.5 2023-12-23 19:48:24,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.86 vs. limit=6.0 2023-12-23 19:48:37,194 INFO [train.py:886] (0/4) Epoch 42, batch 50, loss[loss=0.015, audio_tagging_loss=0.015, over 25000.00 frames. ], tot_loss[loss=0.0178, audio_tagging_loss=0.0178, over 1113535.41 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:48:47,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1303106.6666666667, ans=0.125 2023-12-23 19:49:08,455 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.767e+01 4.244e+01 4.772e+01 5.544e+01 1.220e+02, threshold=9.545e+01, percent-clipped=3.0 2023-12-23 19:49:14,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.83 vs. limit=12.0 2023-12-23 19:49:16,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1303240.0, ans=0.125 2023-12-23 19:49:28,202 INFO [train.py:886] (0/4) Epoch 42, batch 100, loss[loss=0.01388, audio_tagging_loss=0.01388, over 24880.00 frames. ], tot_loss[loss=0.01546, audio_tagging_loss=0.01546, over 1966062.57 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:49:35,205 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-12-23 19:50:00,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1303573.3333333333, ans=0.2 2023-12-23 19:50:12,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1303640.0, ans=0.2 2023-12-23 19:50:20,324 INFO [train.py:886] (0/4) Epoch 42, batch 150, loss[loss=0.01656, audio_tagging_loss=0.01656, over 25000.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 2632158.39 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:50:40,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1303840.0, ans=0.1 2023-12-23 19:50:41,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1303840.0, ans=0.95 2023-12-23 19:50:49,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1303840.0, ans=0.125 2023-12-23 19:50:51,581 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.377e+01 3.833e+01 4.065e+01 4.320e+01 5.040e+01, threshold=8.130e+01, percent-clipped=0.0 2023-12-23 19:50:53,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1303906.6666666667, ans=0.125 2023-12-23 19:50:58,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1303906.6666666667, ans=0.1 2023-12-23 19:51:01,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1303973.3333333333, ans=0.125 2023-12-23 19:51:07,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1303973.3333333333, ans=0.0 2023-12-23 19:51:12,099 INFO [train.py:886] (0/4) Epoch 42, batch 200, loss[loss=0.01054, audio_tagging_loss=0.01054, over 25000.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 3147676.41 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:51:29,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1304106.6666666667, ans=0.0 2023-12-23 19:51:44,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1304240.0, ans=0.125 2023-12-23 19:51:57,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=1304306.6666666667, ans=0.1 2023-12-23 19:52:00,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1304306.6666666667, ans=0.0 2023-12-23 19:52:03,096 INFO [train.py:886] (0/4) Epoch 42, batch 250, loss[loss=0.01274, audio_tagging_loss=0.01274, over 25000.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 3552042.66 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:52:04,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=22.5 2023-12-23 19:52:07,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1304373.3333333333, ans=0.025 2023-12-23 19:52:21,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1304440.0, ans=0.0 2023-12-23 19:52:23,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1304506.6666666667, ans=0.1 2023-12-23 19:52:26,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1304506.6666666667, ans=0.125 2023-12-23 19:52:26,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1304506.6666666667, ans=0.1 2023-12-23 19:52:34,083 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.363e+01 3.769e+01 3.930e+01 4.156e+01 4.971e+01, threshold=7.859e+01, percent-clipped=0.0 2023-12-23 19:52:36,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1304573.3333333333, ans=0.0 2023-12-23 19:52:42,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1304573.3333333333, ans=0.09899494936611666 2023-12-23 19:52:48,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1304640.0, ans=0.2 2023-12-23 19:52:55,099 INFO [train.py:886] (0/4) Epoch 42, batch 300, loss[loss=0.01232, audio_tagging_loss=0.01232, over 24750.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 3852037.58 frames. ], batch size: 99, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:52:59,123 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2023-12-23 19:53:03,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1304706.6666666667, ans=0.125 2023-12-23 19:53:46,156 INFO [train.py:886] (0/4) Epoch 42, batch 350, loss[loss=0.009902, audio_tagging_loss=0.009902, over 24750.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4090813.56 frames. ], batch size: 99, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:53:51,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1305040.0, ans=0.1 2023-12-23 19:54:02,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1305106.6666666667, ans=0.2 2023-12-23 19:54:06,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1305173.3333333333, ans=0.125 2023-12-23 19:54:08,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1305173.3333333333, ans=0.0 2023-12-23 19:54:13,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1305173.3333333333, ans=0.125 2023-12-23 19:54:17,293 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.289e+01 3.742e+01 3.933e+01 4.114e+01 4.736e+01, threshold=7.865e+01, percent-clipped=0.0 2023-12-23 19:54:35,309 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.97 vs. limit=22.5 2023-12-23 19:54:38,498 INFO [train.py:886] (0/4) Epoch 42, batch 400, loss[loss=0.01145, audio_tagging_loss=0.01145, over 24938.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4280878.92 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:54:41,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1305373.3333333333, ans=0.1 2023-12-23 19:54:52,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1305440.0, ans=0.2 2023-12-23 19:55:26,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1305640.0, ans=0.125 2023-12-23 19:55:30,174 INFO [train.py:886] (0/4) Epoch 42, batch 450, loss[loss=0.01122, audio_tagging_loss=0.01122, over 24750.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4426834.48 frames. ], batch size: 99, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:55:40,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1305773.3333333333, ans=0.0 2023-12-23 19:55:48,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1305773.3333333333, ans=0.125 2023-12-23 19:55:52,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1305840.0, ans=0.0 2023-12-23 19:55:55,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1305840.0, ans=0.0 2023-12-23 19:55:59,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=22.5 2023-12-23 19:56:01,851 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.347e+01 3.725e+01 3.880e+01 4.088e+01 4.948e+01, threshold=7.759e+01, percent-clipped=0.0 2023-12-23 19:56:02,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=12.0 2023-12-23 19:56:22,299 INFO [train.py:886] (0/4) Epoch 42, batch 500, loss[loss=0.01224, audio_tagging_loss=0.01224, over 25000.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4547663.11 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:56:26,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1306040.0, ans=0.125 2023-12-23 19:56:28,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1306040.0, ans=0.0 2023-12-23 19:56:46,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1306173.3333333333, ans=0.125 2023-12-23 19:57:02,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1306240.0, ans=0.125 2023-12-23 19:57:15,186 INFO [train.py:886] (0/4) Epoch 42, batch 550, loss[loss=0.01057, audio_tagging_loss=0.01057, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4639460.66 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:57:16,805 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.82 vs. limit=22.5 2023-12-23 19:57:20,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1306373.3333333333, ans=0.2 2023-12-23 19:57:23,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1306373.3333333333, ans=0.2 2023-12-23 19:57:24,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1306440.0, ans=0.125 2023-12-23 19:57:24,996 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=15.0 2023-12-23 19:57:39,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.13 vs. limit=10.0 2023-12-23 19:57:41,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1306506.6666666667, ans=0.2 2023-12-23 19:57:43,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2023-12-23 19:57:46,366 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.310e+01 3.673e+01 3.825e+01 3.941e+01 4.727e+01, threshold=7.651e+01, percent-clipped=0.0 2023-12-23 19:57:57,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1306640.0, ans=0.125 2023-12-23 19:57:59,726 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-196000.pt 2023-12-23 19:58:02,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1306640.0, ans=0.2 2023-12-23 19:58:09,077 INFO [train.py:886] (0/4) Epoch 42, batch 600, loss[loss=0.01183, audio_tagging_loss=0.01183, over 24750.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4711099.10 frames. ], batch size: 99, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:58:19,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1306773.3333333333, ans=0.125 2023-12-23 19:58:22,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=22.5 2023-12-23 19:58:23,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1306773.3333333333, ans=0.95 2023-12-23 19:58:25,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1306773.3333333333, ans=0.0 2023-12-23 19:58:48,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1306906.6666666667, ans=0.2 2023-12-23 19:58:51,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1306973.3333333333, ans=0.125 2023-12-23 19:59:00,391 INFO [train.py:886] (0/4) Epoch 42, batch 650, loss[loss=0.01003, audio_tagging_loss=0.01003, over 24750.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4756529.76 frames. ], batch size: 99, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:59:04,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=1307040.0, ans=0.1 2023-12-23 19:59:07,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2023-12-23 19:59:31,744 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.438e+01 3.758e+01 3.912e+01 4.116e+01 4.641e+01, threshold=7.823e+01, percent-clipped=0.0 2023-12-23 19:59:37,414 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:59:40,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1307240.0, ans=0.125 2023-12-23 19:59:53,701 INFO [train.py:886] (0/4) Epoch 42, batch 700, loss[loss=0.01027, audio_tagging_loss=0.01027, over 24750.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4794239.58 frames. ], batch size: 99, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 20:00:00,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1307373.3333333333, ans=0.95 2023-12-23 20:00:22,041 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2023-12-23 20:00:26,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1307573.3333333333, ans=0.125 2023-12-23 20:00:45,220 INFO [train.py:886] (0/4) Epoch 42, batch 750, loss[loss=0.01346, audio_tagging_loss=0.01346, over 25000.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4829515.68 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 20:01:09,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1307840.0, ans=0.125 2023-12-23 20:01:15,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1307906.6666666667, ans=0.2 2023-12-23 20:01:16,354 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.283e+01 3.690e+01 3.847e+01 4.054e+01 6.024e+01, threshold=7.694e+01, percent-clipped=0.0 2023-12-23 20:01:18,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1307906.6666666667, ans=0.035 2023-12-23 20:01:24,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1307906.6666666667, ans=0.0 2023-12-23 20:01:37,179 INFO [train.py:886] (0/4) Epoch 42, batch 800, loss[loss=0.01276, audio_tagging_loss=0.01276, over 24750.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4864153.44 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:01:38,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1308040.0, ans=0.1 2023-12-23 20:01:44,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1308040.0, ans=0.1 2023-12-23 20:01:44,427 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2023-12-23 20:01:57,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1308173.3333333333, ans=0.125 2023-12-23 20:02:10,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1308240.0, ans=0.0 2023-12-23 20:02:11,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1308240.0, ans=0.07 2023-12-23 20:02:17,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1308306.6666666667, ans=0.125 2023-12-23 20:02:18,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1308306.6666666667, ans=0.1 2023-12-23 20:02:27,605 INFO [train.py:886] (0/4) Epoch 42, batch 850, loss[loss=0.01045, audio_tagging_loss=0.01045, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4877761.12 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:02:29,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1308373.3333333333, ans=0.125 2023-12-23 20:02:37,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.28 vs. limit=15.0 2023-12-23 20:02:38,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1308440.0, ans=0.0 2023-12-23 20:02:53,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1308506.6666666667, ans=0.125 2023-12-23 20:02:54,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1308506.6666666667, ans=0.0 2023-12-23 20:02:58,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2023-12-23 20:02:58,920 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.387e+01 3.709e+01 3.857e+01 4.015e+01 4.625e+01, threshold=7.713e+01, percent-clipped=0.0 2023-12-23 20:03:15,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1308640.0, ans=0.125 2023-12-23 20:03:19,489 INFO [train.py:886] (0/4) Epoch 42, batch 900, loss[loss=0.01161, audio_tagging_loss=0.01161, over 21530.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4894802.71 frames. ], batch size: 107, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:03:29,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1308773.3333333333, ans=0.1 2023-12-23 20:03:34,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1308773.3333333333, ans=0.1 2023-12-23 20:03:39,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1308773.3333333333, ans=0.0 2023-12-23 20:03:43,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2023-12-23 20:04:12,446 INFO [train.py:886] (0/4) Epoch 42, batch 950, loss[loss=0.01377, audio_tagging_loss=0.01377, over 24750.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4897380.63 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:04:21,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1309106.6666666667, ans=0.125 2023-12-23 20:04:25,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1309106.6666666667, ans=0.0 2023-12-23 20:04:31,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1309106.6666666667, ans=0.0 2023-12-23 20:04:43,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.48 vs. limit=22.5 2023-12-23 20:04:43,680 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.390e+01 3.775e+01 3.915e+01 4.098e+01 4.959e+01, threshold=7.831e+01, percent-clipped=0.0 2023-12-23 20:04:47,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.74 vs. limit=15.0 2023-12-23 20:04:53,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1309306.6666666667, ans=0.1 2023-12-23 20:04:59,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1309306.6666666667, ans=0.1 2023-12-23 20:05:04,325 INFO [train.py:886] (0/4) Epoch 42, batch 1000, loss[loss=0.01005, audio_tagging_loss=0.01005, over 24750.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4902569.85 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:05:15,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1309440.0, ans=0.125 2023-12-23 20:05:18,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1309440.0, ans=0.0 2023-12-23 20:05:27,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1309506.6666666667, ans=0.2 2023-12-23 20:05:46,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1309640.0, ans=0.125 2023-12-23 20:05:49,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1309640.0, ans=15.0 2023-12-23 20:05:56,009 INFO [train.py:886] (0/4) Epoch 42, batch 1050, loss[loss=0.01303, audio_tagging_loss=0.01303, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4914187.06 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:06:13,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1309773.3333333333, ans=0.125 2023-12-23 20:06:22,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1309840.0, ans=0.125 2023-12-23 20:06:23,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1309840.0, ans=0.125 2023-12-23 20:06:27,323 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.280e+01 3.716e+01 3.855e+01 4.036e+01 4.580e+01, threshold=7.710e+01, percent-clipped=0.0 2023-12-23 20:06:35,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1309906.6666666667, ans=15.0 2023-12-23 20:06:36,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1309973.3333333333, ans=0.2 2023-12-23 20:06:42,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1309973.3333333333, ans=0.1 2023-12-23 20:06:48,267 INFO [train.py:886] (0/4) Epoch 42, batch 1100, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4926724.20 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:06:48,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2023-12-23 20:06:52,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1310040.0, ans=0.125 2023-12-23 20:06:56,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1310040.0, ans=0.1 2023-12-23 20:06:58,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1310106.6666666667, ans=0.1 2023-12-23 20:07:07,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1310173.3333333333, ans=0.125 2023-12-23 20:07:12,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1310173.3333333333, ans=0.125 2023-12-23 20:07:19,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1310240.0, ans=0.1 2023-12-23 20:07:30,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1310306.6666666667, ans=0.0 2023-12-23 20:07:38,690 INFO [train.py:886] (0/4) Epoch 42, batch 1150, loss[loss=0.01072, audio_tagging_loss=0.01072, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4937836.91 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:07:39,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1310373.3333333333, ans=0.125 2023-12-23 20:07:50,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1310440.0, ans=0.0 2023-12-23 20:07:53,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2023-12-23 20:08:00,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1310506.6666666667, ans=0.025 2023-12-23 20:08:10,037 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.331e+01 3.706e+01 3.860e+01 4.040e+01 4.470e+01, threshold=7.720e+01, percent-clipped=0.0 2023-12-23 20:08:10,274 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:08:11,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1310573.3333333333, ans=0.0 2023-12-23 20:08:25,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2023-12-23 20:08:31,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.68 vs. limit=22.5 2023-12-23 20:08:32,221 INFO [train.py:886] (0/4) Epoch 42, batch 1200, loss[loss=0.01135, audio_tagging_loss=0.01135, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4944443.73 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:08:54,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1310840.0, ans=0.0 2023-12-23 20:09:21,464 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2023-12-23 20:09:22,918 INFO [train.py:886] (0/4) Epoch 42, batch 1250, loss[loss=0.01091, audio_tagging_loss=0.01091, over 24750.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4945843.47 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:09:37,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1311106.6666666667, ans=0.0 2023-12-23 20:09:50,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1311173.3333333333, ans=0.125 2023-12-23 20:09:53,462 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.422e+01 3.763e+01 3.909e+01 4.123e+01 4.708e+01, threshold=7.819e+01, percent-clipped=0.0 2023-12-23 20:09:54,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2023-12-23 20:10:10,543 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-12-23 20:10:11,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1311306.6666666667, ans=0.2 2023-12-23 20:10:13,801 INFO [train.py:886] (0/4) Epoch 42, batch 1300, loss[loss=0.00978, audio_tagging_loss=0.00978, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4941016.88 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:10:25,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1311440.0, ans=0.1 2023-12-23 20:10:31,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1311440.0, ans=0.125 2023-12-23 20:10:33,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1311440.0, ans=0.0 2023-12-23 20:10:51,958 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=12.0 2023-12-23 20:10:57,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.72 vs. limit=22.5 2023-12-23 20:10:57,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.03 vs. limit=22.5 2023-12-23 20:11:05,800 INFO [train.py:886] (0/4) Epoch 42, batch 1350, loss[loss=0.01379, audio_tagging_loss=0.01379, over 25000.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4943372.78 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:11:10,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1311706.6666666667, ans=0.125 2023-12-23 20:11:22,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.78 vs. limit=8.0 2023-12-23 20:11:23,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1311773.3333333333, ans=0.1 2023-12-23 20:11:25,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1311840.0, ans=0.125 2023-12-23 20:11:36,588 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.337e+01 3.709e+01 3.877e+01 4.071e+01 5.035e+01, threshold=7.754e+01, percent-clipped=0.0 2023-12-23 20:11:46,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1311973.3333333333, ans=0.1 2023-12-23 20:11:49,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1311973.3333333333, ans=0.125 2023-12-23 20:11:50,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1311973.3333333333, ans=0.2 2023-12-23 20:11:53,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.13 vs. limit=15.0 2023-12-23 20:11:57,009 INFO [train.py:886] (0/4) Epoch 42, batch 1400, loss[loss=0.0102, audio_tagging_loss=0.0102, over 25000.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4949200.59 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:11:57,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.43 vs. limit=22.5 2023-12-23 20:12:19,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1312173.3333333333, ans=0.2 2023-12-23 20:12:20,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1312173.3333333333, ans=0.125 2023-12-23 20:12:20,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1312173.3333333333, ans=0.125 2023-12-23 20:12:30,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1312240.0, ans=0.1 2023-12-23 20:12:31,598 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.57 vs. limit=10.0 2023-12-23 20:12:33,293 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:12:37,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1312306.6666666667, ans=0.1 2023-12-23 20:12:42,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1312306.6666666667, ans=0.125 2023-12-23 20:12:43,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1312306.6666666667, ans=0.0 2023-12-23 20:12:49,214 INFO [train.py:886] (0/4) Epoch 42, batch 1450, loss[loss=0.01399, audio_tagging_loss=0.01399, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4952352.25 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:12:54,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1312373.3333333333, ans=0.05 2023-12-23 20:13:02,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1312440.0, ans=0.0 2023-12-23 20:13:06,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1312440.0, ans=0.0 2023-12-23 20:13:20,250 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.348e+01 3.643e+01 3.891e+01 4.028e+01 4.685e+01, threshold=7.783e+01, percent-clipped=0.0 2023-12-23 20:13:39,822 INFO [train.py:886] (0/4) Epoch 42, batch 1500, loss[loss=0.01046, audio_tagging_loss=0.01046, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4957505.87 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:13:40,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1312706.6666666667, ans=0.2 2023-12-23 20:13:48,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1312706.6666666667, ans=0.125 2023-12-23 20:14:11,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1312906.6666666667, ans=0.125 2023-12-23 20:14:32,120 INFO [train.py:886] (0/4) Epoch 42, batch 1550, loss[loss=0.01177, audio_tagging_loss=0.01177, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4955201.30 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:14:37,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1313040.0, ans=0.1 2023-12-23 20:14:47,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1313106.6666666667, ans=15.0 2023-12-23 20:15:03,378 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 3.724e+01 3.969e+01 4.156e+01 4.573e+01, threshold=7.937e+01, percent-clipped=0.0 2023-12-23 20:15:13,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1313306.6666666667, ans=0.1 2023-12-23 20:15:24,374 INFO [train.py:886] (0/4) Epoch 42, batch 1600, loss[loss=0.01324, audio_tagging_loss=0.01324, over 24750.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4952200.82 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:15:25,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1313373.3333333333, ans=0.1 2023-12-23 20:15:35,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1313440.0, ans=0.07 2023-12-23 20:15:39,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1313440.0, ans=0.0 2023-12-23 20:15:46,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1313506.6666666667, ans=0.1 2023-12-23 20:15:46,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1313506.6666666667, ans=0.2 2023-12-23 20:15:47,261 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2023-12-23 20:15:59,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1313573.3333333333, ans=0.0 2023-12-23 20:15:59,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1313573.3333333333, ans=0.125 2023-12-23 20:16:16,131 INFO [train.py:886] (0/4) Epoch 42, batch 1650, loss[loss=0.0113, audio_tagging_loss=0.0113, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4951069.38 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:16:29,141 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:16:29,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1313773.3333333333, ans=0.125 2023-12-23 20:16:39,527 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:16:47,631 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.698e+01 3.919e+01 4.098e+01 5.152e+01, threshold=7.837e+01, percent-clipped=0.0 2023-12-23 20:17:07,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=1314040.0, ans=0.1 2023-12-23 20:17:07,863 INFO [train.py:886] (0/4) Epoch 42, batch 1700, loss[loss=0.009324, audio_tagging_loss=0.009324, over 24035.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4951000.33 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:17:25,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1314106.6666666667, ans=0.125 2023-12-23 20:17:34,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1314173.3333333333, ans=0.0 2023-12-23 20:17:37,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2023-12-23 20:18:00,804 INFO [train.py:886] (0/4) Epoch 42, batch 1750, loss[loss=0.01143, audio_tagging_loss=0.01143, over 24750.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4947063.22 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:18:12,385 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.07 vs. limit=12.0 2023-12-23 20:18:18,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1314440.0, ans=0.125 2023-12-23 20:18:19,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1314440.0, ans=0.1 2023-12-23 20:18:23,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.97 vs. limit=22.5 2023-12-23 20:18:31,277 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.382e+01 3.733e+01 3.893e+01 4.031e+01 4.588e+01, threshold=7.786e+01, percent-clipped=0.0 2023-12-23 20:18:38,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=1314573.3333333333, ans=0.1 2023-12-23 20:18:47,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1314640.0, ans=0.125 2023-12-23 20:18:52,173 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-12-23 20:18:52,555 INFO [train.py:886] (0/4) Epoch 42, batch 1800, loss[loss=0.01017, audio_tagging_loss=0.01017, over 25000.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4950680.59 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:19:02,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.21 vs. limit=15.0 2023-12-23 20:19:09,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1314773.3333333333, ans=0.0 2023-12-23 20:19:10,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1314773.3333333333, ans=0.1 2023-12-23 20:19:12,080 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=12.0 2023-12-23 20:19:28,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1314906.6666666667, ans=0.2 2023-12-23 20:19:35,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1314973.3333333333, ans=0.125 2023-12-23 20:19:45,031 INFO [train.py:886] (0/4) Epoch 42, batch 1850, loss[loss=0.01228, audio_tagging_loss=0.01228, over 24948.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4947016.88 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:19:48,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1315040.0, ans=0.0 2023-12-23 20:19:49,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1315040.0, ans=0.1 2023-12-23 20:19:52,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1315040.0, ans=0.125 2023-12-23 20:19:53,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1315106.6666666667, ans=0.125 2023-12-23 20:20:05,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1315173.3333333333, ans=0.125 2023-12-23 20:20:13,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1315173.3333333333, ans=0.0 2023-12-23 20:20:15,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1315240.0, ans=0.125 2023-12-23 20:20:16,576 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.366e+01 3.781e+01 3.954e+01 4.123e+01 5.023e+01, threshold=7.908e+01, percent-clipped=0.0 2023-12-23 20:20:18,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1315240.0, ans=0.125 2023-12-23 20:20:37,026 INFO [train.py:886] (0/4) Epoch 42, batch 1900, loss[loss=0.009688, audio_tagging_loss=0.009688, over 24027.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4941782.57 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:21:09,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1315573.3333333333, ans=0.125 2023-12-23 20:21:19,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.65 vs. limit=15.0 2023-12-23 20:21:26,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1315640.0, ans=0.125 2023-12-23 20:21:28,614 INFO [train.py:886] (0/4) Epoch 42, batch 1950, loss[loss=0.009388, audio_tagging_loss=0.009388, over 24750.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4943766.82 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:21:40,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1315773.3333333333, ans=0.0 2023-12-23 20:21:45,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1315773.3333333333, ans=0.2 2023-12-23 20:21:45,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=12.0 2023-12-23 20:21:47,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2023-12-23 20:21:50,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1315840.0, ans=0.1 2023-12-23 20:21:51,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1315840.0, ans=0.125 2023-12-23 20:21:59,664 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.467e+01 3.743e+01 3.848e+01 4.008e+01 4.780e+01, threshold=7.697e+01, percent-clipped=0.0 2023-12-23 20:22:10,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1315973.3333333333, ans=0.125 2023-12-23 20:22:15,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1315973.3333333333, ans=0.125 2023-12-23 20:22:15,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1315973.3333333333, ans=0.0 2023-12-23 20:22:21,586 INFO [train.py:886] (0/4) Epoch 42, batch 2000, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4946540.40 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:23:04,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1316306.6666666667, ans=0.0 2023-12-23 20:23:07,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1316306.6666666667, ans=0.0 2023-12-23 20:23:11,330 INFO [train.py:886] (0/4) Epoch 42, batch 2050, loss[loss=0.008981, audio_tagging_loss=0.008981, over 25000.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4947519.58 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:23:34,722 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2023-12-23 20:23:41,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2023-12-23 20:23:41,641 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.285e+01 3.684e+01 3.851e+01 4.044e+01 4.837e+01, threshold=7.702e+01, percent-clipped=0.0 2023-12-23 20:23:56,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1316640.0, ans=0.125 2023-12-23 20:24:02,927 INFO [train.py:886] (0/4) Epoch 42, batch 2100, loss[loss=0.01058, audio_tagging_loss=0.01058, over 24750.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4949613.37 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:24:07,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1316706.6666666667, ans=15.0 2023-12-23 20:24:28,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1316840.0, ans=0.1 2023-12-23 20:24:46,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1316973.3333333333, ans=0.125 2023-12-23 20:24:53,620 INFO [train.py:886] (0/4) Epoch 42, batch 2150, loss[loss=0.01097, audio_tagging_loss=0.01097, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4953591.75 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:24:53,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1317040.0, ans=0.0 2023-12-23 20:24:53,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1317040.0, ans=0.04949747468305833 2023-12-23 20:25:02,123 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=15.0 2023-12-23 20:25:03,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1317106.6666666667, ans=0.04949747468305833 2023-12-23 20:25:08,876 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2023-12-23 20:25:15,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1317173.3333333333, ans=0.1 2023-12-23 20:25:16,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1317173.3333333333, ans=0.0 2023-12-23 20:25:17,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1317173.3333333333, ans=0.2 2023-12-23 20:25:24,374 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.516e+01 3.741e+01 3.904e+01 4.093e+01 4.524e+01, threshold=7.808e+01, percent-clipped=0.0 2023-12-23 20:25:31,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1317240.0, ans=10.0 2023-12-23 20:25:35,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1317306.6666666667, ans=0.0 2023-12-23 20:25:45,640 INFO [train.py:886] (0/4) Epoch 42, batch 2200, loss[loss=0.0123, audio_tagging_loss=0.0123, over 24750.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4953000.44 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:25:52,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=1317373.3333333333, ans=0.1 2023-12-23 20:25:56,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-12-23 20:26:01,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1317440.0, ans=0.035 2023-12-23 20:26:08,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1317506.6666666667, ans=0.125 2023-12-23 20:26:35,530 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2023-12-23 20:26:36,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1317706.6666666667, ans=0.125 2023-12-23 20:26:37,035 INFO [train.py:886] (0/4) Epoch 42, batch 2250, loss[loss=0.01235, audio_tagging_loss=0.01235, over 24750.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4945207.84 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:26:45,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1317773.3333333333, ans=0.0 2023-12-23 20:26:48,876 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-12-23 20:26:54,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2023-12-23 20:27:05,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1317840.0, ans=0.1 2023-12-23 20:27:07,751 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.424e+01 3.775e+01 3.919e+01 4.105e+01 4.695e+01, threshold=7.838e+01, percent-clipped=0.0 2023-12-23 20:27:26,580 INFO [train.py:886] (0/4) Epoch 42, batch 2300, loss[loss=0.007466, audio_tagging_loss=0.007466, over 25000.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4944582.75 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:27:44,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1318106.6666666667, ans=0.0 2023-12-23 20:28:01,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1318240.0, ans=0.125 2023-12-23 20:28:03,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1318240.0, ans=0.1 2023-12-23 20:28:12,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1318306.6666666667, ans=0.125 2023-12-23 20:28:20,060 INFO [train.py:886] (0/4) Epoch 42, batch 2350, loss[loss=0.0107, audio_tagging_loss=0.0107, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4943263.43 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:28:25,376 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=22.5 2023-12-23 20:28:41,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1318506.6666666667, ans=0.125 2023-12-23 20:28:47,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1318506.6666666667, ans=0.2 2023-12-23 20:28:49,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1318573.3333333333, ans=0.0 2023-12-23 20:28:50,511 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.647e+01 3.807e+01 4.043e+01 4.652e+01, threshold=7.613e+01, percent-clipped=0.0 2023-12-23 20:29:12,144 INFO [train.py:886] (0/4) Epoch 42, batch 2400, loss[loss=0.01287, audio_tagging_loss=0.01287, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4951084.11 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:29:17,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1318706.6666666667, ans=0.2 2023-12-23 20:29:18,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1318706.6666666667, ans=0.125 2023-12-23 20:29:19,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2023-12-23 20:29:22,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1318773.3333333333, ans=0.0 2023-12-23 20:29:24,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1318773.3333333333, ans=0.1 2023-12-23 20:29:26,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-12-23 20:29:27,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=1318773.3333333333, ans=0.1 2023-12-23 20:29:46,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1318906.6666666667, ans=0.0 2023-12-23 20:29:51,247 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:29:57,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.29 vs. limit=6.0 2023-12-23 20:29:59,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.08 vs. limit=10.0 2023-12-23 20:30:03,217 INFO [train.py:886] (0/4) Epoch 42, batch 2450, loss[loss=0.01413, audio_tagging_loss=0.01413, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4957344.52 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:30:03,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1319040.0, ans=0.125 2023-12-23 20:30:23,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1319173.3333333333, ans=0.125 2023-12-23 20:30:29,493 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2023-12-23 20:30:34,569 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.396e+01 3.760e+01 3.936e+01 4.132e+01 5.379e+01, threshold=7.871e+01, percent-clipped=0.0 2023-12-23 20:30:36,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2023-12-23 20:30:49,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1319306.6666666667, ans=0.05 2023-12-23 20:30:54,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=15.0 2023-12-23 20:30:55,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=12.0 2023-12-23 20:30:55,605 INFO [train.py:886] (0/4) Epoch 42, batch 2500, loss[loss=0.01346, audio_tagging_loss=0.01346, over 24750.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4953207.38 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:30:55,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1319373.3333333333, ans=0.05 2023-12-23 20:31:12,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1319440.0, ans=0.125 2023-12-23 20:31:32,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1319573.3333333333, ans=0.125 2023-12-23 20:31:38,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1319640.0, ans=0.125 2023-12-23 20:31:43,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1319640.0, ans=0.2 2023-12-23 20:31:46,300 INFO [train.py:886] (0/4) Epoch 42, batch 2550, loss[loss=0.0134, audio_tagging_loss=0.0134, over 24750.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4950390.38 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:32:04,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1319773.3333333333, ans=0.125 2023-12-23 20:32:13,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1319840.0, ans=0.125 2023-12-23 20:32:17,485 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.464e+01 3.824e+01 3.969e+01 4.161e+01 4.691e+01, threshold=7.938e+01, percent-clipped=0.0 2023-12-23 20:32:27,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1319906.6666666667, ans=0.0 2023-12-23 20:32:39,489 INFO [train.py:886] (0/4) Epoch 42, batch 2600, loss[loss=0.009652, audio_tagging_loss=0.009652, over 24750.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4945501.17 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:32:52,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1320106.6666666667, ans=0.125 2023-12-23 20:33:20,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1320306.6666666667, ans=0.1 2023-12-23 20:33:31,689 INFO [train.py:886] (0/4) Epoch 42, batch 2650, loss[loss=0.01067, audio_tagging_loss=0.01067, over 24750.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4948014.74 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:34:02,587 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.266e+01 3.676e+01 3.866e+01 4.014e+01 4.776e+01, threshold=7.733e+01, percent-clipped=0.0 2023-12-23 20:34:04,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1320573.3333333333, ans=0.2 2023-12-23 20:34:22,619 INFO [train.py:886] (0/4) Epoch 42, batch 2700, loss[loss=0.01093, audio_tagging_loss=0.01093, over 25000.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4948907.78 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:34:23,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=12.0 2023-12-23 20:34:35,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1320773.3333333333, ans=0.2 2023-12-23 20:34:51,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1320840.0, ans=0.2 2023-12-23 20:34:56,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1320906.6666666667, ans=0.125 2023-12-23 20:34:57,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1320906.6666666667, ans=0.125 2023-12-23 20:34:57,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1320906.6666666667, ans=0.1 2023-12-23 20:35:15,199 INFO [train.py:886] (0/4) Epoch 42, batch 2750, loss[loss=0.01234, audio_tagging_loss=0.01234, over 25000.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4946536.97 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:35:17,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1321040.0, ans=0.125 2023-12-23 20:35:19,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1321040.0, ans=0.125 2023-12-23 20:35:20,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1321040.0, ans=0.1 2023-12-23 20:35:23,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1321040.0, ans=0.0 2023-12-23 20:35:24,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1321106.6666666667, ans=0.1 2023-12-23 20:35:34,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1321173.3333333333, ans=0.2 2023-12-23 20:35:35,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1321173.3333333333, ans=0.125 2023-12-23 20:35:35,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2023-12-23 20:35:46,366 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.326e+01 3.739e+01 3.858e+01 4.112e+01 5.048e+01, threshold=7.715e+01, percent-clipped=0.0 2023-12-23 20:36:05,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2023-12-23 20:36:07,104 INFO [train.py:886] (0/4) Epoch 42, batch 2800, loss[loss=0.01106, audio_tagging_loss=0.01106, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4949828.03 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:36:12,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1321373.3333333333, ans=0.0 2023-12-23 20:36:19,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1321440.0, ans=0.0 2023-12-23 20:36:20,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1321440.0, ans=0.125 2023-12-23 20:36:27,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1321506.6666666667, ans=0.125 2023-12-23 20:36:52,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1321640.0, ans=0.2 2023-12-23 20:36:55,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1321640.0, ans=0.0 2023-12-23 20:36:59,091 INFO [train.py:886] (0/4) Epoch 42, batch 2850, loss[loss=0.01123, audio_tagging_loss=0.01123, over 24048.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4946719.27 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:37:03,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1321706.6666666667, ans=0.0 2023-12-23 20:37:29,693 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.343e+01 3.804e+01 3.965e+01 4.086e+01 5.143e+01, threshold=7.930e+01, percent-clipped=0.0 2023-12-23 20:37:32,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1321906.6666666667, ans=0.0 2023-12-23 20:37:38,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1321906.6666666667, ans=0.0 2023-12-23 20:37:40,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1321973.3333333333, ans=0.0 2023-12-23 20:37:51,428 INFO [train.py:886] (0/4) Epoch 42, batch 2900, loss[loss=0.01086, audio_tagging_loss=0.01086, over 25000.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4945868.07 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:37:53,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1322040.0, ans=0.125 2023-12-23 20:37:56,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2023-12-23 20:38:05,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1322106.6666666667, ans=0.025 2023-12-23 20:38:09,246 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:38:15,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1322173.3333333333, ans=0.125 2023-12-23 20:38:24,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1322240.0, ans=0.125 2023-12-23 20:38:41,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1322373.3333333333, ans=0.125 2023-12-23 20:38:42,608 INFO [train.py:886] (0/4) Epoch 42, batch 2950, loss[loss=0.01253, audio_tagging_loss=0.01253, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4943635.99 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:39:06,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.12 vs. limit=22.5 2023-12-23 20:39:07,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1322506.6666666667, ans=0.125 2023-12-23 20:39:13,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1322573.3333333333, ans=0.0 2023-12-23 20:39:14,049 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.161e+01 3.693e+01 3.854e+01 4.054e+01 4.408e+01, threshold=7.709e+01, percent-clipped=0.0 2023-12-23 20:39:26,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1322640.0, ans=0.125 2023-12-23 20:39:35,454 INFO [train.py:886] (0/4) Epoch 42, batch 3000, loss[loss=0.009325, audio_tagging_loss=0.009325, over 25000.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4948820.40 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:39:35,455 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 20:39:50,194 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6926, 2.9100, 3.7193, 3.6361], device='cuda:0') 2023-12-23 20:39:56,133 INFO [train.py:917] (0/4) Epoch 42, validation: loss=0.03585, audio_tagging_loss=0.03585, over 3737520.00 frames. 2023-12-23 20:39:56,134 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 20:39:59,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1322706.6666666667, ans=0.1 2023-12-23 20:40:00,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.94 vs. limit=15.0 2023-12-23 20:40:03,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1322706.6666666667, ans=0.0 2023-12-23 20:40:46,848 INFO [train.py:886] (0/4) Epoch 42, batch 3050, loss[loss=0.008957, audio_tagging_loss=0.008957, over 24750.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4946013.48 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:41:17,727 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.294e+01 3.719e+01 3.883e+01 4.038e+01 4.632e+01, threshold=7.765e+01, percent-clipped=0.0 2023-12-23 20:41:32,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1323306.6666666667, ans=0.0 2023-12-23 20:41:39,286 INFO [train.py:886] (0/4) Epoch 42, batch 3100, loss[loss=0.01389, audio_tagging_loss=0.01389, over 24750.00 frames. ], tot_loss[loss=0.01123, audio_tagging_loss=0.01123, over 4952831.65 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:41:44,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1323373.3333333333, ans=0.0 2023-12-23 20:41:53,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1323440.0, ans=0.05 2023-12-23 20:41:55,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1323440.0, ans=0.125 2023-12-23 20:42:03,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1323506.6666666667, ans=0.1 2023-12-23 20:42:23,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1323640.0, ans=0.125 2023-12-23 20:42:31,533 INFO [train.py:886] (0/4) Epoch 42, batch 3150, loss[loss=0.009617, audio_tagging_loss=0.009617, over 23096.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4942461.78 frames. ], batch size: 107, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:42:31,780 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:42:39,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1323706.6666666667, ans=0.125 2023-12-23 20:42:41,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1323773.3333333333, ans=0.1 2023-12-23 20:42:46,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1323773.3333333333, ans=0.125 2023-12-23 20:42:46,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1323773.3333333333, ans=0.125 2023-12-23 20:43:01,736 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.454e+01 3.789e+01 3.951e+01 4.110e+01 5.774e+01, threshold=7.902e+01, percent-clipped=0.0 2023-12-23 20:43:19,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1323973.3333333333, ans=0.125 2023-12-23 20:43:21,067 INFO [train.py:886] (0/4) Epoch 42, batch 3200, loss[loss=0.008421, audio_tagging_loss=0.008421, over 24750.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4941606.67 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:43:40,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1324106.6666666667, ans=0.0 2023-12-23 20:43:43,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1324173.3333333333, ans=0.125 2023-12-23 20:43:51,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2023-12-23 20:43:57,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2023-12-23 20:44:01,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1324306.6666666667, ans=0.125 2023-12-23 20:44:13,409 INFO [train.py:886] (0/4) Epoch 42, batch 3250, loss[loss=0.01242, audio_tagging_loss=0.01242, over 24750.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4944870.84 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:44:22,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1324440.0, ans=0.0 2023-12-23 20:44:44,155 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.306e+01 3.683e+01 3.870e+01 4.032e+01 4.579e+01, threshold=7.741e+01, percent-clipped=0.0 2023-12-23 20:44:51,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1324573.3333333333, ans=0.0 2023-12-23 20:44:55,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1324640.0, ans=0.0 2023-12-23 20:45:03,260 INFO [train.py:886] (0/4) Epoch 42, batch 3300, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4950700.92 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:45:03,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1324706.6666666667, ans=0.125 2023-12-23 20:45:07,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1324706.6666666667, ans=0.2 2023-12-23 20:45:11,299 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:45:15,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0 2023-12-23 20:45:20,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1324773.3333333333, ans=0.125 2023-12-23 20:45:39,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1324906.6666666667, ans=0.2 2023-12-23 20:45:48,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1324973.3333333333, ans=0.0 2023-12-23 20:45:53,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1324973.3333333333, ans=0.0 2023-12-23 20:45:55,686 INFO [train.py:886] (0/4) Epoch 42, batch 3350, loss[loss=0.01234, audio_tagging_loss=0.01234, over 25000.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4954597.65 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:45:56,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1325040.0, ans=0.035 2023-12-23 20:46:27,256 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.349e+01 3.683e+01 3.878e+01 4.092e+01 4.733e+01, threshold=7.755e+01, percent-clipped=0.0 2023-12-23 20:46:29,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1325240.0, ans=0.125 2023-12-23 20:46:48,520 INFO [train.py:886] (0/4) Epoch 42, batch 3400, loss[loss=0.01018, audio_tagging_loss=0.01018, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4963601.04 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:46:50,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1325373.3333333333, ans=0.0 2023-12-23 20:46:56,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.40 vs. limit=6.0 2023-12-23 20:46:56,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=12.0 2023-12-23 20:47:00,773 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:47:00,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1325440.0, ans=0.125 2023-12-23 20:47:22,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1325573.3333333333, ans=0.0 2023-12-23 20:47:27,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1325573.3333333333, ans=0.0 2023-12-23 20:47:37,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1325706.6666666667, ans=0.0 2023-12-23 20:47:38,511 INFO [train.py:886] (0/4) Epoch 42, batch 3450, loss[loss=0.01147, audio_tagging_loss=0.01147, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4962987.71 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:47:48,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1325773.3333333333, ans=0.125 2023-12-23 20:48:08,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1325840.0, ans=0.0 2023-12-23 20:48:09,691 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.370e+01 3.738e+01 3.925e+01 4.101e+01 5.009e+01, threshold=7.849e+01, percent-clipped=0.0 2023-12-23 20:48:22,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1325973.3333333333, ans=0.1 2023-12-23 20:48:31,447 INFO [train.py:886] (0/4) Epoch 42, batch 3500, loss[loss=0.01148, audio_tagging_loss=0.01148, over 24750.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4954529.69 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:48:32,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=15.0 2023-12-23 20:48:35,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1326040.0, ans=0.125 2023-12-23 20:49:07,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1326240.0, ans=0.05 2023-12-23 20:49:15,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1326306.6666666667, ans=0.1 2023-12-23 20:49:22,271 INFO [train.py:886] (0/4) Epoch 42, batch 3550, loss[loss=0.008163, audio_tagging_loss=0.008163, over 24750.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4948621.82 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:49:29,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2023-12-23 20:49:36,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1326440.0, ans=0.125 2023-12-23 20:49:42,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1326506.6666666667, ans=0.125 2023-12-23 20:49:46,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2023-12-23 20:49:48,958 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.46 vs. limit=6.0 2023-12-23 20:49:53,672 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.537e+01 3.732e+01 3.866e+01 4.048e+01 4.845e+01, threshold=7.731e+01, percent-clipped=0.0 2023-12-23 20:50:14,571 INFO [train.py:886] (0/4) Epoch 42, batch 3600, loss[loss=0.0107, audio_tagging_loss=0.0107, over 25000.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4951066.16 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:50:15,767 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:50:25,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1326773.3333333333, ans=0.0 2023-12-23 20:50:33,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1326773.3333333333, ans=0.025 2023-12-23 20:50:42,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1326840.0, ans=0.0 2023-12-23 20:51:07,219 INFO [train.py:886] (0/4) Epoch 42, batch 3650, loss[loss=0.01204, audio_tagging_loss=0.01204, over 24750.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4958311.69 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:51:19,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1327106.6666666667, ans=0.125 2023-12-23 20:51:24,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.73 vs. limit=15.0 2023-12-23 20:51:31,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1327173.3333333333, ans=10.0 2023-12-23 20:51:38,587 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.336e+01 3.685e+01 3.869e+01 4.096e+01 4.964e+01, threshold=7.739e+01, percent-clipped=0.0 2023-12-23 20:51:58,199 INFO [train.py:886] (0/4) Epoch 42, batch 3700, loss[loss=0.01196, audio_tagging_loss=0.01196, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4958706.81 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:52:00,331 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.212e-01 2023-12-23 20:52:01,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1327373.3333333333, ans=0.125 2023-12-23 20:52:06,813 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.13 vs. limit=15.0 2023-12-23 20:52:23,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1327506.6666666667, ans=0.0 2023-12-23 20:52:25,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=15.0 2023-12-23 20:52:37,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1327573.3333333333, ans=0.95 2023-12-23 20:52:45,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1327640.0, ans=0.125 2023-12-23 20:52:50,630 INFO [train.py:886] (0/4) Epoch 42, batch 3750, loss[loss=0.01003, audio_tagging_loss=0.01003, over 24750.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4954600.14 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:52:58,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1327706.6666666667, ans=6.0 2023-12-23 20:53:15,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1327840.0, ans=0.0 2023-12-23 20:53:21,793 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.364e+01 3.847e+01 3.984e+01 4.181e+01 4.844e+01, threshold=7.969e+01, percent-clipped=0.0 2023-12-23 20:53:28,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1327906.6666666667, ans=0.0 2023-12-23 20:53:35,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1327973.3333333333, ans=0.0 2023-12-23 20:53:42,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1328040.0, ans=0.125 2023-12-23 20:53:43,715 INFO [train.py:886] (0/4) Epoch 42, batch 3800, loss[loss=0.01322, audio_tagging_loss=0.01322, over 24750.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4945454.73 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:53:57,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1328106.6666666667, ans=0.1 2023-12-23 20:54:05,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1328173.3333333333, ans=0.1 2023-12-23 20:54:12,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1328173.3333333333, ans=0.125 2023-12-23 20:54:19,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1328240.0, ans=0.125 2023-12-23 20:54:30,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1328306.6666666667, ans=0.0 2023-12-23 20:54:34,333 INFO [train.py:886] (0/4) Epoch 42, batch 3850, loss[loss=0.009404, audio_tagging_loss=0.009404, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4949871.22 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:54:41,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1328373.3333333333, ans=0.1 2023-12-23 20:54:42,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2023-12-23 20:54:56,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1328506.6666666667, ans=0.0 2023-12-23 20:55:05,490 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.259e+01 3.729e+01 3.866e+01 4.038e+01 4.994e+01, threshold=7.731e+01, percent-clipped=0.0 2023-12-23 20:55:05,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1328573.3333333333, ans=0.09899494936611666 2023-12-23 20:55:05,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-12-23 20:55:21,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1328640.0, ans=0.2 2023-12-23 20:55:26,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.73 vs. limit=6.0 2023-12-23 20:55:26,612 INFO [train.py:886] (0/4) Epoch 42, batch 3900, loss[loss=0.0107, audio_tagging_loss=0.0107, over 25000.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4951539.56 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:55:31,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1328706.6666666667, ans=0.1 2023-12-23 20:56:10,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1328973.3333333333, ans=0.125 2023-12-23 20:56:13,012 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:56:17,557 INFO [train.py:886] (0/4) Epoch 42, batch 3950, loss[loss=0.009492, audio_tagging_loss=0.009492, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4948191.17 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:56:34,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1329106.6666666667, ans=0.125 2023-12-23 20:56:38,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1329173.3333333333, ans=0.1 2023-12-23 20:56:46,010 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-12-23 20:56:49,246 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.418e+01 3.779e+01 3.890e+01 4.083e+01 4.663e+01, threshold=7.780e+01, percent-clipped=0.0 2023-12-23 20:56:52,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1329240.0, ans=0.0 2023-12-23 20:57:10,634 INFO [train.py:886] (0/4) Epoch 42, batch 4000, loss[loss=0.01066, audio_tagging_loss=0.01066, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4951317.39 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:57:10,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1329373.3333333333, ans=0.125 2023-12-23 20:57:24,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1329440.0, ans=0.125 2023-12-23 20:57:38,356 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.54 vs. limit=15.0 2023-12-23 20:57:40,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1329573.3333333333, ans=0.0 2023-12-23 20:57:44,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1329573.3333333333, ans=0.125 2023-12-23 20:57:46,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.54 vs. limit=22.5 2023-12-23 20:57:49,602 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.36 vs. limit=15.0 2023-12-23 20:57:54,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1329640.0, ans=0.125 2023-12-23 20:57:54,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1329640.0, ans=0.1 2023-12-23 20:58:03,476 INFO [train.py:886] (0/4) Epoch 42, batch 4050, loss[loss=0.009543, audio_tagging_loss=0.009543, over 24054.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4947414.40 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:58:13,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1329773.3333333333, ans=0.125 2023-12-23 20:58:15,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1329773.3333333333, ans=0.1 2023-12-23 20:58:21,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1329773.3333333333, ans=0.05 2023-12-23 20:58:23,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1329840.0, ans=0.1 2023-12-23 20:58:27,321 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.99 vs. limit=15.0 2023-12-23 20:58:32,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1329840.0, ans=0.125 2023-12-23 20:58:35,418 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 3.762e+01 3.929e+01 4.087e+01 5.094e+01, threshold=7.857e+01, percent-clipped=0.0 2023-12-23 20:58:47,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1329973.3333333333, ans=0.1 2023-12-23 20:58:48,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1329973.3333333333, ans=0.0 2023-12-23 20:58:48,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1329973.3333333333, ans=0.125 2023-12-23 20:58:53,483 INFO [train.py:886] (0/4) Epoch 42, batch 4100, loss[loss=0.009247, audio_tagging_loss=0.009247, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4943477.16 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:58:55,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1330040.0, ans=0.0 2023-12-23 20:59:06,250 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:59:20,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1330173.3333333333, ans=0.2 2023-12-23 20:59:45,163 INFO [train.py:886] (0/4) Epoch 42, batch 4150, loss[loss=0.00963, audio_tagging_loss=0.00963, over 24750.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4942454.53 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:59:54,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1330440.0, ans=0.125 2023-12-23 21:00:06,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1330506.6666666667, ans=0.125 2023-12-23 21:00:16,781 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.428e+01 3.800e+01 3.926e+01 4.112e+01 4.569e+01, threshold=7.851e+01, percent-clipped=0.0 2023-12-23 21:00:18,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2023-12-23 21:00:25,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1330640.0, ans=0.125 2023-12-23 21:00:25,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1330640.0, ans=0.125 2023-12-23 21:00:28,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1330640.0, ans=0.0 2023-12-23 21:00:36,530 INFO [train.py:886] (0/4) Epoch 42, batch 4200, loss[loss=0.0101, audio_tagging_loss=0.0101, over 25000.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4941335.97 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:00:41,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1330706.6666666667, ans=0.125 2023-12-23 21:01:00,267 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=12.0 2023-12-23 21:01:08,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1330906.6666666667, ans=0.125 2023-12-23 21:01:26,731 INFO [train.py:886] (0/4) Epoch 42, batch 4250, loss[loss=0.0102, audio_tagging_loss=0.0102, over 25000.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4944256.35 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:01:42,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1331106.6666666667, ans=0.2 2023-12-23 21:01:47,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1331173.3333333333, ans=0.125 2023-12-23 21:01:58,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1331240.0, ans=0.125 2023-12-23 21:01:59,510 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.477e+01 3.725e+01 3.912e+01 4.109e+01 5.134e+01, threshold=7.824e+01, percent-clipped=0.0 2023-12-23 21:02:07,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1331306.6666666667, ans=0.0 2023-12-23 21:02:18,806 INFO [train.py:886] (0/4) Epoch 42, batch 4300, loss[loss=0.01029, audio_tagging_loss=0.01029, over 25000.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4954323.70 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:02:20,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1331373.3333333333, ans=0.09899494936611666 2023-12-23 21:02:25,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1331373.3333333333, ans=0.125 2023-12-23 21:02:26,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1331373.3333333333, ans=0.0 2023-12-23 21:02:29,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1331440.0, ans=0.0 2023-12-23 21:02:32,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1331440.0, ans=0.125 2023-12-23 21:02:51,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1331573.3333333333, ans=0.125 2023-12-23 21:03:09,451 INFO [train.py:886] (0/4) Epoch 42, batch 4350, loss[loss=0.009731, audio_tagging_loss=0.009731, over 24750.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4958916.17 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:03:10,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1331706.6666666667, ans=0.0 2023-12-23 21:03:14,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1331706.6666666667, ans=0.04949747468305833 2023-12-23 21:03:20,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1331773.3333333333, ans=0.125 2023-12-23 21:03:32,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.01 vs. limit=15.0 2023-12-23 21:03:37,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1331840.0, ans=0.04949747468305833 2023-12-23 21:03:37,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1331840.0, ans=0.0 2023-12-23 21:03:41,578 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.795e+01 3.947e+01 4.145e+01 4.884e+01, threshold=7.894e+01, percent-clipped=0.0 2023-12-23 21:03:41,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1331906.6666666667, ans=0.0 2023-12-23 21:03:50,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1331973.3333333333, ans=0.1 2023-12-23 21:03:57,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1331973.3333333333, ans=0.125 2023-12-23 21:04:01,126 INFO [train.py:886] (0/4) Epoch 42, batch 4400, loss[loss=0.009907, audio_tagging_loss=0.009907, over 24750.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4954099.24 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:04:04,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.20 vs. limit=22.5 2023-12-23 21:04:15,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1332106.6666666667, ans=0.0 2023-12-23 21:04:15,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1332106.6666666667, ans=0.0 2023-12-23 21:04:30,655 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=12.0 2023-12-23 21:04:52,964 INFO [train.py:886] (0/4) Epoch 42, batch 4450, loss[loss=0.01209, audio_tagging_loss=0.01209, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4950058.15 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:05:03,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1332440.0, ans=0.1 2023-12-23 21:05:14,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1332506.6666666667, ans=0.0 2023-12-23 21:05:25,825 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=12.0 2023-12-23 21:05:26,397 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.459e+01 3.806e+01 3.979e+01 4.216e+01 4.903e+01, threshold=7.957e+01, percent-clipped=0.0 2023-12-23 21:05:35,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-12-23 21:05:43,549 INFO [train.py:886] (0/4) Epoch 42, batch 4500, loss[loss=0.01137, audio_tagging_loss=0.01137, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4948673.51 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:05:47,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1332706.6666666667, ans=0.5 2023-12-23 21:06:00,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=15.0 2023-12-23 21:06:02,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1332773.3333333333, ans=0.0 2023-12-23 21:06:10,602 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.72 vs. limit=15.0 2023-12-23 21:06:17,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1332906.6666666667, ans=0.125 2023-12-23 21:06:36,315 INFO [train.py:886] (0/4) Epoch 42, batch 4550, loss[loss=0.01079, audio_tagging_loss=0.01079, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4944844.83 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:06:36,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1333040.0, ans=0.125 2023-12-23 21:06:49,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1333106.6666666667, ans=0.125 2023-12-23 21:06:57,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1333173.3333333333, ans=0.2 2023-12-23 21:07:02,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.39 vs. limit=22.5 2023-12-23 21:07:09,416 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.368e+01 3.739e+01 3.901e+01 4.086e+01 5.054e+01, threshold=7.802e+01, percent-clipped=0.0 2023-12-23 21:07:12,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2023-12-23 21:07:18,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1333306.6666666667, ans=0.125 2023-12-23 21:07:20,119 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-200000.pt 2023-12-23 21:07:23,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1333306.6666666667, ans=0.125 2023-12-23 21:07:28,504 INFO [train.py:886] (0/4) Epoch 42, batch 4600, loss[loss=0.01153, audio_tagging_loss=0.01153, over 24750.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4951100.10 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:07:33,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1333373.3333333333, ans=0.1 2023-12-23 21:07:33,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=12.0 2023-12-23 21:07:35,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1333373.3333333333, ans=0.07 2023-12-23 21:07:46,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1333440.0, ans=0.125 2023-12-23 21:07:58,573 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.52 vs. limit=15.0 2023-12-23 21:08:00,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1333573.3333333333, ans=0.2 2023-12-23 21:08:13,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1333640.0, ans=0.125 2023-12-23 21:08:19,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1333640.0, ans=0.0 2023-12-23 21:08:20,865 INFO [train.py:886] (0/4) Epoch 42, batch 4650, loss[loss=0.01362, audio_tagging_loss=0.01362, over 24938.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4957749.24 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:08:54,139 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.336e+01 3.819e+01 3.947e+01 4.083e+01 4.797e+01, threshold=7.893e+01, percent-clipped=0.0 2023-12-23 21:08:54,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1333906.6666666667, ans=0.0 2023-12-23 21:08:59,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1333906.6666666667, ans=0.0 2023-12-23 21:09:10,775 INFO [train.py:886] (0/4) Epoch 42, batch 4700, loss[loss=0.009843, audio_tagging_loss=0.009843, over 24750.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4953380.92 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:09:12,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.96 vs. limit=22.5 2023-12-23 21:09:17,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2023-12-23 21:09:21,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1334106.6666666667, ans=0.2 2023-12-23 21:09:22,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1334106.6666666667, ans=10.0 2023-12-23 21:09:37,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2023-12-23 21:09:55,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1334306.6666666667, ans=0.125 2023-12-23 21:09:58,622 INFO [train.py:886] (0/4) Epoch 42, batch 4750, loss[loss=0.01236, audio_tagging_loss=0.01236, over 24750.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4946678.74 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:10:00,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1334373.3333333333, ans=0.0 2023-12-23 21:10:09,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1334440.0, ans=0.1 2023-12-23 21:10:13,668 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-42.pt 2023-12-23 21:10:32,652 INFO [train.py:886] (0/4) Epoch 43, batch 0, loss[loss=0.02361, audio_tagging_loss=0.02361, over 25000.00 frames. ], tot_loss[loss=0.02361, audio_tagging_loss=0.02361, over 25000.00 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:10:32,653 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 21:10:41,422 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6805, 2.8344, 3.7123, 3.7199], device='cuda:0') 2023-12-23 21:10:53,526 INFO [train.py:917] (0/4) Epoch 43, validation: loss=0.0346, audio_tagging_loss=0.0346, over 3737520.00 frames. 2023-12-23 21:10:53,526 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 21:10:53,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1334480.0, ans=0.0 2023-12-23 21:11:01,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.72 vs. limit=22.5 2023-12-23 21:11:02,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1334480.0, ans=0.125 2023-12-23 21:11:02,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.89 vs. limit=10.0 2023-12-23 21:11:07,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.00 vs. limit=10.0 2023-12-23 21:11:11,322 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.518e+01 3.885e+01 4.049e+01 4.321e+01 9.986e+01, threshold=8.099e+01, percent-clipped=5.0 2023-12-23 21:11:16,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.20 vs. limit=22.5 2023-12-23 21:11:17,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1334613.3333333333, ans=0.125 2023-12-23 21:11:45,496 INFO [train.py:886] (0/4) Epoch 43, batch 50, loss[loss=0.01595, audio_tagging_loss=0.01595, over 25000.00 frames. ], tot_loss[loss=0.01788, audio_tagging_loss=0.01788, over 1107903.70 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:12:11,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1334946.6666666667, ans=0.2 2023-12-23 21:12:17,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1335013.3333333333, ans=0.1 2023-12-23 21:12:20,453 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.46 vs. limit=15.0 2023-12-23 21:12:32,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1335080.0, ans=0.125 2023-12-23 21:12:37,575 INFO [train.py:886] (0/4) Epoch 43, batch 100, loss[loss=0.01455, audio_tagging_loss=0.01455, over 25000.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 1961051.90 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:12:54,981 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.840e+01 4.268e+01 4.587e+01 4.998e+01 5.925e+01, threshold=9.173e+01, percent-clipped=0.0 2023-12-23 21:13:07,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1335346.6666666667, ans=0.125 2023-12-23 21:13:21,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1335413.3333333333, ans=0.0 2023-12-23 21:13:23,748 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.14 vs. limit=22.5 2023-12-23 21:13:27,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1335413.3333333333, ans=0.1 2023-12-23 21:13:28,989 INFO [train.py:886] (0/4) Epoch 43, batch 150, loss[loss=0.01246, audio_tagging_loss=0.01246, over 25000.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 2627480.41 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:13:33,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1335480.0, ans=0.1 2023-12-23 21:13:39,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1335546.6666666667, ans=0.0 2023-12-23 21:14:06,655 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.55 vs. limit=12.0 2023-12-23 21:14:21,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1335813.3333333333, ans=0.0 2023-12-23 21:14:22,070 INFO [train.py:886] (0/4) Epoch 43, batch 200, loss[loss=0.0102, audio_tagging_loss=0.0102, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 3145808.79 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:14:30,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1335813.3333333333, ans=0.2 2023-12-23 21:14:30,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1335880.0, ans=0.125 2023-12-23 21:14:34,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.24 vs. limit=12.0 2023-12-23 21:14:38,284 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.529e+01 3.831e+01 3.990e+01 4.242e+01 5.537e+01, threshold=7.979e+01, percent-clipped=0.0 2023-12-23 21:14:45,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1335946.6666666667, ans=0.2 2023-12-23 21:15:07,453 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2023-12-23 21:15:12,763 INFO [train.py:886] (0/4) Epoch 43, batch 250, loss[loss=0.01058, audio_tagging_loss=0.01058, over 25000.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 3548601.44 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:15:30,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1336213.3333333333, ans=0.0 2023-12-23 21:15:37,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1336280.0, ans=0.0 2023-12-23 21:15:49,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1336346.6666666667, ans=0.125 2023-12-23 21:15:50,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1336346.6666666667, ans=0.0 2023-12-23 21:15:51,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=12.0 2023-12-23 21:15:52,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1336346.6666666667, ans=0.125 2023-12-23 21:16:06,118 INFO [train.py:886] (0/4) Epoch 43, batch 300, loss[loss=0.01116, audio_tagging_loss=0.01116, over 24750.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 3858743.02 frames. ], batch size: 99, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:16:06,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1336480.0, ans=0.05 2023-12-23 21:16:10,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1336480.0, ans=0.125 2023-12-23 21:16:22,915 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.462e+01 3.757e+01 3.899e+01 4.075e+01 4.664e+01, threshold=7.797e+01, percent-clipped=0.0 2023-12-23 21:16:30,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1336613.3333333333, ans=0.0 2023-12-23 21:16:40,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1336680.0, ans=0.0 2023-12-23 21:16:42,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1336680.0, ans=0.1 2023-12-23 21:16:48,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1336746.6666666667, ans=0.0 2023-12-23 21:16:49,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1336746.6666666667, ans=0.125 2023-12-23 21:16:57,858 INFO [train.py:886] (0/4) Epoch 43, batch 350, loss[loss=0.01206, audio_tagging_loss=0.01206, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4088982.07 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:17:19,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.53 vs. limit=22.5 2023-12-23 21:17:20,495 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:17:23,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1336946.6666666667, ans=0.1 2023-12-23 21:17:38,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1337013.3333333333, ans=0.125 2023-12-23 21:17:49,707 INFO [train.py:886] (0/4) Epoch 43, batch 400, loss[loss=0.01157, audio_tagging_loss=0.01157, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4277779.42 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:17:51,743 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:18:08,774 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.477e+01 3.775e+01 3.908e+01 4.060e+01 5.626e+01, threshold=7.816e+01, percent-clipped=0.0 2023-12-23 21:18:22,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1337346.6666666667, ans=0.1 2023-12-23 21:18:24,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1337346.6666666667, ans=0.95 2023-12-23 21:18:24,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1337346.6666666667, ans=0.125 2023-12-23 21:18:28,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.19 vs. limit=6.0 2023-12-23 21:18:42,388 INFO [train.py:886] (0/4) Epoch 43, batch 450, loss[loss=0.009388, audio_tagging_loss=0.009388, over 23948.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4432297.30 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:18:50,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1337480.0, ans=0.125 2023-12-23 21:18:56,222 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=15.0 2023-12-23 21:19:08,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1337613.3333333333, ans=0.125 2023-12-23 21:19:29,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1337746.6666666667, ans=0.0 2023-12-23 21:19:33,117 INFO [train.py:886] (0/4) Epoch 43, batch 500, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4546162.28 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:19:51,661 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.285e+01 3.753e+01 3.929e+01 4.110e+01 4.615e+01, threshold=7.857e+01, percent-clipped=0.0 2023-12-23 21:19:54,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.17 vs. limit=22.5 2023-12-23 21:19:55,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1337946.6666666667, ans=0.2 2023-12-23 21:19:59,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1337946.6666666667, ans=0.125 2023-12-23 21:20:10,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1338013.3333333333, ans=0.125 2023-12-23 21:20:19,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1338080.0, ans=0.2 2023-12-23 21:20:23,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1338080.0, ans=0.125 2023-12-23 21:20:25,814 INFO [train.py:886] (0/4) Epoch 43, batch 550, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4642530.42 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:20:31,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1338146.6666666667, ans=0.125 2023-12-23 21:20:46,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2023-12-23 21:20:55,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=12.0 2023-12-23 21:20:57,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1338346.6666666667, ans=0.125 2023-12-23 21:20:59,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1338346.6666666667, ans=0.125 2023-12-23 21:21:16,799 INFO [train.py:886] (0/4) Epoch 43, batch 600, loss[loss=0.01178, audio_tagging_loss=0.01178, over 24941.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4707968.67 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:21:18,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1338480.0, ans=0.1 2023-12-23 21:21:21,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1338480.0, ans=0.09899494936611666 2023-12-23 21:21:34,499 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.403e+01 3.800e+01 3.964e+01 4.144e+01 4.733e+01, threshold=7.928e+01, percent-clipped=0.0 2023-12-23 21:21:36,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.87 vs. limit=22.5 2023-12-23 21:21:51,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1338680.0, ans=15.0 2023-12-23 21:22:08,606 INFO [train.py:886] (0/4) Epoch 43, batch 650, loss[loss=0.01131, audio_tagging_loss=0.01131, over 24750.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4759290.60 frames. ], batch size: 99, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:22:22,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1338880.0, ans=0.1 2023-12-23 21:22:26,848 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:22:32,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1338946.6666666667, ans=0.125 2023-12-23 21:22:33,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1338946.6666666667, ans=0.0 2023-12-23 21:22:34,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1338946.6666666667, ans=0.125 2023-12-23 21:22:41,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1339013.3333333333, ans=0.125 2023-12-23 21:22:53,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1339080.0, ans=0.2 2023-12-23 21:23:01,669 INFO [train.py:886] (0/4) Epoch 43, batch 700, loss[loss=0.00957, audio_tagging_loss=0.00957, over 24750.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4798201.08 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:23:09,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1339146.6666666667, ans=0.125 2023-12-23 21:23:17,913 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.445e+01 3.835e+01 3.964e+01 4.110e+01 4.993e+01, threshold=7.927e+01, percent-clipped=0.0 2023-12-23 21:23:27,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1339280.0, ans=0.2 2023-12-23 21:23:31,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1339280.0, ans=0.125 2023-12-23 21:23:39,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1339346.6666666667, ans=10.0 2023-12-23 21:23:52,665 INFO [train.py:886] (0/4) Epoch 43, batch 750, loss[loss=0.01154, audio_tagging_loss=0.01154, over 25000.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4830199.72 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:24:23,591 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:24:28,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2023-12-23 21:24:44,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1339813.3333333333, ans=0.0 2023-12-23 21:24:45,221 INFO [train.py:886] (0/4) Epoch 43, batch 800, loss[loss=0.01373, audio_tagging_loss=0.01373, over 24750.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4857258.22 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:24:50,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1339813.3333333333, ans=0.125 2023-12-23 21:24:52,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1339813.3333333333, ans=0.2 2023-12-23 21:25:03,507 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.756e+01 3.877e+01 4.084e+01 5.332e+01, threshold=7.753e+01, percent-clipped=0.0 2023-12-23 21:25:08,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1339946.6666666667, ans=10.0 2023-12-23 21:25:16,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1340013.3333333333, ans=0.2 2023-12-23 21:25:21,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1340013.3333333333, ans=0.125 2023-12-23 21:25:24,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1340013.3333333333, ans=0.125 2023-12-23 21:25:27,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1340080.0, ans=0.125 2023-12-23 21:25:31,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.33 vs. limit=22.5 2023-12-23 21:25:36,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1340080.0, ans=0.0 2023-12-23 21:25:38,239 INFO [train.py:886] (0/4) Epoch 43, batch 850, loss[loss=0.01145, audio_tagging_loss=0.01145, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4882479.71 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:25:38,532 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:26:24,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1340413.3333333333, ans=0.0 2023-12-23 21:26:29,277 INFO [train.py:886] (0/4) Epoch 43, batch 900, loss[loss=0.01042, audio_tagging_loss=0.01042, over 24750.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4900723.47 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:26:31,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1340480.0, ans=0.09899494936611666 2023-12-23 21:26:34,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1340480.0, ans=0.2 2023-12-23 21:26:45,040 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.78 vs. limit=10.0 2023-12-23 21:26:46,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1340546.6666666667, ans=0.2 2023-12-23 21:26:47,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2023-12-23 21:26:47,365 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.377e+01 3.810e+01 3.949e+01 4.133e+01 4.627e+01, threshold=7.898e+01, percent-clipped=0.0 2023-12-23 21:27:00,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1340680.0, ans=10.0 2023-12-23 21:27:01,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1340680.0, ans=0.1 2023-12-23 21:27:15,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1340746.6666666667, ans=0.125 2023-12-23 21:27:17,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2023-12-23 21:27:18,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2023-12-23 21:27:20,772 INFO [train.py:886] (0/4) Epoch 43, batch 950, loss[loss=0.01139, audio_tagging_loss=0.01139, over 24750.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4908607.92 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:28:09,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1341080.0, ans=0.0 2023-12-23 21:28:13,277 INFO [train.py:886] (0/4) Epoch 43, batch 1000, loss[loss=0.009902, audio_tagging_loss=0.009902, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4906945.69 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:28:17,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1341146.6666666667, ans=0.125 2023-12-23 21:28:23,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1341213.3333333333, ans=0.125 2023-12-23 21:28:29,739 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2023-12-23 21:28:30,323 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.266e+01 3.775e+01 3.954e+01 4.117e+01 5.148e+01, threshold=7.909e+01, percent-clipped=0.0 2023-12-23 21:28:30,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1341213.3333333333, ans=0.1 2023-12-23 21:28:50,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1341346.6666666667, ans=0.125 2023-12-23 21:29:04,834 INFO [train.py:886] (0/4) Epoch 43, batch 1050, loss[loss=0.01108, audio_tagging_loss=0.01108, over 24750.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4919484.67 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:29:11,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2023-12-23 21:29:17,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1341546.6666666667, ans=0.125 2023-12-23 21:29:19,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1341546.6666666667, ans=0.95 2023-12-23 21:29:19,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1341546.6666666667, ans=0.125 2023-12-23 21:29:22,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1341546.6666666667, ans=0.1 2023-12-23 21:29:25,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1341613.3333333333, ans=0.0 2023-12-23 21:29:53,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1341746.6666666667, ans=0.0 2023-12-23 21:29:57,291 INFO [train.py:886] (0/4) Epoch 43, batch 1100, loss[loss=0.01074, audio_tagging_loss=0.01074, over 24061.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4922227.68 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:29:58,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1341813.3333333333, ans=0.125 2023-12-23 21:30:05,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1341880.0, ans=0.1 2023-12-23 21:30:07,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1341880.0, ans=0.2 2023-12-23 21:30:08,076 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.58 vs. limit=12.0 2023-12-23 21:30:14,086 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.243e+01 3.738e+01 3.893e+01 4.064e+01 5.194e+01, threshold=7.786e+01, percent-clipped=0.0 2023-12-23 21:30:14,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1341880.0, ans=0.2 2023-12-23 21:30:21,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1341946.6666666667, ans=0.125 2023-12-23 21:30:28,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1342013.3333333333, ans=0.125 2023-12-23 21:30:40,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1342080.0, ans=0.125 2023-12-23 21:30:41,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1342080.0, ans=0.125 2023-12-23 21:30:48,491 INFO [train.py:886] (0/4) Epoch 43, batch 1150, loss[loss=0.01171, audio_tagging_loss=0.01171, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4931161.28 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:31:00,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1342213.3333333333, ans=0.125 2023-12-23 21:31:05,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1342213.3333333333, ans=0.1 2023-12-23 21:31:29,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1342346.6666666667, ans=0.0 2023-12-23 21:31:40,781 INFO [train.py:886] (0/4) Epoch 43, batch 1200, loss[loss=0.01147, audio_tagging_loss=0.01147, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4934785.85 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:31:41,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1342480.0, ans=0.0 2023-12-23 21:31:59,052 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.434e+01 3.771e+01 3.906e+01 4.055e+01 4.735e+01, threshold=7.811e+01, percent-clipped=0.0 2023-12-23 21:32:00,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1342546.6666666667, ans=0.035 2023-12-23 21:32:10,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2023-12-23 21:32:24,001 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2023-12-23 21:32:24,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1342746.6666666667, ans=0.2 2023-12-23 21:32:26,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1342746.6666666667, ans=0.0 2023-12-23 21:32:26,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1342746.6666666667, ans=0.125 2023-12-23 21:32:33,204 INFO [train.py:886] (0/4) Epoch 43, batch 1250, loss[loss=0.01261, audio_tagging_loss=0.01261, over 24750.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4933092.36 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:32:48,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.49 vs. limit=15.0 2023-12-23 21:33:18,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1343080.0, ans=0.0 2023-12-23 21:33:19,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1343080.0, ans=0.0 2023-12-23 21:33:20,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1343080.0, ans=0.2 2023-12-23 21:33:22,779 INFO [train.py:886] (0/4) Epoch 43, batch 1300, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4932246.00 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:33:39,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1343213.3333333333, ans=0.2 2023-12-23 21:33:39,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2023-12-23 21:33:41,674 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.269e+01 3.738e+01 3.919e+01 4.127e+01 4.801e+01, threshold=7.839e+01, percent-clipped=0.0 2023-12-23 21:33:50,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1343280.0, ans=0.0 2023-12-23 21:33:59,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1343346.6666666667, ans=0.015 2023-12-23 21:34:14,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1343413.3333333333, ans=0.1 2023-12-23 21:34:16,092 INFO [train.py:886] (0/4) Epoch 43, batch 1350, loss[loss=0.01256, audio_tagging_loss=0.01256, over 23980.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4934919.93 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:34:27,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1343546.6666666667, ans=0.0 2023-12-23 21:34:29,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1343546.6666666667, ans=0.2 2023-12-23 21:34:41,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1343613.3333333333, ans=0.0 2023-12-23 21:34:46,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1343680.0, ans=0.125 2023-12-23 21:34:50,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1343680.0, ans=0.0 2023-12-23 21:34:52,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2023-12-23 21:34:54,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1343680.0, ans=0.0 2023-12-23 21:35:00,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1343746.6666666667, ans=0.125 2023-12-23 21:35:08,013 INFO [train.py:886] (0/4) Epoch 43, batch 1400, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4938142.59 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:35:25,465 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.254e+01 3.705e+01 3.872e+01 4.110e+01 5.093e+01, threshold=7.744e+01, percent-clipped=0.0 2023-12-23 21:35:25,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1343880.0, ans=0.0 2023-12-23 21:36:00,017 INFO [train.py:886] (0/4) Epoch 43, batch 1450, loss[loss=0.01091, audio_tagging_loss=0.01091, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4944205.76 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:36:00,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1344146.6666666667, ans=0.125 2023-12-23 21:36:14,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1344213.3333333333, ans=0.05 2023-12-23 21:36:27,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1344280.0, ans=0.0 2023-12-23 21:36:53,298 INFO [train.py:886] (0/4) Epoch 43, batch 1500, loss[loss=0.01077, audio_tagging_loss=0.01077, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4948682.37 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:36:56,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1344480.0, ans=0.025 2023-12-23 21:37:09,393 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.371e+01 3.779e+01 3.922e+01 4.118e+01 4.489e+01, threshold=7.843e+01, percent-clipped=0.0 2023-12-23 21:37:21,510 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:37:24,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1344680.0, ans=0.125 2023-12-23 21:37:27,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1344680.0, ans=0.1 2023-12-23 21:37:38,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1344746.6666666667, ans=0.125 2023-12-23 21:37:42,891 INFO [train.py:886] (0/4) Epoch 43, batch 1550, loss[loss=0.009517, audio_tagging_loss=0.009517, over 24750.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4950901.53 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:37:44,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1344813.3333333333, ans=0.0 2023-12-23 21:37:44,478 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=15.0 2023-12-23 21:37:49,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2023-12-23 21:38:00,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1344880.0, ans=0.125 2023-12-23 21:38:02,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1344880.0, ans=0.0 2023-12-23 21:38:06,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1344946.6666666667, ans=0.1 2023-12-23 21:38:07,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1344946.6666666667, ans=0.1 2023-12-23 21:38:18,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1345013.3333333333, ans=0.04949747468305833 2023-12-23 21:38:36,338 INFO [train.py:886] (0/4) Epoch 43, batch 1600, loss[loss=0.0119, audio_tagging_loss=0.0119, over 24750.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4947879.54 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:38:43,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.37 vs. limit=15.0 2023-12-23 21:38:45,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=12.0 2023-12-23 21:38:52,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.53 vs. limit=6.0 2023-12-23 21:38:53,035 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.409e+01 3.814e+01 3.969e+01 4.149e+01 4.804e+01, threshold=7.938e+01, percent-clipped=0.0 2023-12-23 21:38:53,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1345213.3333333333, ans=0.0 2023-12-23 21:38:53,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1345213.3333333333, ans=0.125 2023-12-23 21:39:05,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1345280.0, ans=0.125 2023-12-23 21:39:22,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1345413.3333333333, ans=0.125 2023-12-23 21:39:28,169 INFO [train.py:886] (0/4) Epoch 43, batch 1650, loss[loss=0.01324, audio_tagging_loss=0.01324, over 24750.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4945820.12 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:39:29,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1345480.0, ans=0.2 2023-12-23 21:39:44,422 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:39:46,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1345546.6666666667, ans=0.0 2023-12-23 21:40:00,494 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2023-12-23 21:40:03,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.02 vs. limit=22.5 2023-12-23 21:40:19,279 INFO [train.py:886] (0/4) Epoch 43, batch 1700, loss[loss=0.01037, audio_tagging_loss=0.01037, over 21705.00 frames. ], tot_loss[loss=0.01123, audio_tagging_loss=0.01123, over 4938821.44 frames. ], batch size: 107, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:40:32,194 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:40:36,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1345880.0, ans=0.1 2023-12-23 21:40:38,052 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.392e+01 3.780e+01 3.982e+01 4.194e+01 5.083e+01, threshold=7.964e+01, percent-clipped=0.0 2023-12-23 21:40:45,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1345946.6666666667, ans=0.125 2023-12-23 21:40:52,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1346013.3333333333, ans=0.2 2023-12-23 21:41:00,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1346080.0, ans=0.0 2023-12-23 21:41:11,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1346146.6666666667, ans=0.125 2023-12-23 21:41:12,543 INFO [train.py:886] (0/4) Epoch 43, batch 1750, loss[loss=0.00942, audio_tagging_loss=0.00942, over 24075.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4940387.10 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:41:18,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1346146.6666666667, ans=0.125 2023-12-23 21:41:24,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1346213.3333333333, ans=0.125 2023-12-23 21:41:26,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1346213.3333333333, ans=0.0 2023-12-23 21:41:51,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1346346.6666666667, ans=0.125 2023-12-23 21:41:55,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1346413.3333333333, ans=0.0 2023-12-23 21:42:01,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1346413.3333333333, ans=0.0 2023-12-23 21:42:02,806 INFO [train.py:886] (0/4) Epoch 43, batch 1800, loss[loss=0.009624, audio_tagging_loss=0.009624, over 25000.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4947375.43 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:42:03,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1346480.0, ans=0.1 2023-12-23 21:42:18,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1346546.6666666667, ans=0.125 2023-12-23 21:42:21,658 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.422e+01 3.726e+01 3.927e+01 4.030e+01 4.598e+01, threshold=7.853e+01, percent-clipped=0.0 2023-12-23 21:42:23,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.10 vs. limit=12.0 2023-12-23 21:42:26,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1346613.3333333333, ans=0.125 2023-12-23 21:42:27,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1346613.3333333333, ans=0.125 2023-12-23 21:42:29,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1346613.3333333333, ans=0.125 2023-12-23 21:42:35,544 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:42:56,175 INFO [train.py:886] (0/4) Epoch 43, batch 1850, loss[loss=0.01411, audio_tagging_loss=0.01411, over 24952.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4950794.04 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:43:13,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1346880.0, ans=0.1 2023-12-23 21:43:14,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1346880.0, ans=0.0 2023-12-23 21:43:26,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1347013.3333333333, ans=0.125 2023-12-23 21:43:48,404 INFO [train.py:886] (0/4) Epoch 43, batch 1900, loss[loss=0.01197, audio_tagging_loss=0.01197, over 24750.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4942344.26 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:43:57,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1347213.3333333333, ans=0.1 2023-12-23 21:44:05,356 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.418e+01 3.802e+01 3.982e+01 4.158e+01 4.820e+01, threshold=7.964e+01, percent-clipped=0.0 2023-12-23 21:44:32,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2023-12-23 21:44:39,938 INFO [train.py:886] (0/4) Epoch 43, batch 1950, loss[loss=0.009649, audio_tagging_loss=0.009649, over 24750.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4936641.46 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:44:40,662 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-12-23 21:45:02,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1347613.3333333333, ans=0.1 2023-12-23 21:45:14,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1347680.0, ans=0.125 2023-12-23 21:45:27,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1347746.6666666667, ans=0.1 2023-12-23 21:45:32,964 INFO [train.py:886] (0/4) Epoch 43, batch 2000, loss[loss=0.008898, audio_tagging_loss=0.008898, over 24140.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4945526.39 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:45:50,013 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.411e+01 3.774e+01 3.907e+01 4.123e+01 6.126e+01, threshold=7.815e+01, percent-clipped=0.0 2023-12-23 21:45:56,461 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:46:14,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1348080.0, ans=0.125 2023-12-23 21:46:17,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1348080.0, ans=0.0 2023-12-23 21:46:21,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1348080.0, ans=0.125 2023-12-23 21:46:25,177 INFO [train.py:886] (0/4) Epoch 43, batch 2050, loss[loss=0.009537, audio_tagging_loss=0.009537, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4953684.01 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 64.0 2023-12-23 21:46:29,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.95 vs. limit=15.0 2023-12-23 21:46:30,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1348146.6666666667, ans=0.125 2023-12-23 21:46:31,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2023-12-23 21:46:33,885 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=12.0 2023-12-23 21:46:46,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1348280.0, ans=0.125 2023-12-23 21:46:50,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1348280.0, ans=0.05 2023-12-23 21:47:10,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1348413.3333333333, ans=0.1 2023-12-23 21:47:17,038 INFO [train.py:886] (0/4) Epoch 43, batch 2100, loss[loss=0.01228, audio_tagging_loss=0.01228, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4959923.54 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 64.0 2023-12-23 21:47:36,272 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.344e+01 3.711e+01 3.880e+01 4.032e+01 4.676e+01, threshold=7.761e+01, percent-clipped=0.0 2023-12-23 21:47:45,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1348613.3333333333, ans=0.0 2023-12-23 21:48:01,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1348746.6666666667, ans=0.2 2023-12-23 21:48:10,530 INFO [train.py:886] (0/4) Epoch 43, batch 2150, loss[loss=0.01125, audio_tagging_loss=0.01125, over 25000.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4961247.83 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 64.0 2023-12-23 21:48:30,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1348946.6666666667, ans=0.125 2023-12-23 21:48:52,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1349080.0, ans=0.125 2023-12-23 21:48:52,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1349080.0, ans=0.125 2023-12-23 21:48:59,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1349080.0, ans=0.0 2023-12-23 21:49:00,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1349080.0, ans=0.1 2023-12-23 21:49:01,977 INFO [train.py:886] (0/4) Epoch 43, batch 2200, loss[loss=0.01284, audio_tagging_loss=0.01284, over 24941.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4954556.32 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 64.0 2023-12-23 21:49:03,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2023-12-23 21:49:13,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1349213.3333333333, ans=0.125 2023-12-23 21:49:20,204 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.408e+01 3.810e+01 3.972e+01 4.173e+01 4.722e+01, threshold=7.943e+01, percent-clipped=0.0 2023-12-23 21:49:34,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1349346.6666666667, ans=0.125 2023-12-23 21:49:44,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1349413.3333333333, ans=0.1 2023-12-23 21:49:47,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2023-12-23 21:49:52,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=26.22 vs. limit=15.0 2023-12-23 21:49:54,811 INFO [train.py:886] (0/4) Epoch 43, batch 2250, loss[loss=0.01105, audio_tagging_loss=0.01105, over 24750.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4947540.04 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 64.0 2023-12-23 21:49:56,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1349480.0, ans=0.125 2023-12-23 21:49:56,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1349480.0, ans=0.125 2023-12-23 21:50:08,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1349546.6666666667, ans=0.125 2023-12-23 21:50:09,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1349546.6666666667, ans=0.125 2023-12-23 21:50:29,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1349680.0, ans=0.0 2023-12-23 21:50:36,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.54 vs. limit=10.0 2023-12-23 21:50:45,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1349746.6666666667, ans=0.07 2023-12-23 21:50:48,222 INFO [train.py:886] (0/4) Epoch 43, batch 2300, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4950226.09 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:50:56,510 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.87 vs. limit=6.0 2023-12-23 21:51:04,573 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.400e+01 3.719e+01 3.860e+01 4.088e+01 4.787e+01, threshold=7.721e+01, percent-clipped=0.0 2023-12-23 21:51:06,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1349946.6666666667, ans=0.0 2023-12-23 21:51:20,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1350013.3333333333, ans=0.1 2023-12-23 21:51:29,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1350080.0, ans=0.0 2023-12-23 21:51:38,602 INFO [train.py:886] (0/4) Epoch 43, batch 2350, loss[loss=0.01237, audio_tagging_loss=0.01237, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4949651.98 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:52:12,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1350346.6666666667, ans=0.0 2023-12-23 21:52:19,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1350413.3333333333, ans=0.125 2023-12-23 21:52:31,280 INFO [train.py:886] (0/4) Epoch 43, batch 2400, loss[loss=0.01172, audio_tagging_loss=0.01172, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4950872.73 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:52:47,480 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.368e+01 3.759e+01 3.931e+01 4.096e+01 4.502e+01, threshold=7.861e+01, percent-clipped=0.0 2023-12-23 21:52:48,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=22.5 2023-12-23 21:52:52,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1350613.3333333333, ans=0.0 2023-12-23 21:53:00,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1350613.3333333333, ans=0.0 2023-12-23 21:53:01,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.12 vs. limit=15.0 2023-12-23 21:53:07,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1350680.0, ans=0.0 2023-12-23 21:53:21,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1350813.3333333333, ans=0.125 2023-12-23 21:53:22,057 INFO [train.py:886] (0/4) Epoch 43, batch 2450, loss[loss=0.008878, audio_tagging_loss=0.008878, over 25000.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4956380.43 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:53:45,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.29 vs. limit=15.0 2023-12-23 21:53:50,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1350946.6666666667, ans=0.125 2023-12-23 21:54:12,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1351080.0, ans=0.2 2023-12-23 21:54:14,887 INFO [train.py:886] (0/4) Epoch 43, batch 2500, loss[loss=0.01007, audio_tagging_loss=0.01007, over 24750.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4954319.84 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:54:17,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=15.0 2023-12-23 21:54:19,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1351146.6666666667, ans=0.125 2023-12-23 21:54:19,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1351146.6666666667, ans=0.125 2023-12-23 21:54:32,404 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.451e+01 3.815e+01 4.045e+01 4.230e+01 4.864e+01, threshold=8.091e+01, percent-clipped=0.0 2023-12-23 21:54:39,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1351280.0, ans=0.0 2023-12-23 21:54:58,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1351413.3333333333, ans=0.2 2023-12-23 21:55:07,175 INFO [train.py:886] (0/4) Epoch 43, batch 2550, loss[loss=0.0105, audio_tagging_loss=0.0105, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4943286.31 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:55:15,068 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:55:21,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1351546.6666666667, ans=0.0 2023-12-23 21:55:23,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1351546.6666666667, ans=0.125 2023-12-23 21:55:57,342 INFO [train.py:886] (0/4) Epoch 43, batch 2600, loss[loss=0.01088, audio_tagging_loss=0.01088, over 25000.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4946960.36 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:56:01,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1351813.3333333333, ans=0.2 2023-12-23 21:56:16,367 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.494e+01 3.793e+01 3.971e+01 4.202e+01 4.613e+01, threshold=7.943e+01, percent-clipped=0.0 2023-12-23 21:56:30,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1352013.3333333333, ans=0.2 2023-12-23 21:56:38,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1352080.0, ans=0.07 2023-12-23 21:56:49,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1352146.6666666667, ans=0.125 2023-12-23 21:56:50,196 INFO [train.py:886] (0/4) Epoch 43, batch 2650, loss[loss=0.01075, audio_tagging_loss=0.01075, over 25000.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4947754.40 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:56:52,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=1352146.6666666667, ans=0.02 2023-12-23 21:56:55,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1352146.6666666667, ans=0.125 2023-12-23 21:56:56,551 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2023-12-23 21:57:01,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1352213.3333333333, ans=0.0 2023-12-23 21:57:01,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1352213.3333333333, ans=0.125 2023-12-23 21:57:33,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1352413.3333333333, ans=0.125 2023-12-23 21:57:41,546 INFO [train.py:886] (0/4) Epoch 43, batch 2700, loss[loss=0.01144, audio_tagging_loss=0.01144, over 25000.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4952378.77 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:57:41,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1352480.0, ans=0.0 2023-12-23 21:57:56,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1352546.6666666667, ans=0.125 2023-12-23 21:57:59,802 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.334e+01 3.701e+01 3.871e+01 4.080e+01 4.871e+01, threshold=7.742e+01, percent-clipped=0.0 2023-12-23 21:58:12,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1352680.0, ans=0.125 2023-12-23 21:58:23,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1352746.6666666667, ans=0.2 2023-12-23 21:58:25,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1352746.6666666667, ans=0.09899494936611666 2023-12-23 21:58:34,119 INFO [train.py:886] (0/4) Epoch 43, batch 2750, loss[loss=0.0132, audio_tagging_loss=0.0132, over 25000.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4949771.43 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:58:46,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1352880.0, ans=0.1 2023-12-23 21:59:06,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1353013.3333333333, ans=0.125 2023-12-23 21:59:24,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1353080.0, ans=0.0 2023-12-23 21:59:26,432 INFO [train.py:886] (0/4) Epoch 43, batch 2800, loss[loss=0.01123, audio_tagging_loss=0.01123, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4952813.96 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:59:28,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1353146.6666666667, ans=0.125 2023-12-23 21:59:43,153 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.443e+01 3.835e+01 3.982e+01 4.160e+01 5.061e+01, threshold=7.963e+01, percent-clipped=0.0 2023-12-23 21:59:47,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1353280.0, ans=0.1 2023-12-23 21:59:49,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1353280.0, ans=0.125 2023-12-23 21:59:51,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1353280.0, ans=0.125 2023-12-23 22:00:15,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1353413.3333333333, ans=0.125 2023-12-23 22:00:16,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=1353480.0, ans=0.2 2023-12-23 22:00:18,039 INFO [train.py:886] (0/4) Epoch 43, batch 2850, loss[loss=0.009702, audio_tagging_loss=0.009702, over 24750.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4949754.24 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:00:23,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=15.0 2023-12-23 22:00:29,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1353546.6666666667, ans=0.1 2023-12-23 22:00:39,530 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.76 vs. limit=22.5 2023-12-23 22:00:56,011 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.83 vs. limit=22.5 2023-12-23 22:00:59,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1353746.6666666667, ans=0.2 2023-12-23 22:01:10,325 INFO [train.py:886] (0/4) Epoch 43, batch 2900, loss[loss=0.009243, audio_tagging_loss=0.009243, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4944851.58 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:01:28,113 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.361e+01 3.812e+01 3.915e+01 4.095e+01 4.854e+01, threshold=7.829e+01, percent-clipped=0.0 2023-12-23 22:01:30,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1353946.6666666667, ans=0.2 2023-12-23 22:01:48,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1354013.3333333333, ans=0.125 2023-12-23 22:01:51,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1354080.0, ans=0.2 2023-12-23 22:01:52,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1354080.0, ans=0.1 2023-12-23 22:02:02,227 INFO [train.py:886] (0/4) Epoch 43, batch 2950, loss[loss=0.009559, audio_tagging_loss=0.009559, over 24750.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4949050.79 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:02:07,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2023-12-23 22:02:13,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1354213.3333333333, ans=0.0 2023-12-23 22:02:21,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1354280.0, ans=0.0 2023-12-23 22:02:30,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1354280.0, ans=0.0 2023-12-23 22:02:30,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1354280.0, ans=0.0 2023-12-23 22:02:34,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1354346.6666666667, ans=0.07 2023-12-23 22:02:49,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1354413.3333333333, ans=0.0 2023-12-23 22:02:50,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1354413.3333333333, ans=0.125 2023-12-23 22:02:52,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1354413.3333333333, ans=0.1 2023-12-23 22:02:53,851 INFO [train.py:886] (0/4) Epoch 43, batch 3000, loss[loss=0.01262, audio_tagging_loss=0.01262, over 25000.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4949046.15 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:02:53,852 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 22:03:15,297 INFO [train.py:917] (0/4) Epoch 43, validation: loss=0.03559, audio_tagging_loss=0.03559, over 3737520.00 frames. 2023-12-23 22:03:15,297 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 22:03:21,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1354480.0, ans=0.2 2023-12-23 22:03:31,978 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.311e+01 3.762e+01 3.904e+01 4.055e+01 4.746e+01, threshold=7.807e+01, percent-clipped=0.0 2023-12-23 22:03:33,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1354546.6666666667, ans=0.2 2023-12-23 22:03:54,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1354680.0, ans=0.0 2023-12-23 22:03:54,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1354680.0, ans=0.0 2023-12-23 22:03:54,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.68 vs. limit=22.5 2023-12-23 22:03:55,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1354746.6666666667, ans=0.0 2023-12-23 22:04:01,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1354746.6666666667, ans=0.125 2023-12-23 22:04:07,151 INFO [train.py:886] (0/4) Epoch 43, batch 3050, loss[loss=0.009865, audio_tagging_loss=0.009865, over 23966.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4952014.63 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:04:08,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1354813.3333333333, ans=0.1 2023-12-23 22:04:09,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1354813.3333333333, ans=0.125 2023-12-23 22:04:20,870 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.81 vs. limit=15.0 2023-12-23 22:04:21,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1354880.0, ans=0.125 2023-12-23 22:04:21,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1354880.0, ans=0.2 2023-12-23 22:04:27,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1354946.6666666667, ans=0.0 2023-12-23 22:04:45,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1355013.3333333333, ans=0.1 2023-12-23 22:04:58,878 INFO [train.py:886] (0/4) Epoch 43, batch 3100, loss[loss=0.01255, audio_tagging_loss=0.01255, over 24750.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4956440.05 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:05:02,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1355146.6666666667, ans=0.125 2023-12-23 22:05:04,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1355146.6666666667, ans=0.125 2023-12-23 22:05:04,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1355146.6666666667, ans=0.1 2023-12-23 22:05:06,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1355146.6666666667, ans=0.125 2023-12-23 22:05:17,029 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.431e+01 3.827e+01 3.980e+01 4.191e+01 5.132e+01, threshold=7.960e+01, percent-clipped=0.0 2023-12-23 22:05:18,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-12-23 22:05:41,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=15.0 2023-12-23 22:05:51,039 INFO [train.py:886] (0/4) Epoch 43, batch 3150, loss[loss=0.0131, audio_tagging_loss=0.0131, over 24750.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4954560.43 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:06:21,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.39 vs. limit=15.0 2023-12-23 22:06:26,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=22.5 2023-12-23 22:06:42,739 INFO [train.py:886] (0/4) Epoch 43, batch 3200, loss[loss=0.008143, audio_tagging_loss=0.008143, over 23984.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4945019.08 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:07:00,362 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.493e+01 3.795e+01 4.020e+01 4.191e+01 4.610e+01, threshold=8.041e+01, percent-clipped=0.0 2023-12-23 22:07:07,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1355946.6666666667, ans=0.0 2023-12-23 22:07:07,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1355946.6666666667, ans=0.1 2023-12-23 22:07:09,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1355946.6666666667, ans=0.0 2023-12-23 22:07:13,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1356013.3333333333, ans=0.125 2023-12-23 22:07:16,555 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2023-12-23 22:07:34,561 INFO [train.py:886] (0/4) Epoch 43, batch 3250, loss[loss=0.01047, audio_tagging_loss=0.01047, over 25000.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4945612.17 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:07:35,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1356146.6666666667, ans=0.0 2023-12-23 22:07:35,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1356146.6666666667, ans=0.125 2023-12-23 22:07:42,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.54 vs. limit=12.0 2023-12-23 22:07:52,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1356213.3333333333, ans=0.125 2023-12-23 22:07:58,592 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:08:06,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=12.0 2023-12-23 22:08:15,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1356413.3333333333, ans=0.0 2023-12-23 22:08:22,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2023-12-23 22:08:27,701 INFO [train.py:886] (0/4) Epoch 43, batch 3300, loss[loss=0.01083, audio_tagging_loss=0.01083, over 25000.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4951354.52 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:08:43,756 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.382e+01 3.775e+01 3.931e+01 4.135e+01 5.251e+01, threshold=7.863e+01, percent-clipped=0.0 2023-12-23 22:08:48,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1356613.3333333333, ans=0.025 2023-12-23 22:09:05,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2023-12-23 22:09:17,315 INFO [train.py:886] (0/4) Epoch 43, batch 3350, loss[loss=0.01051, audio_tagging_loss=0.01051, over 25000.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4957408.86 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:09:18,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1356813.3333333333, ans=0.1 2023-12-23 22:09:22,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.44 vs. limit=15.0 2023-12-23 22:09:29,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.56 vs. limit=12.0 2023-12-23 22:09:33,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1356880.0, ans=0.05 2023-12-23 22:09:34,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1356880.0, ans=0.1 2023-12-23 22:09:40,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1356946.6666666667, ans=0.1 2023-12-23 22:09:41,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1356946.6666666667, ans=0.2 2023-12-23 22:10:07,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1357080.0, ans=0.04949747468305833 2023-12-23 22:10:10,578 INFO [train.py:886] (0/4) Epoch 43, batch 3400, loss[loss=0.01174, audio_tagging_loss=0.01174, over 25000.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4960184.14 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:10:11,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1357146.6666666667, ans=0.1 2023-12-23 22:10:23,969 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:10:23,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1357213.3333333333, ans=0.0 2023-12-23 22:10:26,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2023-12-23 22:10:27,284 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 3.821e+01 3.927e+01 4.185e+01 4.649e+01, threshold=7.854e+01, percent-clipped=0.0 2023-12-23 22:10:39,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1357280.0, ans=0.1 2023-12-23 22:10:54,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1357413.3333333333, ans=0.0 2023-12-23 22:10:56,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1357413.3333333333, ans=0.125 2023-12-23 22:10:59,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2023-12-23 22:11:02,496 INFO [train.py:886] (0/4) Epoch 43, batch 3450, loss[loss=0.01093, audio_tagging_loss=0.01093, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4957572.62 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:11:04,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.29 vs. limit=12.0 2023-12-23 22:11:29,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2023-12-23 22:11:35,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.80 vs. limit=15.0 2023-12-23 22:11:42,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1357680.0, ans=0.2 2023-12-23 22:11:50,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1357746.6666666667, ans=0.125 2023-12-23 22:11:50,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1357746.6666666667, ans=0.125 2023-12-23 22:11:54,011 INFO [train.py:886] (0/4) Epoch 43, batch 3500, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24750.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4946240.56 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:11:56,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1357813.3333333333, ans=0.0 2023-12-23 22:12:00,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.85 vs. limit=6.0 2023-12-23 22:12:06,543 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=12.0 2023-12-23 22:12:12,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.50 vs. limit=22.5 2023-12-23 22:12:14,013 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.495e+01 3.870e+01 4.030e+01 4.254e+01 6.630e+01, threshold=8.060e+01, percent-clipped=0.0 2023-12-23 22:12:22,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1357946.6666666667, ans=0.125 2023-12-23 22:12:47,452 INFO [train.py:886] (0/4) Epoch 43, batch 3550, loss[loss=0.01109, audio_tagging_loss=0.01109, over 22211.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4945372.35 frames. ], batch size: 107, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:13:17,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1358346.6666666667, ans=0.0 2023-12-23 22:13:22,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1358346.6666666667, ans=0.0 2023-12-23 22:13:29,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1358413.3333333333, ans=0.0 2023-12-23 22:13:31,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1358413.3333333333, ans=0.125 2023-12-23 22:13:33,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1358413.3333333333, ans=0.125 2023-12-23 22:13:38,453 INFO [train.py:886] (0/4) Epoch 43, batch 3600, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.01123, audio_tagging_loss=0.01123, over 4947552.99 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:13:49,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=22.5 2023-12-23 22:13:53,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1358546.6666666667, ans=0.125 2023-12-23 22:13:55,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1358546.6666666667, ans=0.125 2023-12-23 22:13:57,054 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.362e+01 3.773e+01 3.900e+01 4.130e+01 5.213e+01, threshold=7.800e+01, percent-clipped=0.0 2023-12-23 22:14:02,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1358613.3333333333, ans=0.0 2023-12-23 22:14:13,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1358680.0, ans=0.125 2023-12-23 22:14:30,410 INFO [train.py:886] (0/4) Epoch 43, batch 3650, loss[loss=0.01032, audio_tagging_loss=0.01032, over 24750.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4950264.67 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:14:31,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.30 vs. limit=6.0 2023-12-23 22:14:58,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1358946.6666666667, ans=0.2 2023-12-23 22:15:10,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1359080.0, ans=0.125 2023-12-23 22:15:16,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1359080.0, ans=0.0 2023-12-23 22:15:22,573 INFO [train.py:886] (0/4) Epoch 43, batch 3700, loss[loss=0.008802, audio_tagging_loss=0.008802, over 25000.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4952173.35 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:15:26,241 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:15:36,009 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=15.0 2023-12-23 22:15:36,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1359213.3333333333, ans=0.2 2023-12-23 22:15:38,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1359213.3333333333, ans=0.05 2023-12-23 22:15:41,176 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.439e+01 3.800e+01 4.029e+01 4.222e+01 4.954e+01, threshold=8.057e+01, percent-clipped=0.0 2023-12-23 22:15:44,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1359280.0, ans=0.125 2023-12-23 22:16:01,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1359346.6666666667, ans=0.0 2023-12-23 22:16:10,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1359413.3333333333, ans=0.125 2023-12-23 22:16:14,318 INFO [train.py:886] (0/4) Epoch 43, batch 3750, loss[loss=0.01412, audio_tagging_loss=0.01412, over 24963.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4953702.18 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:16:15,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=15.0 2023-12-23 22:16:16,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1359480.0, ans=0.0 2023-12-23 22:16:20,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1359480.0, ans=0.125 2023-12-23 22:16:24,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1359546.6666666667, ans=0.125 2023-12-23 22:16:37,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1359613.3333333333, ans=0.125 2023-12-23 22:16:44,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1359680.0, ans=0.0 2023-12-23 22:16:50,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1359680.0, ans=0.125 2023-12-23 22:16:54,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1359746.6666666667, ans=0.5 2023-12-23 22:17:07,189 INFO [train.py:886] (0/4) Epoch 43, batch 3800, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24750.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4948515.68 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:17:14,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1359813.3333333333, ans=0.125 2023-12-23 22:17:15,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=15.0 2023-12-23 22:17:20,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1359880.0, ans=0.125 2023-12-23 22:17:24,800 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.435e+01 3.811e+01 3.969e+01 4.137e+01 5.499e+01, threshold=7.937e+01, percent-clipped=0.0 2023-12-23 22:17:31,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1359946.6666666667, ans=0.0 2023-12-23 22:17:34,860 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-204000.pt 2023-12-23 22:17:42,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1360013.3333333333, ans=0.1 2023-12-23 22:17:59,943 INFO [train.py:886] (0/4) Epoch 43, batch 3850, loss[loss=0.01466, audio_tagging_loss=0.01466, over 24750.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4938580.14 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:18:02,267 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.71 vs. limit=15.0 2023-12-23 22:18:05,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1360146.6666666667, ans=0.125 2023-12-23 22:18:11,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1360213.3333333333, ans=0.2 2023-12-23 22:18:17,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1360213.3333333333, ans=0.125 2023-12-23 22:18:23,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1360280.0, ans=0.09899494936611666 2023-12-23 22:18:26,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1360280.0, ans=0.2 2023-12-23 22:18:42,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1360413.3333333333, ans=0.125 2023-12-23 22:18:45,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1360413.3333333333, ans=0.125 2023-12-23 22:18:52,574 INFO [train.py:886] (0/4) Epoch 43, batch 3900, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4947141.32 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:18:53,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1360480.0, ans=0.125 2023-12-23 22:19:11,866 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.297e+01 3.810e+01 3.949e+01 4.143e+01 4.595e+01, threshold=7.898e+01, percent-clipped=0.0 2023-12-23 22:19:16,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.17 vs. limit=15.0 2023-12-23 22:19:39,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1360746.6666666667, ans=0.125 2023-12-23 22:19:45,230 INFO [train.py:886] (0/4) Epoch 43, batch 3950, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4954247.76 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:19:57,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1360880.0, ans=0.125 2023-12-23 22:20:05,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=6.92 vs. limit=15.0 2023-12-23 22:20:36,384 INFO [train.py:886] (0/4) Epoch 43, batch 4000, loss[loss=0.009448, audio_tagging_loss=0.009448, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4953734.19 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:20:39,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1361146.6666666667, ans=0.1 2023-12-23 22:20:55,669 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.315e+01 3.773e+01 3.926e+01 4.187e+01 4.858e+01, threshold=7.853e+01, percent-clipped=0.0 2023-12-23 22:21:08,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1361346.6666666667, ans=0.125 2023-12-23 22:21:09,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1361346.6666666667, ans=0.0 2023-12-23 22:21:13,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1361346.6666666667, ans=0.125 2023-12-23 22:21:17,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1361346.6666666667, ans=0.0 2023-12-23 22:21:19,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1361413.3333333333, ans=0.0 2023-12-23 22:21:28,980 INFO [train.py:886] (0/4) Epoch 43, batch 4050, loss[loss=0.01314, audio_tagging_loss=0.01314, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4956500.42 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:21:30,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.07 vs. limit=6.0 2023-12-23 22:21:45,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1361546.6666666667, ans=0.2 2023-12-23 22:21:57,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1361613.3333333333, ans=0.125 2023-12-23 22:22:10,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2023-12-23 22:22:12,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1361746.6666666667, ans=0.1 2023-12-23 22:22:17,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1361746.6666666667, ans=0.2 2023-12-23 22:22:20,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1361813.3333333333, ans=0.1 2023-12-23 22:22:21,371 INFO [train.py:886] (0/4) Epoch 43, batch 4100, loss[loss=0.01132, audio_tagging_loss=0.01132, over 24085.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4944704.31 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:22:31,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1361880.0, ans=0.2 2023-12-23 22:22:31,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1361880.0, ans=0.125 2023-12-23 22:22:35,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1361880.0, ans=0.125 2023-12-23 22:22:38,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1361880.0, ans=0.0 2023-12-23 22:22:39,745 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.438e+01 3.848e+01 3.982e+01 4.277e+01 4.961e+01, threshold=7.964e+01, percent-clipped=0.0 2023-12-23 22:22:57,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.47 vs. limit=15.0 2023-12-23 22:23:12,809 INFO [train.py:886] (0/4) Epoch 43, batch 4150, loss[loss=0.009302, audio_tagging_loss=0.009302, over 23976.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4940311.95 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:23:23,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1362213.3333333333, ans=0.1 2023-12-23 22:23:27,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1362213.3333333333, ans=0.0 2023-12-23 22:23:50,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2023-12-23 22:23:53,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1362413.3333333333, ans=0.0 2023-12-23 22:23:58,022 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:24:04,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1362480.0, ans=0.0 2023-12-23 22:24:05,199 INFO [train.py:886] (0/4) Epoch 43, batch 4200, loss[loss=0.01026, audio_tagging_loss=0.01026, over 24750.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4943804.94 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:24:06,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=12.0 2023-12-23 22:24:07,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.41 vs. limit=15.0 2023-12-23 22:24:14,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1362546.6666666667, ans=0.125 2023-12-23 22:24:16,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1362546.6666666667, ans=0.0 2023-12-23 22:24:23,809 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.368e+01 3.761e+01 3.920e+01 4.090e+01 4.676e+01, threshold=7.840e+01, percent-clipped=0.0 2023-12-23 22:24:55,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1362746.6666666667, ans=0.0 2023-12-23 22:24:57,374 INFO [train.py:886] (0/4) Epoch 43, batch 4250, loss[loss=0.01398, audio_tagging_loss=0.01398, over 25000.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4947189.26 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:25:05,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1362813.3333333333, ans=0.0 2023-12-23 22:25:14,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1362880.0, ans=0.04949747468305833 2023-12-23 22:25:25,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.45 vs. limit=22.5 2023-12-23 22:25:29,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1363013.3333333333, ans=0.0 2023-12-23 22:25:49,271 INFO [train.py:886] (0/4) Epoch 43, batch 4300, loss[loss=0.007998, audio_tagging_loss=0.007998, over 24045.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4950564.35 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:25:53,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1363146.6666666667, ans=0.1 2023-12-23 22:26:00,425 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:26:06,507 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.24 vs. limit=22.5 2023-12-23 22:26:08,481 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.459e+01 3.750e+01 3.975e+01 4.135e+01 4.734e+01, threshold=7.950e+01, percent-clipped=0.0 2023-12-23 22:26:08,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1363213.3333333333, ans=0.125 2023-12-23 22:26:09,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1363280.0, ans=0.125 2023-12-23 22:26:10,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.45 vs. limit=15.0 2023-12-23 22:26:17,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1363280.0, ans=0.0 2023-12-23 22:26:41,261 INFO [train.py:886] (0/4) Epoch 43, batch 4350, loss[loss=0.01189, audio_tagging_loss=0.01189, over 24750.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4954411.18 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:26:43,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1363480.0, ans=0.0 2023-12-23 22:26:48,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1363480.0, ans=0.125 2023-12-23 22:26:48,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1363480.0, ans=0.0 2023-12-23 22:26:52,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1363546.6666666667, ans=0.2 2023-12-23 22:26:53,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1363546.6666666667, ans=0.125 2023-12-23 22:27:28,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1363746.6666666667, ans=0.125 2023-12-23 22:27:29,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1363746.6666666667, ans=0.5 2023-12-23 22:27:31,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1363813.3333333333, ans=0.1 2023-12-23 22:27:32,590 INFO [train.py:886] (0/4) Epoch 43, batch 4400, loss[loss=0.01177, audio_tagging_loss=0.01177, over 24750.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4949576.29 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:27:34,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1363813.3333333333, ans=0.2 2023-12-23 22:27:34,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.43 vs. limit=15.0 2023-12-23 22:27:39,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1363813.3333333333, ans=0.2 2023-12-23 22:27:41,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1363813.3333333333, ans=0.125 2023-12-23 22:27:47,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1363880.0, ans=0.125 2023-12-23 22:27:52,464 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.368e+01 3.866e+01 4.022e+01 4.177e+01 4.881e+01, threshold=8.045e+01, percent-clipped=0.0 2023-12-23 22:28:23,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1364080.0, ans=0.125 2023-12-23 22:28:25,656 INFO [train.py:886] (0/4) Epoch 43, batch 4450, loss[loss=0.01169, audio_tagging_loss=0.01169, over 24750.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4948898.33 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:28:26,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1364146.6666666667, ans=0.1 2023-12-23 22:28:29,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1364146.6666666667, ans=0.125 2023-12-23 22:28:37,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1364213.3333333333, ans=0.125 2023-12-23 22:28:54,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-12-23 22:28:55,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1364280.0, ans=10.0 2023-12-23 22:28:59,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-12-23 22:29:05,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1364413.3333333333, ans=0.125 2023-12-23 22:29:05,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1364413.3333333333, ans=0.125 2023-12-23 22:29:17,686 INFO [train.py:886] (0/4) Epoch 43, batch 4500, loss[loss=0.009481, audio_tagging_loss=0.009481, over 24750.00 frames. ], tot_loss[loss=0.01123, audio_tagging_loss=0.01123, over 4949710.46 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:29:28,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1364546.6666666667, ans=10.0 2023-12-23 22:29:36,251 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.279e+01 3.827e+01 3.976e+01 4.152e+01 9.618e+01, threshold=7.952e+01, percent-clipped=1.0 2023-12-23 22:29:45,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2023-12-23 22:29:50,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1364680.0, ans=0.125 2023-12-23 22:30:08,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1364813.3333333333, ans=0.2 2023-12-23 22:30:09,437 INFO [train.py:886] (0/4) Epoch 43, batch 4550, loss[loss=0.007701, audio_tagging_loss=0.007701, over 25000.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4948154.57 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:30:17,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1364813.3333333333, ans=0.1 2023-12-23 22:31:01,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1365146.6666666667, ans=0.5 2023-12-23 22:31:02,236 INFO [train.py:886] (0/4) Epoch 43, batch 4600, loss[loss=0.01128, audio_tagging_loss=0.01128, over 24750.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4954187.17 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:31:08,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1365146.6666666667, ans=0.5 2023-12-23 22:31:19,361 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.512e+01 3.855e+01 4.026e+01 4.238e+01 4.840e+01, threshold=8.052e+01, percent-clipped=0.0 2023-12-23 22:31:32,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1365346.6666666667, ans=0.125 2023-12-23 22:31:45,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1365413.3333333333, ans=0.125 2023-12-23 22:31:49,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1365413.3333333333, ans=0.0 2023-12-23 22:31:52,332 INFO [train.py:886] (0/4) Epoch 43, batch 4650, loss[loss=0.01216, audio_tagging_loss=0.01216, over 25000.00 frames. ], tot_loss[loss=0.01113, audio_tagging_loss=0.01113, over 4954716.53 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:31:56,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1365480.0, ans=0.0 2023-12-23 22:31:56,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1365480.0, ans=0.125 2023-12-23 22:32:03,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=22.5 2023-12-23 22:32:09,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1365546.6666666667, ans=0.125 2023-12-23 22:32:33,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.01 vs. limit=15.0 2023-12-23 22:32:38,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1365746.6666666667, ans=0.125 2023-12-23 22:32:38,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1365746.6666666667, ans=0.1 2023-12-23 22:32:43,782 INFO [train.py:886] (0/4) Epoch 43, batch 4700, loss[loss=0.01516, audio_tagging_loss=0.01516, over 24750.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4949089.43 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:32:50,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1365813.3333333333, ans=0.04949747468305833 2023-12-23 22:32:58,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-12-23 22:33:00,263 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.428e+01 3.819e+01 4.036e+01 4.197e+01 4.891e+01, threshold=8.073e+01, percent-clipped=0.0 2023-12-23 22:33:01,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.73 vs. limit=8.0 2023-12-23 22:33:12,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1366013.3333333333, ans=0.0 2023-12-23 22:33:25,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1366080.0, ans=0.1 2023-12-23 22:33:26,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1366080.0, ans=0.07 2023-12-23 22:33:29,792 INFO [train.py:886] (0/4) Epoch 43, batch 4750, loss[loss=0.01312, audio_tagging_loss=0.01312, over 24750.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4945995.96 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:33:37,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.37 vs. limit=22.5 2023-12-23 22:33:41,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1366213.3333333333, ans=0.125 2023-12-23 22:33:45,695 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-43.pt 2023-12-23 22:34:06,057 INFO [train.py:886] (0/4) Epoch 44, batch 0, loss[loss=0.02938, audio_tagging_loss=0.02938, over 20955.00 frames. ], tot_loss[loss=0.02938, audio_tagging_loss=0.02938, over 20955.00 frames. ], batch size: 107, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:34:06,059 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 22:34:27,396 INFO [train.py:917] (0/4) Epoch 44, validation: loss=0.03574, audio_tagging_loss=0.03574, over 3737520.00 frames. 2023-12-23 22:34:27,397 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 22:34:33,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1366253.3333333333, ans=0.125 2023-12-23 22:34:42,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1366320.0, ans=0.125 2023-12-23 22:34:44,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2023-12-23 22:34:55,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1366386.6666666667, ans=0.1 2023-12-23 22:35:00,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1366453.3333333333, ans=0.125 2023-12-23 22:35:07,387 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:35:15,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1366520.0, ans=0.125 2023-12-23 22:35:17,582 INFO [train.py:886] (0/4) Epoch 44, batch 50, loss[loss=0.01211, audio_tagging_loss=0.01211, over 24064.00 frames. ], tot_loss[loss=0.01796, audio_tagging_loss=0.01796, over 1119197.15 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:35:20,386 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.606e+01 4.065e+01 4.610e+01 5.536e+01 1.097e+02, threshold=9.221e+01, percent-clipped=8.0 2023-12-23 22:35:20,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1366586.6666666667, ans=0.2 2023-12-23 22:35:25,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1366586.6666666667, ans=0.125 2023-12-23 22:35:41,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1366720.0, ans=0.2 2023-12-23 22:35:53,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1366786.6666666667, ans=0.125 2023-12-23 22:35:55,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1366786.6666666667, ans=0.125 2023-12-23 22:35:56,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1366853.3333333333, ans=0.125 2023-12-23 22:36:03,450 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.17 vs. limit=15.0 2023-12-23 22:36:08,384 INFO [train.py:886] (0/4) Epoch 44, batch 100, loss[loss=0.01218, audio_tagging_loss=0.01218, over 25000.00 frames. ], tot_loss[loss=0.01565, audio_tagging_loss=0.01565, over 1971790.76 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:36:09,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1366920.0, ans=0.1 2023-12-23 22:36:16,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1366920.0, ans=0.0 2023-12-23 22:36:17,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1366986.6666666667, ans=0.125 2023-12-23 22:36:22,871 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.17 vs. limit=10.0 2023-12-23 22:36:29,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1367053.3333333333, ans=0.0 2023-12-23 22:36:39,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1367120.0, ans=0.125 2023-12-23 22:36:40,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1367120.0, ans=0.05 2023-12-23 22:36:41,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1367120.0, ans=0.125 2023-12-23 22:36:49,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1367186.6666666667, ans=0.125 2023-12-23 22:36:52,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1367186.6666666667, ans=0.125 2023-12-23 22:36:54,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1367186.6666666667, ans=0.125 2023-12-23 22:36:55,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1367186.6666666667, ans=0.5 2023-12-23 22:36:56,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1367186.6666666667, ans=0.125 2023-12-23 22:36:59,352 INFO [train.py:886] (0/4) Epoch 44, batch 150, loss[loss=0.01184, audio_tagging_loss=0.01184, over 24750.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 2634681.29 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:36:59,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1367253.3333333333, ans=0.125 2023-12-23 22:37:02,155 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.808e+01 4.071e+01 4.284e+01 4.499e+01 5.493e+01, threshold=8.567e+01, percent-clipped=0.0 2023-12-23 22:37:11,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1367320.0, ans=0.125 2023-12-23 22:37:12,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1367320.0, ans=0.125 2023-12-23 22:37:49,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1367520.0, ans=0.2 2023-12-23 22:37:51,613 INFO [train.py:886] (0/4) Epoch 44, batch 200, loss[loss=0.01163, audio_tagging_loss=0.01163, over 25000.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 3151634.71 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:37:58,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1367586.6666666667, ans=0.0 2023-12-23 22:38:17,807 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2023-12-23 22:38:23,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1367786.6666666667, ans=0.0 2023-12-23 22:38:24,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.05 vs. limit=6.0 2023-12-23 22:38:30,821 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2023-12-23 22:38:35,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1367853.3333333333, ans=0.1 2023-12-23 22:38:38,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1367853.3333333333, ans=0.125 2023-12-23 22:38:42,577 INFO [train.py:886] (0/4) Epoch 44, batch 250, loss[loss=0.01178, audio_tagging_loss=0.01178, over 25000.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 3553783.34 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:38:45,354 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.532e+01 3.886e+01 4.052e+01 4.204e+01 5.117e+01, threshold=8.104e+01, percent-clipped=0.0 2023-12-23 22:38:49,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1367920.0, ans=0.125 2023-12-23 22:39:02,363 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2023-12-23 22:39:09,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1368053.3333333333, ans=0.05 2023-12-23 22:39:11,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2023-12-23 22:39:25,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1368186.6666666667, ans=0.0 2023-12-23 22:39:34,328 INFO [train.py:886] (0/4) Epoch 44, batch 300, loss[loss=0.01297, audio_tagging_loss=0.01297, over 22052.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 3860144.52 frames. ], batch size: 107, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:39:55,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1368386.6666666667, ans=0.0 2023-12-23 22:39:59,356 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2023-12-23 22:40:02,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1368386.6666666667, ans=0.2 2023-12-23 22:40:26,300 INFO [train.py:886] (0/4) Epoch 44, batch 350, loss[loss=0.01264, audio_tagging_loss=0.01264, over 24750.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4098142.52 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:40:29,093 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.502e+01 3.809e+01 3.953e+01 4.148e+01 4.528e+01, threshold=7.906e+01, percent-clipped=0.0 2023-12-23 22:40:36,905 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2023-12-23 22:40:43,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1368653.3333333333, ans=0.125 2023-12-23 22:40:44,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1368653.3333333333, ans=0.125 2023-12-23 22:40:54,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1368720.0, ans=0.05 2023-12-23 22:40:55,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1368720.0, ans=0.125 2023-12-23 22:41:16,689 INFO [train.py:886] (0/4) Epoch 44, batch 400, loss[loss=0.01124, audio_tagging_loss=0.01124, over 25000.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4283819.64 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:41:21,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1368920.0, ans=0.0 2023-12-23 22:41:22,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1368920.0, ans=0.125 2023-12-23 22:41:32,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1368986.6666666667, ans=0.0 2023-12-23 22:41:46,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1369120.0, ans=0.125 2023-12-23 22:41:59,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=12.0 2023-12-23 22:42:06,115 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2023-12-23 22:42:08,312 INFO [train.py:886] (0/4) Epoch 44, batch 450, loss[loss=0.01091, audio_tagging_loss=0.01091, over 21806.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4431321.25 frames. ], batch size: 107, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:42:11,758 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.414e+01 3.763e+01 3.912e+01 4.071e+01 4.674e+01, threshold=7.824e+01, percent-clipped=0.0 2023-12-23 22:42:17,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1369320.0, ans=0.125 2023-12-23 22:42:19,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1369320.0, ans=0.0 2023-12-23 22:42:21,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1369320.0, ans=0.0 2023-12-23 22:42:28,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1369386.6666666667, ans=0.125 2023-12-23 22:42:57,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1369520.0, ans=0.0 2023-12-23 22:42:59,428 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2023-12-23 22:42:59,943 INFO [train.py:886] (0/4) Epoch 44, batch 500, loss[loss=0.01311, audio_tagging_loss=0.01311, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4547486.05 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:43:20,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1369720.0, ans=0.1 2023-12-23 22:43:20,581 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2023-12-23 22:43:23,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1369720.0, ans=0.0 2023-12-23 22:43:32,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2023-12-23 22:43:35,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1369786.6666666667, ans=0.1 2023-12-23 22:43:37,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1369786.6666666667, ans=0.125 2023-12-23 22:43:38,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2023-12-23 22:43:47,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1369853.3333333333, ans=0.1 2023-12-23 22:43:51,615 INFO [train.py:886] (0/4) Epoch 44, batch 550, loss[loss=0.01281, audio_tagging_loss=0.01281, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4638037.90 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:43:51,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1369920.0, ans=0.2 2023-12-23 22:43:53,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1369920.0, ans=0.0 2023-12-23 22:43:53,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1369920.0, ans=0.125 2023-12-23 22:43:54,463 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.575e+01 3.800e+01 3.977e+01 4.148e+01 4.797e+01, threshold=7.954e+01, percent-clipped=0.0 2023-12-23 22:44:07,443 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:44:12,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1370053.3333333333, ans=0.0 2023-12-23 22:44:16,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=15.0 2023-12-23 22:44:19,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1370053.3333333333, ans=0.0 2023-12-23 22:44:20,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1370120.0, ans=0.2 2023-12-23 22:44:29,852 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:44:41,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2023-12-23 22:44:43,220 INFO [train.py:886] (0/4) Epoch 44, batch 600, loss[loss=0.01058, audio_tagging_loss=0.01058, over 25000.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4706278.57 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:44:51,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1370253.3333333333, ans=0.0 2023-12-23 22:45:17,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0 2023-12-23 22:45:19,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.65 vs. limit=10.0 2023-12-23 22:45:34,270 INFO [train.py:886] (0/4) Epoch 44, batch 650, loss[loss=0.01059, audio_tagging_loss=0.01059, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4753028.93 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:45:37,794 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.451e+01 3.789e+01 3.956e+01 4.142e+01 5.204e+01, threshold=7.912e+01, percent-clipped=0.0 2023-12-23 22:45:39,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.34 vs. limit=22.5 2023-12-23 22:45:49,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1370653.3333333333, ans=0.125 2023-12-23 22:45:51,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1370653.3333333333, ans=0.125 2023-12-23 22:45:53,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1370653.3333333333, ans=0.05 2023-12-23 22:45:55,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2023-12-23 22:46:04,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1370786.6666666667, ans=0.125 2023-12-23 22:46:10,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1370786.6666666667, ans=0.125 2023-12-23 22:46:14,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1370786.6666666667, ans=0.0 2023-12-23 22:46:22,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1370853.3333333333, ans=0.0 2023-12-23 22:46:26,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1370920.0, ans=0.125 2023-12-23 22:46:26,825 INFO [train.py:886] (0/4) Epoch 44, batch 700, loss[loss=0.01448, audio_tagging_loss=0.01448, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4795129.82 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:46:28,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2023-12-23 22:46:30,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1370920.0, ans=0.0 2023-12-23 22:46:39,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1370986.6666666667, ans=0.125 2023-12-23 22:46:47,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1371053.3333333333, ans=0.2 2023-12-23 22:47:17,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.32 vs. limit=15.0 2023-12-23 22:47:18,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1371253.3333333333, ans=0.0 2023-12-23 22:47:19,227 INFO [train.py:886] (0/4) Epoch 44, batch 750, loss[loss=0.01002, audio_tagging_loss=0.01002, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4830139.06 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:47:21,985 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.438e+01 3.757e+01 3.902e+01 4.113e+01 5.017e+01, threshold=7.805e+01, percent-clipped=0.0 2023-12-23 22:47:35,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2023-12-23 22:47:37,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2023-12-23 22:47:55,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1371453.3333333333, ans=0.0 2023-12-23 22:48:09,721 INFO [train.py:886] (0/4) Epoch 44, batch 800, loss[loss=0.01122, audio_tagging_loss=0.01122, over 25000.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4865600.25 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:48:32,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1371720.0, ans=0.125 2023-12-23 22:48:33,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1371720.0, ans=0.05 2023-12-23 22:48:44,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1371786.6666666667, ans=0.0 2023-12-23 22:49:02,934 INFO [train.py:886] (0/4) Epoch 44, batch 850, loss[loss=0.009994, audio_tagging_loss=0.009994, over 24750.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4881791.29 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:49:05,729 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.453e+01 3.773e+01 3.934e+01 4.147e+01 6.054e+01, threshold=7.868e+01, percent-clipped=0.0 2023-12-23 22:49:05,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1371920.0, ans=0.125 2023-12-23 22:49:11,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1371986.6666666667, ans=0.125 2023-12-23 22:49:16,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=12.0 2023-12-23 22:49:17,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.62 vs. limit=22.5 2023-12-23 22:49:21,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1372053.3333333333, ans=0.125 2023-12-23 22:49:23,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1372053.3333333333, ans=0.125 2023-12-23 22:49:27,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1372053.3333333333, ans=0.2 2023-12-23 22:49:40,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1372120.0, ans=0.2 2023-12-23 22:49:46,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1372186.6666666667, ans=0.0 2023-12-23 22:49:53,205 INFO [train.py:886] (0/4) Epoch 44, batch 900, loss[loss=0.01085, audio_tagging_loss=0.01085, over 24750.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4899026.01 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:49:55,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1372253.3333333333, ans=0.125 2023-12-23 22:50:01,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.69 vs. limit=10.0 2023-12-23 22:50:02,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2023-12-23 22:50:04,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1372320.0, ans=0.125 2023-12-23 22:50:14,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1372386.6666666667, ans=0.0 2023-12-23 22:50:21,499 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.61 vs. limit=15.0 2023-12-23 22:50:35,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=12.0 2023-12-23 22:50:36,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1372520.0, ans=0.1 2023-12-23 22:50:45,612 INFO [train.py:886] (0/4) Epoch 44, batch 950, loss[loss=0.009836, audio_tagging_loss=0.009836, over 24007.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4909931.85 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:50:48,460 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.389e+01 3.853e+01 3.991e+01 4.175e+01 5.097e+01, threshold=7.983e+01, percent-clipped=0.0 2023-12-23 22:50:59,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1372653.3333333333, ans=0.125 2023-12-23 22:51:16,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1372786.6666666667, ans=0.125 2023-12-23 22:51:25,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1372853.3333333333, ans=0.125 2023-12-23 22:51:38,148 INFO [train.py:886] (0/4) Epoch 44, batch 1000, loss[loss=0.01199, audio_tagging_loss=0.01199, over 24750.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4917820.79 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:51:39,376 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:51:46,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1372920.0, ans=0.125 2023-12-23 22:52:19,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1373186.6666666667, ans=0.0 2023-12-23 22:52:24,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1373186.6666666667, ans=0.1 2023-12-23 22:52:24,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1373186.6666666667, ans=0.125 2023-12-23 22:52:24,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1373186.6666666667, ans=0.125 2023-12-23 22:52:27,677 INFO [train.py:886] (0/4) Epoch 44, batch 1050, loss[loss=0.0118, audio_tagging_loss=0.0118, over 25000.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4921751.66 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:52:31,161 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.448e+01 3.809e+01 4.004e+01 4.174e+01 4.765e+01, threshold=8.009e+01, percent-clipped=0.0 2023-12-23 22:52:32,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1373253.3333333333, ans=0.125 2023-12-23 22:52:46,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1373320.0, ans=0.1 2023-12-23 22:52:48,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1373386.6666666667, ans=0.125 2023-12-23 22:52:52,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1373386.6666666667, ans=0.125 2023-12-23 22:53:10,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1373520.0, ans=0.0 2023-12-23 22:53:11,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2023-12-23 22:53:15,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1373520.0, ans=0.0 2023-12-23 22:53:21,033 INFO [train.py:886] (0/4) Epoch 44, batch 1100, loss[loss=0.01138, audio_tagging_loss=0.01138, over 25000.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4927218.47 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:53:38,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=1373653.3333333333, ans=0.2 2023-12-23 22:53:57,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1373786.6666666667, ans=0.125 2023-12-23 22:53:59,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2023-12-23 22:54:12,603 INFO [train.py:886] (0/4) Epoch 44, batch 1150, loss[loss=0.01013, audio_tagging_loss=0.01013, over 25000.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4930546.60 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:54:16,271 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.466e+01 3.752e+01 3.932e+01 4.115e+01 4.811e+01, threshold=7.864e+01, percent-clipped=0.0 2023-12-23 22:54:20,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1373920.0, ans=0.125 2023-12-23 22:54:21,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1373920.0, ans=0.125 2023-12-23 22:54:27,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1373986.6666666667, ans=0.2 2023-12-23 22:54:37,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1374053.3333333333, ans=0.025 2023-12-23 22:55:04,706 INFO [train.py:886] (0/4) Epoch 44, batch 1200, loss[loss=0.01465, audio_tagging_loss=0.01465, over 25000.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4937795.16 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:55:07,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1374253.3333333333, ans=0.0 2023-12-23 22:55:17,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1374320.0, ans=0.0 2023-12-23 22:55:20,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1374320.0, ans=0.0 2023-12-23 22:55:21,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1374320.0, ans=0.125 2023-12-23 22:55:24,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1374386.6666666667, ans=0.125 2023-12-23 22:55:34,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1374453.3333333333, ans=0.2 2023-12-23 22:55:56,753 INFO [train.py:886] (0/4) Epoch 44, batch 1250, loss[loss=0.01047, audio_tagging_loss=0.01047, over 24750.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4931205.36 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:56:00,329 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.517e+01 3.822e+01 4.031e+01 4.210e+01 4.983e+01, threshold=8.061e+01, percent-clipped=0.0 2023-12-23 22:56:08,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1374653.3333333333, ans=0.1 2023-12-23 22:56:10,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1374653.3333333333, ans=0.125 2023-12-23 22:56:12,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1374653.3333333333, ans=0.05 2023-12-23 22:56:32,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1374786.6666666667, ans=0.1 2023-12-23 22:56:43,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=15.0 2023-12-23 22:56:47,067 INFO [train.py:886] (0/4) Epoch 44, batch 1300, loss[loss=0.01159, audio_tagging_loss=0.01159, over 25000.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4934643.73 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:57:10,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1375053.3333333333, ans=0.0 2023-12-23 22:57:10,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1375053.3333333333, ans=0.125 2023-12-23 22:57:11,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=15.0 2023-12-23 22:57:31,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1375186.6666666667, ans=0.0 2023-12-23 22:57:33,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1375186.6666666667, ans=0.1 2023-12-23 22:57:39,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1375253.3333333333, ans=0.1 2023-12-23 22:57:39,988 INFO [train.py:886] (0/4) Epoch 44, batch 1350, loss[loss=0.01381, audio_tagging_loss=0.01381, over 23988.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4935483.51 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:57:42,818 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.513e+01 3.826e+01 3.975e+01 4.146e+01 4.619e+01, threshold=7.951e+01, percent-clipped=0.0 2023-12-23 22:57:58,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1375320.0, ans=0.0 2023-12-23 22:58:06,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1375386.6666666667, ans=0.125 2023-12-23 22:58:19,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1375453.3333333333, ans=0.1 2023-12-23 22:58:29,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1375520.0, ans=0.1 2023-12-23 22:58:32,843 INFO [train.py:886] (0/4) Epoch 44, batch 1400, loss[loss=0.01103, audio_tagging_loss=0.01103, over 24750.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4929493.51 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:58:46,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1375653.3333333333, ans=0.0 2023-12-23 22:58:49,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1375653.3333333333, ans=0.1 2023-12-23 22:58:51,847 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:59:00,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1375720.0, ans=0.125 2023-12-23 22:59:10,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.79 vs. limit=12.0 2023-12-23 22:59:14,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1375853.3333333333, ans=10.0 2023-12-23 22:59:16,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1375853.3333333333, ans=0.125 2023-12-23 22:59:23,960 INFO [train.py:886] (0/4) Epoch 44, batch 1450, loss[loss=0.0106, audio_tagging_loss=0.0106, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4936968.64 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:59:26,774 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.506e+01 3.727e+01 3.932e+01 4.156e+01 5.029e+01, threshold=7.864e+01, percent-clipped=0.0 2023-12-23 22:59:44,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1376053.3333333333, ans=0.2 2023-12-23 22:59:50,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1376053.3333333333, ans=0.125 2023-12-23 22:59:57,025 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2023-12-23 23:00:16,695 INFO [train.py:886] (0/4) Epoch 44, batch 1500, loss[loss=0.01109, audio_tagging_loss=0.01109, over 25000.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4935169.14 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:00:18,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1376253.3333333333, ans=0.125 2023-12-23 23:00:18,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1376253.3333333333, ans=0.125 2023-12-23 23:00:28,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1376320.0, ans=0.2 2023-12-23 23:00:28,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1376320.0, ans=0.2 2023-12-23 23:00:45,596 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:00:46,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.88 vs. limit=15.0 2023-12-23 23:00:50,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1376453.3333333333, ans=0.125 2023-12-23 23:00:51,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1376453.3333333333, ans=0.125 2023-12-23 23:01:09,439 INFO [train.py:886] (0/4) Epoch 44, batch 1550, loss[loss=0.01165, audio_tagging_loss=0.01165, over 24750.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4940021.38 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:01:13,027 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.562e+01 3.917e+01 4.065e+01 4.220e+01 5.107e+01, threshold=8.130e+01, percent-clipped=0.0 2023-12-23 23:01:25,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2023-12-23 23:01:29,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1376720.0, ans=0.1 2023-12-23 23:01:36,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1376720.0, ans=0.125 2023-12-23 23:01:51,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1376853.3333333333, ans=0.125 2023-12-23 23:01:54,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1376853.3333333333, ans=0.2 2023-12-23 23:02:00,079 INFO [train.py:886] (0/4) Epoch 44, batch 1600, loss[loss=0.009166, audio_tagging_loss=0.009166, over 24750.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4936248.86 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:02:11,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1376986.6666666667, ans=0.125 2023-12-23 23:02:14,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.71 vs. limit=22.5 2023-12-23 23:02:14,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.01 vs. limit=22.5 2023-12-23 23:02:17,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1376986.6666666667, ans=0.125 2023-12-23 23:02:17,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1376986.6666666667, ans=0.0 2023-12-23 23:02:33,265 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.39 vs. limit=15.0 2023-12-23 23:02:36,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1377120.0, ans=0.125 2023-12-23 23:02:39,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1377120.0, ans=0.125 2023-12-23 23:02:51,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1377253.3333333333, ans=0.125 2023-12-23 23:02:52,375 INFO [train.py:886] (0/4) Epoch 44, batch 1650, loss[loss=0.009962, audio_tagging_loss=0.009962, over 24750.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4931347.92 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:02:55,194 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.228e+01 3.859e+01 4.003e+01 4.218e+01 7.648e+01, threshold=8.006e+01, percent-clipped=0.0 2023-12-23 23:02:59,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1377253.3333333333, ans=0.025 2023-12-23 23:03:05,874 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2023-12-23 23:03:23,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1377453.3333333333, ans=0.2 2023-12-23 23:03:39,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.61 vs. limit=15.0 2023-12-23 23:03:43,276 INFO [train.py:886] (0/4) Epoch 44, batch 1700, loss[loss=0.01096, audio_tagging_loss=0.01096, over 25000.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4934928.81 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:03:48,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-12-23 23:03:57,094 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2023-12-23 23:03:59,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1377653.3333333333, ans=0.0 2023-12-23 23:04:03,131 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.35 vs. limit=15.0 2023-12-23 23:04:06,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1377720.0, ans=0.125 2023-12-23 23:04:10,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1377720.0, ans=0.125 2023-12-23 23:04:14,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1377786.6666666667, ans=0.0 2023-12-23 23:04:35,377 INFO [train.py:886] (0/4) Epoch 44, batch 1750, loss[loss=0.01173, audio_tagging_loss=0.01173, over 25000.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4940448.66 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:04:38,160 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.480e+01 3.774e+01 3.975e+01 4.116e+01 4.755e+01, threshold=7.949e+01, percent-clipped=0.0 2023-12-23 23:04:53,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2023-12-23 23:05:13,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1378120.0, ans=0.0 2023-12-23 23:05:23,594 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:05:27,871 INFO [train.py:886] (0/4) Epoch 44, batch 1800, loss[loss=0.01136, audio_tagging_loss=0.01136, over 25000.00 frames. ], tot_loss[loss=0.01113, audio_tagging_loss=0.01113, over 4943972.95 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:05:29,780 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:05:30,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1378253.3333333333, ans=0.1 2023-12-23 23:06:05,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1378453.3333333333, ans=0.0 2023-12-23 23:06:14,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1378520.0, ans=0.125 2023-12-23 23:06:18,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1378586.6666666667, ans=0.125 2023-12-23 23:06:19,000 INFO [train.py:886] (0/4) Epoch 44, batch 1850, loss[loss=0.01292, audio_tagging_loss=0.01292, over 24750.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4948934.31 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:06:21,832 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.258e+01 3.867e+01 4.025e+01 4.218e+01 4.619e+01, threshold=8.051e+01, percent-clipped=0.0 2023-12-23 23:06:26,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1378586.6666666667, ans=0.125 2023-12-23 23:06:31,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1378653.3333333333, ans=0.125 2023-12-23 23:06:36,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1378653.3333333333, ans=0.125 2023-12-23 23:06:38,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.46 vs. limit=22.5 2023-12-23 23:06:55,623 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:07:10,739 INFO [train.py:886] (0/4) Epoch 44, batch 1900, loss[loss=0.01323, audio_tagging_loss=0.01323, over 24750.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4948455.30 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:07:18,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=22.5 2023-12-23 23:07:26,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1378986.6666666667, ans=0.125 2023-12-23 23:07:34,912 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:07:47,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1379120.0, ans=0.125 2023-12-23 23:08:02,092 INFO [train.py:886] (0/4) Epoch 44, batch 1950, loss[loss=0.01051, audio_tagging_loss=0.01051, over 24750.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4946406.32 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:08:05,534 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.480e+01 3.823e+01 4.045e+01 4.197e+01 4.926e+01, threshold=8.090e+01, percent-clipped=0.0 2023-12-23 23:08:07,758 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2023-12-23 23:08:15,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1379320.0, ans=0.125 2023-12-23 23:08:20,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1379320.0, ans=0.125 2023-12-23 23:08:20,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1379320.0, ans=0.1 2023-12-23 23:08:20,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1379320.0, ans=0.0 2023-12-23 23:08:26,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1379386.6666666667, ans=0.1 2023-12-23 23:08:26,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1379386.6666666667, ans=0.2 2023-12-23 23:08:41,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1379453.3333333333, ans=0.2 2023-12-23 23:08:49,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1379520.0, ans=0.125 2023-12-23 23:08:53,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1379586.6666666667, ans=0.0 2023-12-23 23:08:54,353 INFO [train.py:886] (0/4) Epoch 44, batch 2000, loss[loss=0.01017, audio_tagging_loss=0.01017, over 25000.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4943143.79 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:09:04,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1379653.3333333333, ans=0.2 2023-12-23 23:09:06,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2023-12-23 23:09:07,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1379653.3333333333, ans=0.125 2023-12-23 23:09:08,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=1379653.3333333333, ans=0.1 2023-12-23 23:09:20,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1379720.0, ans=0.125 2023-12-23 23:09:29,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1379786.6666666667, ans=0.125 2023-12-23 23:09:31,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1379786.6666666667, ans=0.0 2023-12-23 23:09:39,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1379853.3333333333, ans=0.0 2023-12-23 23:09:43,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1379853.3333333333, ans=0.1 2023-12-23 23:09:46,496 INFO [train.py:886] (0/4) Epoch 44, batch 2050, loss[loss=0.0123, audio_tagging_loss=0.0123, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4948154.89 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:09:49,334 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.403e+01 3.844e+01 3.984e+01 4.167e+01 5.134e+01, threshold=7.969e+01, percent-clipped=0.0 2023-12-23 23:09:56,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1379986.6666666667, ans=0.1 2023-12-23 23:10:25,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1380186.6666666667, ans=0.1 2023-12-23 23:10:25,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1380186.6666666667, ans=0.125 2023-12-23 23:10:27,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1380186.6666666667, ans=0.025 2023-12-23 23:10:35,733 INFO [train.py:886] (0/4) Epoch 44, batch 2100, loss[loss=0.008577, audio_tagging_loss=0.008577, over 25000.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4952348.68 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:10:38,222 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.23 vs. limit=15.0 2023-12-23 23:10:39,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1380253.3333333333, ans=0.125 2023-12-23 23:10:44,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2023-12-23 23:11:03,234 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:11:07,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2023-12-23 23:11:08,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1380453.3333333333, ans=0.0 2023-12-23 23:11:19,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1380520.0, ans=0.125 2023-12-23 23:11:28,052 INFO [train.py:886] (0/4) Epoch 44, batch 2150, loss[loss=0.01521, audio_tagging_loss=0.01521, over 24959.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4952724.57 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:11:30,900 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.445e+01 3.811e+01 3.971e+01 4.151e+01 4.761e+01, threshold=7.942e+01, percent-clipped=0.0 2023-12-23 23:11:37,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1380653.3333333333, ans=0.125 2023-12-23 23:12:09,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1380853.3333333333, ans=0.125 2023-12-23 23:12:10,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1380853.3333333333, ans=0.0 2023-12-23 23:12:11,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1380853.3333333333, ans=0.0 2023-12-23 23:12:12,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1380853.3333333333, ans=0.1 2023-12-23 23:12:18,072 INFO [train.py:886] (0/4) Epoch 44, batch 2200, loss[loss=0.01064, audio_tagging_loss=0.01064, over 24750.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4947803.81 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:12:24,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1380920.0, ans=0.09899494936611666 2023-12-23 23:12:29,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.62 vs. limit=15.0 2023-12-23 23:12:30,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1380986.6666666667, ans=0.125 2023-12-23 23:13:03,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1381186.6666666667, ans=0.0 2023-12-23 23:13:06,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1381186.6666666667, ans=0.125 2023-12-23 23:13:08,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1381186.6666666667, ans=0.125 2023-12-23 23:13:09,610 INFO [train.py:886] (0/4) Epoch 44, batch 2250, loss[loss=0.01102, audio_tagging_loss=0.01102, over 24750.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4942314.00 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:13:12,392 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.439e+01 3.814e+01 4.052e+01 4.266e+01 4.733e+01, threshold=8.103e+01, percent-clipped=0.0 2023-12-23 23:13:14,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1381253.3333333333, ans=0.125 2023-12-23 23:13:24,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1381320.0, ans=0.125 2023-12-23 23:13:47,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2023-12-23 23:13:54,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1381520.0, ans=15.0 2023-12-23 23:14:02,105 INFO [train.py:886] (0/4) Epoch 44, batch 2300, loss[loss=0.01192, audio_tagging_loss=0.01192, over 22443.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4941935.47 frames. ], batch size: 107, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:14:08,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=1381586.6666666667, ans=0.02 2023-12-23 23:14:10,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1381586.6666666667, ans=0.1 2023-12-23 23:14:25,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1381720.0, ans=0.125 2023-12-23 23:14:31,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1381720.0, ans=0.2 2023-12-23 23:14:53,648 INFO [train.py:886] (0/4) Epoch 44, batch 2350, loss[loss=0.0107, audio_tagging_loss=0.0107, over 25000.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4942554.76 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:14:57,184 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.396e+01 3.788e+01 3.954e+01 4.114e+01 5.032e+01, threshold=7.908e+01, percent-clipped=0.0 2023-12-23 23:15:18,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=1382053.3333333333, ans=15.0 2023-12-23 23:15:18,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=1382053.3333333333, ans=0.2 2023-12-23 23:15:22,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1382053.3333333333, ans=0.125 2023-12-23 23:15:24,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1382120.0, ans=0.1 2023-12-23 23:15:29,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1382120.0, ans=0.0 2023-12-23 23:15:35,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1382186.6666666667, ans=0.0 2023-12-23 23:15:38,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.88 vs. limit=10.0 2023-12-23 23:15:46,383 INFO [train.py:886] (0/4) Epoch 44, batch 2400, loss[loss=0.0117, audio_tagging_loss=0.0117, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4946697.18 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:15:48,482 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:15:55,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1382320.0, ans=0.0 2023-12-23 23:16:09,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1382386.6666666667, ans=0.0 2023-12-23 23:16:10,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1382386.6666666667, ans=0.2 2023-12-23 23:16:29,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=22.5 2023-12-23 23:16:34,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2023-12-23 23:16:38,128 INFO [train.py:886] (0/4) Epoch 44, batch 2450, loss[loss=0.013, audio_tagging_loss=0.013, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4950366.47 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:16:41,716 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.351e+01 3.765e+01 3.945e+01 4.099e+01 8.510e+01, threshold=7.890e+01, percent-clipped=1.0 2023-12-23 23:16:44,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1382586.6666666667, ans=0.125 2023-12-23 23:16:59,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1382720.0, ans=0.2 2023-12-23 23:17:11,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1382786.6666666667, ans=0.125 2023-12-23 23:17:16,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.68 vs. limit=22.5 2023-12-23 23:17:20,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1382853.3333333333, ans=0.07 2023-12-23 23:17:30,031 INFO [train.py:886] (0/4) Epoch 44, batch 2500, loss[loss=0.01002, audio_tagging_loss=0.01002, over 25000.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4948951.79 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:17:32,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1382920.0, ans=0.2 2023-12-23 23:18:20,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.01 vs. limit=10.0 2023-12-23 23:18:22,365 INFO [train.py:886] (0/4) Epoch 44, batch 2550, loss[loss=0.01033, audio_tagging_loss=0.01033, over 24750.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4946351.96 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:18:25,167 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.531e+01 3.915e+01 4.071e+01 4.211e+01 5.058e+01, threshold=8.142e+01, percent-clipped=0.0 2023-12-23 23:18:35,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1383320.0, ans=0.125 2023-12-23 23:18:43,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1383386.6666666667, ans=0.0 2023-12-23 23:18:59,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.97 vs. limit=6.0 2023-12-23 23:19:07,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.51 vs. limit=12.0 2023-12-23 23:19:08,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-12-23 23:19:15,050 INFO [train.py:886] (0/4) Epoch 44, batch 2600, loss[loss=0.009629, audio_tagging_loss=0.009629, over 25000.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4951819.62 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:19:25,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1383653.3333333333, ans=0.125 2023-12-23 23:19:38,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1383720.0, ans=0.0 2023-12-23 23:19:40,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1383720.0, ans=0.125 2023-12-23 23:20:05,555 INFO [train.py:886] (0/4) Epoch 44, batch 2650, loss[loss=0.01127, audio_tagging_loss=0.01127, over 25000.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4952808.98 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:20:09,130 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.375e+01 3.827e+01 4.015e+01 4.224e+01 5.047e+01, threshold=8.029e+01, percent-clipped=0.0 2023-12-23 23:20:32,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1384053.3333333333, ans=0.125 2023-12-23 23:20:51,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1384186.6666666667, ans=0.1 2023-12-23 23:20:59,274 INFO [train.py:886] (0/4) Epoch 44, batch 2700, loss[loss=0.01182, audio_tagging_loss=0.01182, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4950665.34 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:21:11,240 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2023-12-23 23:21:17,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1384320.0, ans=0.0 2023-12-23 23:21:28,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1384386.6666666667, ans=10.0 2023-12-23 23:21:50,601 INFO [train.py:886] (0/4) Epoch 44, batch 2750, loss[loss=0.01117, audio_tagging_loss=0.01117, over 25000.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4959332.53 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 64.0 2023-12-23 23:21:53,386 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.502e+01 3.801e+01 3.957e+01 4.121e+01 4.471e+01, threshold=7.914e+01, percent-clipped=0.0 2023-12-23 23:22:10,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1384720.0, ans=0.125 2023-12-23 23:22:18,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1384720.0, ans=0.125 2023-12-23 23:22:25,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2023-12-23 23:22:29,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.04 vs. limit=6.0 2023-12-23 23:22:39,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=12.0 2023-12-23 23:22:42,922 INFO [train.py:886] (0/4) Epoch 44, batch 2800, loss[loss=0.01016, audio_tagging_loss=0.01016, over 24750.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4949555.39 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 64.0 2023-12-23 23:22:46,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1384920.0, ans=0.0 2023-12-23 23:22:50,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1384920.0, ans=0.125 2023-12-23 23:23:10,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1385053.3333333333, ans=0.0 2023-12-23 23:23:13,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1385120.0, ans=0.125 2023-12-23 23:23:17,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-12-23 23:23:19,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1385120.0, ans=0.95 2023-12-23 23:23:35,053 INFO [train.py:886] (0/4) Epoch 44, batch 2850, loss[loss=0.008999, audio_tagging_loss=0.008999, over 22539.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4942426.55 frames. ], batch size: 107, lr: 2.43e-03, grad_scale: 64.0 2023-12-23 23:23:37,930 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.556e+01 3.856e+01 3.997e+01 4.163e+01 4.618e+01, threshold=7.995e+01, percent-clipped=0.0 2023-12-23 23:23:42,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.32 vs. limit=6.0 2023-12-23 23:23:51,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1385320.0, ans=0.1 2023-12-23 23:24:05,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1385453.3333333333, ans=0.125 2023-12-23 23:24:12,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1385453.3333333333, ans=0.1 2023-12-23 23:24:22,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1385520.0, ans=0.0 2023-12-23 23:24:26,125 INFO [train.py:886] (0/4) Epoch 44, batch 2900, loss[loss=0.009106, audio_tagging_loss=0.009106, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4939436.50 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:25:00,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1385786.6666666667, ans=0.0 2023-12-23 23:25:02,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1385786.6666666667, ans=0.125 2023-12-23 23:25:08,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1385853.3333333333, ans=0.125 2023-12-23 23:25:10,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1385853.3333333333, ans=0.125 2023-12-23 23:25:18,700 INFO [train.py:886] (0/4) Epoch 44, batch 2950, loss[loss=0.01375, audio_tagging_loss=0.01375, over 25000.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4939042.64 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:25:18,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1385920.0, ans=0.0 2023-12-23 23:25:22,472 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.528e+01 3.797e+01 3.995e+01 4.163e+01 7.263e+01, threshold=7.990e+01, percent-clipped=0.0 2023-12-23 23:25:38,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1386053.3333333333, ans=0.1 2023-12-23 23:26:05,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1386186.6666666667, ans=0.0 2023-12-23 23:26:07,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2023-12-23 23:26:10,140 INFO [train.py:886] (0/4) Epoch 44, batch 3000, loss[loss=0.01092, audio_tagging_loss=0.01092, over 25000.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4944716.93 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:26:10,141 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 23:26:32,317 INFO [train.py:917] (0/4) Epoch 44, validation: loss=0.03602, audio_tagging_loss=0.03602, over 3737520.00 frames. 2023-12-23 23:26:32,318 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 23:26:39,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1386253.3333333333, ans=0.0 2023-12-23 23:26:45,927 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:26:46,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1386320.0, ans=0.125 2023-12-23 23:26:48,823 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2023-12-23 23:27:02,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1386453.3333333333, ans=0.125 2023-12-23 23:27:13,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1386520.0, ans=0.1 2023-12-23 23:27:13,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1386520.0, ans=0.125 2023-12-23 23:27:15,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1386520.0, ans=0.0 2023-12-23 23:27:24,344 INFO [train.py:886] (0/4) Epoch 44, batch 3050, loss[loss=0.0104, audio_tagging_loss=0.0104, over 25000.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4951284.36 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:27:28,171 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.485e+01 3.888e+01 4.021e+01 4.196e+01 4.723e+01, threshold=8.042e+01, percent-clipped=0.0 2023-12-23 23:27:35,034 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-208000.pt 2023-12-23 23:28:09,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1386853.3333333333, ans=0.125 2023-12-23 23:28:11,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1386853.3333333333, ans=0.07 2023-12-23 23:28:16,891 INFO [train.py:886] (0/4) Epoch 44, batch 3100, loss[loss=0.007942, audio_tagging_loss=0.007942, over 24750.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4950535.19 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:28:20,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1386920.0, ans=0.1 2023-12-23 23:28:29,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1386986.6666666667, ans=0.1 2023-12-23 23:28:30,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1386986.6666666667, ans=0.125 2023-12-23 23:28:54,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1387120.0, ans=0.0 2023-12-23 23:28:55,271 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2023-12-23 23:29:02,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1387186.6666666667, ans=0.2 2023-12-23 23:29:08,880 INFO [train.py:886] (0/4) Epoch 44, batch 3150, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4947919.51 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:29:12,625 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.491e+01 3.896e+01 4.085e+01 4.176e+01 5.009e+01, threshold=8.169e+01, percent-clipped=0.0 2023-12-23 23:29:15,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1387253.3333333333, ans=0.0 2023-12-23 23:29:17,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1387253.3333333333, ans=0.09899494936611666 2023-12-23 23:29:52,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1387520.0, ans=0.125 2023-12-23 23:30:01,888 INFO [train.py:886] (0/4) Epoch 44, batch 3200, loss[loss=0.01244, audio_tagging_loss=0.01244, over 22030.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4939098.00 frames. ], batch size: 107, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:30:05,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1387586.6666666667, ans=0.04949747468305833 2023-12-23 23:30:12,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1387653.3333333333, ans=0.125 2023-12-23 23:30:30,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1387720.0, ans=0.2 2023-12-23 23:30:46,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1387853.3333333333, ans=10.0 2023-12-23 23:30:53,190 INFO [train.py:886] (0/4) Epoch 44, batch 3250, loss[loss=0.008901, audio_tagging_loss=0.008901, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4941499.89 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:30:57,012 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.426e+01 3.832e+01 3.942e+01 4.111e+01 4.749e+01, threshold=7.885e+01, percent-clipped=0.0 2023-12-23 23:31:08,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1387986.6666666667, ans=0.0 2023-12-23 23:31:44,706 INFO [train.py:886] (0/4) Epoch 44, batch 3300, loss[loss=0.01051, audio_tagging_loss=0.01051, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4939387.05 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:31:45,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1388253.3333333333, ans=0.1 2023-12-23 23:31:49,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1388253.3333333333, ans=0.2 2023-12-23 23:31:50,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1388253.3333333333, ans=0.125 2023-12-23 23:31:58,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.70 vs. limit=10.0 2023-12-23 23:32:09,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1388386.6666666667, ans=0.125 2023-12-23 23:32:13,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1388386.6666666667, ans=0.0 2023-12-23 23:32:15,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1388453.3333333333, ans=0.2 2023-12-23 23:32:17,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2023-12-23 23:32:33,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1388520.0, ans=0.95 2023-12-23 23:32:35,943 INFO [train.py:886] (0/4) Epoch 44, batch 3350, loss[loss=0.01048, audio_tagging_loss=0.01048, over 25000.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4946540.33 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:32:39,737 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.350e+01 3.780e+01 3.970e+01 4.173e+01 4.809e+01, threshold=7.941e+01, percent-clipped=0.0 2023-12-23 23:32:43,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1388586.6666666667, ans=0.125 2023-12-23 23:32:57,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1388720.0, ans=0.125 2023-12-23 23:33:01,841 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.25 vs. limit=22.5 2023-12-23 23:33:16,407 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:33:21,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1388853.3333333333, ans=6.0 2023-12-23 23:33:28,108 INFO [train.py:886] (0/4) Epoch 44, batch 3400, loss[loss=0.01122, audio_tagging_loss=0.01122, over 25000.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4953838.71 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:33:36,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1388920.0, ans=0.0 2023-12-23 23:33:46,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1388986.6666666667, ans=0.125 2023-12-23 23:33:46,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1388986.6666666667, ans=0.125 2023-12-23 23:33:55,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1389053.3333333333, ans=0.0 2023-12-23 23:33:59,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1389120.0, ans=0.2 2023-12-23 23:34:20,507 INFO [train.py:886] (0/4) Epoch 44, batch 3450, loss[loss=0.01064, audio_tagging_loss=0.01064, over 24750.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4953061.78 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:34:23,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1389253.3333333333, ans=0.0 2023-12-23 23:34:24,937 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.641e+01 3.949e+01 4.077e+01 4.251e+01 4.756e+01, threshold=8.154e+01, percent-clipped=0.0 2023-12-23 23:34:28,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1389253.3333333333, ans=0.2 2023-12-23 23:34:59,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1389453.3333333333, ans=0.0 2023-12-23 23:35:00,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2023-12-23 23:35:12,400 INFO [train.py:886] (0/4) Epoch 44, batch 3500, loss[loss=0.01019, audio_tagging_loss=0.01019, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4950674.62 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:35:24,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1389653.3333333333, ans=0.0 2023-12-23 23:35:29,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.17 vs. limit=6.0 2023-12-23 23:35:33,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1389720.0, ans=0.0 2023-12-23 23:35:40,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1389720.0, ans=0.125 2023-12-23 23:36:04,662 INFO [train.py:886] (0/4) Epoch 44, batch 3550, loss[loss=0.009961, audio_tagging_loss=0.009961, over 25000.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4945446.88 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:36:08,439 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.423e+01 3.841e+01 4.039e+01 4.182e+01 4.677e+01, threshold=8.078e+01, percent-clipped=0.0 2023-12-23 23:36:10,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1389920.0, ans=0.125 2023-12-23 23:36:14,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1389986.6666666667, ans=0.125 2023-12-23 23:36:15,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1389986.6666666667, ans=0.0 2023-12-23 23:36:47,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1390186.6666666667, ans=0.125 2023-12-23 23:36:56,421 INFO [train.py:886] (0/4) Epoch 44, batch 3600, loss[loss=0.01067, audio_tagging_loss=0.01067, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4949959.04 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:37:00,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1390253.3333333333, ans=0.1 2023-12-23 23:37:15,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1390320.0, ans=0.125 2023-12-23 23:37:48,515 INFO [train.py:886] (0/4) Epoch 44, batch 3650, loss[loss=0.0121, audio_tagging_loss=0.0121, over 25000.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4949258.91 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:37:52,936 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.491e+01 3.828e+01 3.967e+01 4.135e+01 4.611e+01, threshold=7.934e+01, percent-clipped=0.0 2023-12-23 23:37:56,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1390586.6666666667, ans=0.125 2023-12-23 23:38:00,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1390653.3333333333, ans=0.125 2023-12-23 23:38:10,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1390720.0, ans=0.2 2023-12-23 23:38:15,642 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.72 vs. limit=22.5 2023-12-23 23:38:18,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.24 vs. limit=22.5 2023-12-23 23:38:41,034 INFO [train.py:886] (0/4) Epoch 44, batch 3700, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4946493.03 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:39:02,110 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2023-12-23 23:39:11,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2023-12-23 23:39:14,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1391120.0, ans=0.2 2023-12-23 23:39:14,963 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.70 vs. limit=15.0 2023-12-23 23:39:19,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.52 vs. limit=15.0 2023-12-23 23:39:32,653 INFO [train.py:886] (0/4) Epoch 44, batch 3750, loss[loss=0.01162, audio_tagging_loss=0.01162, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4950718.69 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:39:37,089 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 3.871e+01 4.041e+01 4.218e+01 6.039e+01, threshold=8.081e+01, percent-clipped=0.0 2023-12-23 23:39:53,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1391386.6666666667, ans=0.2 2023-12-23 23:40:18,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1391520.0, ans=0.1 2023-12-23 23:40:24,075 INFO [train.py:886] (0/4) Epoch 44, batch 3800, loss[loss=0.009888, audio_tagging_loss=0.009888, over 24750.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4948465.91 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:40:41,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1391653.3333333333, ans=0.125 2023-12-23 23:40:46,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2023-12-23 23:41:12,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1391853.3333333333, ans=0.0 2023-12-23 23:41:16,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1391920.0, ans=0.125 2023-12-23 23:41:17,265 INFO [train.py:886] (0/4) Epoch 44, batch 3850, loss[loss=0.01044, audio_tagging_loss=0.01044, over 24750.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4946888.06 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:41:21,130 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.492e+01 3.872e+01 4.027e+01 4.207e+01 4.777e+01, threshold=8.053e+01, percent-clipped=0.0 2023-12-23 23:41:25,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1391920.0, ans=22.5 2023-12-23 23:41:30,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1391986.6666666667, ans=0.125 2023-12-23 23:41:50,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.93 vs. limit=8.0 2023-12-23 23:41:56,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1392120.0, ans=0.0 2023-12-23 23:42:09,406 INFO [train.py:886] (0/4) Epoch 44, batch 3900, loss[loss=0.01199, audio_tagging_loss=0.01199, over 24917.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4946704.33 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:42:10,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1392253.3333333333, ans=0.125 2023-12-23 23:42:22,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1392320.0, ans=0.2 2023-12-23 23:42:25,770 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2023-12-23 23:42:26,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1392320.0, ans=0.125 2023-12-23 23:42:29,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1392386.6666666667, ans=0.2 2023-12-23 23:42:30,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2023-12-23 23:42:36,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1392386.6666666667, ans=0.125 2023-12-23 23:43:01,303 INFO [train.py:886] (0/4) Epoch 44, batch 3950, loss[loss=0.009962, audio_tagging_loss=0.009962, over 25000.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4953307.45 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:43:05,141 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.463e+01 3.857e+01 4.007e+01 4.216e+01 4.773e+01, threshold=8.014e+01, percent-clipped=0.0 2023-12-23 23:43:10,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1392653.3333333333, ans=0.0 2023-12-23 23:43:16,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1392653.3333333333, ans=0.125 2023-12-23 23:43:38,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1392786.6666666667, ans=0.1 2023-12-23 23:43:48,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1392853.3333333333, ans=0.125 2023-12-23 23:43:53,655 INFO [train.py:886] (0/4) Epoch 44, batch 4000, loss[loss=0.01319, audio_tagging_loss=0.01319, over 25000.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4960187.98 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:44:01,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1392920.0, ans=0.0 2023-12-23 23:44:04,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.48 vs. limit=12.0 2023-12-23 23:44:08,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1392986.6666666667, ans=0.2 2023-12-23 23:44:21,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1393053.3333333333, ans=0.125 2023-12-23 23:44:30,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1393120.0, ans=0.0 2023-12-23 23:44:34,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1393186.6666666667, ans=0.125 2023-12-23 23:44:38,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1393186.6666666667, ans=0.0 2023-12-23 23:44:39,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1393186.6666666667, ans=0.09899494936611666 2023-12-23 23:44:44,864 INFO [train.py:886] (0/4) Epoch 44, batch 4050, loss[loss=0.0116, audio_tagging_loss=0.0116, over 25000.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4947054.98 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:44:46,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1393253.3333333333, ans=0.125 2023-12-23 23:44:48,645 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.503e+01 3.797e+01 4.007e+01 4.202e+01 4.983e+01, threshold=8.013e+01, percent-clipped=0.0 2023-12-23 23:45:00,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1393320.0, ans=0.1 2023-12-23 23:45:24,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1393453.3333333333, ans=0.0 2023-12-23 23:45:25,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1393520.0, ans=0.2 2023-12-23 23:45:37,346 INFO [train.py:886] (0/4) Epoch 44, batch 4100, loss[loss=0.01153, audio_tagging_loss=0.01153, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4946152.48 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:45:42,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.89 vs. limit=15.0 2023-12-23 23:46:07,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1393720.0, ans=0.1 2023-12-23 23:46:14,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1393786.6666666667, ans=0.1 2023-12-23 23:46:29,084 INFO [train.py:886] (0/4) Epoch 44, batch 4150, loss[loss=0.009779, audio_tagging_loss=0.009779, over 22253.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4937698.41 frames. ], batch size: 107, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:46:33,522 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.380e+01 3.908e+01 4.057e+01 4.233e+01 4.763e+01, threshold=8.114e+01, percent-clipped=0.0 2023-12-23 23:46:38,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1393920.0, ans=0.0 2023-12-23 23:46:42,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1393986.6666666667, ans=0.125 2023-12-23 23:46:43,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1393986.6666666667, ans=22.5 2023-12-23 23:46:47,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1393986.6666666667, ans=0.2 2023-12-23 23:46:58,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1394120.0, ans=0.035 2023-12-23 23:47:08,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1394120.0, ans=0.07 2023-12-23 23:47:20,978 INFO [train.py:886] (0/4) Epoch 44, batch 4200, loss[loss=0.01125, audio_tagging_loss=0.01125, over 24750.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4939747.05 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:47:29,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1394320.0, ans=0.125 2023-12-23 23:47:36,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1394320.0, ans=0.125 2023-12-23 23:47:39,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1394320.0, ans=0.2 2023-12-23 23:47:39,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.14 vs. limit=22.5 2023-12-23 23:48:01,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1394520.0, ans=0.125 2023-12-23 23:48:04,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2023-12-23 23:48:05,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1394520.0, ans=0.0 2023-12-23 23:48:10,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1394520.0, ans=0.125 2023-12-23 23:48:13,013 INFO [train.py:886] (0/4) Epoch 44, batch 4250, loss[loss=0.0104, audio_tagging_loss=0.0104, over 25000.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4939253.47 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:48:13,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.46 vs. limit=10.0 2023-12-23 23:48:17,456 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.360e+01 3.794e+01 3.945e+01 4.179e+01 4.749e+01, threshold=7.890e+01, percent-clipped=0.0 2023-12-23 23:48:25,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=22.5 2023-12-23 23:48:35,874 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-12-23 23:48:38,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.34 vs. limit=15.0 2023-12-23 23:48:47,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1394786.6666666667, ans=0.0 2023-12-23 23:48:48,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1394786.6666666667, ans=0.0 2023-12-23 23:48:51,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1394786.6666666667, ans=0.0 2023-12-23 23:49:04,137 INFO [train.py:886] (0/4) Epoch 44, batch 4300, loss[loss=0.01269, audio_tagging_loss=0.01269, over 24750.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4941086.78 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:49:08,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1394920.0, ans=0.0 2023-12-23 23:49:23,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1394986.6666666667, ans=0.0 2023-12-23 23:49:34,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1395053.3333333333, ans=0.125 2023-12-23 23:49:57,526 INFO [train.py:886] (0/4) Epoch 44, batch 4350, loss[loss=0.008959, audio_tagging_loss=0.008959, over 24750.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4949910.93 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:50:01,297 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.531e+01 3.887e+01 4.029e+01 4.199e+01 5.257e+01, threshold=8.057e+01, percent-clipped=0.0 2023-12-23 23:50:13,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1395320.0, ans=0.2 2023-12-23 23:50:16,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2023-12-23 23:50:18,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1395386.6666666667, ans=0.125 2023-12-23 23:50:42,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1395520.0, ans=0.0 2023-12-23 23:50:49,089 INFO [train.py:886] (0/4) Epoch 44, batch 4400, loss[loss=0.01276, audio_tagging_loss=0.01276, over 24750.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4949347.83 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:51:06,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1395653.3333333333, ans=0.2 2023-12-23 23:51:15,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1395720.0, ans=0.0 2023-12-23 23:51:31,756 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:51:36,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1395853.3333333333, ans=0.0 2023-12-23 23:51:40,098 INFO [train.py:886] (0/4) Epoch 44, batch 4450, loss[loss=0.01046, audio_tagging_loss=0.01046, over 24750.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4949043.02 frames. ], batch size: 99, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:51:40,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1395920.0, ans=0.0 2023-12-23 23:51:43,922 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.489e+01 3.885e+01 4.023e+01 4.248e+01 5.191e+01, threshold=8.046e+01, percent-clipped=0.0 2023-12-23 23:51:44,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1395920.0, ans=0.125 2023-12-23 23:51:49,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1395986.6666666667, ans=0.0 2023-12-23 23:52:18,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.36 vs. limit=22.5 2023-12-23 23:52:33,687 INFO [train.py:886] (0/4) Epoch 44, batch 4500, loss[loss=0.01108, audio_tagging_loss=0.01108, over 25000.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4943488.09 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:52:34,127 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2023-12-23 23:52:50,088 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:52:52,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1396386.6666666667, ans=0.1 2023-12-23 23:53:12,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1396453.3333333333, ans=0.1 2023-12-23 23:53:15,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2023-12-23 23:53:18,561 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:53:20,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1396520.0, ans=0.125 2023-12-23 23:53:24,798 INFO [train.py:886] (0/4) Epoch 44, batch 4550, loss[loss=0.009815, audio_tagging_loss=0.009815, over 25000.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4946003.40 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:53:28,509 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.460e+01 3.833e+01 3.993e+01 4.205e+01 5.726e+01, threshold=7.986e+01, percent-clipped=0.0 2023-12-23 23:53:38,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1396653.3333333333, ans=0.125 2023-12-23 23:54:08,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1396853.3333333333, ans=0.125 2023-12-23 23:54:16,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.47 vs. limit=12.0 2023-12-23 23:54:17,173 INFO [train.py:886] (0/4) Epoch 44, batch 4600, loss[loss=0.01155, audio_tagging_loss=0.01155, over 25000.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4949459.07 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:54:52,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1397120.0, ans=0.0 2023-12-23 23:54:57,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1397186.6666666667, ans=0.0 2023-12-23 23:55:07,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=22.5 2023-12-23 23:55:08,879 INFO [train.py:886] (0/4) Epoch 44, batch 4650, loss[loss=0.01197, audio_tagging_loss=0.01197, over 25000.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4945663.04 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:55:13,397 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.459e+01 3.838e+01 4.030e+01 4.199e+01 4.777e+01, threshold=8.060e+01, percent-clipped=0.0 2023-12-23 23:55:22,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.57 vs. limit=15.0 2023-12-23 23:55:23,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1397320.0, ans=0.125 2023-12-23 23:55:27,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.55 vs. limit=12.0 2023-12-23 23:55:27,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.65 vs. limit=15.0 2023-12-23 23:55:29,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1397386.6666666667, ans=0.0 2023-12-23 23:55:36,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1397386.6666666667, ans=0.2 2023-12-23 23:55:46,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1397453.3333333333, ans=0.125 2023-12-23 23:56:00,026 INFO [train.py:886] (0/4) Epoch 44, batch 4700, loss[loss=0.009624, audio_tagging_loss=0.009624, over 24750.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4942826.68 frames. ], batch size: 99, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:56:12,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1397653.3333333333, ans=0.125 2023-12-23 23:56:22,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1397720.0, ans=0.1 2023-12-23 23:56:35,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2023-12-23 23:56:37,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1397853.3333333333, ans=0.1 2023-12-23 23:56:44,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1397853.3333333333, ans=0.125 2023-12-23 23:56:46,657 INFO [train.py:886] (0/4) Epoch 44, batch 4750, loss[loss=0.01058, audio_tagging_loss=0.01058, over 24750.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4940342.89 frames. ], batch size: 99, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:56:49,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=12.0 2023-12-23 23:56:50,271 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.598e+01 3.838e+01 4.057e+01 4.238e+01 5.270e+01, threshold=8.115e+01, percent-clipped=0.0 2023-12-23 23:56:56,240 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2023-12-23 23:57:01,858 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-44.pt 2023-12-23 23:57:22,210 INFO [train.py:886] (0/4) Epoch 45, batch 0, loss[loss=0.02412, audio_tagging_loss=0.02412, over 23993.00 frames. ], tot_loss[loss=0.02412, audio_tagging_loss=0.02412, over 23993.00 frames. ], batch size: 100, lr: 2.40e-03, grad_scale: 32.0 2023-12-23 23:57:22,212 INFO [train.py:909] (0/4) Computing validation loss 2023-12-23 23:57:43,160 INFO [train.py:917] (0/4) Epoch 45, validation: loss=0.03554, audio_tagging_loss=0.03554, over 3737520.00 frames. 2023-12-23 23:57:43,160 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-23 23:58:10,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1398160.0, ans=0.0 2023-12-23 23:58:14,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1398226.6666666667, ans=0.0 2023-12-23 23:58:21,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1398226.6666666667, ans=0.0 2023-12-23 23:58:31,008 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:58:33,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.46 vs. limit=12.0 2023-12-23 23:58:33,600 INFO [train.py:886] (0/4) Epoch 45, batch 50, loss[loss=0.0156, audio_tagging_loss=0.0156, over 25000.00 frames. ], tot_loss[loss=0.01805, audio_tagging_loss=0.01805, over 1120140.45 frames. ], batch size: 100, lr: 2.40e-03, grad_scale: 16.0 2023-12-23 23:58:39,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1398360.0, ans=0.125 2023-12-23 23:58:51,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1398426.6666666667, ans=10.0 2023-12-23 23:58:54,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1398493.3333333333, ans=0.025 2023-12-23 23:58:59,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1398493.3333333333, ans=0.04949747468305833 2023-12-23 23:59:04,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1398560.0, ans=0.125 2023-12-23 23:59:11,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1398560.0, ans=0.2 2023-12-23 23:59:14,331 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.685e+01 4.411e+01 4.844e+01 5.631e+01 1.112e+02, threshold=9.688e+01, percent-clipped=7.0 2023-12-23 23:59:22,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1398626.6666666667, ans=0.125 2023-12-23 23:59:24,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=1398626.6666666667, ans=15.0 2023-12-23 23:59:26,356 INFO [train.py:886] (0/4) Epoch 45, batch 100, loss[loss=0.01315, audio_tagging_loss=0.01315, over 25000.00 frames. ], tot_loss[loss=0.01548, audio_tagging_loss=0.01548, over 1974651.04 frames. ], batch size: 100, lr: 2.40e-03, grad_scale: 16.0 2023-12-23 23:59:30,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=12.0 2023-12-23 23:59:41,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1398760.0, ans=0.0 2023-12-23 23:59:58,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1398893.3333333333, ans=0.125 2023-12-23 23:59:59,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1398893.3333333333, ans=0.1 2023-12-24 00:00:11,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1398960.0, ans=0.0 2023-12-24 00:00:14,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=15.0 2023-12-24 00:00:14,824 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:00:18,181 INFO [train.py:886] (0/4) Epoch 45, batch 150, loss[loss=0.0135, audio_tagging_loss=0.0135, over 25000.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 2642449.06 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 16.0 2023-12-24 00:00:20,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.48 vs. limit=8.0 2023-12-24 00:00:22,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1399026.6666666667, ans=0.125 2023-12-24 00:00:34,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1399093.3333333333, ans=10.0 2023-12-24 00:00:52,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1399226.6666666667, ans=0.2 2023-12-24 00:00:59,191 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.571e+01 3.963e+01 4.110e+01 4.348e+01 5.500e+01, threshold=8.220e+01, percent-clipped=0.0 2023-12-24 00:01:09,651 INFO [train.py:886] (0/4) Epoch 45, batch 200, loss[loss=0.01467, audio_tagging_loss=0.01467, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 3155394.92 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 16.0 2023-12-24 00:01:15,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1399360.0, ans=0.0 2023-12-24 00:01:46,745 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:02:02,098 INFO [train.py:886] (0/4) Epoch 45, batch 250, loss[loss=0.01471, audio_tagging_loss=0.01471, over 24943.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 3554436.93 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 16.0 2023-12-24 00:02:14,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1399760.0, ans=0.125 2023-12-24 00:02:23,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2023-12-24 00:02:27,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1399826.6666666667, ans=0.0 2023-12-24 00:02:42,532 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.460e+01 3.868e+01 4.040e+01 4.212e+01 5.003e+01, threshold=8.081e+01, percent-clipped=0.0 2023-12-24 00:02:53,881 INFO [train.py:886] (0/4) Epoch 45, batch 300, loss[loss=0.01221, audio_tagging_loss=0.01221, over 24750.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 3855346.12 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 16.0 2023-12-24 00:03:02,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1400026.6666666667, ans=0.025 2023-12-24 00:03:04,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1400093.3333333333, ans=0.1 2023-12-24 00:03:07,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1400093.3333333333, ans=0.2 2023-12-24 00:03:21,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1400160.0, ans=0.1 2023-12-24 00:03:25,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1400226.6666666667, ans=0.125 2023-12-24 00:03:46,174 INFO [train.py:886] (0/4) Epoch 45, batch 350, loss[loss=0.009593, audio_tagging_loss=0.009593, over 22554.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4093561.36 frames. ], batch size: 107, lr: 2.39e-03, grad_scale: 16.0 2023-12-24 00:03:46,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1400360.0, ans=0.09899494936611666 2023-12-24 00:03:55,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1400426.6666666667, ans=0.0 2023-12-24 00:03:58,921 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.29 vs. limit=22.5 2023-12-24 00:04:11,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1400493.3333333333, ans=0.125 2023-12-24 00:04:14,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1400493.3333333333, ans=0.1 2023-12-24 00:04:16,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1400560.0, ans=0.0 2023-12-24 00:04:19,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1400560.0, ans=0.0 2023-12-24 00:04:24,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1400560.0, ans=0.0 2023-12-24 00:04:25,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1400560.0, ans=0.125 2023-12-24 00:04:25,970 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.453e+01 3.862e+01 4.002e+01 4.189e+01 4.449e+01, threshold=8.005e+01, percent-clipped=0.0 2023-12-24 00:04:26,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1400626.6666666667, ans=0.1 2023-12-24 00:04:26,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.44 vs. limit=10.0 2023-12-24 00:04:30,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1400626.6666666667, ans=0.1 2023-12-24 00:04:30,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1400626.6666666667, ans=0.125 2023-12-24 00:04:37,802 INFO [train.py:886] (0/4) Epoch 45, batch 400, loss[loss=0.01019, audio_tagging_loss=0.01019, over 24750.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4279982.92 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:04:49,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1400760.0, ans=0.0 2023-12-24 00:05:22,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1400960.0, ans=0.0 2023-12-24 00:05:25,404 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2023-12-24 00:05:28,530 INFO [train.py:886] (0/4) Epoch 45, batch 450, loss[loss=0.01004, audio_tagging_loss=0.01004, over 25000.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4430861.13 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:05:28,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1401026.6666666667, ans=0.2 2023-12-24 00:05:48,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1401093.3333333333, ans=0.0 2023-12-24 00:05:49,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1401160.0, ans=0.125 2023-12-24 00:05:50,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.42 vs. limit=6.0 2023-12-24 00:05:53,511 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.36 vs. limit=15.0 2023-12-24 00:05:57,151 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=22.5 2023-12-24 00:06:08,953 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.455e+01 3.831e+01 3.994e+01 4.191e+01 6.478e+01, threshold=7.987e+01, percent-clipped=0.0 2023-12-24 00:06:12,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2023-12-24 00:06:18,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1401293.3333333333, ans=0.125 2023-12-24 00:06:21,059 INFO [train.py:886] (0/4) Epoch 45, batch 500, loss[loss=0.01037, audio_tagging_loss=0.01037, over 25000.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4553594.46 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:06:46,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1401493.3333333333, ans=0.2 2023-12-24 00:06:48,608 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-12-24 00:07:10,609 INFO [train.py:886] (0/4) Epoch 45, batch 550, loss[loss=0.01228, audio_tagging_loss=0.01228, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4647715.81 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:07:14,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1401693.3333333333, ans=0.1 2023-12-24 00:07:21,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1401760.0, ans=0.1 2023-12-24 00:07:21,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.43 vs. limit=15.0 2023-12-24 00:07:24,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=15.0 2023-12-24 00:07:31,602 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-12-24 00:07:41,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1401893.3333333333, ans=0.1 2023-12-24 00:07:47,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1401893.3333333333, ans=0.125 2023-12-24 00:07:50,485 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.468e+01 3.862e+01 4.012e+01 4.255e+01 6.507e+01, threshold=8.024e+01, percent-clipped=0.0 2023-12-24 00:07:53,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.45 vs. limit=15.0 2023-12-24 00:07:55,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1401960.0, ans=0.2 2023-12-24 00:08:01,735 INFO [train.py:886] (0/4) Epoch 45, batch 600, loss[loss=0.01085, audio_tagging_loss=0.01085, over 25000.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4718010.37 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:08:10,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1402093.3333333333, ans=0.125 2023-12-24 00:08:31,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1402160.0, ans=0.125 2023-12-24 00:08:35,237 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:08:41,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1402293.3333333333, ans=0.0 2023-12-24 00:08:47,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2023-12-24 00:08:54,262 INFO [train.py:886] (0/4) Epoch 45, batch 650, loss[loss=0.01014, audio_tagging_loss=0.01014, over 24750.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4762267.21 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:08:54,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1402360.0, ans=0.125 2023-12-24 00:09:06,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1402426.6666666667, ans=0.07 2023-12-24 00:09:17,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1402493.3333333333, ans=0.0 2023-12-24 00:09:21,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1402493.3333333333, ans=0.2 2023-12-24 00:09:23,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1402560.0, ans=0.125 2023-12-24 00:09:34,586 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.523e+01 3.819e+01 4.030e+01 4.235e+01 4.740e+01, threshold=8.060e+01, percent-clipped=0.0 2023-12-24 00:09:42,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1402626.6666666667, ans=0.125 2023-12-24 00:09:45,208 INFO [train.py:886] (0/4) Epoch 45, batch 700, loss[loss=0.01113, audio_tagging_loss=0.01113, over 25000.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4802438.21 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:09:45,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=12.0 2023-12-24 00:09:59,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.52 vs. limit=10.0 2023-12-24 00:10:16,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1402893.3333333333, ans=0.0 2023-12-24 00:10:16,619 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2023-12-24 00:10:22,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1402893.3333333333, ans=0.2 2023-12-24 00:10:24,155 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.74 vs. limit=6.0 2023-12-24 00:10:31,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1402960.0, ans=0.0 2023-12-24 00:10:37,461 INFO [train.py:886] (0/4) Epoch 45, batch 750, loss[loss=0.0104, audio_tagging_loss=0.0104, over 21021.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4825423.33 frames. ], batch size: 107, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:10:49,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1403093.3333333333, ans=0.2 2023-12-24 00:11:15,686 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.495e+01 3.861e+01 4.093e+01 4.221e+01 4.879e+01, threshold=8.187e+01, percent-clipped=0.0 2023-12-24 00:11:26,757 INFO [train.py:886] (0/4) Epoch 45, batch 800, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4856398.88 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:11:33,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2023-12-24 00:11:35,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1403360.0, ans=0.125 2023-12-24 00:11:58,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1403560.0, ans=0.125 2023-12-24 00:12:09,742 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2023-12-24 00:12:13,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2023-12-24 00:12:15,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1403626.6666666667, ans=0.1 2023-12-24 00:12:18,619 INFO [train.py:886] (0/4) Epoch 45, batch 850, loss[loss=0.0105, audio_tagging_loss=0.0105, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4881924.87 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:12:39,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1403826.6666666667, ans=0.125 2023-12-24 00:12:43,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1403826.6666666667, ans=0.1 2023-12-24 00:12:44,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1403826.6666666667, ans=0.125 2023-12-24 00:12:54,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1403893.3333333333, ans=0.0 2023-12-24 00:12:58,728 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.415e+01 3.861e+01 4.035e+01 4.222e+01 4.797e+01, threshold=8.070e+01, percent-clipped=0.0 2023-12-24 00:13:06,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1403960.0, ans=0.125 2023-12-24 00:13:10,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1404026.6666666667, ans=0.125 2023-12-24 00:13:11,517 INFO [train.py:886] (0/4) Epoch 45, batch 900, loss[loss=0.009689, audio_tagging_loss=0.009689, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4894693.08 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:13:32,647 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.28 vs. limit=6.0 2023-12-24 00:13:34,153 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:13:49,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.55 vs. limit=15.0 2023-12-24 00:14:02,249 INFO [train.py:886] (0/4) Epoch 45, batch 950, loss[loss=0.00733, audio_tagging_loss=0.00733, over 23965.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4901887.21 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:14:14,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1404426.6666666667, ans=0.2 2023-12-24 00:14:43,648 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.636e+01 3.902e+01 4.041e+01 4.266e+01 4.782e+01, threshold=8.081e+01, percent-clipped=0.0 2023-12-24 00:14:49,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1404626.6666666667, ans=0.125 2023-12-24 00:14:50,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.75 vs. limit=12.0 2023-12-24 00:14:51,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1404626.6666666667, ans=0.125 2023-12-24 00:14:52,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1404626.6666666667, ans=0.0 2023-12-24 00:14:53,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1404626.6666666667, ans=0.125 2023-12-24 00:14:54,680 INFO [train.py:886] (0/4) Epoch 45, batch 1000, loss[loss=0.01106, audio_tagging_loss=0.01106, over 24750.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4903327.31 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:14:57,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1404693.3333333333, ans=0.1 2023-12-24 00:15:03,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1404760.0, ans=0.125 2023-12-24 00:15:04,725 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.67 vs. limit=10.0 2023-12-24 00:15:45,805 INFO [train.py:886] (0/4) Epoch 45, batch 1050, loss[loss=0.01072, audio_tagging_loss=0.01072, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4912567.93 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:15:47,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1405026.6666666667, ans=0.125 2023-12-24 00:15:58,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1405093.3333333333, ans=0.125 2023-12-24 00:16:12,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.38 vs. limit=15.0 2023-12-24 00:16:20,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1405226.6666666667, ans=0.125 2023-12-24 00:16:24,807 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.564e+01 3.844e+01 4.005e+01 4.224e+01 5.202e+01, threshold=8.010e+01, percent-clipped=0.0 2023-12-24 00:16:33,529 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:16:36,163 INFO [train.py:886] (0/4) Epoch 45, batch 1100, loss[loss=0.01053, audio_tagging_loss=0.01053, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4914364.31 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:16:47,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1405426.6666666667, ans=0.125 2023-12-24 00:16:50,316 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.30 vs. limit=15.0 2023-12-24 00:17:01,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1405493.3333333333, ans=0.0 2023-12-24 00:17:02,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1405493.3333333333, ans=0.1 2023-12-24 00:17:09,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1405560.0, ans=0.125 2023-12-24 00:17:09,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2023-12-24 00:17:15,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1405626.6666666667, ans=0.0 2023-12-24 00:17:17,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1405626.6666666667, ans=0.125 2023-12-24 00:17:17,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1405626.6666666667, ans=0.125 2023-12-24 00:17:22,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1405626.6666666667, ans=0.0 2023-12-24 00:17:27,000 INFO [train.py:886] (0/4) Epoch 45, batch 1150, loss[loss=0.01075, audio_tagging_loss=0.01075, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4928349.31 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:17:42,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1405760.0, ans=0.125 2023-12-24 00:17:44,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1405760.0, ans=0.1 2023-12-24 00:17:47,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1405826.6666666667, ans=0.125 2023-12-24 00:17:58,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1405893.3333333333, ans=0.125 2023-12-24 00:18:05,977 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.457e+01 3.770e+01 3.983e+01 4.144e+01 4.747e+01, threshold=7.965e+01, percent-clipped=0.0 2023-12-24 00:18:11,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.75 vs. limit=8.0 2023-12-24 00:18:15,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.38 vs. limit=10.0 2023-12-24 00:18:17,353 INFO [train.py:886] (0/4) Epoch 45, batch 1200, loss[loss=0.01205, audio_tagging_loss=0.01205, over 25000.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4939976.57 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:18:17,499 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:18:33,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1406093.3333333333, ans=0.125 2023-12-24 00:18:35,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1406093.3333333333, ans=0.0 2023-12-24 00:19:03,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1406293.3333333333, ans=0.1 2023-12-24 00:19:05,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1406293.3333333333, ans=0.125 2023-12-24 00:19:09,375 INFO [train.py:886] (0/4) Epoch 45, batch 1250, loss[loss=0.008822, audio_tagging_loss=0.008822, over 24750.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4935890.17 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:19:22,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2023-12-24 00:19:49,847 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.464e+01 3.882e+01 4.077e+01 4.282e+01 6.825e+01, threshold=8.153e+01, percent-clipped=0.0 2023-12-24 00:20:02,546 INFO [train.py:886] (0/4) Epoch 45, batch 1300, loss[loss=0.008993, audio_tagging_loss=0.008993, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4940728.95 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:20:10,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1406693.3333333333, ans=0.2 2023-12-24 00:20:25,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1406826.6666666667, ans=0.09899494936611666 2023-12-24 00:20:29,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1406826.6666666667, ans=0.0 2023-12-24 00:20:39,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1406893.3333333333, ans=0.0 2023-12-24 00:20:48,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.46 vs. limit=15.0 2023-12-24 00:20:53,372 INFO [train.py:886] (0/4) Epoch 45, batch 1350, loss[loss=0.01028, audio_tagging_loss=0.01028, over 25000.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4945758.89 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:21:08,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1407093.3333333333, ans=0.125 2023-12-24 00:21:17,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2023-12-24 00:21:22,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1407160.0, ans=0.125 2023-12-24 00:21:34,832 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.481e+01 3.816e+01 3.963e+01 4.132e+01 5.053e+01, threshold=7.926e+01, percent-clipped=0.0 2023-12-24 00:21:45,960 INFO [train.py:886] (0/4) Epoch 45, batch 1400, loss[loss=0.01024, audio_tagging_loss=0.01024, over 25000.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4952629.08 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:21:55,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1407426.6666666667, ans=0.2 2023-12-24 00:22:09,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1407493.3333333333, ans=0.0 2023-12-24 00:22:29,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1407626.6666666667, ans=0.0 2023-12-24 00:22:38,198 INFO [train.py:886] (0/4) Epoch 45, batch 1450, loss[loss=0.01047, audio_tagging_loss=0.01047, over 24750.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4952868.74 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:22:48,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.85 vs. limit=22.5 2023-12-24 00:22:51,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1407760.0, ans=0.125 2023-12-24 00:22:56,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2023-12-24 00:22:58,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2023-12-24 00:23:07,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1407893.3333333333, ans=0.125 2023-12-24 00:23:13,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=15.0 2023-12-24 00:23:17,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1407893.3333333333, ans=0.0 2023-12-24 00:23:17,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.14 vs. limit=6.0 2023-12-24 00:23:18,723 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.497e+01 3.851e+01 3.995e+01 4.151e+01 4.657e+01, threshold=7.989e+01, percent-clipped=0.0 2023-12-24 00:23:20,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-12-24 00:23:21,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1407960.0, ans=0.0 2023-12-24 00:23:28,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1408026.6666666667, ans=0.0 2023-12-24 00:23:29,319 INFO [train.py:886] (0/4) Epoch 45, batch 1500, loss[loss=0.01003, audio_tagging_loss=0.01003, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4954200.82 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:23:39,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1408093.3333333333, ans=0.125 2023-12-24 00:23:39,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1408093.3333333333, ans=0.125 2023-12-24 00:24:22,035 INFO [train.py:886] (0/4) Epoch 45, batch 1550, loss[loss=0.01193, audio_tagging_loss=0.01193, over 24750.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4953371.77 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:24:33,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1408426.6666666667, ans=0.0 2023-12-24 00:24:36,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1408426.6666666667, ans=0.0 2023-12-24 00:24:40,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2023-12-24 00:24:49,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1408493.3333333333, ans=0.125 2023-12-24 00:24:56,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1408560.0, ans=0.1 2023-12-24 00:25:01,864 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.438e+01 3.911e+01 4.058e+01 4.249e+01 4.989e+01, threshold=8.116e+01, percent-clipped=0.0 2023-12-24 00:25:13,076 INFO [train.py:886] (0/4) Epoch 45, batch 1600, loss[loss=0.01088, audio_tagging_loss=0.01088, over 24750.00 frames. ], tot_loss[loss=0.01113, audio_tagging_loss=0.01113, over 4947145.31 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:25:16,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1408693.3333333333, ans=0.0 2023-12-24 00:25:48,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1408893.3333333333, ans=0.1 2023-12-24 00:25:57,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1408960.0, ans=0.125 2023-12-24 00:26:05,275 INFO [train.py:886] (0/4) Epoch 45, batch 1650, loss[loss=0.01125, audio_tagging_loss=0.01125, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4949317.92 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:26:24,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1409093.3333333333, ans=0.125 2023-12-24 00:26:44,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-12-24 00:26:45,326 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.532e+01 3.821e+01 4.016e+01 4.279e+01 4.895e+01, threshold=8.031e+01, percent-clipped=0.0 2023-12-24 00:26:53,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1409293.3333333333, ans=0.125 2023-12-24 00:26:58,117 INFO [train.py:886] (0/4) Epoch 45, batch 1700, loss[loss=0.01115, audio_tagging_loss=0.01115, over 25000.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4950099.17 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:27:13,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1409426.6666666667, ans=0.125 2023-12-24 00:27:38,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1409626.6666666667, ans=0.0 2023-12-24 00:27:49,110 INFO [train.py:886] (0/4) Epoch 45, batch 1750, loss[loss=0.01271, audio_tagging_loss=0.01271, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4956191.06 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:28:13,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1409826.6666666667, ans=0.0 2023-12-24 00:28:15,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1409826.6666666667, ans=0.125 2023-12-24 00:28:20,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1409893.3333333333, ans=0.125 2023-12-24 00:28:23,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1409893.3333333333, ans=0.1 2023-12-24 00:28:29,630 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.438e+01 3.800e+01 3.993e+01 4.173e+01 4.854e+01, threshold=7.987e+01, percent-clipped=0.0 2023-12-24 00:28:41,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1410026.6666666667, ans=0.125 2023-12-24 00:28:42,213 INFO [train.py:886] (0/4) Epoch 45, batch 1800, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4957963.93 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:28:52,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1410093.3333333333, ans=0.125 2023-12-24 00:28:58,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1410093.3333333333, ans=0.0 2023-12-24 00:29:04,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1410160.0, ans=0.0 2023-12-24 00:29:06,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1410160.0, ans=0.035 2023-12-24 00:29:06,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.81 vs. limit=22.5 2023-12-24 00:29:11,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1410160.0, ans=0.1 2023-12-24 00:29:13,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.13 vs. limit=6.0 2023-12-24 00:29:21,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1410226.6666666667, ans=0.0 2023-12-24 00:29:32,414 INFO [train.py:886] (0/4) Epoch 45, batch 1850, loss[loss=0.01011, audio_tagging_loss=0.01011, over 24048.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4962271.12 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:29:39,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.80 vs. limit=15.0 2023-12-24 00:29:55,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1410493.3333333333, ans=0.2 2023-12-24 00:30:14,389 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.493e+01 3.903e+01 4.080e+01 4.257e+01 5.183e+01, threshold=8.160e+01, percent-clipped=0.0 2023-12-24 00:30:24,981 INFO [train.py:886] (0/4) Epoch 45, batch 1900, loss[loss=0.01084, audio_tagging_loss=0.01084, over 24750.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4953283.36 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 32.0 2023-12-24 00:30:36,510 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:30:57,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1410893.3333333333, ans=0.125 2023-12-24 00:31:14,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1410960.0, ans=0.07 2023-12-24 00:31:16,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1411026.6666666667, ans=0.125 2023-12-24 00:31:16,972 INFO [train.py:886] (0/4) Epoch 45, batch 1950, loss[loss=0.01158, audio_tagging_loss=0.01158, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4954757.33 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 32.0 2023-12-24 00:31:21,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1411026.6666666667, ans=0.0 2023-12-24 00:31:49,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1411226.6666666667, ans=0.125 2023-12-24 00:31:56,107 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.534e+01 3.819e+01 3.949e+01 4.162e+01 4.750e+01, threshold=7.898e+01, percent-clipped=0.0 2023-12-24 00:32:06,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-12-24 00:32:06,739 INFO [train.py:886] (0/4) Epoch 45, batch 2000, loss[loss=0.008543, audio_tagging_loss=0.008543, over 25000.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4950946.48 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 32.0 2023-12-24 00:32:14,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1411360.0, ans=0.1 2023-12-24 00:32:15,082 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2023-12-24 00:32:26,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1411426.6666666667, ans=0.125 2023-12-24 00:32:34,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.97 vs. limit=6.0 2023-12-24 00:32:40,460 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.68 vs. limit=10.0 2023-12-24 00:32:52,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1411626.6666666667, ans=0.125 2023-12-24 00:32:59,701 INFO [train.py:886] (0/4) Epoch 45, batch 2050, loss[loss=0.01123, audio_tagging_loss=0.01123, over 24750.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4951626.88 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:33:30,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1411893.3333333333, ans=0.1 2023-12-24 00:33:32,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.16 vs. limit=22.5 2023-12-24 00:33:35,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1411893.3333333333, ans=0.0 2023-12-24 00:33:39,578 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.466e+01 3.826e+01 3.966e+01 4.166e+01 5.288e+01, threshold=7.932e+01, percent-clipped=0.0 2023-12-24 00:33:39,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1411960.0, ans=0.0 2023-12-24 00:33:43,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=15.0 2023-12-24 00:33:44,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1411960.0, ans=0.2 2023-12-24 00:33:51,012 INFO [train.py:886] (0/4) Epoch 45, batch 2100, loss[loss=0.01102, audio_tagging_loss=0.01102, over 25000.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4948484.61 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:33:56,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1412026.6666666667, ans=0.125 2023-12-24 00:34:13,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1412160.0, ans=0.125 2023-12-24 00:34:14,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1412160.0, ans=0.125 2023-12-24 00:34:34,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1412293.3333333333, ans=0.125 2023-12-24 00:34:43,230 INFO [train.py:886] (0/4) Epoch 45, batch 2150, loss[loss=0.01056, audio_tagging_loss=0.01056, over 24750.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4953224.00 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:34:58,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1412426.6666666667, ans=0.125 2023-12-24 00:35:10,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1412493.3333333333, ans=0.0 2023-12-24 00:35:11,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1412493.3333333333, ans=0.025 2023-12-24 00:35:11,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1412493.3333333333, ans=0.0 2023-12-24 00:35:12,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-12-24 00:35:22,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1412560.0, ans=0.0 2023-12-24 00:35:22,846 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.573e+01 3.885e+01 4.074e+01 4.292e+01 5.969e+01, threshold=8.149e+01, percent-clipped=0.0 2023-12-24 00:35:27,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1412626.6666666667, ans=0.95 2023-12-24 00:35:34,983 INFO [train.py:886] (0/4) Epoch 45, batch 2200, loss[loss=0.0123, audio_tagging_loss=0.0123, over 24750.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4950878.78 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:35:42,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1412693.3333333333, ans=0.1 2023-12-24 00:35:50,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1412760.0, ans=0.125 2023-12-24 00:35:53,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1412826.6666666667, ans=0.125 2023-12-24 00:35:56,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1412826.6666666667, ans=0.2 2023-12-24 00:36:00,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1412826.6666666667, ans=0.0 2023-12-24 00:36:03,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1412826.6666666667, ans=0.125 2023-12-24 00:36:05,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1412893.3333333333, ans=0.125 2023-12-24 00:36:17,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1412960.0, ans=0.125 2023-12-24 00:36:18,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1412960.0, ans=0.125 2023-12-24 00:36:25,233 INFO [train.py:886] (0/4) Epoch 45, batch 2250, loss[loss=0.01048, audio_tagging_loss=0.01048, over 24750.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4947090.86 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:36:33,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1413026.6666666667, ans=0.125 2023-12-24 00:36:39,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1413093.3333333333, ans=0.0 2023-12-24 00:36:59,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1413226.6666666667, ans=0.0 2023-12-24 00:37:06,937 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.413e+01 3.863e+01 4.040e+01 4.242e+01 4.611e+01, threshold=8.080e+01, percent-clipped=0.0 2023-12-24 00:37:12,969 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-212000.pt 2023-12-24 00:37:15,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1413293.3333333333, ans=0.125 2023-12-24 00:37:15,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1413293.3333333333, ans=0.0 2023-12-24 00:37:20,255 INFO [train.py:886] (0/4) Epoch 45, batch 2300, loss[loss=0.01124, audio_tagging_loss=0.01124, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4945902.47 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:37:22,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1413360.0, ans=0.125 2023-12-24 00:37:23,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1413360.0, ans=0.0 2023-12-24 00:37:30,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1413426.6666666667, ans=0.2 2023-12-24 00:37:35,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1413426.6666666667, ans=0.07 2023-12-24 00:38:01,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.58 vs. limit=15.0 2023-12-24 00:38:09,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1413626.6666666667, ans=0.125 2023-12-24 00:38:12,017 INFO [train.py:886] (0/4) Epoch 45, batch 2350, loss[loss=0.01177, audio_tagging_loss=0.01177, over 24750.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4945456.54 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:38:16,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1413693.3333333333, ans=0.0 2023-12-24 00:38:17,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1413693.3333333333, ans=0.125 2023-12-24 00:38:35,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1413826.6666666667, ans=0.125 2023-12-24 00:38:35,973 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2023-12-24 00:38:42,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.73 vs. limit=15.0 2023-12-24 00:38:44,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1413893.3333333333, ans=0.125 2023-12-24 00:38:51,849 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.613e+01 3.856e+01 3.996e+01 4.169e+01 4.627e+01, threshold=7.993e+01, percent-clipped=0.0 2023-12-24 00:38:52,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1413960.0, ans=10.0 2023-12-24 00:38:52,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1413960.0, ans=0.1 2023-12-24 00:39:02,975 INFO [train.py:886] (0/4) Epoch 45, batch 2400, loss[loss=0.009344, audio_tagging_loss=0.009344, over 25000.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4953859.79 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:39:04,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1414026.6666666667, ans=0.1 2023-12-24 00:39:09,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1414026.6666666667, ans=0.125 2023-12-24 00:39:16,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1414093.3333333333, ans=0.125 2023-12-24 00:39:28,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1414160.0, ans=0.0 2023-12-24 00:39:34,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1414226.6666666667, ans=0.0 2023-12-24 00:39:37,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1414226.6666666667, ans=0.2 2023-12-24 00:39:50,216 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2023-12-24 00:39:52,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1414293.3333333333, ans=0.125 2023-12-24 00:39:54,296 INFO [train.py:886] (0/4) Epoch 45, batch 2450, loss[loss=0.01111, audio_tagging_loss=0.01111, over 25000.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4958078.60 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:39:55,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1414360.0, ans=0.0 2023-12-24 00:39:55,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2023-12-24 00:40:03,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1414426.6666666667, ans=0.125 2023-12-24 00:40:07,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1414426.6666666667, ans=0.025 2023-12-24 00:40:14,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1414493.3333333333, ans=0.0 2023-12-24 00:40:19,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1414493.3333333333, ans=0.07 2023-12-24 00:40:33,308 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.484e+01 3.936e+01 4.079e+01 4.274e+01 6.379e+01, threshold=8.158e+01, percent-clipped=0.0 2023-12-24 00:40:38,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=12.0 2023-12-24 00:40:44,581 INFO [train.py:886] (0/4) Epoch 45, batch 2500, loss[loss=0.009455, audio_tagging_loss=0.009455, over 24750.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4959444.71 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:40:55,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1414760.0, ans=0.04949747468305833 2023-12-24 00:40:58,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.43 vs. limit=22.5 2023-12-24 00:41:02,051 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:41:07,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1414826.6666666667, ans=0.125 2023-12-24 00:41:09,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1414826.6666666667, ans=0.1 2023-12-24 00:41:28,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1414960.0, ans=0.5 2023-12-24 00:41:36,874 INFO [train.py:886] (0/4) Epoch 45, batch 2550, loss[loss=0.008863, audio_tagging_loss=0.008863, over 24750.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4952663.16 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:41:38,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1415026.6666666667, ans=0.125 2023-12-24 00:41:47,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1415093.3333333333, ans=0.09899494936611666 2023-12-24 00:41:47,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1415093.3333333333, ans=0.0 2023-12-24 00:42:01,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1415160.0, ans=0.125 2023-12-24 00:42:07,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1415226.6666666667, ans=0.125 2023-12-24 00:42:15,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1415226.6666666667, ans=0.1 2023-12-24 00:42:16,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1415226.6666666667, ans=0.0 2023-12-24 00:42:17,206 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.482e+01 3.969e+01 4.108e+01 4.307e+01 5.107e+01, threshold=8.216e+01, percent-clipped=0.0 2023-12-24 00:42:29,761 INFO [train.py:886] (0/4) Epoch 45, batch 2600, loss[loss=0.012, audio_tagging_loss=0.012, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4944497.92 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:42:38,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1415426.6666666667, ans=0.04949747468305833 2023-12-24 00:42:44,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1415426.6666666667, ans=0.2 2023-12-24 00:43:08,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1415560.0, ans=0.125 2023-12-24 00:43:08,555 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.80 vs. limit=15.0 2023-12-24 00:43:20,985 INFO [train.py:886] (0/4) Epoch 45, batch 2650, loss[loss=0.007164, audio_tagging_loss=0.007164, over 21894.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4945447.44 frames. ], batch size: 107, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:43:24,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1415693.3333333333, ans=0.0 2023-12-24 00:43:29,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1415693.3333333333, ans=0.125 2023-12-24 00:43:37,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2023-12-24 00:43:58,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1415893.3333333333, ans=0.2 2023-12-24 00:44:00,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1415893.3333333333, ans=0.125 2023-12-24 00:44:02,337 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.298e+01 3.813e+01 3.941e+01 4.164e+01 4.704e+01, threshold=7.881e+01, percent-clipped=0.0 2023-12-24 00:44:13,726 INFO [train.py:886] (0/4) Epoch 45, batch 2700, loss[loss=0.01026, audio_tagging_loss=0.01026, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4946705.02 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:44:44,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1416226.6666666667, ans=0.0 2023-12-24 00:45:05,364 INFO [train.py:886] (0/4) Epoch 45, batch 2750, loss[loss=0.01342, audio_tagging_loss=0.01342, over 25000.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4947683.94 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:45:08,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1416360.0, ans=0.125 2023-12-24 00:45:46,095 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.540e+01 3.826e+01 4.053e+01 4.238e+01 4.704e+01, threshold=8.107e+01, percent-clipped=0.0 2023-12-24 00:45:50,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1416626.6666666667, ans=0.125 2023-12-24 00:45:54,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1416626.6666666667, ans=0.05 2023-12-24 00:45:55,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1416693.3333333333, ans=0.125 2023-12-24 00:45:56,551 INFO [train.py:886] (0/4) Epoch 45, batch 2800, loss[loss=0.01169, audio_tagging_loss=0.01169, over 24750.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4944220.63 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:46:06,074 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:46:06,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1416760.0, ans=0.0 2023-12-24 00:46:15,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1416760.0, ans=0.125 2023-12-24 00:46:31,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1416893.3333333333, ans=0.0 2023-12-24 00:46:48,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1416960.0, ans=0.125 2023-12-24 00:46:49,806 INFO [train.py:886] (0/4) Epoch 45, batch 2850, loss[loss=0.01083, audio_tagging_loss=0.01083, over 24750.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4940177.62 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:46:54,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1417026.6666666667, ans=0.125 2023-12-24 00:46:54,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1417026.6666666667, ans=0.125 2023-12-24 00:46:58,567 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:47:04,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1417093.3333333333, ans=0.0 2023-12-24 00:47:29,705 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.527e+01 3.915e+01 4.123e+01 4.264e+01 5.597e+01, threshold=8.246e+01, percent-clipped=0.0 2023-12-24 00:47:30,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2023-12-24 00:47:40,336 INFO [train.py:886] (0/4) Epoch 45, batch 2900, loss[loss=0.01102, audio_tagging_loss=0.01102, over 24750.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4941771.11 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:47:40,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1417360.0, ans=0.2 2023-12-24 00:47:44,750 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.23 vs. limit=10.0 2023-12-24 00:47:51,047 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2023-12-24 00:47:58,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1417426.6666666667, ans=0.0 2023-12-24 00:47:59,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1417426.6666666667, ans=0.1 2023-12-24 00:48:04,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=1417493.3333333333, ans=0.5 2023-12-24 00:48:24,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1417626.6666666667, ans=0.125 2023-12-24 00:48:30,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=12.0 2023-12-24 00:48:32,212 INFO [train.py:886] (0/4) Epoch 45, batch 2950, loss[loss=0.01035, audio_tagging_loss=0.01035, over 24750.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4946678.41 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:48:33,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1417693.3333333333, ans=0.125 2023-12-24 00:48:39,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1417693.3333333333, ans=0.0 2023-12-24 00:48:42,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1417760.0, ans=0.0 2023-12-24 00:48:50,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1417760.0, ans=0.0 2023-12-24 00:48:55,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.48 vs. limit=10.0 2023-12-24 00:48:58,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1417826.6666666667, ans=0.09899494936611666 2023-12-24 00:49:11,468 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.530e+01 3.805e+01 3.987e+01 4.227e+01 4.629e+01, threshold=7.974e+01, percent-clipped=0.0 2023-12-24 00:49:18,909 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:49:23,962 INFO [train.py:886] (0/4) Epoch 45, batch 3000, loss[loss=0.01055, audio_tagging_loss=0.01055, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4946431.45 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:49:23,963 INFO [train.py:909] (0/4) Computing validation loss 2023-12-24 00:49:45,383 INFO [train.py:917] (0/4) Epoch 45, validation: loss=0.03669, audio_tagging_loss=0.03669, over 3737520.00 frames. 2023-12-24 00:49:45,384 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-24 00:49:52,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1418026.6666666667, ans=0.035 2023-12-24 00:49:59,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1418093.3333333333, ans=0.09899494936611666 2023-12-24 00:50:05,603 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:50:13,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1418160.0, ans=0.0 2023-12-24 00:50:25,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1418293.3333333333, ans=0.1 2023-12-24 00:50:36,118 INFO [train.py:886] (0/4) Epoch 45, batch 3050, loss[loss=0.009089, audio_tagging_loss=0.009089, over 22027.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4952837.66 frames. ], batch size: 107, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:51:15,234 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.535e+01 3.857e+01 4.011e+01 4.213e+01 4.793e+01, threshold=8.022e+01, percent-clipped=0.0 2023-12-24 00:51:28,646 INFO [train.py:886] (0/4) Epoch 45, batch 3100, loss[loss=0.0113, audio_tagging_loss=0.0113, over 24750.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4953921.23 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:51:38,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1418760.0, ans=0.125 2023-12-24 00:51:52,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1418826.6666666667, ans=0.125 2023-12-24 00:52:05,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1418893.3333333333, ans=0.125 2023-12-24 00:52:18,791 INFO [train.py:886] (0/4) Epoch 45, batch 3150, loss[loss=0.0121, audio_tagging_loss=0.0121, over 21912.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4950400.67 frames. ], batch size: 107, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:52:27,896 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:52:50,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.55 vs. limit=22.5 2023-12-24 00:53:00,686 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.707e+01 3.938e+01 4.100e+01 4.263e+01 4.919e+01, threshold=8.199e+01, percent-clipped=0.0 2023-12-24 00:53:09,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1419293.3333333333, ans=0.015 2023-12-24 00:53:11,860 INFO [train.py:886] (0/4) Epoch 45, batch 3200, loss[loss=0.0107, audio_tagging_loss=0.0107, over 24750.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4946719.77 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:53:15,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=22.62 vs. limit=22.5 2023-12-24 00:53:18,018 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=12.0 2023-12-24 00:53:19,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1419360.0, ans=0.0 2023-12-24 00:53:20,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.13 vs. limit=15.0 2023-12-24 00:53:52,027 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2023-12-24 00:53:53,203 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.42 vs. limit=15.0 2023-12-24 00:54:03,200 INFO [train.py:886] (0/4) Epoch 45, batch 3250, loss[loss=0.009505, audio_tagging_loss=0.009505, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4946673.73 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:54:03,650 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.73 vs. limit=22.5 2023-12-24 00:54:11,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1419693.3333333333, ans=0.1 2023-12-24 00:54:23,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1419826.6666666667, ans=0.0 2023-12-24 00:54:30,460 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2023-12-24 00:54:39,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.09 vs. limit=22.5 2023-12-24 00:54:43,800 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.456e+01 3.834e+01 4.007e+01 4.238e+01 5.618e+01, threshold=8.014e+01, percent-clipped=0.0 2023-12-24 00:54:55,289 INFO [train.py:886] (0/4) Epoch 45, batch 3300, loss[loss=0.01175, audio_tagging_loss=0.01175, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4954037.58 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:54:58,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.81 vs. limit=15.0 2023-12-24 00:55:04,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=22.5 2023-12-24 00:55:09,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1420093.3333333333, ans=0.125 2023-12-24 00:55:22,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1420160.0, ans=0.1 2023-12-24 00:55:25,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1420226.6666666667, ans=0.125 2023-12-24 00:55:42,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1420293.3333333333, ans=0.1 2023-12-24 00:55:46,835 INFO [train.py:886] (0/4) Epoch 45, batch 3350, loss[loss=0.01158, audio_tagging_loss=0.01158, over 25000.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4952646.59 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:56:05,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1420493.3333333333, ans=0.125 2023-12-24 00:56:10,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1420493.3333333333, ans=0.0 2023-12-24 00:56:26,530 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.535e+01 3.924e+01 4.094e+01 4.263e+01 4.631e+01, threshold=8.187e+01, percent-clipped=0.0 2023-12-24 00:56:26,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1420626.6666666667, ans=0.0 2023-12-24 00:56:35,411 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:56:36,975 INFO [train.py:886] (0/4) Epoch 45, batch 3400, loss[loss=0.01174, audio_tagging_loss=0.01174, over 24750.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4958617.83 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:56:44,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1420693.3333333333, ans=0.0 2023-12-24 00:56:49,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1420760.0, ans=0.125 2023-12-24 00:56:59,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1420826.6666666667, ans=0.125 2023-12-24 00:57:29,305 INFO [train.py:886] (0/4) Epoch 45, batch 3450, loss[loss=0.008777, audio_tagging_loss=0.008777, over 24008.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4958791.50 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:57:45,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2023-12-24 00:57:49,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1421160.0, ans=0.125 2023-12-24 00:58:05,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2023-12-24 00:58:08,515 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.540e+01 3.963e+01 4.132e+01 4.315e+01 4.821e+01, threshold=8.264e+01, percent-clipped=0.0 2023-12-24 00:58:20,518 INFO [train.py:886] (0/4) Epoch 45, batch 3500, loss[loss=0.011, audio_tagging_loss=0.011, over 24750.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4955564.71 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:58:23,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1421360.0, ans=0.0 2023-12-24 00:58:26,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1421360.0, ans=0.0 2023-12-24 00:58:28,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1421360.0, ans=0.0 2023-12-24 00:58:43,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1421493.3333333333, ans=0.035 2023-12-24 00:59:10,978 INFO [train.py:886] (0/4) Epoch 45, batch 3550, loss[loss=0.01062, audio_tagging_loss=0.01062, over 25000.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4956348.03 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:59:14,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2023-12-24 00:59:21,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1421760.0, ans=0.125 2023-12-24 00:59:22,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1421760.0, ans=0.07 2023-12-24 00:59:32,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1421826.6666666667, ans=0.0 2023-12-24 00:59:34,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1421826.6666666667, ans=0.2 2023-12-24 00:59:34,843 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.78 vs. limit=15.0 2023-12-24 00:59:41,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1421893.3333333333, ans=0.1 2023-12-24 00:59:44,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=1421893.3333333333, ans=0.1 2023-12-24 00:59:50,931 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.379e+01 3.787e+01 4.000e+01 4.230e+01 4.921e+01, threshold=7.999e+01, percent-clipped=0.0 2023-12-24 00:59:59,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2023-12-24 01:00:00,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1421960.0, ans=0.125 2023-12-24 01:00:02,141 INFO [train.py:886] (0/4) Epoch 45, batch 3600, loss[loss=0.009519, audio_tagging_loss=0.009519, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4954453.05 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 01:00:29,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1422160.0, ans=0.125 2023-12-24 01:00:47,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1422293.3333333333, ans=0.1 2023-12-24 01:00:53,734 INFO [train.py:886] (0/4) Epoch 45, batch 3650, loss[loss=0.0139, audio_tagging_loss=0.0139, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4958755.62 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 01:00:57,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1422360.0, ans=0.125 2023-12-24 01:01:07,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1422426.6666666667, ans=0.125 2023-12-24 01:01:17,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1422493.3333333333, ans=0.2 2023-12-24 01:01:32,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.93 vs. limit=22.5 2023-12-24 01:01:34,292 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.468e+01 3.852e+01 3.987e+01 4.174e+01 4.561e+01, threshold=7.974e+01, percent-clipped=0.0 2023-12-24 01:01:44,932 INFO [train.py:886] (0/4) Epoch 45, batch 3700, loss[loss=0.01347, audio_tagging_loss=0.01347, over 24750.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4954786.72 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:01:47,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1422693.3333333333, ans=0.125 2023-12-24 01:01:49,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1422693.3333333333, ans=0.125 2023-12-24 01:01:51,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1422693.3333333333, ans=0.125 2023-12-24 01:01:59,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1422760.0, ans=0.0 2023-12-24 01:02:01,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2023-12-24 01:02:07,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.03 vs. limit=10.0 2023-12-24 01:02:11,040 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:02:11,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1422826.6666666667, ans=0.125 2023-12-24 01:02:20,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1422893.3333333333, ans=0.1 2023-12-24 01:02:37,415 INFO [train.py:886] (0/4) Epoch 45, batch 3750, loss[loss=0.0101, audio_tagging_loss=0.0101, over 24750.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4952664.11 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:02:41,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1423026.6666666667, ans=0.125 2023-12-24 01:03:05,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2023-12-24 01:03:17,399 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.344e+01 3.874e+01 4.070e+01 4.272e+01 4.635e+01, threshold=8.140e+01, percent-clipped=0.0 2023-12-24 01:03:21,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1423293.3333333333, ans=0.0 2023-12-24 01:03:24,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1423293.3333333333, ans=0.125 2023-12-24 01:03:28,533 INFO [train.py:886] (0/4) Epoch 45, batch 3800, loss[loss=0.01131, audio_tagging_loss=0.01131, over 21247.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4943109.88 frames. ], batch size: 107, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:03:29,001 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2023-12-24 01:03:36,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2023-12-24 01:03:50,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1423493.3333333333, ans=0.125 2023-12-24 01:04:09,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1423560.0, ans=0.5 2023-12-24 01:04:10,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1423626.6666666667, ans=0.0 2023-12-24 01:04:14,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1423626.6666666667, ans=0.0 2023-12-24 01:04:20,896 INFO [train.py:886] (0/4) Epoch 45, batch 3850, loss[loss=0.009894, audio_tagging_loss=0.009894, over 25000.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4937990.75 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:04:24,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1423693.3333333333, ans=0.125 2023-12-24 01:04:29,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2023-12-24 01:04:40,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1423760.0, ans=0.0 2023-12-24 01:04:45,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1423826.6666666667, ans=0.125 2023-12-24 01:04:59,990 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.544e+01 3.849e+01 4.042e+01 4.188e+01 4.936e+01, threshold=8.085e+01, percent-clipped=0.0 2023-12-24 01:05:04,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1423960.0, ans=0.125 2023-12-24 01:05:08,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1423960.0, ans=0.125 2023-12-24 01:05:11,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2023-12-24 01:05:12,415 INFO [train.py:886] (0/4) Epoch 45, batch 3900, loss[loss=0.01393, audio_tagging_loss=0.01393, over 24750.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4944457.05 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:05:23,252 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=6.0 2023-12-24 01:05:31,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1424160.0, ans=0.1 2023-12-24 01:05:31,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2023-12-24 01:05:44,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1424226.6666666667, ans=0.0 2023-12-24 01:05:45,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1424226.6666666667, ans=0.05 2023-12-24 01:06:01,973 INFO [train.py:886] (0/4) Epoch 45, batch 3950, loss[loss=0.01101, audio_tagging_loss=0.01101, over 25000.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4945376.46 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:06:14,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1424426.6666666667, ans=0.2 2023-12-24 01:06:17,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1424426.6666666667, ans=0.0 2023-12-24 01:06:33,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1424560.0, ans=0.2 2023-12-24 01:06:42,076 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.512e+01 3.875e+01 4.019e+01 4.169e+01 5.128e+01, threshold=8.038e+01, percent-clipped=0.0 2023-12-24 01:06:43,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1424626.6666666667, ans=0.125 2023-12-24 01:06:51,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1424626.6666666667, ans=0.0 2023-12-24 01:06:53,883 INFO [train.py:886] (0/4) Epoch 45, batch 4000, loss[loss=0.009652, audio_tagging_loss=0.009652, over 21959.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4946829.37 frames. ], batch size: 107, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:06:56,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1424693.3333333333, ans=0.0 2023-12-24 01:06:57,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1424693.3333333333, ans=0.025 2023-12-24 01:07:18,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1424826.6666666667, ans=0.125 2023-12-24 01:07:18,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1424826.6666666667, ans=0.125 2023-12-24 01:07:25,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1424893.3333333333, ans=0.125 2023-12-24 01:07:25,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2023-12-24 01:07:28,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1424893.3333333333, ans=0.2 2023-12-24 01:07:29,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1424893.3333333333, ans=0.125 2023-12-24 01:07:33,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1424960.0, ans=0.125 2023-12-24 01:07:42,911 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=12.0 2023-12-24 01:07:43,277 INFO [train.py:886] (0/4) Epoch 45, batch 4050, loss[loss=0.01071, audio_tagging_loss=0.01071, over 25000.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4948681.17 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:07:49,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.83 vs. limit=22.5 2023-12-24 01:07:54,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1425093.3333333333, ans=0.125 2023-12-24 01:08:01,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1425093.3333333333, ans=0.125 2023-12-24 01:08:05,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1425160.0, ans=0.125 2023-12-24 01:08:05,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1425160.0, ans=0.07 2023-12-24 01:08:21,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1425226.6666666667, ans=0.0 2023-12-24 01:08:24,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2023-12-24 01:08:25,217 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.455e+01 3.868e+01 4.107e+01 4.296e+01 5.422e+01, threshold=8.214e+01, percent-clipped=0.0 2023-12-24 01:08:26,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1425293.3333333333, ans=0.0 2023-12-24 01:08:34,914 INFO [train.py:886] (0/4) Epoch 45, batch 4100, loss[loss=0.01183, audio_tagging_loss=0.01183, over 24750.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4943277.51 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:08:36,354 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2023-12-24 01:08:42,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1425360.0, ans=0.125 2023-12-24 01:08:43,965 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2023-12-24 01:08:53,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1425426.6666666667, ans=0.125 2023-12-24 01:08:55,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1425426.6666666667, ans=0.125 2023-12-24 01:09:02,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1425493.3333333333, ans=0.2 2023-12-24 01:09:07,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1425560.0, ans=0.125 2023-12-24 01:09:10,654 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2023-12-24 01:09:14,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2023-12-24 01:09:27,202 INFO [train.py:886] (0/4) Epoch 45, batch 4150, loss[loss=0.009416, audio_tagging_loss=0.009416, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4937762.69 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:09:32,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1425693.3333333333, ans=0.1 2023-12-24 01:10:08,339 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.433e+01 3.920e+01 4.056e+01 4.290e+01 4.928e+01, threshold=8.113e+01, percent-clipped=0.0 2023-12-24 01:10:16,929 INFO [train.py:886] (0/4) Epoch 45, batch 4200, loss[loss=0.0112, audio_tagging_loss=0.0112, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4947448.55 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:10:18,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1426026.6666666667, ans=0.0 2023-12-24 01:10:20,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1426026.6666666667, ans=0.125 2023-12-24 01:10:23,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.19 vs. limit=22.5 2023-12-24 01:10:33,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1426093.3333333333, ans=0.0 2023-12-24 01:10:40,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1426160.0, ans=0.125 2023-12-24 01:10:42,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1426160.0, ans=0.125 2023-12-24 01:10:47,038 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:11:08,469 INFO [train.py:886] (0/4) Epoch 45, batch 4250, loss[loss=0.01205, audio_tagging_loss=0.01205, over 24750.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4947737.25 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:11:21,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1426426.6666666667, ans=0.125 2023-12-24 01:11:27,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1426493.3333333333, ans=0.125 2023-12-24 01:11:32,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1426493.3333333333, ans=0.125 2023-12-24 01:11:33,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1426493.3333333333, ans=0.2 2023-12-24 01:11:34,288 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:11:41,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1426560.0, ans=0.2 2023-12-24 01:11:41,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1426560.0, ans=0.05 2023-12-24 01:11:49,245 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.608e+01 3.890e+01 4.019e+01 4.191e+01 4.680e+01, threshold=8.038e+01, percent-clipped=0.0 2023-12-24 01:11:51,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1426626.6666666667, ans=0.0 2023-12-24 01:11:55,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1426626.6666666667, ans=0.125 2023-12-24 01:11:57,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1426693.3333333333, ans=0.125 2023-12-24 01:11:58,746 INFO [train.py:886] (0/4) Epoch 45, batch 4300, loss[loss=0.008383, audio_tagging_loss=0.008383, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4952845.25 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:12:02,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1426693.3333333333, ans=0.125 2023-12-24 01:12:10,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1426760.0, ans=0.1 2023-12-24 01:12:41,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2023-12-24 01:12:45,053 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2023-12-24 01:12:50,102 INFO [train.py:886] (0/4) Epoch 45, batch 4350, loss[loss=0.01048, audio_tagging_loss=0.01048, over 25000.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4959486.46 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:12:53,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1427026.6666666667, ans=0.0 2023-12-24 01:13:19,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1427160.0, ans=0.125 2023-12-24 01:13:24,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1427226.6666666667, ans=0.125 2023-12-24 01:13:31,697 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.626e+01 3.980e+01 4.130e+01 4.328e+01 5.553e+01, threshold=8.260e+01, percent-clipped=0.0 2023-12-24 01:13:43,000 INFO [train.py:886] (0/4) Epoch 45, batch 4400, loss[loss=0.01195, audio_tagging_loss=0.01195, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4951688.64 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:13:57,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1427426.6666666667, ans=0.125 2023-12-24 01:14:01,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1427493.3333333333, ans=0.0 2023-12-24 01:14:10,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2023-12-24 01:14:11,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2023-12-24 01:14:28,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1427626.6666666667, ans=0.125 2023-12-24 01:14:32,948 INFO [train.py:886] (0/4) Epoch 45, batch 4450, loss[loss=0.009754, audio_tagging_loss=0.009754, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4951621.19 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:14:59,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1427826.6666666667, ans=0.125 2023-12-24 01:15:15,501 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.412e+01 3.945e+01 4.084e+01 4.273e+01 5.400e+01, threshold=8.168e+01, percent-clipped=0.0 2023-12-24 01:15:16,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1427960.0, ans=0.0 2023-12-24 01:15:22,441 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.63 vs. limit=22.5 2023-12-24 01:15:24,894 INFO [train.py:886] (0/4) Epoch 45, batch 4500, loss[loss=0.01079, audio_tagging_loss=0.01079, over 25000.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4955657.91 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:15:38,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1428093.3333333333, ans=0.125 2023-12-24 01:16:01,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1428226.6666666667, ans=0.125 2023-12-24 01:16:06,597 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:16:09,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1428293.3333333333, ans=0.125 2023-12-24 01:16:11,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1428293.3333333333, ans=0.0 2023-12-24 01:16:17,041 INFO [train.py:886] (0/4) Epoch 45, batch 4550, loss[loss=0.008783, audio_tagging_loss=0.008783, over 24750.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4959506.88 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:16:37,205 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:16:39,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1428493.3333333333, ans=0.1 2023-12-24 01:16:59,849 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.365e+01 3.907e+01 4.021e+01 4.185e+01 4.612e+01, threshold=8.042e+01, percent-clipped=0.0 2023-12-24 01:17:04,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1428626.6666666667, ans=0.125 2023-12-24 01:17:05,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1428626.6666666667, ans=0.125 2023-12-24 01:17:07,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1428693.3333333333, ans=0.2 2023-12-24 01:17:08,552 INFO [train.py:886] (0/4) Epoch 45, batch 4600, loss[loss=0.01166, audio_tagging_loss=0.01166, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4957547.29 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:17:14,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1428693.3333333333, ans=0.0 2023-12-24 01:17:15,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1428693.3333333333, ans=0.0 2023-12-24 01:17:18,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1428760.0, ans=0.0 2023-12-24 01:17:20,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.52 vs. limit=22.5 2023-12-24 01:17:50,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1428960.0, ans=0.125 2023-12-24 01:18:00,685 INFO [train.py:886] (0/4) Epoch 45, batch 4650, loss[loss=0.01081, audio_tagging_loss=0.01081, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4960079.40 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:18:12,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1429093.3333333333, ans=0.5 2023-12-24 01:18:42,116 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.658e+01 3.928e+01 4.124e+01 4.353e+01 4.841e+01, threshold=8.247e+01, percent-clipped=0.0 2023-12-24 01:18:50,469 INFO [train.py:886] (0/4) Epoch 45, batch 4700, loss[loss=0.009775, audio_tagging_loss=0.009775, over 22217.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4952218.98 frames. ], batch size: 107, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:18:53,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1429360.0, ans=0.125 2023-12-24 01:19:01,795 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2023-12-24 01:19:08,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1429493.3333333333, ans=0.0 2023-12-24 01:19:11,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2023-12-24 01:19:28,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1429626.6666666667, ans=0.2 2023-12-24 01:19:31,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1429626.6666666667, ans=0.1 2023-12-24 01:19:37,540 INFO [train.py:886] (0/4) Epoch 45, batch 4750, loss[loss=0.0105, audio_tagging_loss=0.0105, over 24750.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4944441.56 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:19:53,207 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-45.pt 2023-12-24 01:20:12,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1429800.0, ans=0.0 2023-12-24 01:20:13,026 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2023-12-24 01:20:13,328 INFO [train.py:886] (0/4) Epoch 46, batch 0, loss[loss=0.02642, audio_tagging_loss=0.02642, over 23989.00 frames. ], tot_loss[loss=0.02642, audio_tagging_loss=0.02642, over 23989.00 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:20:13,329 INFO [train.py:909] (0/4) Computing validation loss 2023-12-24 01:20:24,956 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1079, 3.5770, 3.8959, 3.9095], device='cuda:0') 2023-12-24 01:20:34,592 INFO [train.py:917] (0/4) Epoch 46, validation: loss=0.03601, audio_tagging_loss=0.03601, over 3737520.00 frames. 2023-12-24 01:20:34,593 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-24 01:20:43,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1429866.6666666667, ans=10.0 2023-12-24 01:20:45,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1429866.6666666667, ans=0.2 2023-12-24 01:20:48,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.49 vs. limit=22.5 2023-12-24 01:20:57,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1429933.3333333333, ans=0.125 2023-12-24 01:21:00,402 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.422e+01 4.025e+01 4.232e+01 5.097e+01 1.112e+02, threshold=8.463e+01, percent-clipped=5.0 2023-12-24 01:21:17,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1430066.6666666667, ans=0.1 2023-12-24 01:21:25,403 INFO [train.py:886] (0/4) Epoch 46, batch 50, loss[loss=0.01482, audio_tagging_loss=0.01482, over 23993.00 frames. ], tot_loss[loss=0.01748, audio_tagging_loss=0.01748, over 1115459.54 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:22:09,709 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:22:11,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1430400.0, ans=0.09899494936611666 2023-12-24 01:22:13,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1430400.0, ans=0.125 2023-12-24 01:22:17,806 INFO [train.py:886] (0/4) Epoch 46, batch 100, loss[loss=0.01078, audio_tagging_loss=0.01078, over 25000.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 1971402.27 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:22:43,338 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.772e+01 4.262e+01 4.601e+01 4.856e+01 5.800e+01, threshold=9.203e+01, percent-clipped=0.0 2023-12-24 01:22:45,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1430600.0, ans=0.07 2023-12-24 01:22:53,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1430666.6666666667, ans=0.125 2023-12-24 01:23:01,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1430733.3333333333, ans=0.0 2023-12-24 01:23:09,994 INFO [train.py:886] (0/4) Epoch 46, batch 150, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 2639812.82 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:23:14,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1430800.0, ans=0.125 2023-12-24 01:23:20,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1430866.6666666667, ans=10.0 2023-12-24 01:23:21,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1430866.6666666667, ans=0.2 2023-12-24 01:23:39,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1431000.0, ans=0.125 2023-12-24 01:23:41,057 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.91 vs. limit=10.0 2023-12-24 01:24:01,173 INFO [train.py:886] (0/4) Epoch 46, batch 200, loss[loss=0.00932, audio_tagging_loss=0.00932, over 25000.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 3156950.30 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:24:14,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1431200.0, ans=0.125 2023-12-24 01:24:18,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2023-12-24 01:24:18,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1431200.0, ans=0.2 2023-12-24 01:24:26,295 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.683e+01 3.918e+01 4.124e+01 4.291e+01 5.491e+01, threshold=8.249e+01, percent-clipped=0.0 2023-12-24 01:24:26,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1431266.6666666667, ans=0.0 2023-12-24 01:24:30,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1431333.3333333333, ans=0.1 2023-12-24 01:24:43,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1431400.0, ans=0.2 2023-12-24 01:24:49,237 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:24:51,864 INFO [train.py:886] (0/4) Epoch 46, batch 250, loss[loss=0.0133, audio_tagging_loss=0.0133, over 24949.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 3557553.35 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:24:54,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1431466.6666666667, ans=0.125 2023-12-24 01:24:55,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1431466.6666666667, ans=0.1 2023-12-24 01:25:23,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1431666.6666666667, ans=0.125 2023-12-24 01:25:30,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1431666.6666666667, ans=0.125 2023-12-24 01:25:37,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1431733.3333333333, ans=0.0 2023-12-24 01:25:42,509 INFO [train.py:886] (0/4) Epoch 46, batch 300, loss[loss=0.01142, audio_tagging_loss=0.01142, over 24750.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 3868669.05 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:25:42,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1431800.0, ans=0.125 2023-12-24 01:25:43,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1431800.0, ans=0.0 2023-12-24 01:26:00,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.56 vs. limit=22.5 2023-12-24 01:26:08,050 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.528e+01 3.903e+01 4.066e+01 4.292e+01 4.827e+01, threshold=8.132e+01, percent-clipped=0.0 2023-12-24 01:26:08,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1431933.3333333333, ans=0.0 2023-12-24 01:26:26,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1432066.6666666667, ans=0.2 2023-12-24 01:26:31,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1432066.6666666667, ans=0.125 2023-12-24 01:26:33,589 INFO [train.py:886] (0/4) Epoch 46, batch 350, loss[loss=0.01083, audio_tagging_loss=0.01083, over 24750.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4104251.67 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:27:00,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1432266.6666666667, ans=0.0 2023-12-24 01:27:04,032 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:27:04,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1432333.3333333333, ans=0.125 2023-12-24 01:27:09,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1432333.3333333333, ans=0.1 2023-12-24 01:27:14,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1432400.0, ans=0.0 2023-12-24 01:27:26,604 INFO [train.py:886] (0/4) Epoch 46, batch 400, loss[loss=0.009913, audio_tagging_loss=0.009913, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4292177.94 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:27:52,248 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.333e+01 3.856e+01 4.044e+01 4.244e+01 4.925e+01, threshold=8.088e+01, percent-clipped=0.0 2023-12-24 01:27:52,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2023-12-24 01:27:56,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1432666.6666666667, ans=0.0 2023-12-24 01:27:58,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1432666.6666666667, ans=0.2 2023-12-24 01:28:04,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1432666.6666666667, ans=0.2 2023-12-24 01:28:04,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1432666.6666666667, ans=0.0 2023-12-24 01:28:17,145 INFO [train.py:886] (0/4) Epoch 46, batch 450, loss[loss=0.01273, audio_tagging_loss=0.01273, over 24750.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4436750.87 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:28:27,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.39 vs. limit=15.0 2023-12-24 01:29:09,434 INFO [train.py:886] (0/4) Epoch 46, batch 500, loss[loss=0.009597, audio_tagging_loss=0.009597, over 25000.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4549371.66 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:29:35,996 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.577e+01 3.884e+01 4.049e+01 4.174e+01 4.739e+01, threshold=8.098e+01, percent-clipped=0.0 2023-12-24 01:29:40,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1433333.3333333333, ans=0.0 2023-12-24 01:30:01,316 INFO [train.py:886] (0/4) Epoch 46, batch 550, loss[loss=0.009748, audio_tagging_loss=0.009748, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4642728.36 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:30:05,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2023-12-24 01:30:26,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1433600.0, ans=0.1 2023-12-24 01:30:38,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1433666.6666666667, ans=0.125 2023-12-24 01:30:41,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1433666.6666666667, ans=0.1 2023-12-24 01:30:52,352 INFO [train.py:886] (0/4) Epoch 46, batch 600, loss[loss=0.009036, audio_tagging_loss=0.009036, over 24750.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4708204.00 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:30:53,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1433800.0, ans=0.0 2023-12-24 01:30:55,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1433800.0, ans=0.125 2023-12-24 01:31:01,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1433866.6666666667, ans=0.0 2023-12-24 01:31:11,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1433866.6666666667, ans=0.125 2023-12-24 01:31:13,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1433933.3333333333, ans=0.1 2023-12-24 01:31:18,056 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.528e+01 3.934e+01 4.117e+01 4.300e+01 4.914e+01, threshold=8.233e+01, percent-clipped=0.0 2023-12-24 01:31:31,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1434066.6666666667, ans=0.0 2023-12-24 01:31:32,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1434066.6666666667, ans=0.125 2023-12-24 01:31:40,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1434066.6666666667, ans=0.125 2023-12-24 01:31:43,852 INFO [train.py:886] (0/4) Epoch 46, batch 650, loss[loss=0.01168, audio_tagging_loss=0.01168, over 24750.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4755279.51 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:31:44,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.05 vs. limit=15.0 2023-12-24 01:31:47,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1434133.3333333333, ans=0.125 2023-12-24 01:31:52,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.55 vs. limit=22.5 2023-12-24 01:31:52,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1434200.0, ans=0.0 2023-12-24 01:31:59,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1434200.0, ans=0.125 2023-12-24 01:32:13,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=1434333.3333333333, ans=0.5 2023-12-24 01:32:17,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1434333.3333333333, ans=0.0 2023-12-24 01:32:33,564 INFO [train.py:886] (0/4) Epoch 46, batch 700, loss[loss=0.009988, audio_tagging_loss=0.009988, over 24750.00 frames. ], tot_loss[loss=0.01123, audio_tagging_loss=0.01123, over 4796759.45 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:32:38,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-12-24 01:32:39,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1434466.6666666667, ans=0.2 2023-12-24 01:32:49,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1434533.3333333333, ans=0.2 2023-12-24 01:32:58,777 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.569e+01 3.947e+01 4.093e+01 4.311e+01 5.149e+01, threshold=8.186e+01, percent-clipped=0.0 2023-12-24 01:33:05,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1434666.6666666667, ans=0.0 2023-12-24 01:33:19,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2023-12-24 01:33:25,080 INFO [train.py:886] (0/4) Epoch 46, batch 750, loss[loss=0.0106, audio_tagging_loss=0.0106, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4828897.50 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:33:29,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1434800.0, ans=0.125 2023-12-24 01:33:34,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1434866.6666666667, ans=0.125 2023-12-24 01:33:53,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1434933.3333333333, ans=0.125 2023-12-24 01:33:53,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2023-12-24 01:34:03,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=12.0 2023-12-24 01:34:06,694 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2023-12-24 01:34:07,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1435066.6666666667, ans=0.0 2023-12-24 01:34:14,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1435066.6666666667, ans=0.125 2023-12-24 01:34:16,969 INFO [train.py:886] (0/4) Epoch 46, batch 800, loss[loss=0.01011, audio_tagging_loss=0.01011, over 25000.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4862795.52 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:34:21,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.66 vs. limit=10.0 2023-12-24 01:34:41,861 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.539e+01 3.875e+01 4.046e+01 4.240e+01 5.244e+01, threshold=8.092e+01, percent-clipped=0.0 2023-12-24 01:34:45,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2023-12-24 01:34:49,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1435333.3333333333, ans=0.125 2023-12-24 01:34:52,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2023-12-24 01:34:58,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1435400.0, ans=0.0 2023-12-24 01:35:04,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.84 vs. limit=15.0 2023-12-24 01:35:08,500 INFO [train.py:886] (0/4) Epoch 46, batch 850, loss[loss=0.01136, audio_tagging_loss=0.01136, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4889352.59 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:35:12,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1435466.6666666667, ans=0.0 2023-12-24 01:35:17,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1435533.3333333333, ans=0.0 2023-12-24 01:35:22,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1435533.3333333333, ans=0.125 2023-12-24 01:35:26,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1435533.3333333333, ans=0.125 2023-12-24 01:35:29,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1435600.0, ans=0.125 2023-12-24 01:35:43,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1435666.6666666667, ans=15.0 2023-12-24 01:35:44,453 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2023-12-24 01:35:49,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1435733.3333333333, ans=0.0 2023-12-24 01:35:57,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.92 vs. limit=6.0 2023-12-24 01:36:00,424 INFO [train.py:886] (0/4) Epoch 46, batch 900, loss[loss=0.009955, audio_tagging_loss=0.009955, over 24750.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4902450.71 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:36:16,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1435866.6666666667, ans=0.0 2023-12-24 01:36:20,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1435933.3333333333, ans=0.0 2023-12-24 01:36:25,490 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.452e+01 3.869e+01 4.061e+01 4.225e+01 5.084e+01, threshold=8.122e+01, percent-clipped=0.0 2023-12-24 01:36:33,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1436000.0, ans=0.07 2023-12-24 01:36:50,209 INFO [train.py:886] (0/4) Epoch 46, batch 950, loss[loss=0.00872, audio_tagging_loss=0.00872, over 24003.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4907742.88 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:36:55,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1436133.3333333333, ans=0.0 2023-12-24 01:37:03,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1436200.0, ans=0.0 2023-12-24 01:37:14,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1436266.6666666667, ans=0.1 2023-12-24 01:37:42,109 INFO [train.py:886] (0/4) Epoch 46, batch 1000, loss[loss=0.01057, audio_tagging_loss=0.01057, over 23997.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4911278.16 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:37:42,842 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.06 vs. limit=10.0 2023-12-24 01:38:00,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1436533.3333333333, ans=0.2 2023-12-24 01:38:06,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1436600.0, ans=0.125 2023-12-24 01:38:06,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1436600.0, ans=0.2 2023-12-24 01:38:07,787 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.584e+01 3.871e+01 4.031e+01 4.254e+01 4.824e+01, threshold=8.061e+01, percent-clipped=0.0 2023-12-24 01:38:12,749 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:38:14,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1436666.6666666667, ans=0.0 2023-12-24 01:38:16,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1436666.6666666667, ans=0.0 2023-12-24 01:38:24,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1436733.3333333333, ans=0.2 2023-12-24 01:38:24,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1436733.3333333333, ans=0.1 2023-12-24 01:38:31,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1436733.3333333333, ans=0.1 2023-12-24 01:38:32,884 INFO [train.py:886] (0/4) Epoch 46, batch 1050, loss[loss=0.01298, audio_tagging_loss=0.01298, over 24750.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4921888.57 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:39:03,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1437000.0, ans=0.125 2023-12-24 01:39:23,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1437133.3333333333, ans=0.2 2023-12-24 01:39:24,085 INFO [train.py:886] (0/4) Epoch 46, batch 1100, loss[loss=0.01055, audio_tagging_loss=0.01055, over 24750.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4926189.68 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:39:46,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1437266.6666666667, ans=0.1 2023-12-24 01:39:49,751 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.427e+01 3.840e+01 4.057e+01 4.285e+01 6.085e+01, threshold=8.114e+01, percent-clipped=0.0 2023-12-24 01:39:52,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1437266.6666666667, ans=0.125 2023-12-24 01:39:54,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1437333.3333333333, ans=0.0 2023-12-24 01:39:59,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1437333.3333333333, ans=0.125 2023-12-24 01:40:15,278 INFO [train.py:886] (0/4) Epoch 46, batch 1150, loss[loss=0.009766, audio_tagging_loss=0.009766, over 25000.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4935101.64 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:40:22,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1437466.6666666667, ans=0.125 2023-12-24 01:40:24,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1437533.3333333333, ans=0.125 2023-12-24 01:40:33,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1437533.3333333333, ans=0.0 2023-12-24 01:40:39,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1437600.0, ans=0.1 2023-12-24 01:40:45,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.95 vs. limit=10.0 2023-12-24 01:40:57,203 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.15 vs. limit=6.0 2023-12-24 01:41:03,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1437733.3333333333, ans=0.125 2023-12-24 01:41:04,526 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:41:05,275 INFO [train.py:886] (0/4) Epoch 46, batch 1200, loss[loss=0.009025, audio_tagging_loss=0.009025, over 25000.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4944228.37 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:41:30,966 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.478e+01 3.918e+01 4.092e+01 4.253e+01 4.725e+01, threshold=8.185e+01, percent-clipped=0.0 2023-12-24 01:41:33,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1437933.3333333333, ans=0.125 2023-12-24 01:41:45,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1438000.0, ans=0.0 2023-12-24 01:41:50,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1438066.6666666667, ans=0.0 2023-12-24 01:41:51,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1438066.6666666667, ans=0.07 2023-12-24 01:41:57,097 INFO [train.py:886] (0/4) Epoch 46, batch 1250, loss[loss=0.01348, audio_tagging_loss=0.01348, over 24750.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4942569.61 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:42:00,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.78 vs. limit=15.0 2023-12-24 01:42:03,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.04 vs. limit=22.5 2023-12-24 01:42:09,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1438200.0, ans=0.1 2023-12-24 01:42:30,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1438333.3333333333, ans=0.125 2023-12-24 01:42:30,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1438333.3333333333, ans=0.125 2023-12-24 01:42:37,422 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2023-12-24 01:42:47,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1438466.6666666667, ans=0.125 2023-12-24 01:42:47,840 INFO [train.py:886] (0/4) Epoch 46, batch 1300, loss[loss=0.0109, audio_tagging_loss=0.0109, over 24750.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4940835.76 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:42:53,212 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2023-12-24 01:43:02,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2023-12-24 01:43:13,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1438600.0, ans=0.125 2023-12-24 01:43:14,370 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.645e+01 3.930e+01 4.058e+01 4.275e+01 4.949e+01, threshold=8.116e+01, percent-clipped=0.0 2023-12-24 01:43:18,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1438666.6666666667, ans=0.0 2023-12-24 01:43:19,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1438666.6666666667, ans=0.0 2023-12-24 01:43:24,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1438666.6666666667, ans=0.125 2023-12-24 01:43:32,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1438733.3333333333, ans=0.0 2023-12-24 01:43:39,277 INFO [train.py:886] (0/4) Epoch 46, batch 1350, loss[loss=0.009293, audio_tagging_loss=0.009293, over 24750.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4944236.84 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:43:54,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1438866.6666666667, ans=0.125 2023-12-24 01:43:56,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1438866.6666666667, ans=0.1 2023-12-24 01:43:56,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1438866.6666666667, ans=0.0 2023-12-24 01:44:07,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1438933.3333333333, ans=0.0 2023-12-24 01:44:21,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2023-12-24 01:44:30,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1439133.3333333333, ans=0.125 2023-12-24 01:44:31,774 INFO [train.py:886] (0/4) Epoch 46, batch 1400, loss[loss=0.01122, audio_tagging_loss=0.01122, over 25000.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4949015.28 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 64.0 2023-12-24 01:44:40,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2023-12-24 01:44:43,485 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.04 vs. limit=22.5 2023-12-24 01:44:44,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1439200.0, ans=0.0 2023-12-24 01:44:51,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1439200.0, ans=0.125 2023-12-24 01:44:55,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2023-12-24 01:44:58,226 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.346e+01 3.871e+01 4.064e+01 4.207e+01 5.055e+01, threshold=8.128e+01, percent-clipped=0.0 2023-12-24 01:45:01,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.29 vs. limit=10.0 2023-12-24 01:45:24,842 INFO [train.py:886] (0/4) Epoch 46, batch 1450, loss[loss=0.01106, audio_tagging_loss=0.01106, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4956093.95 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:45:31,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1439466.6666666667, ans=0.125 2023-12-24 01:45:48,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2023-12-24 01:46:02,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1439666.6666666667, ans=0.5 2023-12-24 01:46:04,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1439666.6666666667, ans=0.125 2023-12-24 01:46:11,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1439733.3333333333, ans=0.125 2023-12-24 01:46:15,266 INFO [train.py:886] (0/4) Epoch 46, batch 1500, loss[loss=0.01003, audio_tagging_loss=0.01003, over 25000.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4951968.77 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:46:24,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1439800.0, ans=0.125 2023-12-24 01:46:33,414 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2023-12-24 01:46:40,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1439933.3333333333, ans=0.2 2023-12-24 01:46:41,577 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.511e+01 3.911e+01 4.080e+01 4.273e+01 5.286e+01, threshold=8.160e+01, percent-clipped=0.0 2023-12-24 01:46:45,669 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-216000.pt 2023-12-24 01:46:57,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1440000.0, ans=10.0 2023-12-24 01:47:10,225 INFO [train.py:886] (0/4) Epoch 46, batch 1550, loss[loss=0.01127, audio_tagging_loss=0.01127, over 24750.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4952771.17 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:47:10,658 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.62 vs. limit=15.0 2023-12-24 01:47:22,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1440200.0, ans=0.125 2023-12-24 01:47:22,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2023-12-24 01:47:33,373 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.38 vs. limit=22.5 2023-12-24 01:47:38,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=15.0 2023-12-24 01:47:54,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1440400.0, ans=0.0 2023-12-24 01:47:56,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2023-12-24 01:48:02,293 INFO [train.py:886] (0/4) Epoch 46, batch 1600, loss[loss=0.01012, audio_tagging_loss=0.01012, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4948955.82 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:48:15,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1440533.3333333333, ans=0.125 2023-12-24 01:48:21,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1440600.0, ans=0.0 2023-12-24 01:48:21,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.81 vs. limit=22.5 2023-12-24 01:48:26,622 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.459e+01 3.905e+01 4.113e+01 4.286e+01 4.788e+01, threshold=8.225e+01, percent-clipped=0.0 2023-12-24 01:48:26,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1440600.0, ans=0.125 2023-12-24 01:48:27,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.03 vs. limit=6.0 2023-12-24 01:48:48,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1440733.3333333333, ans=0.125 2023-12-24 01:48:50,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1440733.3333333333, ans=0.125 2023-12-24 01:48:52,885 INFO [train.py:886] (0/4) Epoch 46, batch 1650, loss[loss=0.0102, audio_tagging_loss=0.0102, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4948346.38 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:48:58,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2023-12-24 01:49:12,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1440866.6666666667, ans=0.0 2023-12-24 01:49:44,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1441066.6666666667, ans=0.2 2023-12-24 01:49:45,229 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:49:46,070 INFO [train.py:886] (0/4) Epoch 46, batch 1700, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4955537.96 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:49:54,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1441200.0, ans=0.125 2023-12-24 01:50:01,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1441200.0, ans=0.125 2023-12-24 01:50:11,912 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.496e+01 3.860e+01 4.005e+01 4.201e+01 5.398e+01, threshold=8.010e+01, percent-clipped=0.0 2023-12-24 01:50:13,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.21 vs. limit=22.5 2023-12-24 01:50:16,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1441333.3333333333, ans=0.125 2023-12-24 01:50:19,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1441333.3333333333, ans=0.125 2023-12-24 01:50:21,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1441333.3333333333, ans=0.0 2023-12-24 01:50:27,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1441400.0, ans=0.125 2023-12-24 01:50:29,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1441400.0, ans=0.0 2023-12-24 01:50:36,428 INFO [train.py:886] (0/4) Epoch 46, batch 1750, loss[loss=0.01261, audio_tagging_loss=0.01261, over 24750.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4953413.06 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:51:00,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1441600.0, ans=0.125 2023-12-24 01:51:11,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1441666.6666666667, ans=0.95 2023-12-24 01:51:21,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.53 vs. limit=22.5 2023-12-24 01:51:29,191 INFO [train.py:886] (0/4) Epoch 46, batch 1800, loss[loss=0.008774, audio_tagging_loss=0.008774, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4958509.74 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:51:32,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1441800.0, ans=0.125 2023-12-24 01:51:37,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1441800.0, ans=0.0 2023-12-24 01:51:42,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1441866.6666666667, ans=0.125 2023-12-24 01:51:55,675 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.556e+01 3.867e+01 4.060e+01 4.230e+01 5.187e+01, threshold=8.121e+01, percent-clipped=0.0 2023-12-24 01:52:01,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1442000.0, ans=0.125 2023-12-24 01:52:20,844 INFO [train.py:886] (0/4) Epoch 46, batch 1850, loss[loss=0.01288, audio_tagging_loss=0.01288, over 24750.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4959890.32 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:52:43,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1442266.6666666667, ans=0.07 2023-12-24 01:52:45,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1442266.6666666667, ans=0.125 2023-12-24 01:53:01,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1442400.0, ans=0.125 2023-12-24 01:53:04,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.39 vs. limit=15.0 2023-12-24 01:53:12,140 INFO [train.py:886] (0/4) Epoch 46, batch 1900, loss[loss=0.01169, audio_tagging_loss=0.01169, over 24750.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4945525.47 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:53:27,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1442533.3333333333, ans=0.0 2023-12-24 01:53:31,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1442533.3333333333, ans=0.125 2023-12-24 01:53:38,753 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.559e+01 3.923e+01 4.090e+01 4.316e+01 4.935e+01, threshold=8.180e+01, percent-clipped=0.0 2023-12-24 01:53:42,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1442666.6666666667, ans=0.125 2023-12-24 01:54:05,334 INFO [train.py:886] (0/4) Epoch 46, batch 1950, loss[loss=0.01066, audio_tagging_loss=0.01066, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4938187.44 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:54:19,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1442866.6666666667, ans=0.0 2023-12-24 01:54:26,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1442933.3333333333, ans=0.05 2023-12-24 01:54:33,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.54 vs. limit=15.0 2023-12-24 01:54:56,372 INFO [train.py:886] (0/4) Epoch 46, batch 2000, loss[loss=0.01232, audio_tagging_loss=0.01232, over 24750.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4946001.44 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:54:56,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1443133.3333333333, ans=10.0 2023-12-24 01:55:09,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1443200.0, ans=0.125 2023-12-24 01:55:12,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=12.0 2023-12-24 01:55:16,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1443266.6666666667, ans=0.125 2023-12-24 01:55:22,165 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.540e+01 3.855e+01 4.032e+01 4.223e+01 5.008e+01, threshold=8.064e+01, percent-clipped=0.0 2023-12-24 01:55:28,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1443333.3333333333, ans=0.04949747468305833 2023-12-24 01:55:29,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.73 vs. limit=15.0 2023-12-24 01:55:36,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1443333.3333333333, ans=0.125 2023-12-24 01:55:43,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1443400.0, ans=0.125 2023-12-24 01:55:48,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1443466.6666666667, ans=0.0 2023-12-24 01:55:48,848 INFO [train.py:886] (0/4) Epoch 46, batch 2050, loss[loss=0.01063, audio_tagging_loss=0.01063, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4950896.64 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:56:05,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1443533.3333333333, ans=0.125 2023-12-24 01:56:08,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1443533.3333333333, ans=0.125 2023-12-24 01:56:10,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1443600.0, ans=0.0 2023-12-24 01:56:15,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1443600.0, ans=0.0 2023-12-24 01:56:17,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1443600.0, ans=0.125 2023-12-24 01:56:40,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.00 vs. limit=6.0 2023-12-24 01:56:41,054 INFO [train.py:886] (0/4) Epoch 46, batch 2100, loss[loss=0.008784, audio_tagging_loss=0.008784, over 21570.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4947759.22 frames. ], batch size: 107, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:56:51,713 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2023-12-24 01:57:05,945 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.524e+01 3.905e+01 4.022e+01 4.224e+01 4.545e+01, threshold=8.045e+01, percent-clipped=0.0 2023-12-24 01:57:08,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.43 vs. limit=10.0 2023-12-24 01:57:12,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=22.5 2023-12-24 01:57:24,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1444066.6666666667, ans=0.1 2023-12-24 01:57:32,015 INFO [train.py:886] (0/4) Epoch 46, batch 2150, loss[loss=0.01252, audio_tagging_loss=0.01252, over 25000.00 frames. ], tot_loss[loss=0.01085, audio_tagging_loss=0.01085, over 4958179.09 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:57:48,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1444200.0, ans=0.125 2023-12-24 01:57:51,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1444200.0, ans=0.2 2023-12-24 01:58:08,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1444333.3333333333, ans=0.2 2023-12-24 01:58:24,352 INFO [train.py:886] (0/4) Epoch 46, batch 2200, loss[loss=0.01188, audio_tagging_loss=0.01188, over 24750.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4954746.31 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:58:31,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1444466.6666666667, ans=0.125 2023-12-24 01:58:50,836 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.513e+01 3.958e+01 4.112e+01 4.314e+01 5.314e+01, threshold=8.224e+01, percent-clipped=0.0 2023-12-24 01:58:58,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1444666.6666666667, ans=0.2 2023-12-24 01:59:04,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1444666.6666666667, ans=0.0 2023-12-24 01:59:09,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.05 vs. limit=6.0 2023-12-24 01:59:16,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1444800.0, ans=0.125 2023-12-24 01:59:16,820 INFO [train.py:886] (0/4) Epoch 46, batch 2250, loss[loss=0.009507, audio_tagging_loss=0.009507, over 24750.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4954895.28 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:59:22,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1444800.0, ans=0.125 2023-12-24 01:59:36,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1444933.3333333333, ans=0.2 2023-12-24 02:00:05,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1445066.6666666667, ans=0.1 2023-12-24 02:00:08,438 INFO [train.py:886] (0/4) Epoch 46, batch 2300, loss[loss=0.01172, audio_tagging_loss=0.01172, over 22064.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4951812.87 frames. ], batch size: 107, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:00:15,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1445133.3333333333, ans=0.125 2023-12-24 02:00:34,261 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.386e+01 3.894e+01 4.073e+01 4.228e+01 5.336e+01, threshold=8.145e+01, percent-clipped=0.0 2023-12-24 02:00:43,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1445333.3333333333, ans=0.1 2023-12-24 02:00:55,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1445400.0, ans=0.0 2023-12-24 02:00:55,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2023-12-24 02:01:00,762 INFO [train.py:886] (0/4) Epoch 46, batch 2350, loss[loss=0.01093, audio_tagging_loss=0.01093, over 25000.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4951135.93 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:01:15,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1445533.3333333333, ans=0.125 2023-12-24 02:01:31,699 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.24 vs. limit=15.0 2023-12-24 02:01:35,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1445666.6666666667, ans=0.0 2023-12-24 02:01:36,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1445666.6666666667, ans=0.0 2023-12-24 02:01:40,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1445733.3333333333, ans=0.0 2023-12-24 02:01:51,088 INFO [train.py:886] (0/4) Epoch 46, batch 2400, loss[loss=0.01362, audio_tagging_loss=0.01362, over 25000.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4956086.86 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:02:08,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.22 vs. limit=22.5 2023-12-24 02:02:15,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1445933.3333333333, ans=0.0 2023-12-24 02:02:16,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2023-12-24 02:02:16,977 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.495e+01 3.926e+01 4.069e+01 4.266e+01 5.020e+01, threshold=8.138e+01, percent-clipped=0.0 2023-12-24 02:02:19,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1445933.3333333333, ans=0.125 2023-12-24 02:02:23,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1446000.0, ans=0.1 2023-12-24 02:02:43,282 INFO [train.py:886] (0/4) Epoch 46, batch 2450, loss[loss=0.01153, audio_tagging_loss=0.01153, over 25000.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4960182.89 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:02:45,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.45 vs. limit=22.5 2023-12-24 02:03:15,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1446333.3333333333, ans=0.04949747468305833 2023-12-24 02:03:20,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1446333.3333333333, ans=0.025 2023-12-24 02:03:22,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.63 vs. limit=10.0 2023-12-24 02:03:35,347 INFO [train.py:886] (0/4) Epoch 46, batch 2500, loss[loss=0.01109, audio_tagging_loss=0.01109, over 24750.00 frames. ], tot_loss[loss=0.01113, audio_tagging_loss=0.01113, over 4950713.55 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:03:50,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2023-12-24 02:03:56,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1446600.0, ans=0.125 2023-12-24 02:04:00,415 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.680e+01 3.970e+01 4.120e+01 4.239e+01 5.060e+01, threshold=8.241e+01, percent-clipped=0.0 2023-12-24 02:04:16,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1446733.3333333333, ans=0.0 2023-12-24 02:04:19,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1446733.3333333333, ans=0.1 2023-12-24 02:04:19,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1446733.3333333333, ans=0.0 2023-12-24 02:04:25,311 INFO [train.py:886] (0/4) Epoch 46, batch 2550, loss[loss=0.01082, audio_tagging_loss=0.01082, over 25000.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4943912.42 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:04:34,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1446800.0, ans=0.125 2023-12-24 02:04:46,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1446933.3333333333, ans=0.125 2023-12-24 02:04:55,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1447000.0, ans=0.125 2023-12-24 02:05:07,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2023-12-24 02:05:18,345 INFO [train.py:886] (0/4) Epoch 46, batch 2600, loss[loss=0.01073, audio_tagging_loss=0.01073, over 24750.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4944661.32 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:05:22,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1447133.3333333333, ans=0.125 2023-12-24 02:05:29,947 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:05:36,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-12-24 02:05:44,708 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.567e+01 3.903e+01 4.068e+01 4.253e+01 4.776e+01, threshold=8.137e+01, percent-clipped=0.0 2023-12-24 02:05:46,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2023-12-24 02:05:51,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1447333.3333333333, ans=0.125 2023-12-24 02:06:00,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1447400.0, ans=0.125 2023-12-24 02:06:09,962 INFO [train.py:886] (0/4) Epoch 46, batch 2650, loss[loss=0.01002, audio_tagging_loss=0.01002, over 25000.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4944001.81 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:06:19,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1447466.6666666667, ans=0.0 2023-12-24 02:06:33,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1447600.0, ans=0.025 2023-12-24 02:06:36,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1447600.0, ans=0.0 2023-12-24 02:06:38,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1447600.0, ans=0.125 2023-12-24 02:06:44,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1447666.6666666667, ans=0.2 2023-12-24 02:06:47,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1447666.6666666667, ans=0.2 2023-12-24 02:06:50,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1447666.6666666667, ans=0.125 2023-12-24 02:06:51,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1447733.3333333333, ans=0.2 2023-12-24 02:07:01,523 INFO [train.py:886] (0/4) Epoch 46, batch 2700, loss[loss=0.009833, audio_tagging_loss=0.009833, over 21829.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4942216.60 frames. ], batch size: 107, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:07:06,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1447800.0, ans=0.07 2023-12-24 02:07:10,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2023-12-24 02:07:27,936 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.425e+01 3.925e+01 4.049e+01 4.308e+01 4.721e+01, threshold=8.099e+01, percent-clipped=0.0 2023-12-24 02:07:53,861 INFO [train.py:886] (0/4) Epoch 46, batch 2750, loss[loss=0.008559, audio_tagging_loss=0.008559, over 25000.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4943826.38 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:08:17,739 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.59 vs. limit=15.0 2023-12-24 02:08:18,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1448266.6666666667, ans=0.125 2023-12-24 02:08:38,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1448400.0, ans=0.125 2023-12-24 02:08:41,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1448400.0, ans=0.0 2023-12-24 02:08:43,411 INFO [train.py:886] (0/4) Epoch 46, batch 2800, loss[loss=0.01156, audio_tagging_loss=0.01156, over 24750.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4944388.75 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:08:48,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.06 vs. limit=22.5 2023-12-24 02:09:00,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2023-12-24 02:09:03,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2023-12-24 02:09:06,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1448600.0, ans=0.125 2023-12-24 02:09:09,794 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.523e+01 3.906e+01 4.083e+01 4.345e+01 5.056e+01, threshold=8.167e+01, percent-clipped=0.0 2023-12-24 02:09:11,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.35 vs. limit=15.0 2023-12-24 02:09:31,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1448733.3333333333, ans=0.0 2023-12-24 02:09:36,211 INFO [train.py:886] (0/4) Epoch 46, batch 2850, loss[loss=0.01363, audio_tagging_loss=0.01363, over 24750.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4943185.19 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:09:54,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1448866.6666666667, ans=0.125 2023-12-24 02:10:02,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1448933.3333333333, ans=0.125 2023-12-24 02:10:20,752 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2023-12-24 02:10:28,356 INFO [train.py:886] (0/4) Epoch 46, batch 2900, loss[loss=0.009968, audio_tagging_loss=0.009968, over 24750.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4942701.01 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:10:28,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1449133.3333333333, ans=0.125 2023-12-24 02:10:34,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1449133.3333333333, ans=0.0 2023-12-24 02:10:53,594 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.454e+01 3.893e+01 4.087e+01 4.310e+01 5.363e+01, threshold=8.174e+01, percent-clipped=0.0 2023-12-24 02:10:55,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1449266.6666666667, ans=0.1 2023-12-24 02:11:00,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=1449333.3333333333, ans=0.2 2023-12-24 02:11:11,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1449400.0, ans=0.07 2023-12-24 02:11:13,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1449400.0, ans=0.125 2023-12-24 02:11:19,436 INFO [train.py:886] (0/4) Epoch 46, batch 2950, loss[loss=0.009931, audio_tagging_loss=0.009931, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4936486.37 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:11:20,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1449466.6666666667, ans=0.025 2023-12-24 02:11:28,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.86 vs. limit=15.0 2023-12-24 02:11:36,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2023-12-24 02:11:42,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1449600.0, ans=0.125 2023-12-24 02:12:06,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1449733.3333333333, ans=0.125 2023-12-24 02:12:12,483 INFO [train.py:886] (0/4) Epoch 46, batch 3000, loss[loss=0.009253, audio_tagging_loss=0.009253, over 25000.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4942276.98 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:12:12,484 INFO [train.py:909] (0/4) Computing validation loss 2023-12-24 02:12:30,552 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6781, 2.9676, 3.6767, 3.6948], device='cuda:0') 2023-12-24 02:12:34,124 INFO [train.py:917] (0/4) Epoch 46, validation: loss=0.03679, audio_tagging_loss=0.03679, over 3737520.00 frames. 2023-12-24 02:12:34,125 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-24 02:12:43,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1449866.6666666667, ans=0.125 2023-12-24 02:12:58,522 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.409e+01 3.890e+01 4.114e+01 4.303e+01 5.269e+01, threshold=8.229e+01, percent-clipped=0.0 2023-12-24 02:12:58,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1449933.3333333333, ans=0.125 2023-12-24 02:13:16,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-12-24 02:13:24,999 INFO [train.py:886] (0/4) Epoch 46, batch 3050, loss[loss=0.01078, audio_tagging_loss=0.01078, over 25000.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4942675.30 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:13:29,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1450133.3333333333, ans=0.125 2023-12-24 02:13:33,814 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.13 vs. limit=15.0 2023-12-24 02:13:48,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1450266.6666666667, ans=0.1 2023-12-24 02:13:49,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1450266.6666666667, ans=0.125 2023-12-24 02:13:54,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1450266.6666666667, ans=0.5 2023-12-24 02:14:00,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1450333.3333333333, ans=0.125 2023-12-24 02:14:01,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1450333.3333333333, ans=0.0 2023-12-24 02:14:02,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1450333.3333333333, ans=0.07 2023-12-24 02:14:16,872 INFO [train.py:886] (0/4) Epoch 46, batch 3100, loss[loss=0.00966, audio_tagging_loss=0.00966, over 25000.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4947364.03 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:14:20,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.99 vs. limit=15.0 2023-12-24 02:14:22,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.08 vs. limit=10.0 2023-12-24 02:14:24,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1450466.6666666667, ans=0.125 2023-12-24 02:14:38,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1450600.0, ans=0.125 2023-12-24 02:14:41,854 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.614e+01 3.922e+01 4.127e+01 4.313e+01 5.087e+01, threshold=8.254e+01, percent-clipped=0.0 2023-12-24 02:14:58,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1450733.3333333333, ans=0.125 2023-12-24 02:15:07,064 INFO [train.py:886] (0/4) Epoch 46, batch 3150, loss[loss=0.01216, audio_tagging_loss=0.01216, over 24750.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4948389.11 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:15:12,924 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.24 vs. limit=12.0 2023-12-24 02:15:14,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.67 vs. limit=15.0 2023-12-24 02:15:21,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2023-12-24 02:15:27,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.12 vs. limit=15.0 2023-12-24 02:15:32,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-12-24 02:15:49,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1451066.6666666667, ans=0.125 2023-12-24 02:15:50,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2023-12-24 02:15:58,397 INFO [train.py:886] (0/4) Epoch 46, batch 3200, loss[loss=0.009022, audio_tagging_loss=0.009022, over 24750.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4942367.76 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:16:00,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1451133.3333333333, ans=0.125 2023-12-24 02:16:14,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1451200.0, ans=0.2 2023-12-24 02:16:24,222 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.595e+01 3.931e+01 4.109e+01 4.308e+01 5.073e+01, threshold=8.218e+01, percent-clipped=0.0 2023-12-24 02:16:35,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1451333.3333333333, ans=0.1 2023-12-24 02:16:36,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1451333.3333333333, ans=0.04949747468305833 2023-12-24 02:16:48,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1451400.0, ans=0.125 2023-12-24 02:16:50,739 INFO [train.py:886] (0/4) Epoch 46, batch 3250, loss[loss=0.0107, audio_tagging_loss=0.0107, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4942116.99 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:16:58,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1451466.6666666667, ans=0.035 2023-12-24 02:16:58,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1451466.6666666667, ans=0.125 2023-12-24 02:16:58,843 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.47 vs. limit=10.0 2023-12-24 02:17:01,624 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.35 vs. limit=10.0 2023-12-24 02:17:07,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1451533.3333333333, ans=0.125 2023-12-24 02:17:07,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1451533.3333333333, ans=0.125 2023-12-24 02:17:32,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1451733.3333333333, ans=0.125 2023-12-24 02:17:34,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1451733.3333333333, ans=0.125 2023-12-24 02:17:41,154 INFO [train.py:886] (0/4) Epoch 46, batch 3300, loss[loss=0.0125, audio_tagging_loss=0.0125, over 25000.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4942249.02 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:17:56,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1451866.6666666667, ans=0.1 2023-12-24 02:17:58,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1451866.6666666667, ans=0.0 2023-12-24 02:18:07,484 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.526e+01 3.879e+01 4.032e+01 4.165e+01 5.063e+01, threshold=8.064e+01, percent-clipped=0.0 2023-12-24 02:18:33,747 INFO [train.py:886] (0/4) Epoch 46, batch 3350, loss[loss=0.01006, audio_tagging_loss=0.01006, over 25000.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4942656.53 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:18:51,744 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2023-12-24 02:18:54,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1452266.6666666667, ans=0.125 2023-12-24 02:18:55,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1452266.6666666667, ans=0.125 2023-12-24 02:18:58,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1452266.6666666667, ans=0.125 2023-12-24 02:19:04,478 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2023-12-24 02:19:08,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1452333.3333333333, ans=0.035 2023-12-24 02:19:24,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1452466.6666666667, ans=0.0 2023-12-24 02:19:25,264 INFO [train.py:886] (0/4) Epoch 46, batch 3400, loss[loss=0.01182, audio_tagging_loss=0.01182, over 24750.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4944393.62 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:19:31,025 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0 2023-12-24 02:19:41,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1452533.3333333333, ans=0.05 2023-12-24 02:19:47,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1452600.0, ans=0.04949747468305833 2023-12-24 02:19:48,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1452600.0, ans=0.0 2023-12-24 02:19:51,845 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.591e+01 3.965e+01 4.111e+01 4.276e+01 8.253e+01, threshold=8.223e+01, percent-clipped=1.0 2023-12-24 02:19:57,869 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2023-12-24 02:19:59,341 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.41 vs. limit=15.0 2023-12-24 02:20:01,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1452666.6666666667, ans=0.0 2023-12-24 02:20:10,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1452733.3333333333, ans=0.0 2023-12-24 02:20:17,521 INFO [train.py:886] (0/4) Epoch 46, batch 3450, loss[loss=0.0107, audio_tagging_loss=0.0107, over 24750.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4946577.80 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:20:49,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1453000.0, ans=0.2 2023-12-24 02:20:56,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1453000.0, ans=0.0 2023-12-24 02:21:09,814 INFO [train.py:886] (0/4) Epoch 46, batch 3500, loss[loss=0.01166, audio_tagging_loss=0.01166, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4945400.29 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:21:09,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1453133.3333333333, ans=0.2 2023-12-24 02:21:15,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1453133.3333333333, ans=0.125 2023-12-24 02:21:22,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1453200.0, ans=0.125 2023-12-24 02:21:23,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1453200.0, ans=0.0 2023-12-24 02:21:24,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1453200.0, ans=0.125 2023-12-24 02:21:28,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1453266.6666666667, ans=0.125 2023-12-24 02:21:37,136 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.445e+01 3.883e+01 4.040e+01 4.247e+01 5.009e+01, threshold=8.081e+01, percent-clipped=0.0 2023-12-24 02:21:42,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1453333.3333333333, ans=0.0 2023-12-24 02:21:45,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1453333.3333333333, ans=0.0 2023-12-24 02:21:49,143 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2023-12-24 02:22:01,499 INFO [train.py:886] (0/4) Epoch 46, batch 3550, loss[loss=0.009889, audio_tagging_loss=0.009889, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4949073.00 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:22:11,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1453533.3333333333, ans=0.2 2023-12-24 02:22:14,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1453533.3333333333, ans=0.125 2023-12-24 02:22:32,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1453666.6666666667, ans=0.2 2023-12-24 02:22:53,309 INFO [train.py:886] (0/4) Epoch 46, batch 3600, loss[loss=0.01048, audio_tagging_loss=0.01048, over 25000.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4944833.86 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:22:55,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2023-12-24 02:23:06,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1453866.6666666667, ans=0.125 2023-12-24 02:23:11,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1453866.6666666667, ans=0.0 2023-12-24 02:23:16,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1453933.3333333333, ans=0.1 2023-12-24 02:23:16,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1453933.3333333333, ans=0.2 2023-12-24 02:23:17,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1453933.3333333333, ans=0.0 2023-12-24 02:23:18,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1453933.3333333333, ans=0.025 2023-12-24 02:23:20,530 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.432e+01 3.928e+01 4.098e+01 4.249e+01 6.702e+01, threshold=8.195e+01, percent-clipped=0.0 2023-12-24 02:23:28,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1454000.0, ans=0.125 2023-12-24 02:23:44,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1454133.3333333333, ans=0.09899494936611666 2023-12-24 02:23:46,088 INFO [train.py:886] (0/4) Epoch 46, batch 3650, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4945562.91 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:23:50,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1454133.3333333333, ans=0.2 2023-12-24 02:23:55,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.08 vs. limit=10.0 2023-12-24 02:24:16,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1454333.3333333333, ans=0.2 2023-12-24 02:24:27,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1454400.0, ans=0.125 2023-12-24 02:24:31,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1454400.0, ans=0.125 2023-12-24 02:24:32,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1454400.0, ans=0.0 2023-12-24 02:24:36,118 INFO [train.py:886] (0/4) Epoch 46, batch 3700, loss[loss=0.00842, audio_tagging_loss=0.00842, over 25000.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4954484.81 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:24:36,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1454466.6666666667, ans=0.0 2023-12-24 02:24:57,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1454600.0, ans=0.0 2023-12-24 02:24:57,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1454600.0, ans=0.125 2023-12-24 02:24:59,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1454600.0, ans=0.125 2023-12-24 02:25:00,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1454600.0, ans=0.035 2023-12-24 02:25:03,711 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.626e+01 3.918e+01 4.062e+01 4.194e+01 4.815e+01, threshold=8.124e+01, percent-clipped=0.0 2023-12-24 02:25:03,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1454600.0, ans=0.0 2023-12-24 02:25:13,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.38 vs. limit=22.5 2023-12-24 02:25:24,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1454733.3333333333, ans=0.025 2023-12-24 02:25:29,635 INFO [train.py:886] (0/4) Epoch 46, batch 3750, loss[loss=0.01009, audio_tagging_loss=0.01009, over 24750.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4951499.71 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:25:31,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2023-12-24 02:25:44,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1454866.6666666667, ans=0.125 2023-12-24 02:25:46,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=15.0 2023-12-24 02:26:07,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1455000.0, ans=0.2 2023-12-24 02:26:15,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1455066.6666666667, ans=0.0 2023-12-24 02:26:20,503 INFO [train.py:886] (0/4) Epoch 46, batch 3800, loss[loss=0.01165, audio_tagging_loss=0.01165, over 24750.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4951113.07 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:26:35,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1455200.0, ans=0.0 2023-12-24 02:26:37,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1455200.0, ans=0.1 2023-12-24 02:26:46,644 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.525e+01 3.960e+01 4.079e+01 4.273e+01 4.996e+01, threshold=8.158e+01, percent-clipped=0.0 2023-12-24 02:26:46,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1455266.6666666667, ans=0.125 2023-12-24 02:26:47,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1455266.6666666667, ans=0.125 2023-12-24 02:26:51,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1455333.3333333333, ans=0.125 2023-12-24 02:26:51,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.16 vs. limit=22.5 2023-12-24 02:27:11,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.91 vs. limit=22.5 2023-12-24 02:27:11,912 INFO [train.py:886] (0/4) Epoch 46, batch 3850, loss[loss=0.009951, audio_tagging_loss=0.009951, over 25000.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4943043.24 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:27:40,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1455600.0, ans=0.0 2023-12-24 02:27:43,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1455666.6666666667, ans=0.125 2023-12-24 02:28:03,995 INFO [train.py:886] (0/4) Epoch 46, batch 3900, loss[loss=0.01245, audio_tagging_loss=0.01245, over 23162.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4947103.59 frames. ], batch size: 107, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:28:12,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1455866.6666666667, ans=0.125 2023-12-24 02:28:22,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1455933.3333333333, ans=0.0 2023-12-24 02:28:30,027 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.541e+01 3.894e+01 4.050e+01 4.339e+01 5.039e+01, threshold=8.100e+01, percent-clipped=0.0 2023-12-24 02:28:37,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1456000.0, ans=0.125 2023-12-24 02:28:42,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.48 vs. limit=15.0 2023-12-24 02:28:54,394 INFO [train.py:886] (0/4) Epoch 46, batch 3950, loss[loss=0.009227, audio_tagging_loss=0.009227, over 25000.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4955008.79 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:28:58,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1456133.3333333333, ans=0.125 2023-12-24 02:29:01,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1456133.3333333333, ans=0.07 2023-12-24 02:29:04,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=22.5 2023-12-24 02:29:07,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1456200.0, ans=0.125 2023-12-24 02:29:10,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1456200.0, ans=0.0 2023-12-24 02:29:22,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=6.18 vs. limit=15.0 2023-12-24 02:29:23,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=1456266.6666666667, ans=0.02 2023-12-24 02:29:25,764 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.43 vs. limit=15.0 2023-12-24 02:29:28,532 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:29:31,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1456333.3333333333, ans=0.125 2023-12-24 02:29:46,380 INFO [train.py:886] (0/4) Epoch 46, batch 4000, loss[loss=0.008782, audio_tagging_loss=0.008782, over 24750.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4952835.70 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:29:46,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1456466.6666666667, ans=0.0 2023-12-24 02:29:47,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1456466.6666666667, ans=0.125 2023-12-24 02:29:47,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1456466.6666666667, ans=0.1 2023-12-24 02:30:13,629 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.650e+01 3.977e+01 4.098e+01 4.271e+01 5.184e+01, threshold=8.196e+01, percent-clipped=0.0 2023-12-24 02:30:37,711 INFO [train.py:886] (0/4) Epoch 46, batch 4050, loss[loss=0.01118, audio_tagging_loss=0.01118, over 22227.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4955463.59 frames. ], batch size: 107, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:30:52,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1456866.6666666667, ans=0.125 2023-12-24 02:30:53,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1456866.6666666667, ans=0.125 2023-12-24 02:30:53,391 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:31:08,400 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.13 vs. limit=15.0 2023-12-24 02:31:19,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=15.0 2023-12-24 02:31:28,344 INFO [train.py:886] (0/4) Epoch 46, batch 4100, loss[loss=0.01115, audio_tagging_loss=0.01115, over 24750.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4952816.84 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:31:33,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1457133.3333333333, ans=0.125 2023-12-24 02:31:36,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1457133.3333333333, ans=15.0 2023-12-24 02:31:42,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1457200.0, ans=0.125 2023-12-24 02:31:47,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2023-12-24 02:31:50,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1457266.6666666667, ans=0.125 2023-12-24 02:31:53,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1457266.6666666667, ans=0.125 2023-12-24 02:31:55,136 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.551e+01 3.971e+01 4.093e+01 4.225e+01 5.395e+01, threshold=8.186e+01, percent-clipped=0.0 2023-12-24 02:32:12,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1457400.0, ans=0.5 2023-12-24 02:32:19,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2023-12-24 02:32:20,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1457466.6666666667, ans=0.0 2023-12-24 02:32:21,003 INFO [train.py:886] (0/4) Epoch 46, batch 4150, loss[loss=0.00914, audio_tagging_loss=0.00914, over 25000.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4951660.02 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:33:05,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1457733.3333333333, ans=0.125 2023-12-24 02:33:10,865 INFO [train.py:886] (0/4) Epoch 46, batch 4200, loss[loss=0.00977, audio_tagging_loss=0.00977, over 25000.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4952523.47 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:33:15,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1457800.0, ans=0.125 2023-12-24 02:33:27,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1457866.6666666667, ans=0.05 2023-12-24 02:33:38,197 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.408e+01 3.862e+01 4.047e+01 4.185e+01 5.649e+01, threshold=8.095e+01, percent-clipped=0.0 2023-12-24 02:33:54,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1458066.6666666667, ans=0.1 2023-12-24 02:33:54,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2023-12-24 02:33:54,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1458066.6666666667, ans=0.125 2023-12-24 02:34:03,961 INFO [train.py:886] (0/4) Epoch 46, batch 4250, loss[loss=0.01073, audio_tagging_loss=0.01073, over 25000.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4949272.60 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:34:07,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1458133.3333333333, ans=0.125 2023-12-24 02:34:18,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.79 vs. limit=22.5 2023-12-24 02:34:49,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1458400.0, ans=0.1 2023-12-24 02:34:55,812 INFO [train.py:886] (0/4) Epoch 46, batch 4300, loss[loss=0.01081, audio_tagging_loss=0.01081, over 25000.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4954683.32 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:35:01,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1458466.6666666667, ans=0.1 2023-12-24 02:35:19,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1458600.0, ans=0.0 2023-12-24 02:35:21,228 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.535e+01 3.901e+01 4.132e+01 4.342e+01 5.346e+01, threshold=8.265e+01, percent-clipped=0.0 2023-12-24 02:35:36,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1458733.3333333333, ans=0.1 2023-12-24 02:35:44,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.99 vs. limit=22.5 2023-12-24 02:35:46,805 INFO [train.py:886] (0/4) Epoch 46, batch 4350, loss[loss=0.01036, audio_tagging_loss=0.01036, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4957663.93 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:35:49,896 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:35:51,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1458800.0, ans=0.0 2023-12-24 02:35:51,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1458800.0, ans=0.0 2023-12-24 02:36:09,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1458933.3333333333, ans=0.125 2023-12-24 02:36:39,151 INFO [train.py:886] (0/4) Epoch 46, batch 4400, loss[loss=0.006955, audio_tagging_loss=0.006955, over 24051.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4952358.48 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:36:39,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1459133.3333333333, ans=0.125 2023-12-24 02:36:42,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1459133.3333333333, ans=0.1 2023-12-24 02:36:49,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1459200.0, ans=0.125 2023-12-24 02:36:52,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1459200.0, ans=0.2 2023-12-24 02:36:52,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.83 vs. limit=22.5 2023-12-24 02:37:03,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1459266.6666666667, ans=0.125 2023-12-24 02:37:06,115 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.527e+01 3.966e+01 4.108e+01 4.313e+01 4.794e+01, threshold=8.216e+01, percent-clipped=0.0 2023-12-24 02:37:21,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2023-12-24 02:37:23,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1459400.0, ans=0.125 2023-12-24 02:37:30,589 INFO [train.py:886] (0/4) Epoch 46, batch 4450, loss[loss=0.0119, audio_tagging_loss=0.0119, over 24750.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4948534.68 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:37:33,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1459466.6666666667, ans=0.05 2023-12-24 02:38:00,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=12.0 2023-12-24 02:38:02,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1459666.6666666667, ans=0.125 2023-12-24 02:38:02,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1459666.6666666667, ans=0.0 2023-12-24 02:38:08,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.85 vs. limit=15.0 2023-12-24 02:38:09,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1459666.6666666667, ans=0.2 2023-12-24 02:38:22,184 INFO [train.py:886] (0/4) Epoch 46, batch 4500, loss[loss=0.01047, audio_tagging_loss=0.01047, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4946135.86 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:38:25,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1459800.0, ans=0.125 2023-12-24 02:38:32,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2023-12-24 02:38:49,733 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.485e+01 3.900e+01 4.113e+01 4.259e+01 4.782e+01, threshold=8.226e+01, percent-clipped=0.0 2023-12-24 02:38:49,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1459933.3333333333, ans=0.1 2023-12-24 02:38:52,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.91 vs. limit=10.0 2023-12-24 02:39:10,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1460066.6666666667, ans=0.2 2023-12-24 02:39:14,694 INFO [train.py:886] (0/4) Epoch 46, batch 4550, loss[loss=0.0103, audio_tagging_loss=0.0103, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4948831.72 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 32.0 2023-12-24 02:39:20,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1460133.3333333333, ans=0.125 2023-12-24 02:39:30,667 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:39:52,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1460333.3333333333, ans=0.125 2023-12-24 02:40:05,487 INFO [train.py:886] (0/4) Epoch 46, batch 4600, loss[loss=0.01163, audio_tagging_loss=0.01163, over 25000.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4950842.54 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 32.0 2023-12-24 02:40:06,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1460466.6666666667, ans=0.0 2023-12-24 02:40:15,642 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2023-12-24 02:40:31,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1460600.0, ans=0.1 2023-12-24 02:40:33,136 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.440e+01 3.978e+01 4.125e+01 4.323e+01 5.544e+01, threshold=8.249e+01, percent-clipped=0.0 2023-12-24 02:40:35,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1460600.0, ans=0.0 2023-12-24 02:40:46,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1460733.3333333333, ans=0.0 2023-12-24 02:40:57,027 INFO [train.py:886] (0/4) Epoch 46, batch 4650, loss[loss=0.009674, audio_tagging_loss=0.009674, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4948986.00 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 32.0 2023-12-24 02:41:04,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1460800.0, ans=0.125 2023-12-24 02:41:17,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1460933.3333333333, ans=0.0 2023-12-24 02:41:25,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-12-24 02:41:36,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1461066.6666666667, ans=0.125 2023-12-24 02:41:44,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1461066.6666666667, ans=0.125 2023-12-24 02:41:45,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.69 vs. limit=15.0 2023-12-24 02:41:46,465 INFO [train.py:886] (0/4) Epoch 46, batch 4700, loss[loss=0.01099, audio_tagging_loss=0.01099, over 24750.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4945820.54 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 32.0 2023-12-24 02:41:47,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1461133.3333333333, ans=0.125 2023-12-24 02:42:03,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1461200.0, ans=0.125 2023-12-24 02:42:12,849 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.719e+01 3.992e+01 4.134e+01 4.373e+01 5.124e+01, threshold=8.269e+01, percent-clipped=0.0 2023-12-24 02:42:13,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1461266.6666666667, ans=0.0 2023-12-24 02:42:13,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1461266.6666666667, ans=0.125 2023-12-24 02:42:14,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1461266.6666666667, ans=0.0 2023-12-24 02:42:17,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1461333.3333333333, ans=0.0 2023-12-24 02:42:34,326 INFO [train.py:886] (0/4) Epoch 46, batch 4750, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4945568.88 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 32.0 2023-12-24 02:42:34,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1461466.6666666667, ans=0.125 2023-12-24 02:42:39,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1461466.6666666667, ans=0.125 2023-12-24 02:42:45,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.04 vs. limit=22.5 2023-12-24 02:42:49,456 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-46.pt 2023-12-24 02:43:10,059 INFO [train.py:886] (0/4) Epoch 47, batch 0, loss[loss=0.03276, audio_tagging_loss=0.03276, over 20715.00 frames. ], tot_loss[loss=0.03276, audio_tagging_loss=0.03276, over 20715.00 frames. ], batch size: 107, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:43:10,060 INFO [train.py:909] (0/4) Computing validation loss 2023-12-24 02:43:20,500 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6488, 4.0129, 4.1865, 3.9210], device='cuda:0') 2023-12-24 02:43:30,567 INFO [train.py:917] (0/4) Epoch 47, validation: loss=0.0358, audio_tagging_loss=0.0358, over 3737520.00 frames. 2023-12-24 02:43:30,568 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-24 02:43:32,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1461573.3333333333, ans=0.035 2023-12-24 02:43:39,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1461573.3333333333, ans=0.07 2023-12-24 02:43:40,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1461640.0, ans=0.0 2023-12-24 02:43:50,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1461706.6666666667, ans=0.2 2023-12-24 02:44:05,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1461773.3333333333, ans=0.125 2023-12-24 02:44:05,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1461773.3333333333, ans=0.0 2023-12-24 02:44:13,055 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.78 vs. limit=15.0 2023-12-24 02:44:22,405 INFO [train.py:886] (0/4) Epoch 47, batch 50, loss[loss=0.01584, audio_tagging_loss=0.01584, over 24872.00 frames. ], tot_loss[loss=0.01761, audio_tagging_loss=0.01761, over 1113631.49 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:44:34,687 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.488e+01 4.203e+01 4.907e+01 5.637e+01 1.199e+02, threshold=9.813e+01, percent-clipped=7.0 2023-12-24 02:44:39,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1461973.3333333333, ans=0.125 2023-12-24 02:44:47,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1462040.0, ans=0.125 2023-12-24 02:44:52,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1462106.6666666667, ans=10.0 2023-12-24 02:44:55,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1462106.6666666667, ans=0.125 2023-12-24 02:44:56,836 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:44:56,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1462106.6666666667, ans=0.2 2023-12-24 02:45:01,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1462106.6666666667, ans=0.2 2023-12-24 02:45:07,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1462173.3333333333, ans=0.04949747468305833 2023-12-24 02:45:13,767 INFO [train.py:886] (0/4) Epoch 47, batch 100, loss[loss=0.01147, audio_tagging_loss=0.01147, over 25000.00 frames. ], tot_loss[loss=0.01499, audio_tagging_loss=0.01499, over 1967200.29 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:45:42,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1462373.3333333333, ans=0.125 2023-12-24 02:45:43,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1462373.3333333333, ans=0.0 2023-12-24 02:45:58,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1462506.6666666667, ans=0.1 2023-12-24 02:46:04,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1462506.6666666667, ans=0.1 2023-12-24 02:46:05,874 INFO [train.py:886] (0/4) Epoch 47, batch 150, loss[loss=0.008725, audio_tagging_loss=0.008725, over 24059.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 2626971.85 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:46:06,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1462573.3333333333, ans=0.125 2023-12-24 02:46:06,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1462573.3333333333, ans=0.1 2023-12-24 02:46:14,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1462640.0, ans=0.1 2023-12-24 02:46:17,191 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.815e+01 4.110e+01 4.292e+01 4.596e+01 5.407e+01, threshold=8.583e+01, percent-clipped=0.0 2023-12-24 02:46:36,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1462773.3333333333, ans=0.125 2023-12-24 02:46:37,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1462773.3333333333, ans=0.125 2023-12-24 02:46:40,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1462773.3333333333, ans=0.1 2023-12-24 02:46:58,182 INFO [train.py:886] (0/4) Epoch 47, batch 200, loss[loss=0.01042, audio_tagging_loss=0.01042, over 24750.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 3138329.01 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:47:13,458 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2023-12-24 02:47:17,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1463040.0, ans=0.125 2023-12-24 02:47:18,271 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.56 vs. limit=15.0 2023-12-24 02:47:20,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1463040.0, ans=0.125 2023-12-24 02:47:36,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1463106.6666666667, ans=0.1 2023-12-24 02:47:42,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2023-12-24 02:47:49,235 INFO [train.py:886] (0/4) Epoch 47, batch 250, loss[loss=0.008257, audio_tagging_loss=0.008257, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 3543508.29 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:47:55,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2023-12-24 02:47:56,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1463240.0, ans=0.1 2023-12-24 02:48:01,227 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.589e+01 3.929e+01 4.138e+01 4.313e+01 4.926e+01, threshold=8.277e+01, percent-clipped=0.0 2023-12-24 02:48:09,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-12-24 02:48:11,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1463373.3333333333, ans=0.0 2023-12-24 02:48:18,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1463440.0, ans=0.1 2023-12-24 02:48:18,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1463440.0, ans=0.125 2023-12-24 02:48:23,538 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:48:24,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2023-12-24 02:48:25,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=12.0 2023-12-24 02:48:35,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1463506.6666666667, ans=0.125 2023-12-24 02:48:40,502 INFO [train.py:886] (0/4) Epoch 47, batch 300, loss[loss=0.01239, audio_tagging_loss=0.01239, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 3854920.53 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:48:44,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1463573.3333333333, ans=0.125 2023-12-24 02:48:58,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.92 vs. limit=15.0 2023-12-24 02:49:11,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.55 vs. limit=22.5 2023-12-24 02:49:23,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1463840.0, ans=0.0 2023-12-24 02:49:29,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1463840.0, ans=0.5 2023-12-24 02:49:29,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2023-12-24 02:49:31,966 INFO [train.py:886] (0/4) Epoch 47, batch 350, loss[loss=0.01253, audio_tagging_loss=0.01253, over 24750.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4094288.94 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:49:37,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1463906.6666666667, ans=0.04949747468305833 2023-12-24 02:49:44,718 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.473e+01 3.948e+01 4.136e+01 4.344e+01 5.181e+01, threshold=8.273e+01, percent-clipped=0.0 2023-12-24 02:50:01,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1464106.6666666667, ans=0.125 2023-12-24 02:50:19,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1464173.3333333333, ans=0.125 2023-12-24 02:50:20,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1464173.3333333333, ans=0.1 2023-12-24 02:50:24,241 INFO [train.py:886] (0/4) Epoch 47, batch 400, loss[loss=0.008292, audio_tagging_loss=0.008292, over 24750.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4288974.42 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:50:26,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1464240.0, ans=0.0 2023-12-24 02:50:39,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1464306.6666666667, ans=0.125 2023-12-24 02:50:47,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1464373.3333333333, ans=0.125 2023-12-24 02:50:55,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1464440.0, ans=0.125 2023-12-24 02:51:00,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1464440.0, ans=0.0 2023-12-24 02:51:00,764 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.05 vs. limit=6.0 2023-12-24 02:51:06,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.33 vs. limit=22.5 2023-12-24 02:51:09,416 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.29 vs. limit=15.0 2023-12-24 02:51:16,127 INFO [train.py:886] (0/4) Epoch 47, batch 450, loss[loss=0.009859, audio_tagging_loss=0.009859, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4434499.57 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:51:17,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1464573.3333333333, ans=0.125 2023-12-24 02:51:28,924 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.364e+01 3.902e+01 4.040e+01 4.251e+01 5.082e+01, threshold=8.080e+01, percent-clipped=0.0 2023-12-24 02:51:34,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1464640.0, ans=0.125 2023-12-24 02:51:42,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1464706.6666666667, ans=0.1 2023-12-24 02:51:50,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1464773.3333333333, ans=0.125 2023-12-24 02:51:55,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1464773.3333333333, ans=0.125 2023-12-24 02:51:59,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1464840.0, ans=0.125 2023-12-24 02:52:07,975 INFO [train.py:886] (0/4) Epoch 47, batch 500, loss[loss=0.01262, audio_tagging_loss=0.01262, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4550004.78 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:52:13,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2023-12-24 02:52:14,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1464906.6666666667, ans=0.1 2023-12-24 02:52:20,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2023-12-24 02:52:22,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.45 vs. limit=15.0 2023-12-24 02:52:26,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.53 vs. limit=10.0 2023-12-24 02:52:29,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1465040.0, ans=0.2 2023-12-24 02:52:33,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1465040.0, ans=0.0 2023-12-24 02:52:38,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1465106.6666666667, ans=0.0 2023-12-24 02:52:41,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1465106.6666666667, ans=0.0 2023-12-24 02:52:41,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1465106.6666666667, ans=0.0 2023-12-24 02:52:52,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1465173.3333333333, ans=0.2 2023-12-24 02:52:55,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.37 vs. limit=10.0 2023-12-24 02:53:00,332 INFO [train.py:886] (0/4) Epoch 47, batch 550, loss[loss=0.01102, audio_tagging_loss=0.01102, over 24750.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4644332.25 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:53:04,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1465240.0, ans=0.125 2023-12-24 02:53:04,452 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2023-12-24 02:53:09,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2023-12-24 02:53:12,414 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.441e+01 3.969e+01 4.099e+01 4.262e+01 5.027e+01, threshold=8.197e+01, percent-clipped=0.0 2023-12-24 02:53:14,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1465306.6666666667, ans=0.125 2023-12-24 02:53:45,026 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2023-12-24 02:53:51,816 INFO [train.py:886] (0/4) Epoch 47, batch 600, loss[loss=0.01065, audio_tagging_loss=0.01065, over 24750.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4710376.83 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:53:52,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1465573.3333333333, ans=0.125 2023-12-24 02:53:54,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1465573.3333333333, ans=0.2 2023-12-24 02:54:26,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=12.0 2023-12-24 02:54:38,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1465840.0, ans=10.0 2023-12-24 02:54:41,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1465840.0, ans=0.0 2023-12-24 02:54:43,458 INFO [train.py:886] (0/4) Epoch 47, batch 650, loss[loss=0.01027, audio_tagging_loss=0.01027, over 25000.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4759930.76 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:54:48,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1465906.6666666667, ans=15.0 2023-12-24 02:54:51,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.31 vs. limit=15.0 2023-12-24 02:54:53,661 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:54:55,401 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.474e+01 3.862e+01 4.034e+01 4.310e+01 5.761e+01, threshold=8.068e+01, percent-clipped=0.0 2023-12-24 02:55:07,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1466040.0, ans=0.2 2023-12-24 02:55:12,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1466040.0, ans=0.125 2023-12-24 02:55:16,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.74 vs. limit=22.5 2023-12-24 02:55:17,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1466106.6666666667, ans=0.0 2023-12-24 02:55:23,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1466173.3333333333, ans=0.1 2023-12-24 02:55:30,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2023-12-24 02:55:34,945 INFO [train.py:886] (0/4) Epoch 47, batch 700, loss[loss=0.01214, audio_tagging_loss=0.01214, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4803483.84 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:55:46,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1466306.6666666667, ans=0.2 2023-12-24 02:55:49,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1466306.6666666667, ans=0.1 2023-12-24 02:55:50,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1466306.6666666667, ans=0.1 2023-12-24 02:55:54,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1466373.3333333333, ans=0.125 2023-12-24 02:55:58,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1466373.3333333333, ans=0.05 2023-12-24 02:56:07,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.19 vs. limit=15.0 2023-12-24 02:56:11,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1466440.0, ans=0.0 2023-12-24 02:56:26,189 INFO [train.py:886] (0/4) Epoch 47, batch 750, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4830439.13 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:56:27,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2023-12-24 02:56:38,898 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.551e+01 3.880e+01 4.112e+01 4.307e+01 5.752e+01, threshold=8.224e+01, percent-clipped=0.0 2023-12-24 02:56:40,049 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-220000.pt 2023-12-24 02:56:50,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1466706.6666666667, ans=0.125 2023-12-24 02:56:53,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1466706.6666666667, ans=0.0 2023-12-24 02:57:17,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1466840.0, ans=0.125 2023-12-24 02:57:20,477 INFO [train.py:886] (0/4) Epoch 47, batch 800, loss[loss=0.01075, audio_tagging_loss=0.01075, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4860636.46 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:57:28,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.82 vs. limit=15.0 2023-12-24 02:57:37,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1466973.3333333333, ans=0.0 2023-12-24 02:58:12,282 INFO [train.py:886] (0/4) Epoch 47, batch 850, loss[loss=0.009964, audio_tagging_loss=0.009964, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4885041.23 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:58:25,024 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.509e+01 3.885e+01 4.046e+01 4.247e+01 5.015e+01, threshold=8.092e+01, percent-clipped=0.0 2023-12-24 02:58:27,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1467306.6666666667, ans=0.125 2023-12-24 02:58:41,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1467373.3333333333, ans=0.0 2023-12-24 02:58:41,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1467373.3333333333, ans=0.0 2023-12-24 02:59:04,297 INFO [train.py:886] (0/4) Epoch 47, batch 900, loss[loss=0.01316, audio_tagging_loss=0.01316, over 24948.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4902968.13 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:59:24,438 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.16 vs. limit=22.5 2023-12-24 02:59:29,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1467706.6666666667, ans=0.0 2023-12-24 02:59:34,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2023-12-24 02:59:40,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1467773.3333333333, ans=0.125 2023-12-24 02:59:43,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1467773.3333333333, ans=0.125 2023-12-24 02:59:45,235 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=15.0 2023-12-24 02:59:56,681 INFO [train.py:886] (0/4) Epoch 47, batch 950, loss[loss=0.01085, audio_tagging_loss=0.01085, over 24750.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4909181.92 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:59:57,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1467906.6666666667, ans=0.0 2023-12-24 03:00:08,634 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.658e+01 3.944e+01 4.162e+01 4.322e+01 5.155e+01, threshold=8.324e+01, percent-clipped=0.0 2023-12-24 03:00:09,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1467973.3333333333, ans=10.0 2023-12-24 03:00:12,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=15.0 2023-12-24 03:00:48,896 INFO [train.py:886] (0/4) Epoch 47, batch 1000, loss[loss=0.009799, audio_tagging_loss=0.009799, over 24750.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4910402.72 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:00:53,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1468240.0, ans=0.125 2023-12-24 03:01:38,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1468506.6666666667, ans=0.125 2023-12-24 03:01:40,698 INFO [train.py:886] (0/4) Epoch 47, batch 1050, loss[loss=0.01094, audio_tagging_loss=0.01094, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4920935.24 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:01:51,984 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:01:53,565 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.568e+01 3.919e+01 4.119e+01 4.342e+01 4.813e+01, threshold=8.238e+01, percent-clipped=0.0 2023-12-24 03:02:14,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1468773.3333333333, ans=0.1 2023-12-24 03:02:14,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1468773.3333333333, ans=0.125 2023-12-24 03:02:32,944 INFO [train.py:886] (0/4) Epoch 47, batch 1100, loss[loss=0.01001, audio_tagging_loss=0.01001, over 25000.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4931365.64 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:02:36,390 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-12-24 03:02:40,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1468906.6666666667, ans=0.125 2023-12-24 03:02:40,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1468906.6666666667, ans=0.125 2023-12-24 03:02:46,251 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:03:07,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1469106.6666666667, ans=0.0 2023-12-24 03:03:20,676 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.71 vs. limit=15.0 2023-12-24 03:03:24,545 INFO [train.py:886] (0/4) Epoch 47, batch 1150, loss[loss=0.0104, audio_tagging_loss=0.0104, over 25000.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4937818.99 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:03:24,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1469240.0, ans=0.125 2023-12-24 03:03:30,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1469240.0, ans=0.125 2023-12-24 03:03:37,298 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.450e+01 3.892e+01 4.065e+01 4.219e+01 4.911e+01, threshold=8.130e+01, percent-clipped=0.0 2023-12-24 03:03:39,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1469306.6666666667, ans=0.125 2023-12-24 03:03:44,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1469306.6666666667, ans=15.0 2023-12-24 03:03:45,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1469373.3333333333, ans=0.125 2023-12-24 03:03:50,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1469373.3333333333, ans=0.2 2023-12-24 03:04:17,407 INFO [train.py:886] (0/4) Epoch 47, batch 1200, loss[loss=0.01261, audio_tagging_loss=0.01261, over 25000.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4943559.04 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:04:19,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1469573.3333333333, ans=0.125 2023-12-24 03:04:25,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1469573.3333333333, ans=0.0 2023-12-24 03:04:31,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1469640.0, ans=0.0 2023-12-24 03:04:32,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1469640.0, ans=0.125 2023-12-24 03:04:43,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1469706.6666666667, ans=0.0 2023-12-24 03:04:49,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1469773.3333333333, ans=0.125 2023-12-24 03:04:50,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1469773.3333333333, ans=0.125 2023-12-24 03:04:51,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1469773.3333333333, ans=0.125 2023-12-24 03:04:51,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2023-12-24 03:05:07,735 INFO [train.py:886] (0/4) Epoch 47, batch 1250, loss[loss=0.01199, audio_tagging_loss=0.01199, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4943706.10 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:05:08,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1469906.6666666667, ans=0.125 2023-12-24 03:05:11,620 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:05:14,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1469906.6666666667, ans=0.0 2023-12-24 03:05:20,725 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.584e+01 3.991e+01 4.155e+01 4.310e+01 5.132e+01, threshold=8.310e+01, percent-clipped=0.0 2023-12-24 03:05:23,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1469973.3333333333, ans=0.125 2023-12-24 03:05:39,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1470106.6666666667, ans=0.125 2023-12-24 03:05:42,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.28 vs. limit=10.0 2023-12-24 03:05:49,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1470173.3333333333, ans=0.125 2023-12-24 03:05:55,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1470173.3333333333, ans=0.2 2023-12-24 03:05:59,456 INFO [train.py:886] (0/4) Epoch 47, batch 1300, loss[loss=0.00949, audio_tagging_loss=0.00949, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4941572.11 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:06:00,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2023-12-24 03:06:05,870 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.18 vs. limit=22.5 2023-12-24 03:06:09,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1470306.6666666667, ans=0.0 2023-12-24 03:06:10,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1470306.6666666667, ans=0.0 2023-12-24 03:06:31,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1470440.0, ans=0.125 2023-12-24 03:06:37,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1470440.0, ans=0.0 2023-12-24 03:06:44,119 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.09 vs. limit=22.5 2023-12-24 03:06:52,376 INFO [train.py:886] (0/4) Epoch 47, batch 1350, loss[loss=0.01011, audio_tagging_loss=0.01011, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4946762.22 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:07:00,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1470573.3333333333, ans=0.125 2023-12-24 03:07:03,683 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.555e+01 3.933e+01 4.111e+01 4.315e+01 5.636e+01, threshold=8.222e+01, percent-clipped=0.0 2023-12-24 03:07:18,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1470706.6666666667, ans=0.125 2023-12-24 03:07:34,001 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.28 vs. limit=22.5 2023-12-24 03:07:40,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1470840.0, ans=0.125 2023-12-24 03:07:42,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2023-12-24 03:07:43,725 INFO [train.py:886] (0/4) Epoch 47, batch 1400, loss[loss=0.01145, audio_tagging_loss=0.01145, over 25000.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4944872.63 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:07:46,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2023-12-24 03:08:10,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1471040.0, ans=0.1 2023-12-24 03:08:26,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1471173.3333333333, ans=0.1 2023-12-24 03:08:27,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1471173.3333333333, ans=0.1 2023-12-24 03:08:31,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1471173.3333333333, ans=0.0 2023-12-24 03:08:32,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1471173.3333333333, ans=0.07 2023-12-24 03:08:34,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1471173.3333333333, ans=0.125 2023-12-24 03:08:35,969 INFO [train.py:886] (0/4) Epoch 47, batch 1450, loss[loss=0.01126, audio_tagging_loss=0.01126, over 25000.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4946417.70 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:08:45,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2023-12-24 03:08:48,112 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.547e+01 3.853e+01 4.015e+01 4.195e+01 8.350e+01, threshold=8.030e+01, percent-clipped=1.0 2023-12-24 03:08:57,502 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:09:03,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1471373.3333333333, ans=0.2 2023-12-24 03:09:14,490 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=22.5 2023-12-24 03:09:18,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1471506.6666666667, ans=0.2 2023-12-24 03:09:26,250 INFO [train.py:886] (0/4) Epoch 47, batch 1500, loss[loss=0.01144, audio_tagging_loss=0.01144, over 25000.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4941321.23 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:09:44,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1471640.0, ans=0.125 2023-12-24 03:09:46,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2023-12-24 03:10:17,895 INFO [train.py:886] (0/4) Epoch 47, batch 1550, loss[loss=0.01123, audio_tagging_loss=0.01123, over 24750.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4939502.65 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:10:18,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1471906.6666666667, ans=0.125 2023-12-24 03:10:19,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1471906.6666666667, ans=0.125 2023-12-24 03:10:20,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1471906.6666666667, ans=0.125 2023-12-24 03:10:21,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2023-12-24 03:10:29,798 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.710e+01 4.030e+01 4.186e+01 4.353e+01 4.618e+01, threshold=8.371e+01, percent-clipped=0.0 2023-12-24 03:10:30,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1471973.3333333333, ans=0.5 2023-12-24 03:10:53,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1472106.6666666667, ans=0.125 2023-12-24 03:11:01,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1472173.3333333333, ans=0.0 2023-12-24 03:11:10,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2023-12-24 03:11:10,583 INFO [train.py:886] (0/4) Epoch 47, batch 1600, loss[loss=0.01032, audio_tagging_loss=0.01032, over 24750.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4931298.11 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:11:12,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1472240.0, ans=0.5 2023-12-24 03:11:22,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2023-12-24 03:11:42,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=22.5 2023-12-24 03:11:42,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1472440.0, ans=0.125 2023-12-24 03:12:01,421 INFO [train.py:886] (0/4) Epoch 47, batch 1650, loss[loss=0.01014, audio_tagging_loss=0.01014, over 25000.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4932372.95 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:12:14,045 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.647e+01 4.031e+01 4.196e+01 4.409e+01 5.344e+01, threshold=8.391e+01, percent-clipped=0.0 2023-12-24 03:12:14,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1472640.0, ans=0.125 2023-12-24 03:12:15,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.24 vs. limit=10.0 2023-12-24 03:12:17,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1472640.0, ans=0.125 2023-12-24 03:12:21,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1472706.6666666667, ans=0.0 2023-12-24 03:12:22,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1472706.6666666667, ans=0.035 2023-12-24 03:12:46,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1472840.0, ans=0.125 2023-12-24 03:12:52,661 INFO [train.py:886] (0/4) Epoch 47, batch 1700, loss[loss=0.01242, audio_tagging_loss=0.01242, over 25000.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4929905.61 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:12:56,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1472906.6666666667, ans=0.125 2023-12-24 03:13:00,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1472906.6666666667, ans=0.0 2023-12-24 03:13:11,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1472973.3333333333, ans=0.1 2023-12-24 03:13:11,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1472973.3333333333, ans=0.0 2023-12-24 03:13:16,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=1473040.0, ans=15.0 2023-12-24 03:13:21,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1473040.0, ans=0.125 2023-12-24 03:13:23,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=15.0 2023-12-24 03:13:40,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1473173.3333333333, ans=0.125 2023-12-24 03:13:43,958 INFO [train.py:886] (0/4) Epoch 47, batch 1750, loss[loss=0.01182, audio_tagging_loss=0.01182, over 24750.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4936739.78 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:13:56,767 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.487e+01 3.924e+01 4.095e+01 4.271e+01 4.874e+01, threshold=8.190e+01, percent-clipped=0.0 2023-12-24 03:14:01,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=15.0 2023-12-24 03:14:17,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1473440.0, ans=0.0 2023-12-24 03:14:33,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1473506.6666666667, ans=0.125 2023-12-24 03:14:35,518 INFO [train.py:886] (0/4) Epoch 47, batch 1800, loss[loss=0.01096, audio_tagging_loss=0.01096, over 25000.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4941298.24 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:15:06,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2023-12-24 03:15:08,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.48 vs. limit=22.5 2023-12-24 03:15:13,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1473773.3333333333, ans=0.125 2023-12-24 03:15:14,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1473773.3333333333, ans=0.1 2023-12-24 03:15:21,094 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.09 vs. limit=12.0 2023-12-24 03:15:27,763 INFO [train.py:886] (0/4) Epoch 47, batch 1850, loss[loss=0.0116, audio_tagging_loss=0.0116, over 24750.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4937057.75 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:15:34,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1473906.6666666667, ans=0.0 2023-12-24 03:15:39,848 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.565e+01 3.927e+01 4.095e+01 4.266e+01 4.764e+01, threshold=8.189e+01, percent-clipped=0.0 2023-12-24 03:16:13,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1474173.3333333333, ans=0.1 2023-12-24 03:16:19,731 INFO [train.py:886] (0/4) Epoch 47, batch 1900, loss[loss=0.01088, audio_tagging_loss=0.01088, over 24750.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4937930.19 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:16:39,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1474306.6666666667, ans=0.0 2023-12-24 03:16:53,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1474440.0, ans=0.125 2023-12-24 03:16:59,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1474440.0, ans=0.5 2023-12-24 03:17:10,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1474506.6666666667, ans=0.1 2023-12-24 03:17:12,111 INFO [train.py:886] (0/4) Epoch 47, batch 1950, loss[loss=0.01446, audio_tagging_loss=0.01446, over 24750.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4934141.10 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:17:12,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1474573.3333333333, ans=0.2 2023-12-24 03:17:15,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1474573.3333333333, ans=0.0 2023-12-24 03:17:15,577 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=12.0 2023-12-24 03:17:19,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1474573.3333333333, ans=0.0 2023-12-24 03:17:21,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1474640.0, ans=0.125 2023-12-24 03:17:23,392 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:17:24,063 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.604e+01 3.930e+01 4.132e+01 4.340e+01 4.631e+01, threshold=8.265e+01, percent-clipped=0.0 2023-12-24 03:18:03,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1474906.6666666667, ans=10.0 2023-12-24 03:18:04,037 INFO [train.py:886] (0/4) Epoch 47, batch 2000, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01085, audio_tagging_loss=0.01085, over 4937156.29 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:18:08,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=12.0 2023-12-24 03:18:12,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1474906.6666666667, ans=0.95 2023-12-24 03:18:19,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1474973.3333333333, ans=0.125 2023-12-24 03:18:29,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1475040.0, ans=0.125 2023-12-24 03:18:34,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1475106.6666666667, ans=0.125 2023-12-24 03:18:38,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1475106.6666666667, ans=0.0 2023-12-24 03:18:56,072 INFO [train.py:886] (0/4) Epoch 47, batch 2050, loss[loss=0.01206, audio_tagging_loss=0.01206, over 25000.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4942226.62 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:19:07,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2023-12-24 03:19:09,100 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.482e+01 3.891e+01 4.061e+01 4.208e+01 4.839e+01, threshold=8.122e+01, percent-clipped=0.0 2023-12-24 03:19:17,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-12-24 03:19:33,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1475440.0, ans=0.0 2023-12-24 03:19:45,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1475506.6666666667, ans=0.125 2023-12-24 03:19:47,634 INFO [train.py:886] (0/4) Epoch 47, batch 2100, loss[loss=0.008673, audio_tagging_loss=0.008673, over 24750.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4948551.91 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:19:56,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1475640.0, ans=0.125 2023-12-24 03:20:12,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=12.0 2023-12-24 03:20:14,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-12-24 03:20:28,331 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-12-24 03:20:28,342 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.73 vs. limit=15.0 2023-12-24 03:20:38,853 INFO [train.py:886] (0/4) Epoch 47, batch 2150, loss[loss=0.01012, audio_tagging_loss=0.01012, over 24750.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4952987.81 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:20:41,349 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.00 vs. limit=22.5 2023-12-24 03:20:50,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1475973.3333333333, ans=0.0 2023-12-24 03:20:52,744 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.593e+01 3.982e+01 4.167e+01 4.321e+01 5.208e+01, threshold=8.335e+01, percent-clipped=0.0 2023-12-24 03:21:06,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1476040.0, ans=0.04949747468305833 2023-12-24 03:21:21,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2023-12-24 03:21:26,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1476173.3333333333, ans=0.125 2023-12-24 03:21:30,436 INFO [train.py:886] (0/4) Epoch 47, batch 2200, loss[loss=0.01311, audio_tagging_loss=0.01311, over 24944.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4949130.19 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:21:34,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1476240.0, ans=0.2 2023-12-24 03:21:37,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1476240.0, ans=0.1 2023-12-24 03:21:46,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1476306.6666666667, ans=0.125 2023-12-24 03:22:09,077 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.45 vs. limit=10.0 2023-12-24 03:22:11,842 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:22:20,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1476506.6666666667, ans=0.2 2023-12-24 03:22:23,334 INFO [train.py:886] (0/4) Epoch 47, batch 2250, loss[loss=0.009819, audio_tagging_loss=0.009819, over 24023.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4945199.62 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:22:35,565 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.670e+01 3.897e+01 4.118e+01 4.297e+01 5.377e+01, threshold=8.236e+01, percent-clipped=0.0 2023-12-24 03:22:36,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1476640.0, ans=0.0 2023-12-24 03:22:44,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1476706.6666666667, ans=0.0 2023-12-24 03:22:45,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1476706.6666666667, ans=0.07 2023-12-24 03:22:49,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1476706.6666666667, ans=0.125 2023-12-24 03:22:50,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1476706.6666666667, ans=0.125 2023-12-24 03:23:00,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1476773.3333333333, ans=0.05 2023-12-24 03:23:07,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1476840.0, ans=0.0 2023-12-24 03:23:07,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1476840.0, ans=0.2 2023-12-24 03:23:11,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1476840.0, ans=0.0 2023-12-24 03:23:14,274 INFO [train.py:886] (0/4) Epoch 47, batch 2300, loss[loss=0.01077, audio_tagging_loss=0.01077, over 25000.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4949829.10 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:23:19,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1476906.6666666667, ans=0.0 2023-12-24 03:23:23,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1476973.3333333333, ans=0.125 2023-12-24 03:23:34,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1477040.0, ans=0.05 2023-12-24 03:23:47,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1477106.6666666667, ans=0.0 2023-12-24 03:24:02,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1477173.3333333333, ans=0.0 2023-12-24 03:24:02,490 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=22.5 2023-12-24 03:24:05,699 INFO [train.py:886] (0/4) Epoch 47, batch 2350, loss[loss=0.0112, audio_tagging_loss=0.0112, over 25000.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4948596.37 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:24:07,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1477240.0, ans=0.0 2023-12-24 03:24:08,009 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2023-12-24 03:24:19,377 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.580e+01 3.910e+01 4.055e+01 4.262e+01 5.306e+01, threshold=8.110e+01, percent-clipped=0.0 2023-12-24 03:24:32,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1477373.3333333333, ans=0.0 2023-12-24 03:24:51,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1477506.6666666667, ans=0.125 2023-12-24 03:24:58,084 INFO [train.py:886] (0/4) Epoch 47, batch 2400, loss[loss=0.007902, audio_tagging_loss=0.007902, over 25000.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4956622.47 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:24:59,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1477573.3333333333, ans=0.125 2023-12-24 03:25:09,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1477640.0, ans=0.125 2023-12-24 03:25:49,196 INFO [train.py:886] (0/4) Epoch 47, batch 2450, loss[loss=0.01024, audio_tagging_loss=0.01024, over 25000.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4964084.20 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:25:50,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1477906.6666666667, ans=0.1 2023-12-24 03:26:03,618 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.638e+01 3.970e+01 4.140e+01 4.271e+01 4.902e+01, threshold=8.281e+01, percent-clipped=0.0 2023-12-24 03:26:04,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1477973.3333333333, ans=0.0 2023-12-24 03:26:08,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1477973.3333333333, ans=0.0 2023-12-24 03:26:17,007 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.41 vs. limit=15.0 2023-12-24 03:26:19,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1478106.6666666667, ans=0.125 2023-12-24 03:26:42,132 INFO [train.py:886] (0/4) Epoch 47, batch 2500, loss[loss=0.009586, audio_tagging_loss=0.009586, over 24750.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4952432.35 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:27:08,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=12.0 2023-12-24 03:27:08,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1478373.3333333333, ans=0.125 2023-12-24 03:27:10,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1478373.3333333333, ans=0.1 2023-12-24 03:27:12,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1478440.0, ans=0.0 2023-12-24 03:27:33,045 INFO [train.py:886] (0/4) Epoch 47, batch 2550, loss[loss=0.01135, audio_tagging_loss=0.01135, over 24750.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4943256.15 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:27:33,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1478573.3333333333, ans=0.0 2023-12-24 03:27:36,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1478573.3333333333, ans=0.1 2023-12-24 03:27:45,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1478640.0, ans=0.0 2023-12-24 03:27:47,509 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.559e+01 3.962e+01 4.101e+01 4.307e+01 5.190e+01, threshold=8.202e+01, percent-clipped=0.0 2023-12-24 03:28:03,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1478773.3333333333, ans=0.125 2023-12-24 03:28:05,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1478773.3333333333, ans=0.125 2023-12-24 03:28:13,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.56 vs. limit=10.0 2023-12-24 03:28:17,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1478840.0, ans=0.1 2023-12-24 03:28:20,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1478840.0, ans=0.1 2023-12-24 03:28:25,397 INFO [train.py:886] (0/4) Epoch 47, batch 2600, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24750.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4943340.31 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:28:32,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1478906.6666666667, ans=0.125 2023-12-24 03:28:37,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1478973.3333333333, ans=0.125 2023-12-24 03:28:37,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1478973.3333333333, ans=0.0 2023-12-24 03:28:39,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1478973.3333333333, ans=0.1 2023-12-24 03:28:40,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1478973.3333333333, ans=0.025 2023-12-24 03:28:47,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1479040.0, ans=0.125 2023-12-24 03:28:49,979 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.64 vs. limit=22.5 2023-12-24 03:28:52,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2023-12-24 03:28:55,316 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:28:58,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1479106.6666666667, ans=0.0 2023-12-24 03:28:59,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1479106.6666666667, ans=0.1 2023-12-24 03:29:15,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1479173.3333333333, ans=0.125 2023-12-24 03:29:17,346 INFO [train.py:886] (0/4) Epoch 47, batch 2650, loss[loss=0.01163, audio_tagging_loss=0.01163, over 24750.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4948172.74 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:29:23,110 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:29:26,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1479306.6666666667, ans=0.125 2023-12-24 03:29:30,365 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.599e+01 3.913e+01 4.117e+01 4.340e+01 5.739e+01, threshold=8.234e+01, percent-clipped=0.0 2023-12-24 03:29:48,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1479440.0, ans=0.125 2023-12-24 03:30:08,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1479573.3333333333, ans=0.1 2023-12-24 03:30:08,782 INFO [train.py:886] (0/4) Epoch 47, batch 2700, loss[loss=0.009235, audio_tagging_loss=0.009235, over 25000.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4951057.84 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:30:13,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1479573.3333333333, ans=0.2 2023-12-24 03:30:27,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1479640.0, ans=0.1 2023-12-24 03:30:46,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1479773.3333333333, ans=0.125 2023-12-24 03:30:55,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1479840.0, ans=0.025 2023-12-24 03:30:57,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1479840.0, ans=0.125 2023-12-24 03:31:01,093 INFO [train.py:886] (0/4) Epoch 47, batch 2750, loss[loss=0.009217, audio_tagging_loss=0.009217, over 24750.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4949329.46 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:31:02,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1479906.6666666667, ans=0.0 2023-12-24 03:31:10,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1479973.3333333333, ans=0.125 2023-12-24 03:31:14,034 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.573e+01 3.907e+01 4.081e+01 4.248e+01 4.984e+01, threshold=8.163e+01, percent-clipped=0.0 2023-12-24 03:31:28,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1480040.0, ans=0.1 2023-12-24 03:31:28,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=22.5 2023-12-24 03:31:38,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1480106.6666666667, ans=0.125 2023-12-24 03:31:51,816 INFO [train.py:886] (0/4) Epoch 47, batch 2800, loss[loss=0.01032, audio_tagging_loss=0.01032, over 24750.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4944338.05 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:32:07,321 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:32:11,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1480306.6666666667, ans=0.125 2023-12-24 03:32:13,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1480373.3333333333, ans=0.1 2023-12-24 03:32:14,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2023-12-24 03:32:43,755 INFO [train.py:886] (0/4) Epoch 47, batch 2850, loss[loss=0.009876, audio_tagging_loss=0.009876, over 24025.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4939407.10 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:32:56,595 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.697e+01 4.004e+01 4.137e+01 4.360e+01 4.931e+01, threshold=8.275e+01, percent-clipped=0.0 2023-12-24 03:33:21,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1480773.3333333333, ans=0.125 2023-12-24 03:33:23,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1480773.3333333333, ans=0.1 2023-12-24 03:33:30,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1480840.0, ans=0.1 2023-12-24 03:33:34,978 INFO [train.py:886] (0/4) Epoch 47, batch 2900, loss[loss=0.01011, audio_tagging_loss=0.01011, over 25000.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4940713.33 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:33:49,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.65 vs. limit=10.0 2023-12-24 03:33:51,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1480973.3333333333, ans=0.125 2023-12-24 03:33:57,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.71 vs. limit=22.5 2023-12-24 03:34:11,333 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.77 vs. limit=15.0 2023-12-24 03:34:12,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1481106.6666666667, ans=0.1 2023-12-24 03:34:13,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1481106.6666666667, ans=0.125 2023-12-24 03:34:27,552 INFO [train.py:886] (0/4) Epoch 47, batch 2950, loss[loss=0.009949, audio_tagging_loss=0.009949, over 25000.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4946789.49 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:34:27,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1481240.0, ans=0.0 2023-12-24 03:34:37,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1481306.6666666667, ans=0.0 2023-12-24 03:34:41,282 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.488e+01 3.898e+01 4.063e+01 4.276e+01 4.870e+01, threshold=8.126e+01, percent-clipped=0.0 2023-12-24 03:34:44,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1481306.6666666667, ans=0.0 2023-12-24 03:34:44,501 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2023-12-24 03:34:52,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2023-12-24 03:34:52,242 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2023-12-24 03:35:07,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1481440.0, ans=0.125 2023-12-24 03:35:19,983 INFO [train.py:886] (0/4) Epoch 47, batch 3000, loss[loss=0.01044, audio_tagging_loss=0.01044, over 25000.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4947419.88 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:35:19,984 INFO [train.py:909] (0/4) Computing validation loss 2023-12-24 03:35:27,637 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6651, 4.0199, 4.1725, 3.8694], device='cuda:0') 2023-12-24 03:35:41,611 INFO [train.py:917] (0/4) Epoch 47, validation: loss=0.03661, audio_tagging_loss=0.03661, over 3737520.00 frames. 2023-12-24 03:35:41,612 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-24 03:35:44,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1481573.3333333333, ans=0.2 2023-12-24 03:35:50,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1481573.3333333333, ans=0.0 2023-12-24 03:36:23,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2023-12-24 03:36:33,048 INFO [train.py:886] (0/4) Epoch 47, batch 3050, loss[loss=0.01098, audio_tagging_loss=0.01098, over 25000.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4951618.22 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:36:41,117 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2023-12-24 03:36:46,062 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.561e+01 3.890e+01 4.067e+01 4.265e+01 5.158e+01, threshold=8.135e+01, percent-clipped=0.0 2023-12-24 03:36:54,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1482040.0, ans=0.125 2023-12-24 03:36:59,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1482040.0, ans=0.0 2023-12-24 03:37:08,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2023-12-24 03:37:18,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1482173.3333333333, ans=0.0 2023-12-24 03:37:25,377 INFO [train.py:886] (0/4) Epoch 47, batch 3100, loss[loss=0.0102, audio_tagging_loss=0.0102, over 24750.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4944494.52 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:37:28,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1482240.0, ans=0.0 2023-12-24 03:37:49,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1482373.3333333333, ans=0.125 2023-12-24 03:38:00,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-12-24 03:38:05,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1482506.6666666667, ans=0.2 2023-12-24 03:38:10,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1482506.6666666667, ans=0.0 2023-12-24 03:38:16,241 INFO [train.py:886] (0/4) Epoch 47, batch 3150, loss[loss=0.01424, audio_tagging_loss=0.01424, over 24936.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4941968.78 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:38:16,701 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.56 vs. limit=6.0 2023-12-24 03:38:30,728 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.480e+01 3.934e+01 4.165e+01 4.401e+01 5.350e+01, threshold=8.330e+01, percent-clipped=0.0 2023-12-24 03:38:55,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1482773.3333333333, ans=0.0 2023-12-24 03:38:55,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1482773.3333333333, ans=0.125 2023-12-24 03:38:59,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1482840.0, ans=0.1 2023-12-24 03:39:01,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1482840.0, ans=0.1 2023-12-24 03:39:08,896 INFO [train.py:886] (0/4) Epoch 47, batch 3200, loss[loss=0.00945, audio_tagging_loss=0.00945, over 24750.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4930257.33 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:39:14,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1482906.6666666667, ans=0.0 2023-12-24 03:39:29,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1483040.0, ans=15.0 2023-12-24 03:39:46,356 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:39:47,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1483106.6666666667, ans=0.1 2023-12-24 03:39:49,381 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2023-12-24 03:39:55,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2023-12-24 03:39:59,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1483240.0, ans=0.0 2023-12-24 03:40:00,797 INFO [train.py:886] (0/4) Epoch 47, batch 3250, loss[loss=0.01174, audio_tagging_loss=0.01174, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4928556.71 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:40:06,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1483240.0, ans=0.0 2023-12-24 03:40:13,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1483306.6666666667, ans=0.09899494936611666 2023-12-24 03:40:14,364 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.530e+01 3.967e+01 4.152e+01 4.354e+01 4.796e+01, threshold=8.304e+01, percent-clipped=0.0 2023-12-24 03:40:19,287 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:40:31,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1483440.0, ans=0.05 2023-12-24 03:40:40,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1483506.6666666667, ans=0.125 2023-12-24 03:40:49,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1483506.6666666667, ans=0.125 2023-12-24 03:40:51,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1483573.3333333333, ans=0.125 2023-12-24 03:40:52,247 INFO [train.py:886] (0/4) Epoch 47, batch 3300, loss[loss=0.01069, audio_tagging_loss=0.01069, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4932030.47 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:41:10,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1483640.0, ans=0.0 2023-12-24 03:41:29,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1483773.3333333333, ans=0.125 2023-12-24 03:41:30,401 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.25 vs. limit=15.0 2023-12-24 03:41:41,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1483840.0, ans=0.1 2023-12-24 03:41:43,711 INFO [train.py:886] (0/4) Epoch 47, batch 3350, loss[loss=0.01292, audio_tagging_loss=0.01292, over 24750.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4935218.50 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:41:52,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1483906.6666666667, ans=0.1 2023-12-24 03:41:57,457 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.599e+01 3.920e+01 4.118e+01 4.246e+01 4.815e+01, threshold=8.236e+01, percent-clipped=0.0 2023-12-24 03:41:59,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1483973.3333333333, ans=0.125 2023-12-24 03:42:13,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.11 vs. limit=15.0 2023-12-24 03:42:14,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1484106.6666666667, ans=0.2 2023-12-24 03:42:30,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1484173.3333333333, ans=0.125 2023-12-24 03:42:35,586 INFO [train.py:886] (0/4) Epoch 47, batch 3400, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4942169.94 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:42:50,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1484306.6666666667, ans=0.125 2023-12-24 03:43:02,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1484373.3333333333, ans=0.2 2023-12-24 03:43:13,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1484440.0, ans=0.1 2023-12-24 03:43:27,005 INFO [train.py:886] (0/4) Epoch 47, batch 3450, loss[loss=0.009498, audio_tagging_loss=0.009498, over 24750.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4942049.47 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:43:40,798 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.504e+01 3.946e+01 4.174e+01 4.346e+01 5.015e+01, threshold=8.347e+01, percent-clipped=0.0 2023-12-24 03:43:50,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1484706.6666666667, ans=10.0 2023-12-24 03:44:13,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1484840.0, ans=0.0 2023-12-24 03:44:19,474 INFO [train.py:886] (0/4) Epoch 47, batch 3500, loss[loss=0.009501, audio_tagging_loss=0.009501, over 24750.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4942374.93 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:44:23,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1484906.6666666667, ans=0.2 2023-12-24 03:44:36,594 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=15.0 2023-12-24 03:44:38,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1484973.3333333333, ans=0.125 2023-12-24 03:44:38,263 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:44:58,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1485106.6666666667, ans=0.125 2023-12-24 03:45:10,892 INFO [train.py:886] (0/4) Epoch 47, batch 3550, loss[loss=0.009469, audio_tagging_loss=0.009469, over 24917.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4943520.28 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:45:24,596 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.366e+01 3.931e+01 4.091e+01 4.261e+01 4.917e+01, threshold=8.182e+01, percent-clipped=0.0 2023-12-24 03:45:38,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1485373.3333333333, ans=0.0 2023-12-24 03:45:39,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1485373.3333333333, ans=0.125 2023-12-24 03:45:45,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1485440.0, ans=0.0 2023-12-24 03:45:55,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1485506.6666666667, ans=0.2 2023-12-24 03:46:02,700 INFO [train.py:886] (0/4) Epoch 47, batch 3600, loss[loss=0.009007, audio_tagging_loss=0.009007, over 25000.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4941881.81 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:46:28,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1485706.6666666667, ans=0.0 2023-12-24 03:46:30,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1485706.6666666667, ans=0.2 2023-12-24 03:46:33,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1485773.3333333333, ans=0.0 2023-12-24 03:46:55,304 INFO [train.py:886] (0/4) Epoch 47, batch 3650, loss[loss=0.01093, audio_tagging_loss=0.01093, over 25000.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4947980.40 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:47:04,130 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2023-12-24 03:47:08,291 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.641e+01 3.947e+01 4.112e+01 4.320e+01 4.973e+01, threshold=8.224e+01, percent-clipped=0.0 2023-12-24 03:47:16,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1486040.0, ans=0.5 2023-12-24 03:47:26,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1486106.6666666667, ans=0.0 2023-12-24 03:47:38,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1486173.3333333333, ans=0.125 2023-12-24 03:47:39,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.67 vs. limit=12.0 2023-12-24 03:47:46,824 INFO [train.py:886] (0/4) Epoch 47, batch 3700, loss[loss=0.01029, audio_tagging_loss=0.01029, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4954830.45 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:48:07,685 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2023-12-24 03:48:08,675 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.66 vs. limit=15.0 2023-12-24 03:48:11,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1486373.3333333333, ans=0.0 2023-12-24 03:48:37,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1486506.6666666667, ans=0.0 2023-12-24 03:48:39,064 INFO [train.py:886] (0/4) Epoch 47, batch 3750, loss[loss=0.009042, audio_tagging_loss=0.009042, over 24054.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4949417.00 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:48:44,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1486573.3333333333, ans=0.1 2023-12-24 03:48:44,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1486573.3333333333, ans=0.125 2023-12-24 03:48:48,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1486640.0, ans=0.0 2023-12-24 03:48:49,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1486640.0, ans=10.0 2023-12-24 03:48:50,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.45 vs. limit=5.0 2023-12-24 03:48:51,912 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.685e+01 4.010e+01 4.129e+01 4.399e+01 4.975e+01, threshold=8.259e+01, percent-clipped=0.0 2023-12-24 03:48:57,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1486640.0, ans=0.5 2023-12-24 03:48:59,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.46 vs. limit=10.0 2023-12-24 03:49:23,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1486840.0, ans=0.125 2023-12-24 03:49:30,070 INFO [train.py:886] (0/4) Epoch 47, batch 3800, loss[loss=0.009392, audio_tagging_loss=0.009392, over 24084.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4934550.39 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:49:32,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1486906.6666666667, ans=0.5 2023-12-24 03:49:39,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1486906.6666666667, ans=0.1 2023-12-24 03:49:47,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-12-24 03:49:48,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1486973.3333333333, ans=0.0 2023-12-24 03:49:51,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1487040.0, ans=0.125 2023-12-24 03:49:55,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1487040.0, ans=0.95 2023-12-24 03:50:00,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1487106.6666666667, ans=0.07 2023-12-24 03:50:00,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1487106.6666666667, ans=0.125 2023-12-24 03:50:01,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1487106.6666666667, ans=0.0 2023-12-24 03:50:14,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1487173.3333333333, ans=0.125 2023-12-24 03:50:20,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2023-12-24 03:50:22,427 INFO [train.py:886] (0/4) Epoch 47, batch 3850, loss[loss=0.01158, audio_tagging_loss=0.01158, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4933924.73 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:50:22,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2023-12-24 03:50:35,806 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.564e+01 3.975e+01 4.146e+01 4.354e+01 5.243e+01, threshold=8.293e+01, percent-clipped=0.0 2023-12-24 03:50:45,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1487373.3333333333, ans=0.125 2023-12-24 03:50:50,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1487373.3333333333, ans=0.125 2023-12-24 03:50:56,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1487440.0, ans=0.05 2023-12-24 03:51:00,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1487440.0, ans=0.015 2023-12-24 03:51:04,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1487506.6666666667, ans=0.125 2023-12-24 03:51:09,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1487506.6666666667, ans=0.0 2023-12-24 03:51:11,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1487506.6666666667, ans=0.0 2023-12-24 03:51:11,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1487506.6666666667, ans=0.0 2023-12-24 03:51:15,178 INFO [train.py:886] (0/4) Epoch 47, batch 3900, loss[loss=0.009574, audio_tagging_loss=0.009574, over 25000.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4938924.31 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:51:31,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2023-12-24 03:51:49,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1487773.3333333333, ans=0.0 2023-12-24 03:51:50,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1487773.3333333333, ans=0.125 2023-12-24 03:52:06,195 INFO [train.py:886] (0/4) Epoch 47, batch 3950, loss[loss=0.01239, audio_tagging_loss=0.01239, over 25000.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4939494.01 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:52:06,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1487906.6666666667, ans=0.125 2023-12-24 03:52:19,823 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.576e+01 3.897e+01 4.099e+01 4.303e+01 4.809e+01, threshold=8.198e+01, percent-clipped=0.0 2023-12-24 03:52:26,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1488040.0, ans=0.2 2023-12-24 03:52:47,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1488173.3333333333, ans=0.1 2023-12-24 03:52:58,004 INFO [train.py:886] (0/4) Epoch 47, batch 4000, loss[loss=0.01165, audio_tagging_loss=0.01165, over 25000.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4946171.76 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:53:00,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1488240.0, ans=0.125 2023-12-24 03:53:00,452 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.91 vs. limit=15.0 2023-12-24 03:53:01,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1488240.0, ans=0.125 2023-12-24 03:53:30,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1488440.0, ans=10.0 2023-12-24 03:53:49,966 INFO [train.py:886] (0/4) Epoch 47, batch 4050, loss[loss=0.01021, audio_tagging_loss=0.01021, over 24750.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4947664.83 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:53:59,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1488573.3333333333, ans=0.1 2023-12-24 03:54:03,598 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.451e+01 4.007e+01 4.158e+01 4.362e+01 4.853e+01, threshold=8.315e+01, percent-clipped=0.0 2023-12-24 03:54:03,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1488640.0, ans=0.125 2023-12-24 03:54:16,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1488706.6666666667, ans=0.125 2023-12-24 03:54:20,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.59 vs. limit=22.5 2023-12-24 03:54:35,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2023-12-24 03:54:41,995 INFO [train.py:886] (0/4) Epoch 47, batch 4100, loss[loss=0.01087, audio_tagging_loss=0.01087, over 24750.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4935442.08 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:54:53,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=1488973.3333333333, ans=0.02 2023-12-24 03:55:04,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1489040.0, ans=0.2 2023-12-24 03:55:18,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2023-12-24 03:55:18,371 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-12-24 03:55:24,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1489173.3333333333, ans=0.1 2023-12-24 03:55:33,576 INFO [train.py:886] (0/4) Epoch 47, batch 4150, loss[loss=0.01123, audio_tagging_loss=0.01123, over 25000.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4939490.55 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:55:39,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.67 vs. limit=15.0 2023-12-24 03:55:41,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1489240.0, ans=0.125 2023-12-24 03:55:43,756 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:55:47,354 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.657e+01 3.988e+01 4.176e+01 4.439e+01 5.267e+01, threshold=8.351e+01, percent-clipped=0.0 2023-12-24 03:55:59,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1489373.3333333333, ans=0.125 2023-12-24 03:56:00,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2023-12-24 03:56:25,309 INFO [train.py:886] (0/4) Epoch 47, batch 4200, loss[loss=0.009628, audio_tagging_loss=0.009628, over 24750.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4941459.37 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:56:29,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1489573.3333333333, ans=0.1 2023-12-24 03:56:32,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1489573.3333333333, ans=0.1 2023-12-24 03:56:50,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1489706.6666666667, ans=0.0 2023-12-24 03:56:58,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.62 vs. limit=15.0 2023-12-24 03:57:04,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.08 vs. limit=12.0 2023-12-24 03:57:05,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1489773.3333333333, ans=0.0 2023-12-24 03:57:18,732 INFO [train.py:886] (0/4) Epoch 47, batch 4250, loss[loss=0.01058, audio_tagging_loss=0.01058, over 25000.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4947337.88 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:57:23,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1489906.6666666667, ans=0.035 2023-12-24 03:57:31,100 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.425e+01 3.978e+01 4.120e+01 4.273e+01 4.787e+01, threshold=8.239e+01, percent-clipped=0.0 2023-12-24 03:57:31,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1489973.3333333333, ans=0.125 2023-12-24 03:57:32,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=12.0 2023-12-24 03:57:33,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.20 vs. limit=15.0 2023-12-24 03:57:47,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1490106.6666666667, ans=0.125 2023-12-24 03:58:01,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1490173.3333333333, ans=0.125 2023-12-24 03:58:05,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1490173.3333333333, ans=0.125 2023-12-24 03:58:09,476 INFO [train.py:886] (0/4) Epoch 47, batch 4300, loss[loss=0.01224, audio_tagging_loss=0.01224, over 25000.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4955237.37 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:58:55,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=15.0 2023-12-24 03:59:01,776 INFO [train.py:886] (0/4) Epoch 47, batch 4350, loss[loss=0.01219, audio_tagging_loss=0.01219, over 24750.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4959188.72 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:59:02,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1490573.3333333333, ans=0.125 2023-12-24 03:59:08,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1490573.3333333333, ans=0.0 2023-12-24 03:59:13,255 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:59:14,773 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.601e+01 4.048e+01 4.179e+01 4.362e+01 5.083e+01, threshold=8.358e+01, percent-clipped=0.0 2023-12-24 03:59:19,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1490640.0, ans=0.1 2023-12-24 03:59:35,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1490773.3333333333, ans=0.125 2023-12-24 03:59:40,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1490773.3333333333, ans=0.2 2023-12-24 03:59:47,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.67 vs. limit=22.5 2023-12-24 03:59:48,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1490840.0, ans=0.125 2023-12-24 03:59:50,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1490840.0, ans=0.125 2023-12-24 03:59:53,615 INFO [train.py:886] (0/4) Epoch 47, batch 4400, loss[loss=0.01115, audio_tagging_loss=0.01115, over 24750.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4955687.90 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:00:33,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1491106.6666666667, ans=0.0 2023-12-24 04:00:38,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1491173.3333333333, ans=0.0 2023-12-24 04:00:45,146 INFO [train.py:886] (0/4) Epoch 47, batch 4450, loss[loss=0.009551, audio_tagging_loss=0.009551, over 24750.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4950233.38 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:00:45,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1491240.0, ans=0.0 2023-12-24 04:00:58,833 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.617e+01 3.970e+01 4.177e+01 4.303e+01 4.832e+01, threshold=8.353e+01, percent-clipped=0.0 2023-12-24 04:01:03,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.52 vs. limit=22.5 2023-12-24 04:01:09,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1491373.3333333333, ans=0.125 2023-12-24 04:01:37,666 INFO [train.py:886] (0/4) Epoch 47, batch 4500, loss[loss=0.01139, audio_tagging_loss=0.01139, over 25000.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4950970.30 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:02:24,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1491840.0, ans=0.1 2023-12-24 04:02:25,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1491840.0, ans=0.1 2023-12-24 04:02:30,051 INFO [train.py:886] (0/4) Epoch 47, batch 4550, loss[loss=0.01086, audio_tagging_loss=0.01086, over 25000.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4954133.95 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:02:39,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1491973.3333333333, ans=0.125 2023-12-24 04:02:42,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1491973.3333333333, ans=0.125 2023-12-24 04:02:43,226 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.462e+01 3.976e+01 4.121e+01 4.329e+01 5.064e+01, threshold=8.243e+01, percent-clipped=0.0 2023-12-24 04:03:06,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1492106.6666666667, ans=0.0 2023-12-24 04:03:21,233 INFO [train.py:886] (0/4) Epoch 47, batch 4600, loss[loss=0.01181, audio_tagging_loss=0.01181, over 25000.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4953988.61 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:03:40,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.34 vs. limit=22.5 2023-12-24 04:04:06,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1492506.6666666667, ans=0.125 2023-12-24 04:04:10,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1492506.6666666667, ans=0.125 2023-12-24 04:04:11,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1492506.6666666667, ans=0.125 2023-12-24 04:04:13,631 INFO [train.py:886] (0/4) Epoch 47, batch 4650, loss[loss=0.008901, audio_tagging_loss=0.008901, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4955778.11 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:04:20,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1492573.3333333333, ans=0.1 2023-12-24 04:04:22,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1492573.3333333333, ans=0.1 2023-12-24 04:04:23,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1492640.0, ans=0.0 2023-12-24 04:04:26,946 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.563e+01 3.987e+01 4.134e+01 4.270e+01 4.832e+01, threshold=8.268e+01, percent-clipped=0.0 2023-12-24 04:04:32,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1492640.0, ans=0.0 2023-12-24 04:05:04,474 INFO [train.py:886] (0/4) Epoch 47, batch 4700, loss[loss=0.01217, audio_tagging_loss=0.01217, over 24750.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4954875.42 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:05:11,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1492906.6666666667, ans=0.2 2023-12-24 04:05:18,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1492973.3333333333, ans=0.2 2023-12-24 04:05:39,589 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.58 vs. limit=15.0 2023-12-24 04:05:51,827 INFO [train.py:886] (0/4) Epoch 47, batch 4750, loss[loss=0.01135, audio_tagging_loss=0.01135, over 24750.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4950237.18 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:05:54,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1493240.0, ans=0.125 2023-12-24 04:05:54,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2023-12-24 04:06:02,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1493306.6666666667, ans=0.125 2023-12-24 04:06:03,899 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.583e+01 4.050e+01 4.260e+01 4.432e+01 5.167e+01, threshold=8.521e+01, percent-clipped=0.0 2023-12-24 04:06:04,148 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-224000.pt 2023-12-24 04:06:09,301 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-47.pt 2023-12-24 04:06:27,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1493346.6666666667, ans=0.125 2023-12-24 04:06:28,392 INFO [train.py:886] (0/4) Epoch 48, batch 0, loss[loss=0.02083, audio_tagging_loss=0.02083, over 24035.00 frames. ], tot_loss[loss=0.02083, audio_tagging_loss=0.02083, over 24035.00 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:06:28,393 INFO [train.py:909] (0/4) Computing validation loss 2023-12-24 04:06:49,468 INFO [train.py:917] (0/4) Epoch 48, validation: loss=0.03686, audio_tagging_loss=0.03686, over 3737520.00 frames. 2023-12-24 04:06:49,469 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-24 04:06:58,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1493413.3333333333, ans=0.125 2023-12-24 04:07:18,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1493480.0, ans=0.125 2023-12-24 04:07:34,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1493613.3333333333, ans=0.0 2023-12-24 04:07:41,194 INFO [train.py:886] (0/4) Epoch 48, batch 50, loss[loss=0.01347, audio_tagging_loss=0.01347, over 25000.00 frames. ], tot_loss[loss=0.01716, audio_tagging_loss=0.01716, over 1117828.17 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:08:00,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2023-12-24 04:08:02,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1493813.3333333333, ans=0.0 2023-12-24 04:08:18,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1493880.0, ans=0.125 2023-12-24 04:08:31,115 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 4.056e+01 4.648e+01 5.127e+01 5.664e+01 9.776e+01, threshold=1.025e+02, percent-clipped=5.0 2023-12-24 04:08:31,594 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2023-12-24 04:08:33,024 INFO [train.py:886] (0/4) Epoch 48, batch 100, loss[loss=0.0159, audio_tagging_loss=0.0159, over 25000.00 frames. ], tot_loss[loss=0.01491, audio_tagging_loss=0.01491, over 1966640.80 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:08:34,195 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:08:43,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1494080.0, ans=0.125 2023-12-24 04:08:46,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1494080.0, ans=0.1 2023-12-24 04:09:00,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1494146.6666666667, ans=0.5 2023-12-24 04:09:05,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1494213.3333333333, ans=0.125 2023-12-24 04:09:11,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1494213.3333333333, ans=0.0 2023-12-24 04:09:22,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1494280.0, ans=0.1 2023-12-24 04:09:24,823 INFO [train.py:886] (0/4) Epoch 48, batch 150, loss[loss=0.01096, audio_tagging_loss=0.01096, over 24750.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 2633302.52 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:09:31,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1494346.6666666667, ans=0.125 2023-12-24 04:09:34,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1494346.6666666667, ans=0.125 2023-12-24 04:09:49,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1494480.0, ans=0.125 2023-12-24 04:09:50,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1494480.0, ans=0.125 2023-12-24 04:09:52,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1494480.0, ans=0.0 2023-12-24 04:09:59,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1494546.6666666667, ans=0.2 2023-12-24 04:10:14,261 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.608e+01 4.085e+01 4.287e+01 4.458e+01 4.971e+01, threshold=8.574e+01, percent-clipped=0.0 2023-12-24 04:10:16,194 INFO [train.py:886] (0/4) Epoch 48, batch 200, loss[loss=0.01062, audio_tagging_loss=0.01062, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 3145382.25 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:10:16,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1494680.0, ans=0.125 2023-12-24 04:10:45,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1494813.3333333333, ans=0.125 2023-12-24 04:11:08,775 INFO [train.py:886] (0/4) Epoch 48, batch 250, loss[loss=0.01448, audio_tagging_loss=0.01448, over 25000.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 3544513.59 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:11:18,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1495080.0, ans=0.125 2023-12-24 04:11:18,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1495080.0, ans=0.125 2023-12-24 04:11:19,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2023-12-24 04:11:21,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.60 vs. limit=15.0 2023-12-24 04:11:29,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1495146.6666666667, ans=0.0 2023-12-24 04:11:47,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1495213.3333333333, ans=0.125 2023-12-24 04:11:56,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1495280.0, ans=0.2 2023-12-24 04:11:58,317 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.624e+01 3.958e+01 4.151e+01 4.361e+01 5.160e+01, threshold=8.303e+01, percent-clipped=0.0 2023-12-24 04:11:58,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1495280.0, ans=0.0 2023-12-24 04:12:00,671 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=22.5 2023-12-24 04:12:00,935 INFO [train.py:886] (0/4) Epoch 48, batch 300, loss[loss=0.01026, audio_tagging_loss=0.01026, over 24750.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 3857892.79 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:12:06,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1495346.6666666667, ans=0.1 2023-12-24 04:12:43,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2023-12-24 04:12:52,531 INFO [train.py:886] (0/4) Epoch 48, batch 350, loss[loss=0.01219, audio_tagging_loss=0.01219, over 22216.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4089090.35 frames. ], batch size: 107, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:12:59,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1495680.0, ans=0.2 2023-12-24 04:13:14,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1495813.3333333333, ans=0.1 2023-12-24 04:13:42,334 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.481e+01 3.929e+01 4.116e+01 4.285e+01 5.525e+01, threshold=8.232e+01, percent-clipped=0.0 2023-12-24 04:13:44,921 INFO [train.py:886] (0/4) Epoch 48, batch 400, loss[loss=0.01011, audio_tagging_loss=0.01011, over 25000.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4278948.51 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:13:51,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1496013.3333333333, ans=0.0 2023-12-24 04:13:58,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1496080.0, ans=0.125 2023-12-24 04:13:59,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1496080.0, ans=0.0 2023-12-24 04:14:23,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1496213.3333333333, ans=0.0 2023-12-24 04:14:23,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1496213.3333333333, ans=0.0 2023-12-24 04:14:35,632 INFO [train.py:886] (0/4) Epoch 48, batch 450, loss[loss=0.01065, audio_tagging_loss=0.01065, over 25000.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4427802.22 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:14:35,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1496346.6666666667, ans=0.125 2023-12-24 04:15:08,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1496546.6666666667, ans=0.0 2023-12-24 04:15:14,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1496546.6666666667, ans=0.1 2023-12-24 04:15:26,673 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.400e+01 3.891e+01 4.102e+01 4.306e+01 5.682e+01, threshold=8.203e+01, percent-clipped=0.0 2023-12-24 04:15:28,581 INFO [train.py:886] (0/4) Epoch 48, batch 500, loss[loss=0.01115, audio_tagging_loss=0.01115, over 25000.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4544767.90 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:15:46,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1496746.6666666667, ans=0.1 2023-12-24 04:15:49,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1496813.3333333333, ans=0.125 2023-12-24 04:15:57,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1496813.3333333333, ans=0.1 2023-12-24 04:16:03,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1496880.0, ans=0.07 2023-12-24 04:16:03,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1496880.0, ans=0.125 2023-12-24 04:16:03,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.80 vs. limit=22.5 2023-12-24 04:16:07,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1496880.0, ans=0.2 2023-12-24 04:16:19,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.33 vs. limit=10.0 2023-12-24 04:16:19,548 INFO [train.py:886] (0/4) Epoch 48, batch 550, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4634096.05 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:16:20,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1497013.3333333333, ans=0.125 2023-12-24 04:16:22,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1497013.3333333333, ans=0.0 2023-12-24 04:16:24,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1497013.3333333333, ans=0.025 2023-12-24 04:16:30,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1497080.0, ans=0.2 2023-12-24 04:16:41,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1497146.6666666667, ans=0.125 2023-12-24 04:16:49,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1497146.6666666667, ans=0.1 2023-12-24 04:17:10,273 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.621e+01 4.004e+01 4.167e+01 4.372e+01 5.263e+01, threshold=8.335e+01, percent-clipped=0.0 2023-12-24 04:17:12,149 INFO [train.py:886] (0/4) Epoch 48, batch 600, loss[loss=0.01137, audio_tagging_loss=0.01137, over 24941.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4702735.53 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:17:27,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1497413.3333333333, ans=0.125 2023-12-24 04:17:46,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2023-12-24 04:17:48,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1497546.6666666667, ans=0.125 2023-12-24 04:18:03,732 INFO [train.py:886] (0/4) Epoch 48, batch 650, loss[loss=0.009847, audio_tagging_loss=0.009847, over 24750.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4757269.94 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:18:06,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=1497680.0, ans=0.05 2023-12-24 04:18:10,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1497680.0, ans=0.125 2023-12-24 04:18:52,642 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.607e+01 3.971e+01 4.176e+01 4.392e+01 5.928e+01, threshold=8.353e+01, percent-clipped=0.0 2023-12-24 04:18:55,243 INFO [train.py:886] (0/4) Epoch 48, batch 700, loss[loss=0.01093, audio_tagging_loss=0.01093, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4797394.62 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:19:03,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=6.48 vs. limit=15.0 2023-12-24 04:19:09,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1498080.0, ans=0.0 2023-12-24 04:19:23,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-12-24 04:19:34,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1498280.0, ans=0.07 2023-12-24 04:19:44,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1498280.0, ans=0.125 2023-12-24 04:19:46,791 INFO [train.py:886] (0/4) Epoch 48, batch 750, loss[loss=0.01092, audio_tagging_loss=0.01092, over 24750.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4831577.92 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:19:54,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1498346.6666666667, ans=0.125 2023-12-24 04:19:59,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1498413.3333333333, ans=0.0 2023-12-24 04:20:36,085 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.587e+01 3.930e+01 4.066e+01 4.267e+01 5.066e+01, threshold=8.132e+01, percent-clipped=0.0 2023-12-24 04:20:38,016 INFO [train.py:886] (0/4) Epoch 48, batch 800, loss[loss=0.00858, audio_tagging_loss=0.00858, over 24072.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4864877.90 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:20:47,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1498680.0, ans=0.0 2023-12-24 04:21:00,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1498813.3333333333, ans=0.125 2023-12-24 04:21:01,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1498813.3333333333, ans=0.2 2023-12-24 04:21:21,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1498946.6666666667, ans=0.5 2023-12-24 04:21:30,304 INFO [train.py:886] (0/4) Epoch 48, batch 850, loss[loss=0.008937, audio_tagging_loss=0.008937, over 25000.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4886396.14 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:21:48,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1499080.0, ans=0.1 2023-12-24 04:21:56,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.89 vs. limit=12.0 2023-12-24 04:22:08,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.67 vs. limit=22.5 2023-12-24 04:22:13,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1499280.0, ans=0.2 2023-12-24 04:22:20,130 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.702e+01 4.028e+01 4.180e+01 4.366e+01 5.371e+01, threshold=8.359e+01, percent-clipped=0.0 2023-12-24 04:22:22,863 INFO [train.py:886] (0/4) Epoch 48, batch 900, loss[loss=0.01035, audio_tagging_loss=0.01035, over 24750.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4904402.09 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:22:43,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1499480.0, ans=0.0 2023-12-24 04:22:44,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1499480.0, ans=0.025 2023-12-24 04:22:47,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1499480.0, ans=0.125 2023-12-24 04:22:50,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1499480.0, ans=0.1 2023-12-24 04:22:52,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1499546.6666666667, ans=0.07 2023-12-24 04:22:59,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1499546.6666666667, ans=0.125 2023-12-24 04:22:59,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.35 vs. limit=22.5 2023-12-24 04:23:01,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1499546.6666666667, ans=0.0 2023-12-24 04:23:04,770 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2023-12-24 04:23:09,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1499613.3333333333, ans=0.125 2023-12-24 04:23:14,541 INFO [train.py:886] (0/4) Epoch 48, batch 950, loss[loss=0.0126, audio_tagging_loss=0.0126, over 24950.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4912095.46 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:23:24,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1499746.6666666667, ans=0.1 2023-12-24 04:23:49,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1499880.0, ans=0.0 2023-12-24 04:24:04,739 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.583e+01 3.993e+01 4.147e+01 4.321e+01 5.221e+01, threshold=8.295e+01, percent-clipped=0.0 2023-12-24 04:24:05,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1499946.6666666667, ans=0.1 2023-12-24 04:24:07,319 INFO [train.py:886] (0/4) Epoch 48, batch 1000, loss[loss=0.01157, audio_tagging_loss=0.01157, over 24750.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4915916.28 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:24:08,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1500013.3333333333, ans=0.09899494936611666 2023-12-24 04:24:09,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1500013.3333333333, ans=0.125 2023-12-24 04:24:20,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.61 vs. limit=10.0 2023-12-24 04:24:42,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1500213.3333333333, ans=0.125 2023-12-24 04:24:55,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1500280.0, ans=0.1 2023-12-24 04:24:58,994 INFO [train.py:886] (0/4) Epoch 48, batch 1050, loss[loss=0.01172, audio_tagging_loss=0.01172, over 25000.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4922551.69 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:25:03,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1500346.6666666667, ans=0.125 2023-12-24 04:25:09,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1500413.3333333333, ans=0.015 2023-12-24 04:25:10,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1500413.3333333333, ans=0.125 2023-12-24 04:25:15,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2023-12-24 04:25:30,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1500546.6666666667, ans=0.07 2023-12-24 04:25:40,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1500613.3333333333, ans=0.125 2023-12-24 04:25:48,713 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.258e+01 3.950e+01 4.096e+01 4.313e+01 4.903e+01, threshold=8.193e+01, percent-clipped=0.0 2023-12-24 04:25:50,617 INFO [train.py:886] (0/4) Epoch 48, batch 1100, loss[loss=0.009583, audio_tagging_loss=0.009583, over 25000.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4922811.95 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:25:52,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1500680.0, ans=0.0 2023-12-24 04:25:55,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1500680.0, ans=0.125 2023-12-24 04:25:56,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1500680.0, ans=0.125 2023-12-24 04:26:17,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1500813.3333333333, ans=0.2 2023-12-24 04:26:28,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1500880.0, ans=0.0 2023-12-24 04:26:34,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1500946.6666666667, ans=0.0 2023-12-24 04:26:42,957 INFO [train.py:886] (0/4) Epoch 48, batch 1150, loss[loss=0.00985, audio_tagging_loss=0.00985, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4929010.15 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:26:52,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1501080.0, ans=0.0 2023-12-24 04:26:56,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1501080.0, ans=0.125 2023-12-24 04:27:01,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1501080.0, ans=0.1 2023-12-24 04:27:28,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1501280.0, ans=0.125 2023-12-24 04:27:30,491 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.76 vs. limit=10.0 2023-12-24 04:27:32,815 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.629e+01 3.995e+01 4.171e+01 4.338e+01 4.792e+01, threshold=8.343e+01, percent-clipped=0.0 2023-12-24 04:27:34,745 INFO [train.py:886] (0/4) Epoch 48, batch 1200, loss[loss=0.01376, audio_tagging_loss=0.01376, over 24941.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4938711.69 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:28:03,594 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2023-12-24 04:28:14,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-12-24 04:28:20,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1501613.3333333333, ans=0.125 2023-12-24 04:28:26,243 INFO [train.py:886] (0/4) Epoch 48, batch 1250, loss[loss=0.0116, audio_tagging_loss=0.0116, over 24750.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4942734.41 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:28:37,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1501746.6666666667, ans=0.1 2023-12-24 04:28:54,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.84 vs. limit=22.5 2023-12-24 04:28:56,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1501880.0, ans=0.2 2023-12-24 04:29:00,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1501880.0, ans=0.125 2023-12-24 04:29:16,865 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.731e+01 4.024e+01 4.196e+01 4.446e+01 5.087e+01, threshold=8.392e+01, percent-clipped=0.0 2023-12-24 04:29:18,762 INFO [train.py:886] (0/4) Epoch 48, batch 1300, loss[loss=0.01112, audio_tagging_loss=0.01112, over 21854.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4938218.05 frames. ], batch size: 107, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:29:50,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1502213.3333333333, ans=0.2 2023-12-24 04:29:54,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.39 vs. limit=22.5 2023-12-24 04:29:59,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2023-12-24 04:30:10,948 INFO [train.py:886] (0/4) Epoch 48, batch 1350, loss[loss=0.01118, audio_tagging_loss=0.01118, over 24750.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4939286.89 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:30:21,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1502413.3333333333, ans=0.125 2023-12-24 04:30:25,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.61 vs. limit=22.5 2023-12-24 04:30:27,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-12-24 04:30:44,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1502546.6666666667, ans=0.5 2023-12-24 04:31:00,585 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.655e+01 3.929e+01 4.129e+01 4.400e+01 5.128e+01, threshold=8.257e+01, percent-clipped=0.0 2023-12-24 04:31:02,526 INFO [train.py:886] (0/4) Epoch 48, batch 1400, loss[loss=0.009412, audio_tagging_loss=0.009412, over 25000.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4948037.39 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:31:23,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1502813.3333333333, ans=0.2 2023-12-24 04:31:34,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2023-12-24 04:31:37,242 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2023-12-24 04:31:40,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1502880.0, ans=0.0 2023-12-24 04:31:46,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1502946.6666666667, ans=0.1 2023-12-24 04:31:46,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.99 vs. limit=15.0 2023-12-24 04:31:53,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1503013.3333333333, ans=0.125 2023-12-24 04:31:54,560 INFO [train.py:886] (0/4) Epoch 48, batch 1450, loss[loss=0.01049, audio_tagging_loss=0.01049, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4953038.15 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:32:22,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1503146.6666666667, ans=15.0 2023-12-24 04:32:27,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1503213.3333333333, ans=0.125 2023-12-24 04:32:42,395 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:32:43,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1503280.0, ans=0.125 2023-12-24 04:32:44,961 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.529e+01 3.948e+01 4.171e+01 4.358e+01 4.772e+01, threshold=8.342e+01, percent-clipped=0.0 2023-12-24 04:32:46,901 INFO [train.py:886] (0/4) Epoch 48, batch 1500, loss[loss=0.01036, audio_tagging_loss=0.01036, over 25000.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4957419.12 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:32:47,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1503346.6666666667, ans=0.0 2023-12-24 04:33:05,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1503413.3333333333, ans=0.125 2023-12-24 04:33:08,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1503480.0, ans=0.1 2023-12-24 04:33:36,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1503613.3333333333, ans=0.1 2023-12-24 04:33:40,132 INFO [train.py:886] (0/4) Epoch 48, batch 1550, loss[loss=0.009921, audio_tagging_loss=0.009921, over 24750.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4948070.74 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:33:43,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1503680.0, ans=0.0 2023-12-24 04:33:51,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=1503746.6666666667, ans=0.5 2023-12-24 04:33:59,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=1503813.3333333333, ans=15.0 2023-12-24 04:33:59,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-12-24 04:34:15,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1503880.0, ans=0.1 2023-12-24 04:34:28,487 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:34:29,208 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.717e+01 4.053e+01 4.191e+01 4.372e+01 4.989e+01, threshold=8.382e+01, percent-clipped=0.0 2023-12-24 04:34:31,128 INFO [train.py:886] (0/4) Epoch 48, batch 1600, loss[loss=0.009249, audio_tagging_loss=0.009249, over 24750.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4936839.39 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:34:35,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1504013.3333333333, ans=0.125 2023-12-24 04:34:35,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1504013.3333333333, ans=0.1 2023-12-24 04:35:07,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1504213.3333333333, ans=10.0 2023-12-24 04:35:10,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=12.0 2023-12-24 04:35:22,951 INFO [train.py:886] (0/4) Epoch 48, batch 1650, loss[loss=0.01103, audio_tagging_loss=0.01103, over 25000.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4942247.93 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:35:24,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1504346.6666666667, ans=0.0 2023-12-24 04:35:27,309 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2023-12-24 04:35:32,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1504413.3333333333, ans=0.125 2023-12-24 04:35:35,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1504413.3333333333, ans=0.1 2023-12-24 04:35:43,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1504480.0, ans=0.0 2023-12-24 04:36:09,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1504613.3333333333, ans=0.0 2023-12-24 04:36:09,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1504613.3333333333, ans=0.125 2023-12-24 04:36:11,660 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.638e+01 3.961e+01 4.120e+01 4.349e+01 5.089e+01, threshold=8.240e+01, percent-clipped=0.0 2023-12-24 04:36:14,266 INFO [train.py:886] (0/4) Epoch 48, batch 1700, loss[loss=0.009228, audio_tagging_loss=0.009228, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4943267.69 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:36:16,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=15.0 2023-12-24 04:36:19,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1504680.0, ans=0.0 2023-12-24 04:36:22,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1504680.0, ans=0.0 2023-12-24 04:37:06,798 INFO [train.py:886] (0/4) Epoch 48, batch 1750, loss[loss=0.01171, audio_tagging_loss=0.01171, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4951853.89 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:37:18,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1505080.0, ans=0.015 2023-12-24 04:37:29,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1505146.6666666667, ans=0.2 2023-12-24 04:37:30,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=12.0 2023-12-24 04:37:43,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1505213.3333333333, ans=0.2 2023-12-24 04:37:47,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1505280.0, ans=0.1 2023-12-24 04:37:49,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.58 vs. limit=15.0 2023-12-24 04:37:55,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1505280.0, ans=0.125 2023-12-24 04:37:56,315 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.626e+01 3.943e+01 4.153e+01 4.308e+01 5.197e+01, threshold=8.305e+01, percent-clipped=0.0 2023-12-24 04:37:59,052 INFO [train.py:886] (0/4) Epoch 48, batch 1800, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4947063.88 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:38:18,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1505480.0, ans=0.2 2023-12-24 04:38:21,723 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:38:23,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-12-24 04:38:32,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1505546.6666666667, ans=0.1 2023-12-24 04:38:32,634 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-12-24 04:38:50,139 INFO [train.py:886] (0/4) Epoch 48, batch 1850, loss[loss=0.01007, audio_tagging_loss=0.01007, over 24750.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4948283.03 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:39:10,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1505813.3333333333, ans=0.125 2023-12-24 04:39:14,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1505813.3333333333, ans=0.125 2023-12-24 04:39:36,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1505946.6666666667, ans=0.1 2023-12-24 04:39:40,445 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.564e+01 4.037e+01 4.214e+01 4.372e+01 5.067e+01, threshold=8.428e+01, percent-clipped=0.0 2023-12-24 04:39:42,329 INFO [train.py:886] (0/4) Epoch 48, batch 1900, loss[loss=0.01209, audio_tagging_loss=0.01209, over 24750.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4943882.46 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:39:45,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1506013.3333333333, ans=0.07 2023-12-24 04:39:55,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1506080.0, ans=0.125 2023-12-24 04:40:10,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1506146.6666666667, ans=0.125 2023-12-24 04:40:11,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1506146.6666666667, ans=0.0 2023-12-24 04:40:16,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1506213.3333333333, ans=0.1 2023-12-24 04:40:33,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1506346.6666666667, ans=0.125 2023-12-24 04:40:33,857 INFO [train.py:886] (0/4) Epoch 48, batch 1950, loss[loss=0.01076, audio_tagging_loss=0.01076, over 25000.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4944811.33 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:40:41,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2023-12-24 04:40:42,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1506346.6666666667, ans=0.0 2023-12-24 04:40:43,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2023-12-24 04:41:24,477 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.628e+01 3.985e+01 4.127e+01 4.367e+01 5.324e+01, threshold=8.253e+01, percent-clipped=0.0 2023-12-24 04:41:26,441 INFO [train.py:886] (0/4) Epoch 48, batch 2000, loss[loss=0.01036, audio_tagging_loss=0.01036, over 22214.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4940703.82 frames. ], batch size: 107, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:41:32,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1506680.0, ans=0.0 2023-12-24 04:41:43,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1506746.6666666667, ans=0.05 2023-12-24 04:41:49,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1506813.3333333333, ans=0.125 2023-12-24 04:42:00,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1506880.0, ans=0.2 2023-12-24 04:42:03,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1506880.0, ans=0.125 2023-12-24 04:42:17,919 INFO [train.py:886] (0/4) Epoch 48, batch 2050, loss[loss=0.008597, audio_tagging_loss=0.008597, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4945688.29 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:42:22,807 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.34 vs. limit=6.0 2023-12-24 04:42:46,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1507146.6666666667, ans=0.0 2023-12-24 04:42:56,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1507213.3333333333, ans=0.125 2023-12-24 04:43:00,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1507280.0, ans=0.0 2023-12-24 04:43:07,459 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.609e+01 3.972e+01 4.171e+01 4.415e+01 5.113e+01, threshold=8.342e+01, percent-clipped=0.0 2023-12-24 04:43:09,379 INFO [train.py:886] (0/4) Epoch 48, batch 2100, loss[loss=0.009404, audio_tagging_loss=0.009404, over 25000.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4946886.64 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:43:09,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1507346.6666666667, ans=0.125 2023-12-24 04:43:09,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1507346.6666666667, ans=0.07 2023-12-24 04:43:17,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1507346.6666666667, ans=0.125 2023-12-24 04:43:18,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.03 vs. limit=12.0 2023-12-24 04:43:24,165 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:43:25,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1507413.3333333333, ans=0.2 2023-12-24 04:43:33,114 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:43:35,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1507480.0, ans=0.1 2023-12-24 04:43:49,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1507546.6666666667, ans=0.125 2023-12-24 04:43:53,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1507613.3333333333, ans=0.125 2023-12-24 04:43:57,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1507613.3333333333, ans=0.125 2023-12-24 04:43:59,341 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=12.0 2023-12-24 04:44:00,790 INFO [train.py:886] (0/4) Epoch 48, batch 2150, loss[loss=0.01018, audio_tagging_loss=0.01018, over 25000.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4949016.35 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:44:14,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1507746.6666666667, ans=0.1 2023-12-24 04:44:50,300 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.605e+01 4.004e+01 4.221e+01 4.415e+01 5.119e+01, threshold=8.442e+01, percent-clipped=0.0 2023-12-24 04:44:52,941 INFO [train.py:886] (0/4) Epoch 48, batch 2200, loss[loss=0.01237, audio_tagging_loss=0.01237, over 24750.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4940728.66 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:45:01,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1508080.0, ans=0.0 2023-12-24 04:45:03,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1508080.0, ans=0.0 2023-12-24 04:45:05,747 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-12-24 04:45:28,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=15.0 2023-12-24 04:45:30,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1508213.3333333333, ans=0.125 2023-12-24 04:45:32,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1508280.0, ans=0.125 2023-12-24 04:45:40,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1508280.0, ans=0.0 2023-12-24 04:45:43,765 INFO [train.py:886] (0/4) Epoch 48, batch 2250, loss[loss=0.01009, audio_tagging_loss=0.01009, over 24750.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4935269.72 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:46:05,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1508480.0, ans=0.1 2023-12-24 04:46:05,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1508480.0, ans=0.0 2023-12-24 04:46:22,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1508613.3333333333, ans=0.1 2023-12-24 04:46:32,555 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.473e+01 4.034e+01 4.155e+01 4.335e+01 5.306e+01, threshold=8.310e+01, percent-clipped=0.0 2023-12-24 04:46:34,437 INFO [train.py:886] (0/4) Epoch 48, batch 2300, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4935864.08 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:46:42,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1508680.0, ans=0.125 2023-12-24 04:46:58,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1508813.3333333333, ans=0.125 2023-12-24 04:46:58,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2023-12-24 04:47:21,179 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:47:25,848 INFO [train.py:886] (0/4) Epoch 48, batch 2350, loss[loss=0.0103, audio_tagging_loss=0.0103, over 25000.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4942702.63 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:47:37,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1509080.0, ans=0.0 2023-12-24 04:47:41,676 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:47:51,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1509146.6666666667, ans=0.0 2023-12-24 04:48:13,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1509280.0, ans=0.125 2023-12-24 04:48:14,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2023-12-24 04:48:15,762 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.606e+01 3.960e+01 4.074e+01 4.287e+01 4.962e+01, threshold=8.148e+01, percent-clipped=0.0 2023-12-24 04:48:17,720 INFO [train.py:886] (0/4) Epoch 48, batch 2400, loss[loss=0.01147, audio_tagging_loss=0.01147, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4948500.34 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:49:10,956 INFO [train.py:886] (0/4) Epoch 48, batch 2450, loss[loss=0.01149, audio_tagging_loss=0.01149, over 25000.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4952223.72 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:49:42,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1509880.0, ans=0.125 2023-12-24 04:50:00,245 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.486e+01 4.002e+01 4.128e+01 4.314e+01 5.407e+01, threshold=8.257e+01, percent-clipped=0.0 2023-12-24 04:50:02,155 INFO [train.py:886] (0/4) Epoch 48, batch 2500, loss[loss=0.01099, audio_tagging_loss=0.01099, over 24750.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4955136.02 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:50:02,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1510013.3333333333, ans=0.025 2023-12-24 04:50:18,438 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:50:45,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.62 vs. limit=22.5 2023-12-24 04:50:54,094 INFO [train.py:886] (0/4) Epoch 48, batch 2550, loss[loss=0.0107, audio_tagging_loss=0.0107, over 24750.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4946591.43 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:50:55,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1510346.6666666667, ans=0.125 2023-12-24 04:51:34,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1510613.3333333333, ans=0.0 2023-12-24 04:51:43,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1510613.3333333333, ans=0.125 2023-12-24 04:51:43,839 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.503e+01 4.050e+01 4.211e+01 4.452e+01 5.003e+01, threshold=8.421e+01, percent-clipped=0.0 2023-12-24 04:51:46,458 INFO [train.py:886] (0/4) Epoch 48, batch 2600, loss[loss=0.01184, audio_tagging_loss=0.01184, over 25000.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4946771.19 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:51:54,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1510680.0, ans=0.0 2023-12-24 04:51:57,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1510746.6666666667, ans=0.125 2023-12-24 04:52:18,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1510880.0, ans=0.0 2023-12-24 04:52:28,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1510946.6666666667, ans=0.125 2023-12-24 04:52:32,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2023-12-24 04:52:37,980 INFO [train.py:886] (0/4) Epoch 48, batch 2650, loss[loss=0.01145, audio_tagging_loss=0.01145, over 25000.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4948590.88 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:53:28,179 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.465e+01 3.977e+01 4.124e+01 4.274e+01 5.169e+01, threshold=8.248e+01, percent-clipped=0.0 2023-12-24 04:53:30,069 INFO [train.py:886] (0/4) Epoch 48, batch 2700, loss[loss=0.008938, audio_tagging_loss=0.008938, over 21660.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4948917.78 frames. ], batch size: 107, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:53:38,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=22.5 2023-12-24 04:53:51,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.37 vs. limit=22.5 2023-12-24 04:54:18,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1511613.3333333333, ans=0.125 2023-12-24 04:54:20,678 INFO [train.py:886] (0/4) Epoch 48, batch 2750, loss[loss=0.01224, audio_tagging_loss=0.01224, over 25000.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4956020.06 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:54:32,795 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2023-12-24 04:54:49,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1511813.3333333333, ans=0.125 2023-12-24 04:54:50,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-12-24 04:54:57,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1511880.0, ans=0.125 2023-12-24 04:55:00,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2023-12-24 04:55:02,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1511946.6666666667, ans=0.0 2023-12-24 04:55:10,759 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.456e+01 3.953e+01 4.094e+01 4.279e+01 4.852e+01, threshold=8.188e+01, percent-clipped=0.0 2023-12-24 04:55:11,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.93 vs. limit=22.5 2023-12-24 04:55:12,689 INFO [train.py:886] (0/4) Epoch 48, batch 2800, loss[loss=0.01325, audio_tagging_loss=0.01325, over 24950.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4958592.76 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:55:19,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1512013.3333333333, ans=0.125 2023-12-24 04:55:20,499 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2023-12-24 04:55:45,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2023-12-24 04:55:49,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1512213.3333333333, ans=0.0 2023-12-24 04:55:53,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1512280.0, ans=0.2 2023-12-24 04:56:04,334 INFO [train.py:886] (0/4) Epoch 48, batch 2850, loss[loss=0.01212, audio_tagging_loss=0.01212, over 22252.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4941940.60 frames. ], batch size: 107, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:56:09,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1512346.6666666667, ans=0.125 2023-12-24 04:56:23,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=22.5 2023-12-24 04:56:24,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1512480.0, ans=0.09899494936611666 2023-12-24 04:56:37,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.70 vs. limit=15.0 2023-12-24 04:56:49,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1512613.3333333333, ans=0.1 2023-12-24 04:56:53,474 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.637e+01 3.988e+01 4.154e+01 4.396e+01 6.475e+01, threshold=8.307e+01, percent-clipped=0.0 2023-12-24 04:56:55,363 INFO [train.py:886] (0/4) Epoch 48, batch 2900, loss[loss=0.009099, audio_tagging_loss=0.009099, over 24750.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4942164.19 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:57:00,679 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-12-24 04:57:03,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1512680.0, ans=0.0 2023-12-24 04:57:13,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1512746.6666666667, ans=0.125 2023-12-24 04:57:16,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1512813.3333333333, ans=0.2 2023-12-24 04:57:30,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1512880.0, ans=0.0 2023-12-24 04:57:47,894 INFO [train.py:886] (0/4) Epoch 48, batch 2950, loss[loss=0.01161, audio_tagging_loss=0.01161, over 24750.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4938684.63 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:57:49,252 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2023-12-24 04:58:23,572 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.64 vs. limit=10.0 2023-12-24 04:58:37,208 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.638e+01 3.916e+01 4.052e+01 4.286e+01 4.882e+01, threshold=8.104e+01, percent-clipped=0.0 2023-12-24 04:58:38,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1513346.6666666667, ans=0.0 2023-12-24 04:58:39,121 INFO [train.py:886] (0/4) Epoch 48, batch 3000, loss[loss=0.01059, audio_tagging_loss=0.01059, over 25000.00 frames. ], tot_loss[loss=0.0106, audio_tagging_loss=0.0106, over 4946681.93 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:58:39,122 INFO [train.py:909] (0/4) Computing validation loss 2023-12-24 04:58:58,262 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.7281, 2.4148, 2.5344, 2.2040, 3.8854, 3.1667, 4.0653, 2.4717], device='cuda:0') 2023-12-24 04:59:00,488 INFO [train.py:917] (0/4) Epoch 48, validation: loss=0.03695, audio_tagging_loss=0.03695, over 3737520.00 frames. 2023-12-24 04:59:00,489 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-24 04:59:16,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1513413.3333333333, ans=0.1 2023-12-24 04:59:32,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1513546.6666666667, ans=0.0 2023-12-24 04:59:42,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1513613.3333333333, ans=0.0 2023-12-24 04:59:43,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1513613.3333333333, ans=0.2 2023-12-24 04:59:52,935 INFO [train.py:886] (0/4) Epoch 48, batch 3050, loss[loss=0.01102, audio_tagging_loss=0.01102, over 25000.00 frames. ], tot_loss[loss=0.01049, audio_tagging_loss=0.01049, over 4940094.58 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:59:56,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1513680.0, ans=0.07 2023-12-24 04:59:57,161 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=12.0 2023-12-24 05:00:02,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2023-12-24 05:00:03,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1513746.6666666667, ans=0.0 2023-12-24 05:00:03,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1513746.6666666667, ans=0.125 2023-12-24 05:00:12,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1513813.3333333333, ans=0.2 2023-12-24 05:00:19,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1513813.3333333333, ans=0.125 2023-12-24 05:00:28,203 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.25 vs. limit=22.5 2023-12-24 05:00:40,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1513946.6666666667, ans=0.125 2023-12-24 05:00:41,843 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.634e+01 4.038e+01 4.200e+01 4.350e+01 5.861e+01, threshold=8.401e+01, percent-clipped=0.0 2023-12-24 05:00:42,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=15.0 2023-12-24 05:00:44,476 INFO [train.py:886] (0/4) Epoch 48, batch 3100, loss[loss=0.01028, audio_tagging_loss=0.01028, over 24750.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4945775.79 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:01:36,055 INFO [train.py:886] (0/4) Epoch 48, batch 3150, loss[loss=0.01271, audio_tagging_loss=0.01271, over 24942.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4941650.47 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:01:44,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1514346.6666666667, ans=0.2 2023-12-24 05:01:47,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1514413.3333333333, ans=0.04949747468305833 2023-12-24 05:01:48,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1514413.3333333333, ans=0.1 2023-12-24 05:01:56,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1514480.0, ans=0.125 2023-12-24 05:02:03,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1514480.0, ans=0.09899494936611666 2023-12-24 05:02:20,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1514613.3333333333, ans=0.2 2023-12-24 05:02:22,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1514613.3333333333, ans=0.1 2023-12-24 05:02:26,562 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.591e+01 4.003e+01 4.208e+01 4.385e+01 5.175e+01, threshold=8.416e+01, percent-clipped=0.0 2023-12-24 05:02:26,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2023-12-24 05:02:28,483 INFO [train.py:886] (0/4) Epoch 48, batch 3200, loss[loss=0.01088, audio_tagging_loss=0.01088, over 24750.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4939369.33 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:02:35,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.13 vs. limit=22.5 2023-12-24 05:02:38,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1514746.6666666667, ans=0.0 2023-12-24 05:02:52,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2023-12-24 05:02:54,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1514813.3333333333, ans=0.125 2023-12-24 05:03:09,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1514946.6666666667, ans=0.1 2023-12-24 05:03:18,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1514946.6666666667, ans=0.1 2023-12-24 05:03:18,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1514946.6666666667, ans=0.125 2023-12-24 05:03:20,321 INFO [train.py:886] (0/4) Epoch 48, batch 3250, loss[loss=0.008561, audio_tagging_loss=0.008561, over 24750.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4941596.20 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:03:32,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1515080.0, ans=0.2 2023-12-24 05:04:10,063 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.551e+01 3.994e+01 4.173e+01 4.405e+01 4.939e+01, threshold=8.347e+01, percent-clipped=0.0 2023-12-24 05:04:11,493 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2023-12-24 05:04:12,013 INFO [train.py:886] (0/4) Epoch 48, batch 3300, loss[loss=0.01129, audio_tagging_loss=0.01129, over 25000.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4944618.21 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:04:17,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1515346.6666666667, ans=0.0 2023-12-24 05:04:21,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.66 vs. limit=22.5 2023-12-24 05:04:34,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1515480.0, ans=0.0 2023-12-24 05:04:43,460 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:04:56,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1515613.3333333333, ans=0.125 2023-12-24 05:05:04,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1515680.0, ans=0.125 2023-12-24 05:05:04,501 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.18 vs. limit=15.0 2023-12-24 05:05:04,986 INFO [train.py:886] (0/4) Epoch 48, batch 3350, loss[loss=0.01314, audio_tagging_loss=0.01314, over 25000.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4949308.69 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:05:22,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1515746.6666666667, ans=0.0 2023-12-24 05:05:37,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1515880.0, ans=0.1 2023-12-24 05:05:38,866 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2023-12-24 05:05:42,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.84 vs. limit=12.0 2023-12-24 05:05:54,534 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.602e+01 3.981e+01 4.138e+01 4.310e+01 5.201e+01, threshold=8.276e+01, percent-clipped=0.0 2023-12-24 05:05:56,178 INFO [train.py:886] (0/4) Epoch 48, batch 3400, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4951107.35 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:06:13,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1516080.0, ans=0.0 2023-12-24 05:06:14,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1516080.0, ans=0.0 2023-12-24 05:06:20,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2023-12-24 05:06:25,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1516213.3333333333, ans=0.0 2023-12-24 05:06:37,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1516280.0, ans=0.0 2023-12-24 05:06:39,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1516280.0, ans=0.1 2023-12-24 05:06:47,741 INFO [train.py:886] (0/4) Epoch 48, batch 3450, loss[loss=0.01175, audio_tagging_loss=0.01175, over 24750.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4945076.87 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:07:01,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=15.0 2023-12-24 05:07:13,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1516480.0, ans=0.0 2023-12-24 05:07:17,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1516546.6666666667, ans=0.5 2023-12-24 05:07:38,220 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 4.055e+01 4.203e+01 4.420e+01 4.923e+01, threshold=8.407e+01, percent-clipped=0.0 2023-12-24 05:07:39,187 INFO [train.py:886] (0/4) Epoch 48, batch 3500, loss[loss=0.01222, audio_tagging_loss=0.01222, over 24750.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4940771.20 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:07:45,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1516680.0, ans=0.0 2023-12-24 05:07:51,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.98 vs. limit=15.0 2023-12-24 05:08:12,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1516880.0, ans=0.125 2023-12-24 05:08:17,316 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.18 vs. limit=6.0 2023-12-24 05:08:30,193 INFO [train.py:886] (0/4) Epoch 48, batch 3550, loss[loss=0.01027, audio_tagging_loss=0.01027, over 22020.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4943647.13 frames. ], batch size: 107, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:08:30,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1517013.3333333333, ans=0.5 2023-12-24 05:08:35,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1517013.3333333333, ans=0.125 2023-12-24 05:08:45,994 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=22.5 2023-12-24 05:08:56,099 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:08:57,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1517146.6666666667, ans=0.125 2023-12-24 05:09:01,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=1517213.3333333333, ans=22.5 2023-12-24 05:09:11,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1517280.0, ans=0.125 2023-12-24 05:09:22,407 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.601e+01 3.929e+01 4.166e+01 4.382e+01 5.169e+01, threshold=8.332e+01, percent-clipped=0.0 2023-12-24 05:09:23,395 INFO [train.py:886] (0/4) Epoch 48, batch 3600, loss[loss=0.00941, audio_tagging_loss=0.00941, over 22097.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4948873.52 frames. ], batch size: 107, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:09:43,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1517480.0, ans=0.125 2023-12-24 05:09:52,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1517480.0, ans=0.125 2023-12-24 05:10:01,570 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2023-12-24 05:10:08,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1517613.3333333333, ans=0.07 2023-12-24 05:10:12,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1517613.3333333333, ans=0.125 2023-12-24 05:10:14,231 INFO [train.py:886] (0/4) Epoch 48, batch 3650, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4955003.86 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:10:35,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1517813.3333333333, ans=0.125 2023-12-24 05:11:05,043 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.632e+01 4.020e+01 4.210e+01 4.432e+01 5.110e+01, threshold=8.421e+01, percent-clipped=0.0 2023-12-24 05:11:06,031 INFO [train.py:886] (0/4) Epoch 48, batch 3700, loss[loss=0.01164, audio_tagging_loss=0.01164, over 24750.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4955236.04 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:11:10,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1518013.3333333333, ans=0.1 2023-12-24 05:11:18,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1518080.0, ans=0.0 2023-12-24 05:11:49,249 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:11:58,842 INFO [train.py:886] (0/4) Epoch 48, batch 3750, loss[loss=0.009256, audio_tagging_loss=0.009256, over 24750.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4955476.12 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:12:02,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1518346.6666666667, ans=0.1 2023-12-24 05:12:14,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1518413.3333333333, ans=0.2 2023-12-24 05:12:25,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1518480.0, ans=0.0 2023-12-24 05:12:31,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1518546.6666666667, ans=0.125 2023-12-24 05:12:39,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1518546.6666666667, ans=0.05 2023-12-24 05:12:49,908 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.681e+01 4.049e+01 4.266e+01 4.434e+01 5.925e+01, threshold=8.531e+01, percent-clipped=0.0 2023-12-24 05:12:50,901 INFO [train.py:886] (0/4) Epoch 48, batch 3800, loss[loss=0.01037, audio_tagging_loss=0.01037, over 22670.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4944264.93 frames. ], batch size: 107, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:12:52,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1518680.0, ans=0.1 2023-12-24 05:12:58,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1518680.0, ans=0.125 2023-12-24 05:13:00,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1518680.0, ans=0.2 2023-12-24 05:13:01,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1518746.6666666667, ans=0.1 2023-12-24 05:13:09,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1518746.6666666667, ans=0.0 2023-12-24 05:13:11,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1518813.3333333333, ans=0.125 2023-12-24 05:13:25,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1518880.0, ans=0.2 2023-12-24 05:13:37,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1518946.6666666667, ans=0.0 2023-12-24 05:13:37,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1518946.6666666667, ans=0.125 2023-12-24 05:13:42,681 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:13:43,505 INFO [train.py:886] (0/4) Epoch 48, batch 3850, loss[loss=0.01239, audio_tagging_loss=0.01239, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4946134.93 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:13:49,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.01 vs. limit=10.0 2023-12-24 05:13:53,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1519080.0, ans=0.125 2023-12-24 05:14:05,124 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:14:14,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1519213.3333333333, ans=0.125 2023-12-24 05:14:25,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.92 vs. limit=22.5 2023-12-24 05:14:29,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1519280.0, ans=0.1 2023-12-24 05:14:29,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1519280.0, ans=0.1 2023-12-24 05:14:33,478 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.582e+01 4.007e+01 4.136e+01 4.360e+01 5.120e+01, threshold=8.272e+01, percent-clipped=0.0 2023-12-24 05:14:35,556 INFO [train.py:886] (0/4) Epoch 48, batch 3900, loss[loss=0.008446, audio_tagging_loss=0.008446, over 25000.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4952295.67 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:14:35,835 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:14:40,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1519346.6666666667, ans=0.0 2023-12-24 05:14:42,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1519346.6666666667, ans=0.1 2023-12-24 05:14:48,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1519413.3333333333, ans=0.0 2023-12-24 05:14:51,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.52 vs. limit=10.0 2023-12-24 05:14:53,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1519413.3333333333, ans=0.125 2023-12-24 05:15:05,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1519546.6666666667, ans=0.2 2023-12-24 05:15:17,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1519613.3333333333, ans=0.0 2023-12-24 05:15:21,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1519613.3333333333, ans=0.1 2023-12-24 05:15:24,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1519613.3333333333, ans=0.2 2023-12-24 05:15:27,380 INFO [train.py:886] (0/4) Epoch 48, batch 3950, loss[loss=0.01081, audio_tagging_loss=0.01081, over 25000.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4955037.49 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:15:29,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-12-24 05:15:41,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2023-12-24 05:15:43,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1519746.6666666667, ans=0.2 2023-12-24 05:16:15,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1519946.6666666667, ans=0.125 2023-12-24 05:16:16,447 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-228000.pt 2023-12-24 05:16:20,885 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.458e+01 4.031e+01 4.185e+01 4.396e+01 5.576e+01, threshold=8.370e+01, percent-clipped=0.0 2023-12-24 05:16:21,866 INFO [train.py:886] (0/4) Epoch 48, batch 4000, loss[loss=0.00913, audio_tagging_loss=0.00913, over 25000.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4961595.77 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:16:25,734 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:16:26,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1520013.3333333333, ans=0.2 2023-12-24 05:16:28,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1520013.3333333333, ans=0.125 2023-12-24 05:16:29,582 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:16:33,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.98 vs. limit=22.5 2023-12-24 05:16:46,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.72 vs. limit=8.0 2023-12-24 05:16:59,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1520213.3333333333, ans=0.2 2023-12-24 05:17:13,340 INFO [train.py:886] (0/4) Epoch 48, batch 4050, loss[loss=0.0101, audio_tagging_loss=0.0101, over 24750.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4960805.78 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:17:26,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1520413.3333333333, ans=0.125 2023-12-24 05:17:28,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.79 vs. limit=22.5 2023-12-24 05:17:30,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1520413.3333333333, ans=0.2 2023-12-24 05:17:44,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1520546.6666666667, ans=0.125 2023-12-24 05:18:05,528 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.431e+01 4.000e+01 4.154e+01 4.324e+01 5.032e+01, threshold=8.309e+01, percent-clipped=0.0 2023-12-24 05:18:06,506 INFO [train.py:886] (0/4) Epoch 48, batch 4100, loss[loss=0.01165, audio_tagging_loss=0.01165, over 25000.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4951834.29 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:18:11,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1520680.0, ans=0.0 2023-12-24 05:18:11,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1520680.0, ans=0.125 2023-12-24 05:18:13,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1520680.0, ans=0.1 2023-12-24 05:18:46,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2023-12-24 05:18:48,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1520946.6666666667, ans=0.125 2023-12-24 05:18:55,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1520946.6666666667, ans=0.125 2023-12-24 05:18:57,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1521013.3333333333, ans=0.125 2023-12-24 05:18:58,349 INFO [train.py:886] (0/4) Epoch 48, batch 4150, loss[loss=0.01386, audio_tagging_loss=0.01386, over 24750.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4952117.28 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:19:04,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2023-12-24 05:19:38,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1521213.3333333333, ans=0.1 2023-12-24 05:19:44,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1521280.0, ans=0.0 2023-12-24 05:19:49,396 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.487e+01 4.024e+01 4.153e+01 4.360e+01 4.986e+01, threshold=8.306e+01, percent-clipped=0.0 2023-12-24 05:19:50,384 INFO [train.py:886] (0/4) Epoch 48, batch 4200, loss[loss=0.009915, audio_tagging_loss=0.009915, over 24750.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4948162.09 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:19:52,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1521346.6666666667, ans=0.95 2023-12-24 05:20:14,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1521480.0, ans=0.2 2023-12-24 05:20:27,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1521546.6666666667, ans=0.1 2023-12-24 05:20:39,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1521613.3333333333, ans=0.2 2023-12-24 05:20:43,073 INFO [train.py:886] (0/4) Epoch 48, batch 4250, loss[loss=0.008911, audio_tagging_loss=0.008911, over 22092.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4941619.55 frames. ], batch size: 107, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:20:50,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1521680.0, ans=0.0 2023-12-24 05:21:00,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1521746.6666666667, ans=0.1 2023-12-24 05:21:01,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1521746.6666666667, ans=0.125 2023-12-24 05:21:02,108 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.44 vs. limit=22.5 2023-12-24 05:21:06,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1521813.3333333333, ans=0.05 2023-12-24 05:21:29,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1521946.6666666667, ans=0.125 2023-12-24 05:21:33,911 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.572e+01 3.984e+01 4.154e+01 4.315e+01 4.731e+01, threshold=8.307e+01, percent-clipped=0.0 2023-12-24 05:21:34,912 INFO [train.py:886] (0/4) Epoch 48, batch 4300, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4949102.83 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:21:40,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1522013.3333333333, ans=0.125 2023-12-24 05:22:10,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1522213.3333333333, ans=0.125 2023-12-24 05:22:21,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1522280.0, ans=0.1 2023-12-24 05:22:26,760 INFO [train.py:886] (0/4) Epoch 48, batch 4350, loss[loss=0.01146, audio_tagging_loss=0.01146, over 24750.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4948401.36 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:22:31,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1522346.6666666667, ans=0.125 2023-12-24 05:22:32,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1522346.6666666667, ans=0.0 2023-12-24 05:22:35,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.89 vs. limit=10.0 2023-12-24 05:22:50,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=1522480.0, ans=0.02 2023-12-24 05:22:57,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1522546.6666666667, ans=0.1 2023-12-24 05:23:16,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1522613.3333333333, ans=0.0 2023-12-24 05:23:18,155 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.751e+01 4.021e+01 4.163e+01 4.358e+01 4.850e+01, threshold=8.326e+01, percent-clipped=0.0 2023-12-24 05:23:19,154 INFO [train.py:886] (0/4) Epoch 48, batch 4400, loss[loss=0.01379, audio_tagging_loss=0.01379, over 24937.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4945481.66 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:23:48,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1522813.3333333333, ans=0.125 2023-12-24 05:23:57,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2023-12-24 05:24:00,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1522946.6666666667, ans=0.2 2023-12-24 05:24:03,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1522946.6666666667, ans=0.0 2023-12-24 05:24:10,765 INFO [train.py:886] (0/4) Epoch 48, batch 4450, loss[loss=0.01019, audio_tagging_loss=0.01019, over 24750.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4942988.07 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:24:13,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1523013.3333333333, ans=0.1 2023-12-24 05:24:37,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=15.0 2023-12-24 05:24:39,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1523146.6666666667, ans=0.0 2023-12-24 05:25:01,879 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.573e+01 4.053e+01 4.196e+01 4.337e+01 4.887e+01, threshold=8.393e+01, percent-clipped=0.0 2023-12-24 05:25:02,868 INFO [train.py:886] (0/4) Epoch 48, batch 4500, loss[loss=0.01153, audio_tagging_loss=0.01153, over 24750.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4941715.87 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:25:11,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1523413.3333333333, ans=0.09899494936611666 2023-12-24 05:25:23,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1523480.0, ans=0.125 2023-12-24 05:25:30,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1523480.0, ans=0.95 2023-12-24 05:25:35,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1523546.6666666667, ans=0.1 2023-12-24 05:25:53,558 INFO [train.py:886] (0/4) Epoch 48, batch 4550, loss[loss=0.01102, audio_tagging_loss=0.01102, over 25000.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4951236.63 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:26:08,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1523746.6666666667, ans=0.125 2023-12-24 05:26:28,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1523880.0, ans=0.125 2023-12-24 05:26:32,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1523880.0, ans=0.125 2023-12-24 05:26:38,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.07 vs. limit=15.0 2023-12-24 05:26:44,557 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.666e+01 4.022e+01 4.224e+01 4.415e+01 4.872e+01, threshold=8.448e+01, percent-clipped=0.0 2023-12-24 05:26:45,540 INFO [train.py:886] (0/4) Epoch 48, batch 4600, loss[loss=0.01032, audio_tagging_loss=0.01032, over 25000.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4943545.31 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:26:50,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.17 vs. limit=22.5 2023-12-24 05:27:00,613 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.28 vs. limit=10.0 2023-12-24 05:27:12,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1524146.6666666667, ans=0.125 2023-12-24 05:27:38,191 INFO [train.py:886] (0/4) Epoch 48, batch 4650, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4951815.70 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:27:40,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1524346.6666666667, ans=0.0 2023-12-24 05:27:43,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1524346.6666666667, ans=0.0 2023-12-24 05:27:44,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1524346.6666666667, ans=0.125 2023-12-24 05:28:09,735 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:28:23,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2023-12-24 05:28:26,940 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.730e+01 4.056e+01 4.273e+01 4.476e+01 5.742e+01, threshold=8.545e+01, percent-clipped=0.0 2023-12-24 05:28:27,900 INFO [train.py:886] (0/4) Epoch 48, batch 4700, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4950519.10 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:28:42,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1524746.6666666667, ans=15.0 2023-12-24 05:28:51,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1524813.3333333333, ans=0.125 2023-12-24 05:28:53,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1524813.3333333333, ans=0.125 2023-12-24 05:29:02,977 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2023-12-24 05:29:12,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1524946.6666666667, ans=0.125 2023-12-24 05:29:15,581 INFO [train.py:886] (0/4) Epoch 48, batch 4750, loss[loss=0.01117, audio_tagging_loss=0.01117, over 24750.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4949154.07 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:29:27,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1525080.0, ans=0.2 2023-12-24 05:29:30,728 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-48.pt 2023-12-24 05:29:50,898 INFO [train.py:886] (0/4) Epoch 49, batch 0, loss[loss=0.02108, audio_tagging_loss=0.02108, over 25000.00 frames. ], tot_loss[loss=0.02108, audio_tagging_loss=0.02108, over 25000.00 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 32.0 2023-12-24 05:29:50,899 INFO [train.py:909] (0/4) Computing validation loss 2023-12-24 05:30:12,075 INFO [train.py:917] (0/4) Epoch 49, validation: loss=0.03671, audio_tagging_loss=0.03671, over 3737520.00 frames. 2023-12-24 05:30:12,076 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-24 05:30:23,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1525186.6666666667, ans=0.0 2023-12-24 05:30:34,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1525253.3333333333, ans=0.1 2023-12-24 05:30:37,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1525253.3333333333, ans=0.125 2023-12-24 05:30:43,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1525320.0, ans=0.125 2023-12-24 05:30:47,304 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.515e+01 4.166e+01 4.419e+01 5.711e+01 1.124e+02, threshold=8.838e+01, percent-clipped=6.0 2023-12-24 05:30:58,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1525386.6666666667, ans=0.0 2023-12-24 05:31:03,284 INFO [train.py:886] (0/4) Epoch 49, batch 50, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.01722, audio_tagging_loss=0.01722, over 1114332.56 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:31:21,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1525520.0, ans=0.2 2023-12-24 05:31:21,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1525520.0, ans=0.05 2023-12-24 05:31:54,141 INFO [train.py:886] (0/4) Epoch 49, batch 100, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 1968970.02 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:31:56,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1525786.6666666667, ans=0.125 2023-12-24 05:32:01,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1525786.6666666667, ans=0.1 2023-12-24 05:32:04,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1525853.3333333333, ans=0.2 2023-12-24 05:32:19,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1525920.0, ans=0.125 2023-12-24 05:32:28,988 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.922e+01 4.393e+01 4.587e+01 4.980e+01 5.717e+01, threshold=9.174e+01, percent-clipped=0.0 2023-12-24 05:32:43,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1526053.3333333333, ans=0.125 2023-12-24 05:32:44,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1526120.0, ans=0.125 2023-12-24 05:32:44,819 INFO [train.py:886] (0/4) Epoch 49, batch 150, loss[loss=0.01246, audio_tagging_loss=0.01246, over 25000.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 2636930.99 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:32:57,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1526186.6666666667, ans=0.125 2023-12-24 05:33:28,992 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:33:37,122 INFO [train.py:886] (0/4) Epoch 49, batch 200, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 3154528.81 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:33:39,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1526453.3333333333, ans=0.125 2023-12-24 05:34:03,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1526586.6666666667, ans=0.1 2023-12-24 05:34:12,029 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.501e+01 4.050e+01 4.227e+01 4.381e+01 4.955e+01, threshold=8.454e+01, percent-clipped=0.0 2023-12-24 05:34:15,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1526653.3333333333, ans=0.0 2023-12-24 05:34:21,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1526720.0, ans=0.0 2023-12-24 05:34:28,161 INFO [train.py:886] (0/4) Epoch 49, batch 250, loss[loss=0.01094, audio_tagging_loss=0.01094, over 25000.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 3556231.23 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:34:53,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1526920.0, ans=0.125 2023-12-24 05:35:05,647 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.11 vs. limit=6.0 2023-12-24 05:35:14,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1527053.3333333333, ans=0.125 2023-12-24 05:35:16,771 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=22.5 2023-12-24 05:35:19,252 INFO [train.py:886] (0/4) Epoch 49, batch 300, loss[loss=0.01189, audio_tagging_loss=0.01189, over 24750.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 3864215.10 frames. ], batch size: 99, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:35:39,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.07 vs. limit=22.5 2023-12-24 05:35:53,616 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.671e+01 4.036e+01 4.211e+01 4.378e+01 5.277e+01, threshold=8.421e+01, percent-clipped=0.0 2023-12-24 05:36:10,938 INFO [train.py:886] (0/4) Epoch 49, batch 350, loss[loss=0.01048, audio_tagging_loss=0.01048, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4103296.70 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 16.0 2023-12-24 05:36:12,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1527453.3333333333, ans=0.0 2023-12-24 05:36:17,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.38 vs. limit=15.0 2023-12-24 05:36:21,742 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.29 vs. limit=22.5 2023-12-24 05:36:23,694 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2023-12-24 05:36:28,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.81 vs. limit=10.0 2023-12-24 05:36:41,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2023-12-24 05:36:50,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-12-24 05:37:01,188 INFO [train.py:886] (0/4) Epoch 49, batch 400, loss[loss=0.01173, audio_tagging_loss=0.01173, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4288634.69 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:37:24,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1527920.0, ans=0.125 2023-12-24 05:37:28,130 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:37:36,420 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.568e+01 3.880e+01 4.116e+01 4.327e+01 4.784e+01, threshold=8.231e+01, percent-clipped=0.0 2023-12-24 05:37:36,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1527986.6666666667, ans=0.0 2023-12-24 05:37:37,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1527986.6666666667, ans=0.125 2023-12-24 05:37:37,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=12.0 2023-12-24 05:37:53,096 INFO [train.py:886] (0/4) Epoch 49, batch 450, loss[loss=0.01236, audio_tagging_loss=0.01236, over 24903.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4441555.55 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:38:01,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1528120.0, ans=0.1 2023-12-24 05:38:14,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=1528253.3333333333, ans=0.2 2023-12-24 05:38:37,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1528386.6666666667, ans=0.0 2023-12-24 05:38:44,055 INFO [train.py:886] (0/4) Epoch 49, batch 500, loss[loss=0.01127, audio_tagging_loss=0.01127, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4556476.07 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:39:04,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1528586.6666666667, ans=0.125 2023-12-24 05:39:05,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1528586.6666666667, ans=0.0 2023-12-24 05:39:16,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1528653.3333333333, ans=0.0 2023-12-24 05:39:18,318 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.604e+01 3.968e+01 4.140e+01 4.300e+01 5.100e+01, threshold=8.280e+01, percent-clipped=0.0 2023-12-24 05:39:22,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1528653.3333333333, ans=0.05 2023-12-24 05:39:28,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1528720.0, ans=0.1 2023-12-24 05:39:34,825 INFO [train.py:886] (0/4) Epoch 49, batch 550, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4642691.84 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:39:35,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1528786.6666666667, ans=0.125 2023-12-24 05:39:36,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1528786.6666666667, ans=0.0 2023-12-24 05:39:46,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.29 vs. limit=15.0 2023-12-24 05:39:52,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1528853.3333333333, ans=0.125 2023-12-24 05:39:53,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.57 vs. limit=10.0 2023-12-24 05:40:08,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1528986.6666666667, ans=0.1 2023-12-24 05:40:25,968 INFO [train.py:886] (0/4) Epoch 49, batch 600, loss[loss=0.008287, audio_tagging_loss=0.008287, over 24750.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4711486.56 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:40:33,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1529120.0, ans=0.2 2023-12-24 05:40:57,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1529320.0, ans=0.0 2023-12-24 05:40:59,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1529320.0, ans=0.1 2023-12-24 05:41:01,065 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.737e+01 4.021e+01 4.199e+01 4.428e+01 6.437e+01, threshold=8.399e+01, percent-clipped=0.0 2023-12-24 05:41:01,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1529320.0, ans=0.0 2023-12-24 05:41:03,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1529320.0, ans=0.1 2023-12-24 05:41:03,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1529320.0, ans=0.0 2023-12-24 05:41:04,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1529320.0, ans=0.2 2023-12-24 05:41:09,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1529386.6666666667, ans=0.07 2023-12-24 05:41:13,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1529386.6666666667, ans=0.0 2023-12-24 05:41:18,327 INFO [train.py:886] (0/4) Epoch 49, batch 650, loss[loss=0.01294, audio_tagging_loss=0.01294, over 24750.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4756094.24 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:41:41,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1529586.6666666667, ans=0.1 2023-12-24 05:42:01,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1529720.0, ans=0.1 2023-12-24 05:42:06,405 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:42:09,758 INFO [train.py:886] (0/4) Epoch 49, batch 700, loss[loss=0.01068, audio_tagging_loss=0.01068, over 25000.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4796270.59 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:42:12,123 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2023-12-24 05:42:13,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1529786.6666666667, ans=0.0 2023-12-24 05:42:14,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1529786.6666666667, ans=0.125 2023-12-24 05:42:23,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1529853.3333333333, ans=0.07 2023-12-24 05:42:32,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1529920.0, ans=0.2 2023-12-24 05:42:35,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2023-12-24 05:42:37,620 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:42:43,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2023-12-24 05:42:44,609 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.582e+01 3.993e+01 4.206e+01 4.464e+01 5.068e+01, threshold=8.413e+01, percent-clipped=0.0 2023-12-24 05:42:56,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1530053.3333333333, ans=0.1 2023-12-24 05:43:00,361 INFO [train.py:886] (0/4) Epoch 49, batch 750, loss[loss=0.0092, audio_tagging_loss=0.0092, over 25000.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4832722.23 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:43:01,027 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-12-24 05:43:42,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1530386.6666666667, ans=0.125 2023-12-24 05:43:52,125 INFO [train.py:886] (0/4) Epoch 49, batch 800, loss[loss=0.01173, audio_tagging_loss=0.01173, over 25000.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4857090.90 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:44:02,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1530520.0, ans=0.125 2023-12-24 05:44:18,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1530586.6666666667, ans=0.125 2023-12-24 05:44:27,100 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.579e+01 3.966e+01 4.159e+01 4.355e+01 5.694e+01, threshold=8.318e+01, percent-clipped=0.0 2023-12-24 05:44:37,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1530720.0, ans=0.125 2023-12-24 05:44:39,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1530720.0, ans=0.2 2023-12-24 05:44:40,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2023-12-24 05:44:43,592 INFO [train.py:886] (0/4) Epoch 49, batch 850, loss[loss=0.01073, audio_tagging_loss=0.01073, over 25000.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4877300.59 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:44:48,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1530786.6666666667, ans=0.0 2023-12-24 05:45:16,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1530986.6666666667, ans=0.125 2023-12-24 05:45:27,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1531053.3333333333, ans=0.0 2023-12-24 05:45:34,374 INFO [train.py:886] (0/4) Epoch 49, batch 900, loss[loss=0.008148, audio_tagging_loss=0.008148, over 24750.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4893726.64 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:45:38,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.91 vs. limit=10.0 2023-12-24 05:45:38,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.29 vs. limit=15.0 2023-12-24 05:46:08,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1531320.0, ans=0.125 2023-12-24 05:46:08,734 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.704e+01 4.050e+01 4.200e+01 4.404e+01 5.257e+01, threshold=8.399e+01, percent-clipped=0.0 2023-12-24 05:46:12,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1531320.0, ans=0.125 2023-12-24 05:46:24,524 INFO [train.py:886] (0/4) Epoch 49, batch 950, loss[loss=0.009606, audio_tagging_loss=0.009606, over 24002.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4896541.77 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:46:35,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-12-24 05:46:49,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1531586.6666666667, ans=0.0 2023-12-24 05:46:49,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1531586.6666666667, ans=0.125 2023-12-24 05:46:56,299 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:47:16,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1531786.6666666667, ans=0.125 2023-12-24 05:47:17,029 INFO [train.py:886] (0/4) Epoch 49, batch 1000, loss[loss=0.009068, audio_tagging_loss=0.009068, over 24750.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4904316.91 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:47:17,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1531786.6666666667, ans=0.125 2023-12-24 05:47:30,464 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:47:37,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1531920.0, ans=0.125 2023-12-24 05:47:50,839 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.669e+01 3.990e+01 4.143e+01 4.294e+01 8.017e+01, threshold=8.286e+01, percent-clipped=0.0 2023-12-24 05:48:00,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1532053.3333333333, ans=0.125 2023-12-24 05:48:06,658 INFO [train.py:886] (0/4) Epoch 49, batch 1050, loss[loss=0.009062, audio_tagging_loss=0.009062, over 23995.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4916958.92 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:48:35,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1532320.0, ans=0.0 2023-12-24 05:48:49,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1532386.6666666667, ans=0.125 2023-12-24 05:48:51,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1532386.6666666667, ans=0.0 2023-12-24 05:48:56,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1532386.6666666667, ans=0.0 2023-12-24 05:48:58,205 INFO [train.py:886] (0/4) Epoch 49, batch 1100, loss[loss=0.008861, audio_tagging_loss=0.008861, over 25000.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4925770.27 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:48:58,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1532453.3333333333, ans=0.125 2023-12-24 05:49:03,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1532453.3333333333, ans=0.0 2023-12-24 05:49:29,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1532653.3333333333, ans=0.125 2023-12-24 05:49:30,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1532653.3333333333, ans=0.125 2023-12-24 05:49:33,436 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.655e+01 3.973e+01 4.201e+01 4.376e+01 5.038e+01, threshold=8.402e+01, percent-clipped=0.0 2023-12-24 05:49:35,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1532653.3333333333, ans=0.1 2023-12-24 05:49:50,808 INFO [train.py:886] (0/4) Epoch 49, batch 1150, loss[loss=0.01081, audio_tagging_loss=0.01081, over 25000.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4934231.08 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:49:53,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-12-24 05:50:32,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1533053.3333333333, ans=0.0 2023-12-24 05:50:42,341 INFO [train.py:886] (0/4) Epoch 49, batch 1200, loss[loss=0.01191, audio_tagging_loss=0.01191, over 25000.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4937825.15 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:50:55,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1533186.6666666667, ans=0.125 2023-12-24 05:50:56,359 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:51:04,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=15.0 2023-12-24 05:51:06,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1533253.3333333333, ans=0.125 2023-12-24 05:51:08,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1533253.3333333333, ans=0.2 2023-12-24 05:51:12,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1533320.0, ans=0.0 2023-12-24 05:51:16,882 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.561e+01 4.054e+01 4.177e+01 4.431e+01 4.907e+01, threshold=8.353e+01, percent-clipped=0.0 2023-12-24 05:51:25,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1533386.6666666667, ans=0.0 2023-12-24 05:51:34,927 INFO [train.py:886] (0/4) Epoch 49, batch 1250, loss[loss=0.01138, audio_tagging_loss=0.01138, over 24750.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4938773.01 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:51:36,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1533453.3333333333, ans=0.0 2023-12-24 05:51:40,247 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-12-24 05:51:46,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1533520.0, ans=0.1 2023-12-24 05:51:49,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1533520.0, ans=0.0 2023-12-24 05:51:54,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=12.0 2023-12-24 05:52:05,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1533653.3333333333, ans=0.0 2023-12-24 05:52:10,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1533653.3333333333, ans=0.2 2023-12-24 05:52:16,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1533720.0, ans=0.0 2023-12-24 05:52:19,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1533720.0, ans=0.125 2023-12-24 05:52:26,485 INFO [train.py:886] (0/4) Epoch 49, batch 1300, loss[loss=0.01077, audio_tagging_loss=0.01077, over 24018.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4939326.94 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:52:33,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1533786.6666666667, ans=0.125 2023-12-24 05:52:51,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1533920.0, ans=0.0 2023-12-24 05:52:55,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1533920.0, ans=0.0 2023-12-24 05:53:01,777 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.636e+01 4.042e+01 4.229e+01 4.423e+01 4.890e+01, threshold=8.459e+01, percent-clipped=0.0 2023-12-24 05:53:18,433 INFO [train.py:886] (0/4) Epoch 49, batch 1350, loss[loss=0.01156, audio_tagging_loss=0.01156, over 23964.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4938078.58 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:53:31,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1534186.6666666667, ans=0.125 2023-12-24 05:53:31,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2023-12-24 05:53:32,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1534186.6666666667, ans=0.125 2023-12-24 05:53:47,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1534253.3333333333, ans=0.2 2023-12-24 05:53:54,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=1534320.0, ans=22.5 2023-12-24 05:54:10,946 INFO [train.py:886] (0/4) Epoch 49, batch 1400, loss[loss=0.01315, audio_tagging_loss=0.01315, over 25000.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4941641.09 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:54:16,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1534453.3333333333, ans=0.125 2023-12-24 05:54:19,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1534520.0, ans=0.125 2023-12-24 05:54:28,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1534520.0, ans=0.0 2023-12-24 05:54:45,410 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.570e+01 3.938e+01 4.141e+01 4.265e+01 4.875e+01, threshold=8.281e+01, percent-clipped=0.0 2023-12-24 05:54:55,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1534720.0, ans=0.1 2023-12-24 05:54:59,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1534786.6666666667, ans=0.0 2023-12-24 05:55:00,593 INFO [train.py:886] (0/4) Epoch 49, batch 1450, loss[loss=0.0115, audio_tagging_loss=0.0115, over 25000.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4949654.93 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:55:06,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1534786.6666666667, ans=0.2 2023-12-24 05:55:11,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1534853.3333333333, ans=0.125 2023-12-24 05:55:53,552 INFO [train.py:886] (0/4) Epoch 49, batch 1500, loss[loss=0.009766, audio_tagging_loss=0.009766, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4957767.92 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:56:05,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1535186.6666666667, ans=0.1 2023-12-24 05:56:21,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.93 vs. limit=15.0 2023-12-24 05:56:21,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2023-12-24 05:56:28,576 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.582e+01 4.106e+01 4.257e+01 4.421e+01 5.889e+01, threshold=8.514e+01, percent-clipped=0.0 2023-12-24 05:56:29,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1535320.0, ans=0.0 2023-12-24 05:56:40,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1535386.6666666667, ans=0.0 2023-12-24 05:56:45,301 INFO [train.py:886] (0/4) Epoch 49, batch 1550, loss[loss=0.01044, audio_tagging_loss=0.01044, over 24750.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4951396.49 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:57:19,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1535653.3333333333, ans=0.125 2023-12-24 05:57:37,135 INFO [train.py:886] (0/4) Epoch 49, batch 1600, loss[loss=0.01136, audio_tagging_loss=0.01136, over 25000.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4946192.49 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:57:41,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1535786.6666666667, ans=0.1 2023-12-24 05:57:42,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1535786.6666666667, ans=0.125 2023-12-24 05:57:42,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1535786.6666666667, ans=0.1 2023-12-24 05:57:51,571 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:57:57,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1535920.0, ans=0.0 2023-12-24 05:58:08,341 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2023-12-24 05:58:12,892 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.640e+01 4.058e+01 4.221e+01 4.400e+01 5.973e+01, threshold=8.442e+01, percent-clipped=0.0 2023-12-24 05:58:15,108 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:58:27,865 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.85 vs. limit=15.0 2023-12-24 05:58:30,227 INFO [train.py:886] (0/4) Epoch 49, batch 1650, loss[loss=0.00823, audio_tagging_loss=0.00823, over 24750.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4948122.74 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:58:54,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1536253.3333333333, ans=0.1 2023-12-24 05:59:06,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1536320.0, ans=0.035 2023-12-24 05:59:06,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1536320.0, ans=0.1 2023-12-24 05:59:15,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1536386.6666666667, ans=0.04949747468305833 2023-12-24 05:59:21,934 INFO [train.py:886] (0/4) Epoch 49, batch 1700, loss[loss=0.009593, audio_tagging_loss=0.009593, over 25000.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4947091.91 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:59:35,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.16 vs. limit=10.0 2023-12-24 05:59:46,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1536586.6666666667, ans=0.2 2023-12-24 05:59:54,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1536653.3333333333, ans=0.0 2023-12-24 05:59:57,643 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.681e+01 4.027e+01 4.185e+01 4.353e+01 5.164e+01, threshold=8.370e+01, percent-clipped=0.0 2023-12-24 05:59:58,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1536653.3333333333, ans=0.125 2023-12-24 05:59:59,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1536653.3333333333, ans=0.0 2023-12-24 05:59:59,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1536653.3333333333, ans=0.1 2023-12-24 06:00:11,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1536720.0, ans=0.125 2023-12-24 06:00:13,567 INFO [train.py:886] (0/4) Epoch 49, batch 1750, loss[loss=0.01063, audio_tagging_loss=0.01063, over 25000.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4953558.25 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:00:14,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1536786.6666666667, ans=0.125 2023-12-24 06:00:16,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1536786.6666666667, ans=0.125 2023-12-24 06:00:42,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=12.0 2023-12-24 06:00:43,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1536920.0, ans=0.125 2023-12-24 06:00:43,174 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:00:50,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1536986.6666666667, ans=0.2 2023-12-24 06:01:05,328 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:01:05,969 INFO [train.py:886] (0/4) Epoch 49, batch 1800, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4954287.63 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:01:31,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1537253.3333333333, ans=0.1 2023-12-24 06:01:33,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1537253.3333333333, ans=0.125 2023-12-24 06:01:41,125 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.734e+01 4.057e+01 4.201e+01 4.362e+01 5.500e+01, threshold=8.403e+01, percent-clipped=0.0 2023-12-24 06:01:57,698 INFO [train.py:886] (0/4) Epoch 49, batch 1850, loss[loss=0.01038, audio_tagging_loss=0.01038, over 24750.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4956137.85 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:02:05,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1537453.3333333333, ans=0.125 2023-12-24 06:02:27,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1537586.6666666667, ans=0.125 2023-12-24 06:02:28,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1537653.3333333333, ans=0.0 2023-12-24 06:02:43,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1537720.0, ans=0.125 2023-12-24 06:02:47,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.91 vs. limit=15.0 2023-12-24 06:02:49,912 INFO [train.py:886] (0/4) Epoch 49, batch 1900, loss[loss=0.01035, audio_tagging_loss=0.01035, over 24750.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4949898.40 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:02:51,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1537786.6666666667, ans=0.0 2023-12-24 06:03:08,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1537853.3333333333, ans=0.125 2023-12-24 06:03:13,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1537920.0, ans=0.2 2023-12-24 06:03:24,921 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.654e+01 4.070e+01 4.198e+01 4.398e+01 6.870e+01, threshold=8.397e+01, percent-clipped=0.0 2023-12-24 06:03:34,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1538053.3333333333, ans=0.1 2023-12-24 06:03:41,668 INFO [train.py:886] (0/4) Epoch 49, batch 1950, loss[loss=0.0103, audio_tagging_loss=0.0103, over 24750.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4946995.07 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:03:48,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1538120.0, ans=0.125 2023-12-24 06:04:03,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1538253.3333333333, ans=0.04949747468305833 2023-12-24 06:04:07,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1538253.3333333333, ans=0.0 2023-12-24 06:04:18,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1538320.0, ans=0.0 2023-12-24 06:04:19,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2023-12-24 06:04:27,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1538386.6666666667, ans=0.05 2023-12-24 06:04:32,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1538453.3333333333, ans=0.2 2023-12-24 06:04:33,141 INFO [train.py:886] (0/4) Epoch 49, batch 2000, loss[loss=0.01098, audio_tagging_loss=0.01098, over 24750.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4950842.55 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:04:59,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1538586.6666666667, ans=0.05 2023-12-24 06:05:02,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1538586.6666666667, ans=0.125 2023-12-24 06:05:08,349 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.364e+01 3.984e+01 4.128e+01 4.387e+01 6.325e+01, threshold=8.257e+01, percent-clipped=0.0 2023-12-24 06:05:26,320 INFO [train.py:886] (0/4) Epoch 49, batch 2050, loss[loss=0.008182, audio_tagging_loss=0.008182, over 20942.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4944752.15 frames. ], batch size: 107, lr: 2.19e-03, grad_scale: 64.0 2023-12-24 06:05:27,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1538786.6666666667, ans=0.125 2023-12-24 06:05:28,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1538786.6666666667, ans=0.125 2023-12-24 06:05:38,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1538853.3333333333, ans=0.2 2023-12-24 06:05:48,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1538920.0, ans=0.1 2023-12-24 06:05:55,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1538920.0, ans=0.0 2023-12-24 06:06:03,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1538986.6666666667, ans=0.0 2023-12-24 06:06:03,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1538986.6666666667, ans=0.2 2023-12-24 06:06:10,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1539053.3333333333, ans=0.0 2023-12-24 06:06:11,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1539053.3333333333, ans=15.0 2023-12-24 06:06:12,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2023-12-24 06:06:16,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1539120.0, ans=0.125 2023-12-24 06:06:17,051 INFO [train.py:886] (0/4) Epoch 49, batch 2100, loss[loss=0.01078, audio_tagging_loss=0.01078, over 25000.00 frames. ], tot_loss[loss=0.01052, audio_tagging_loss=0.01052, over 4953723.78 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 64.0 2023-12-24 06:06:43,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1539253.3333333333, ans=0.0 2023-12-24 06:06:45,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1539253.3333333333, ans=0.125 2023-12-24 06:06:52,077 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.619e+01 3.997e+01 4.198e+01 4.409e+01 5.519e+01, threshold=8.397e+01, percent-clipped=0.0 2023-12-24 06:06:52,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1539320.0, ans=0.0 2023-12-24 06:07:01,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1539386.6666666667, ans=0.5 2023-12-24 06:07:09,403 INFO [train.py:886] (0/4) Epoch 49, batch 2150, loss[loss=0.01166, audio_tagging_loss=0.01166, over 24750.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4953760.76 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 64.0 2023-12-24 06:07:09,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1539453.3333333333, ans=0.0 2023-12-24 06:07:27,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1539520.0, ans=0.1 2023-12-24 06:07:30,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1539586.6666666667, ans=0.125 2023-12-24 06:07:34,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1539586.6666666667, ans=0.0 2023-12-24 06:08:00,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.13 vs. limit=10.0 2023-12-24 06:08:01,527 INFO [train.py:886] (0/4) Epoch 49, batch 2200, loss[loss=0.01476, audio_tagging_loss=0.01476, over 24750.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4949761.85 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 64.0 2023-12-24 06:08:03,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1539786.6666666667, ans=0.09899494936611666 2023-12-24 06:08:07,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1539786.6666666667, ans=0.125 2023-12-24 06:08:10,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1539853.3333333333, ans=0.2 2023-12-24 06:08:14,872 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.02 vs. limit=10.0 2023-12-24 06:08:19,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1539853.3333333333, ans=0.125 2023-12-24 06:08:35,893 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.719e+01 4.113e+01 4.276e+01 4.515e+01 5.433e+01, threshold=8.552e+01, percent-clipped=0.0 2023-12-24 06:08:37,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1539986.6666666667, ans=0.025 2023-12-24 06:08:39,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1539986.6666666667, ans=0.125 2023-12-24 06:08:44,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1540053.3333333333, ans=0.125 2023-12-24 06:08:45,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1540053.3333333333, ans=0.07 2023-12-24 06:08:45,510 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=15.0 2023-12-24 06:08:46,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1540053.3333333333, ans=0.1 2023-12-24 06:08:51,168 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:08:51,810 INFO [train.py:886] (0/4) Epoch 49, batch 2250, loss[loss=0.01047, audio_tagging_loss=0.01047, over 25000.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4946689.71 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 64.0 2023-12-24 06:09:24,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1540320.0, ans=0.04949747468305833 2023-12-24 06:09:25,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1540320.0, ans=0.5 2023-12-24 06:09:31,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1540320.0, ans=0.125 2023-12-24 06:09:38,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1540386.6666666667, ans=0.2 2023-12-24 06:09:42,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1540386.6666666667, ans=0.025 2023-12-24 06:09:45,121 INFO [train.py:886] (0/4) Epoch 49, batch 2300, loss[loss=0.01202, audio_tagging_loss=0.01202, over 25000.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4950870.34 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:09:55,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1540520.0, ans=0.125 2023-12-24 06:09:55,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.86 vs. limit=10.0 2023-12-24 06:09:57,794 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:10:00,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1540520.0, ans=0.125 2023-12-24 06:10:00,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.79 vs. limit=22.5 2023-12-24 06:10:04,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1540586.6666666667, ans=0.1 2023-12-24 06:10:16,957 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2023-12-24 06:10:19,791 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.27 vs. limit=10.0 2023-12-24 06:10:21,203 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.695e+01 3.977e+01 4.114e+01 4.288e+01 4.900e+01, threshold=8.227e+01, percent-clipped=0.0 2023-12-24 06:10:36,361 INFO [train.py:886] (0/4) Epoch 49, batch 2350, loss[loss=0.00926, audio_tagging_loss=0.00926, over 24750.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4951208.66 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:10:47,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1540853.3333333333, ans=10.0 2023-12-24 06:11:06,326 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:11:14,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1540986.6666666667, ans=0.125 2023-12-24 06:11:26,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1541053.3333333333, ans=0.0 2023-12-24 06:11:27,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.88 vs. limit=15.0 2023-12-24 06:11:28,813 INFO [train.py:886] (0/4) Epoch 49, batch 2400, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4953781.43 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:11:35,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1541120.0, ans=0.125 2023-12-24 06:11:37,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1541186.6666666667, ans=0.125 2023-12-24 06:11:52,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1541253.3333333333, ans=0.125 2023-12-24 06:12:02,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1541320.0, ans=0.125 2023-12-24 06:12:03,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1541320.0, ans=0.07 2023-12-24 06:12:04,395 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.621e+01 3.987e+01 4.152e+01 4.367e+01 5.469e+01, threshold=8.304e+01, percent-clipped=0.0 2023-12-24 06:12:20,348 INFO [train.py:886] (0/4) Epoch 49, batch 2450, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4955700.63 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:12:42,276 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.42 vs. limit=22.5 2023-12-24 06:12:47,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.12 vs. limit=6.0 2023-12-24 06:13:11,139 INFO [train.py:886] (0/4) Epoch 49, batch 2500, loss[loss=0.01404, audio_tagging_loss=0.01404, over 24750.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4950013.39 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:13:13,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1541786.6666666667, ans=0.1 2023-12-24 06:13:35,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1541920.0, ans=0.1 2023-12-24 06:13:39,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1541920.0, ans=0.125 2023-12-24 06:13:47,842 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.799e+01 4.107e+01 4.264e+01 4.424e+01 5.486e+01, threshold=8.528e+01, percent-clipped=0.0 2023-12-24 06:13:51,454 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.62 vs. limit=15.0 2023-12-24 06:14:04,222 INFO [train.py:886] (0/4) Epoch 49, batch 2550, loss[loss=0.01035, audio_tagging_loss=0.01035, over 24750.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4948274.80 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:14:11,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1542120.0, ans=0.125 2023-12-24 06:14:19,520 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2023-12-24 06:14:22,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2023-12-24 06:14:39,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1542320.0, ans=0.125 2023-12-24 06:14:53,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2023-12-24 06:14:55,085 INFO [train.py:886] (0/4) Epoch 49, batch 2600, loss[loss=0.01183, audio_tagging_loss=0.01183, over 24750.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4944784.05 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:14:57,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1542453.3333333333, ans=0.125 2023-12-24 06:15:04,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1542453.3333333333, ans=15.0 2023-12-24 06:15:10,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.53 vs. limit=22.5 2023-12-24 06:15:15,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1542586.6666666667, ans=0.125 2023-12-24 06:15:20,662 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.62 vs. limit=22.5 2023-12-24 06:15:32,853 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.739e+01 4.055e+01 4.229e+01 4.404e+01 4.899e+01, threshold=8.458e+01, percent-clipped=0.0 2023-12-24 06:15:36,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1542720.0, ans=0.1 2023-12-24 06:15:47,947 INFO [train.py:886] (0/4) Epoch 49, batch 2650, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4951734.46 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:15:52,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1542786.6666666667, ans=0.2 2023-12-24 06:16:08,482 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=22.5 2023-12-24 06:16:30,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1543053.3333333333, ans=0.04949747468305833 2023-12-24 06:16:40,426 INFO [train.py:886] (0/4) Epoch 49, batch 2700, loss[loss=0.009855, audio_tagging_loss=0.009855, over 24750.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4959636.96 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:16:42,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.01 vs. limit=15.0 2023-12-24 06:16:56,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=1543186.6666666667, ans=0.02 2023-12-24 06:17:08,363 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2023-12-24 06:17:14,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1543320.0, ans=0.125 2023-12-24 06:17:16,718 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.523e+01 3.964e+01 4.171e+01 4.415e+01 4.994e+01, threshold=8.341e+01, percent-clipped=0.0 2023-12-24 06:17:21,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1543386.6666666667, ans=0.0 2023-12-24 06:17:22,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1543386.6666666667, ans=15.0 2023-12-24 06:17:26,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1543386.6666666667, ans=0.125 2023-12-24 06:17:27,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.00 vs. limit=15.0 2023-12-24 06:17:31,771 INFO [train.py:886] (0/4) Epoch 49, batch 2750, loss[loss=0.008746, audio_tagging_loss=0.008746, over 25000.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4960031.10 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:17:35,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1543453.3333333333, ans=0.125 2023-12-24 06:17:49,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1543520.0, ans=0.125 2023-12-24 06:18:10,673 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:18:17,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1543720.0, ans=0.0 2023-12-24 06:18:24,108 INFO [train.py:886] (0/4) Epoch 49, batch 2800, loss[loss=0.01298, audio_tagging_loss=0.01298, over 24750.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4963994.61 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:18:34,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1543853.3333333333, ans=0.1 2023-12-24 06:18:38,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1543853.3333333333, ans=0.0 2023-12-24 06:18:39,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1543853.3333333333, ans=0.125 2023-12-24 06:18:46,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=12.0 2023-12-24 06:18:53,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1543920.0, ans=0.125 2023-12-24 06:19:00,877 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.540e+01 4.058e+01 4.176e+01 4.407e+01 5.903e+01, threshold=8.351e+01, percent-clipped=0.0 2023-12-24 06:19:04,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1544053.3333333333, ans=0.1 2023-12-24 06:19:16,495 INFO [train.py:886] (0/4) Epoch 49, batch 2850, loss[loss=0.01147, audio_tagging_loss=0.01147, over 24750.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4957168.64 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:19:26,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1544186.6666666667, ans=0.125 2023-12-24 06:20:03,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1544386.6666666667, ans=0.125 2023-12-24 06:20:07,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1544453.3333333333, ans=0.0 2023-12-24 06:20:08,188 INFO [train.py:886] (0/4) Epoch 49, batch 2900, loss[loss=0.01021, audio_tagging_loss=0.01021, over 24750.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4950567.95 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:20:08,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1544453.3333333333, ans=0.125 2023-12-24 06:20:42,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1544653.3333333333, ans=0.125 2023-12-24 06:20:44,086 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.584e+01 4.050e+01 4.220e+01 4.394e+01 4.987e+01, threshold=8.439e+01, percent-clipped=0.0 2023-12-24 06:20:55,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=6.97 vs. limit=15.0 2023-12-24 06:20:56,239 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=22.5 2023-12-24 06:20:56,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1544720.0, ans=0.0 2023-12-24 06:21:00,370 INFO [train.py:886] (0/4) Epoch 49, batch 2950, loss[loss=0.008621, audio_tagging_loss=0.008621, over 25000.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4956525.03 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:21:20,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1544920.0, ans=0.125 2023-12-24 06:21:32,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1544986.6666666667, ans=0.125 2023-12-24 06:21:35,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1544986.6666666667, ans=0.125 2023-12-24 06:21:38,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1544986.6666666667, ans=0.125 2023-12-24 06:21:52,347 INFO [train.py:886] (0/4) Epoch 49, batch 3000, loss[loss=0.01127, audio_tagging_loss=0.01127, over 25000.00 frames. ], tot_loss[loss=0.01049, audio_tagging_loss=0.01049, over 4958741.54 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:21:52,348 INFO [train.py:909] (0/4) Computing validation loss 2023-12-24 06:22:03,929 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2254, 3.5183, 3.9266, 3.9398], device='cuda:0') 2023-12-24 06:22:13,827 INFO [train.py:917] (0/4) Epoch 49, validation: loss=0.03737, audio_tagging_loss=0.03737, over 3737520.00 frames. 2023-12-24 06:22:13,827 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-24 06:22:15,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1545120.0, ans=0.0 2023-12-24 06:22:19,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1545120.0, ans=0.125 2023-12-24 06:22:43,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1545320.0, ans=0.2 2023-12-24 06:22:44,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1545320.0, ans=0.0 2023-12-24 06:22:50,397 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.477e+01 3.989e+01 4.185e+01 4.456e+01 5.215e+01, threshold=8.370e+01, percent-clipped=0.0 2023-12-24 06:23:01,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1545386.6666666667, ans=0.1 2023-12-24 06:23:06,469 INFO [train.py:886] (0/4) Epoch 49, batch 3050, loss[loss=0.0107, audio_tagging_loss=0.0107, over 24750.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4956795.03 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:23:12,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1545453.3333333333, ans=0.1 2023-12-24 06:23:20,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1545520.0, ans=0.125 2023-12-24 06:23:25,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1545520.0, ans=0.0 2023-12-24 06:23:37,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1545653.3333333333, ans=0.035 2023-12-24 06:23:41,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1545653.3333333333, ans=0.125 2023-12-24 06:23:43,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2023-12-24 06:23:44,576 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:23:46,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1545720.0, ans=0.125 2023-12-24 06:23:57,216 INFO [train.py:886] (0/4) Epoch 49, batch 3100, loss[loss=0.0123, audio_tagging_loss=0.0123, over 25000.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4954250.67 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:24:08,689 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=12.0 2023-12-24 06:24:13,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1545853.3333333333, ans=0.125 2023-12-24 06:24:27,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1545986.6666666667, ans=0.125 2023-12-24 06:24:33,098 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.660e+01 4.061e+01 4.253e+01 4.429e+01 4.827e+01, threshold=8.507e+01, percent-clipped=0.0 2023-12-24 06:24:35,339 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=12.0 2023-12-24 06:24:35,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1545986.6666666667, ans=0.125 2023-12-24 06:24:37,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1546053.3333333333, ans=0.125 2023-12-24 06:24:47,929 INFO [train.py:886] (0/4) Epoch 49, batch 3150, loss[loss=0.01227, audio_tagging_loss=0.01227, over 24750.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4954024.19 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:25:01,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1546186.6666666667, ans=0.125 2023-12-24 06:25:05,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1546186.6666666667, ans=0.125 2023-12-24 06:25:30,274 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.87 vs. limit=22.5 2023-12-24 06:25:31,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1546386.6666666667, ans=0.125 2023-12-24 06:25:40,629 INFO [train.py:886] (0/4) Epoch 49, batch 3200, loss[loss=0.009532, audio_tagging_loss=0.009532, over 24750.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4954575.73 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:25:43,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1546453.3333333333, ans=0.0 2023-12-24 06:26:13,024 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-232000.pt 2023-12-24 06:26:16,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2023-12-24 06:26:17,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1546653.3333333333, ans=0.125 2023-12-24 06:26:18,569 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.708e+01 4.064e+01 4.235e+01 4.462e+01 5.298e+01, threshold=8.470e+01, percent-clipped=0.0 2023-12-24 06:26:20,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1546653.3333333333, ans=0.125 2023-12-24 06:26:29,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1546720.0, ans=0.1 2023-12-24 06:26:33,560 INFO [train.py:886] (0/4) Epoch 49, batch 3250, loss[loss=0.009684, audio_tagging_loss=0.009684, over 21452.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4950902.68 frames. ], batch size: 107, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:27:10,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1546986.6666666667, ans=0.1 2023-12-24 06:27:15,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1547053.3333333333, ans=0.1 2023-12-24 06:27:26,089 INFO [train.py:886] (0/4) Epoch 49, batch 3300, loss[loss=0.01065, audio_tagging_loss=0.01065, over 24750.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4951781.26 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:27:36,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.24 vs. limit=15.0 2023-12-24 06:27:49,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1547253.3333333333, ans=0.125 2023-12-24 06:28:01,840 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.618e+01 4.007e+01 4.177e+01 4.374e+01 5.032e+01, threshold=8.354e+01, percent-clipped=0.0 2023-12-24 06:28:08,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1547386.6666666667, ans=0.125 2023-12-24 06:28:16,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1547453.3333333333, ans=0.125 2023-12-24 06:28:17,558 INFO [train.py:886] (0/4) Epoch 49, batch 3350, loss[loss=0.01137, audio_tagging_loss=0.01137, over 24002.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4951885.22 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:28:36,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1547520.0, ans=0.125 2023-12-24 06:28:47,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1547653.3333333333, ans=0.2 2023-12-24 06:28:48,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1547653.3333333333, ans=0.125 2023-12-24 06:28:49,206 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:28:52,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1547653.3333333333, ans=0.125 2023-12-24 06:29:01,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1547720.0, ans=10.0 2023-12-24 06:29:08,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1547786.6666666667, ans=0.0 2023-12-24 06:29:09,053 INFO [train.py:886] (0/4) Epoch 49, batch 3400, loss[loss=0.01098, audio_tagging_loss=0.01098, over 25000.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4952797.29 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:29:10,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=15.0 2023-12-24 06:29:21,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1547853.3333333333, ans=0.125 2023-12-24 06:29:45,477 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.515e+01 4.092e+01 4.242e+01 4.462e+01 5.102e+01, threshold=8.484e+01, percent-clipped=0.0 2023-12-24 06:29:46,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-12-24 06:30:02,454 INFO [train.py:886] (0/4) Epoch 49, batch 3450, loss[loss=0.01037, audio_tagging_loss=0.01037, over 24750.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4953574.33 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:30:10,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.29 vs. limit=22.5 2023-12-24 06:30:23,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1548253.3333333333, ans=0.125 2023-12-24 06:30:25,507 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.97 vs. limit=15.0 2023-12-24 06:30:37,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1548320.0, ans=0.125 2023-12-24 06:30:40,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.25 vs. limit=15.0 2023-12-24 06:30:49,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1548386.6666666667, ans=0.125 2023-12-24 06:30:49,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1548386.6666666667, ans=0.1 2023-12-24 06:30:52,205 INFO [train.py:886] (0/4) Epoch 49, batch 3500, loss[loss=0.01071, audio_tagging_loss=0.01071, over 25000.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4948019.44 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:31:05,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1548520.0, ans=0.2 2023-12-24 06:31:06,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1548520.0, ans=0.0 2023-12-24 06:31:17,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1548586.6666666667, ans=0.0 2023-12-24 06:31:29,131 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.463e+01 4.041e+01 4.186e+01 4.358e+01 4.992e+01, threshold=8.372e+01, percent-clipped=0.0 2023-12-24 06:31:39,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1548720.0, ans=0.1 2023-12-24 06:31:44,750 INFO [train.py:886] (0/4) Epoch 49, batch 3550, loss[loss=0.009401, audio_tagging_loss=0.009401, over 25000.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4943112.47 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:31:55,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1548853.3333333333, ans=0.125 2023-12-24 06:31:57,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.94 vs. limit=6.0 2023-12-24 06:32:14,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.04 vs. limit=12.0 2023-12-24 06:32:18,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1548986.6666666667, ans=0.125 2023-12-24 06:32:28,885 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2023-12-24 06:32:35,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1549120.0, ans=0.0 2023-12-24 06:32:36,657 INFO [train.py:886] (0/4) Epoch 49, batch 3600, loss[loss=0.01034, audio_tagging_loss=0.01034, over 25000.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4942480.34 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:33:13,501 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.476e+01 4.039e+01 4.199e+01 4.373e+01 5.772e+01, threshold=8.398e+01, percent-clipped=0.0 2023-12-24 06:33:14,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.25 vs. limit=22.5 2023-12-24 06:33:16,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1549320.0, ans=0.0 2023-12-24 06:33:21,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1549386.6666666667, ans=0.125 2023-12-24 06:33:28,332 INFO [train.py:886] (0/4) Epoch 49, batch 3650, loss[loss=0.01014, audio_tagging_loss=0.01014, over 25000.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4946644.95 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:33:34,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1549453.3333333333, ans=0.1 2023-12-24 06:33:37,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1549520.0, ans=0.0 2023-12-24 06:33:46,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1549520.0, ans=0.125 2023-12-24 06:34:16,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2023-12-24 06:34:20,970 INFO [train.py:886] (0/4) Epoch 49, batch 3700, loss[loss=0.01083, audio_tagging_loss=0.01083, over 25000.00 frames. ], tot_loss[loss=0.01052, audio_tagging_loss=0.01052, over 4955346.08 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:34:28,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2023-12-24 06:34:30,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1549853.3333333333, ans=0.125 2023-12-24 06:34:48,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1549920.0, ans=0.2 2023-12-24 06:34:57,825 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.487e+01 4.052e+01 4.232e+01 4.524e+01 5.047e+01, threshold=8.465e+01, percent-clipped=0.0 2023-12-24 06:34:58,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.91 vs. limit=10.0 2023-12-24 06:34:59,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1549986.6666666667, ans=0.2 2023-12-24 06:35:12,701 INFO [train.py:886] (0/4) Epoch 49, batch 3750, loss[loss=0.00943, audio_tagging_loss=0.00943, over 24750.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4954083.10 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:35:38,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.72 vs. limit=22.5 2023-12-24 06:35:52,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1550320.0, ans=0.05 2023-12-24 06:35:53,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1550320.0, ans=0.0 2023-12-24 06:35:56,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1550386.6666666667, ans=0.125 2023-12-24 06:36:04,934 INFO [train.py:886] (0/4) Epoch 49, batch 3800, loss[loss=0.01084, audio_tagging_loss=0.01084, over 24750.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4948078.41 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:36:10,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.11 vs. limit=12.0 2023-12-24 06:36:24,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2023-12-24 06:36:31,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1550586.6666666667, ans=0.0 2023-12-24 06:36:41,086 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.564e+01 4.023e+01 4.204e+01 4.391e+01 4.921e+01, threshold=8.409e+01, percent-clipped=0.0 2023-12-24 06:36:47,005 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=12.0 2023-12-24 06:36:57,912 INFO [train.py:886] (0/4) Epoch 49, batch 3850, loss[loss=0.01099, audio_tagging_loss=0.01099, over 24750.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4947503.05 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:37:18,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1550920.0, ans=0.1 2023-12-24 06:37:35,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1550986.6666666667, ans=0.125 2023-12-24 06:37:44,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1551053.3333333333, ans=0.125 2023-12-24 06:37:49,539 INFO [train.py:886] (0/4) Epoch 49, batch 3900, loss[loss=0.01122, audio_tagging_loss=0.01122, over 24750.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4951263.11 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:37:49,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1551120.0, ans=0.2 2023-12-24 06:38:03,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1551186.6666666667, ans=0.125 2023-12-24 06:38:03,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1551186.6666666667, ans=0.125 2023-12-24 06:38:10,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1551253.3333333333, ans=0.1 2023-12-24 06:38:23,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1551320.0, ans=0.125 2023-12-24 06:38:25,557 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.639e+01 4.074e+01 4.182e+01 4.374e+01 4.953e+01, threshold=8.363e+01, percent-clipped=0.0 2023-12-24 06:38:29,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1551320.0, ans=0.0 2023-12-24 06:38:41,223 INFO [train.py:886] (0/4) Epoch 49, batch 3950, loss[loss=0.01115, audio_tagging_loss=0.01115, over 24750.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4951900.85 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:38:45,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1551453.3333333333, ans=0.05 2023-12-24 06:39:05,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1551586.6666666667, ans=0.2 2023-12-24 06:39:07,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1551586.6666666667, ans=0.125 2023-12-24 06:39:27,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1551720.0, ans=0.025 2023-12-24 06:39:31,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1551720.0, ans=0.1 2023-12-24 06:39:33,398 INFO [train.py:886] (0/4) Epoch 49, batch 4000, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4952251.14 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:39:43,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1551853.3333333333, ans=0.125 2023-12-24 06:40:03,317 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:40:09,394 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.560e+01 4.058e+01 4.250e+01 4.476e+01 5.419e+01, threshold=8.500e+01, percent-clipped=0.0 2023-12-24 06:40:24,540 INFO [train.py:886] (0/4) Epoch 49, batch 4050, loss[loss=0.00971, audio_tagging_loss=0.00971, over 24750.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4953556.67 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:40:32,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.91 vs. limit=22.5 2023-12-24 06:40:43,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1552186.6666666667, ans=0.5 2023-12-24 06:40:52,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1552253.3333333333, ans=0.0 2023-12-24 06:40:56,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1552320.0, ans=0.2 2023-12-24 06:41:17,513 INFO [train.py:886] (0/4) Epoch 49, batch 4100, loss[loss=0.008847, audio_tagging_loss=0.008847, over 24750.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4949571.69 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:41:18,873 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.19 vs. limit=10.0 2023-12-24 06:41:19,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1552453.3333333333, ans=0.125 2023-12-24 06:41:26,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1552520.0, ans=0.05 2023-12-24 06:41:42,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1552586.6666666667, ans=0.125 2023-12-24 06:41:46,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1552586.6666666667, ans=0.0 2023-12-24 06:41:49,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1552653.3333333333, ans=0.125 2023-12-24 06:41:51,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1552653.3333333333, ans=0.125 2023-12-24 06:41:53,397 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.555e+01 4.048e+01 4.230e+01 4.426e+01 5.079e+01, threshold=8.460e+01, percent-clipped=0.0 2023-12-24 06:42:03,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1552720.0, ans=0.125 2023-12-24 06:42:07,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1552786.6666666667, ans=0.0 2023-12-24 06:42:08,596 INFO [train.py:886] (0/4) Epoch 49, batch 4150, loss[loss=0.01048, audio_tagging_loss=0.01048, over 24750.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4951774.73 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:42:08,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1552786.6666666667, ans=0.0 2023-12-24 06:42:26,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1552853.3333333333, ans=0.95 2023-12-24 06:42:32,510 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=8.0 2023-12-24 06:42:44,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1552986.6666666667, ans=0.0 2023-12-24 06:42:54,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1553053.3333333333, ans=0.0 2023-12-24 06:42:59,751 INFO [train.py:886] (0/4) Epoch 49, batch 4200, loss[loss=0.01095, audio_tagging_loss=0.01095, over 25000.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4950330.77 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:43:36,493 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.716e+01 4.005e+01 4.208e+01 4.384e+01 4.988e+01, threshold=8.417e+01, percent-clipped=0.0 2023-12-24 06:43:36,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1553320.0, ans=0.125 2023-12-24 06:43:39,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1553320.0, ans=0.2 2023-12-24 06:43:40,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2023-12-24 06:43:47,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1553386.6666666667, ans=0.2 2023-12-24 06:43:47,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1553386.6666666667, ans=0.2 2023-12-24 06:43:51,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1553386.6666666667, ans=0.0 2023-12-24 06:43:52,894 INFO [train.py:886] (0/4) Epoch 49, batch 4250, loss[loss=0.007794, audio_tagging_loss=0.007794, over 24074.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4954995.33 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:44:02,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.15 vs. limit=10.0 2023-12-24 06:44:04,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1553520.0, ans=0.0 2023-12-24 06:44:06,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1553520.0, ans=0.09899494936611666 2023-12-24 06:44:10,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1553520.0, ans=0.2 2023-12-24 06:44:13,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1553586.6666666667, ans=0.125 2023-12-24 06:44:29,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1553653.3333333333, ans=0.0 2023-12-24 06:44:33,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1553720.0, ans=0.035 2023-12-24 06:44:44,060 INFO [train.py:886] (0/4) Epoch 49, batch 4300, loss[loss=0.01164, audio_tagging_loss=0.01164, over 24750.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4955656.10 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 64.0 2023-12-24 06:45:03,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1553853.3333333333, ans=0.125 2023-12-24 06:45:12,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1553920.0, ans=0.125 2023-12-24 06:45:21,779 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.664e+01 3.972e+01 4.193e+01 4.366e+01 5.433e+01, threshold=8.386e+01, percent-clipped=0.0 2023-12-24 06:45:29,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1554053.3333333333, ans=0.2 2023-12-24 06:45:37,138 INFO [train.py:886] (0/4) Epoch 49, batch 4350, loss[loss=0.009844, audio_tagging_loss=0.009844, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4959214.73 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:45:41,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2023-12-24 06:45:45,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1554186.6666666667, ans=0.1 2023-12-24 06:46:04,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1554253.3333333333, ans=0.1 2023-12-24 06:46:13,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2023-12-24 06:46:23,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2023-12-24 06:46:26,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.56 vs. limit=12.0 2023-12-24 06:46:28,745 INFO [train.py:886] (0/4) Epoch 49, batch 4400, loss[loss=0.009008, audio_tagging_loss=0.009008, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4952706.61 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:46:51,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1554586.6666666667, ans=0.05 2023-12-24 06:46:52,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1554586.6666666667, ans=0.1 2023-12-24 06:47:04,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1554653.3333333333, ans=0.125 2023-12-24 06:47:06,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2023-12-24 06:47:07,212 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.673e+01 4.187e+01 4.293e+01 4.475e+01 5.860e+01, threshold=8.587e+01, percent-clipped=0.0 2023-12-24 06:47:20,687 INFO [train.py:886] (0/4) Epoch 49, batch 4450, loss[loss=0.009549, audio_tagging_loss=0.009549, over 24750.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4947389.17 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:47:35,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1554853.3333333333, ans=0.125 2023-12-24 06:47:40,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1554853.3333333333, ans=0.0 2023-12-24 06:47:42,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1554920.0, ans=0.125 2023-12-24 06:47:47,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1554920.0, ans=0.2 2023-12-24 06:47:49,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1554920.0, ans=0.2 2023-12-24 06:47:54,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1554986.6666666667, ans=0.125 2023-12-24 06:48:12,847 INFO [train.py:886] (0/4) Epoch 49, batch 4500, loss[loss=0.008879, audio_tagging_loss=0.008879, over 25000.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4946611.47 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:48:15,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1555120.0, ans=0.125 2023-12-24 06:48:23,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1555186.6666666667, ans=0.0 2023-12-24 06:48:28,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1555186.6666666667, ans=0.125 2023-12-24 06:48:28,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1555186.6666666667, ans=0.125 2023-12-24 06:48:44,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1555320.0, ans=0.09899494936611666 2023-12-24 06:48:46,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2023-12-24 06:48:49,613 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.358e+01 3.945e+01 4.149e+01 4.337e+01 5.254e+01, threshold=8.299e+01, percent-clipped=0.0 2023-12-24 06:49:02,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1555453.3333333333, ans=0.05 2023-12-24 06:49:03,789 INFO [train.py:886] (0/4) Epoch 49, batch 4550, loss[loss=0.00942, audio_tagging_loss=0.00942, over 25000.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4951631.02 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:49:13,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1555453.3333333333, ans=0.2 2023-12-24 06:49:34,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1555653.3333333333, ans=0.125 2023-12-24 06:49:38,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1555653.3333333333, ans=0.125 2023-12-24 06:49:55,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1555786.6666666667, ans=0.125 2023-12-24 06:49:56,021 INFO [train.py:886] (0/4) Epoch 49, batch 4600, loss[loss=0.01109, audio_tagging_loss=0.01109, over 25000.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4950536.90 frames. ], batch size: 100, lr: 2.17e-03, grad_scale: 32.0 2023-12-24 06:49:56,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=12.0 2023-12-24 06:50:16,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.70 vs. limit=6.0 2023-12-24 06:50:32,530 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.708e+01 4.082e+01 4.227e+01 4.419e+01 5.112e+01, threshold=8.454e+01, percent-clipped=0.0 2023-12-24 06:50:39,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1556053.3333333333, ans=0.1 2023-12-24 06:50:41,381 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2023-12-24 06:50:41,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1556053.3333333333, ans=15.0 2023-12-24 06:50:42,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=12.0 2023-12-24 06:50:46,515 INFO [train.py:886] (0/4) Epoch 49, batch 4650, loss[loss=0.01036, audio_tagging_loss=0.01036, over 24940.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4958043.96 frames. ], batch size: 100, lr: 2.17e-03, grad_scale: 32.0 2023-12-24 06:51:08,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1556253.3333333333, ans=0.0 2023-12-24 06:51:08,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1556253.3333333333, ans=0.0 2023-12-24 06:51:20,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1556320.0, ans=0.125 2023-12-24 06:51:20,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1556320.0, ans=0.1 2023-12-24 06:51:31,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1556386.6666666667, ans=0.2 2023-12-24 06:51:32,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1556386.6666666667, ans=0.125 2023-12-24 06:51:37,689 INFO [train.py:886] (0/4) Epoch 49, batch 4700, loss[loss=0.01105, audio_tagging_loss=0.01105, over 24750.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4958549.77 frames. ], batch size: 99, lr: 2.17e-03, grad_scale: 32.0 2023-12-24 06:51:44,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1556453.3333333333, ans=10.0 2023-12-24 06:51:46,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=1556520.0, ans=12.0 2023-12-24 06:51:46,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1556520.0, ans=0.125 2023-12-24 06:51:57,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1556586.6666666667, ans=0.125 2023-12-24 06:51:59,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1556586.6666666667, ans=0.1 2023-12-24 06:52:01,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1556586.6666666667, ans=0.125 2023-12-24 06:52:07,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1556653.3333333333, ans=10.0 2023-12-24 06:52:09,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2023-12-24 06:52:10,981 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.696e+01 4.105e+01 4.278e+01 4.465e+01 5.122e+01, threshold=8.556e+01, percent-clipped=0.0 2023-12-24 06:52:24,081 INFO [train.py:886] (0/4) Epoch 49, batch 4750, loss[loss=0.01041, audio_tagging_loss=0.01041, over 24750.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4957165.86 frames. ], batch size: 99, lr: 2.17e-03, grad_scale: 32.0 2023-12-24 06:52:29,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1556786.6666666667, ans=0.125 2023-12-24 06:52:29,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.46 vs. limit=15.0 2023-12-24 06:52:40,124 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-49.pt 2023-12-24 06:52:59,428 INFO [train.py:886] (0/4) Epoch 50, batch 0, loss[loss=0.02516, audio_tagging_loss=0.02516, over 20990.00 frames. ], tot_loss[loss=0.02516, audio_tagging_loss=0.02516, over 20990.00 frames. ], batch size: 107, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 06:52:59,429 INFO [train.py:909] (0/4) Computing validation loss 2023-12-24 06:53:09,922 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4825, 3.1207, 4.0598, 3.7478], device='cuda:0') 2023-12-24 06:53:21,090 INFO [train.py:917] (0/4) Epoch 50, validation: loss=0.03747, audio_tagging_loss=0.03747, over 3737520.00 frames. 2023-12-24 06:53:21,091 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-24 06:53:26,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1556893.3333333333, ans=0.125 2023-12-24 06:53:44,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1557026.6666666667, ans=0.0 2023-12-24 06:53:45,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1557026.6666666667, ans=0.0 2023-12-24 06:53:47,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=1557026.6666666667, ans=12.0 2023-12-24 06:54:01,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1557160.0, ans=0.125 2023-12-24 06:54:01,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1557160.0, ans=0.04949747468305833 2023-12-24 06:54:02,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1557160.0, ans=0.0 2023-12-24 06:54:11,352 INFO [train.py:886] (0/4) Epoch 50, batch 50, loss[loss=0.01306, audio_tagging_loss=0.01306, over 25000.00 frames. ], tot_loss[loss=0.01725, audio_tagging_loss=0.01725, over 1114104.80 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:54:35,196 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.740e+01 4.510e+01 5.078e+01 5.716e+01 1.112e+02, threshold=1.016e+02, percent-clipped=6.0 2023-12-24 06:54:44,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1557426.6666666667, ans=0.1 2023-12-24 06:55:04,531 INFO [train.py:886] (0/4) Epoch 50, batch 100, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01483, audio_tagging_loss=0.01483, over 1967604.65 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:55:07,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1557560.0, ans=0.125 2023-12-24 06:55:08,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1557560.0, ans=0.0 2023-12-24 06:55:27,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1557693.3333333333, ans=0.0 2023-12-24 06:55:27,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1557693.3333333333, ans=0.125 2023-12-24 06:55:45,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1557826.6666666667, ans=0.125 2023-12-24 06:55:49,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1557826.6666666667, ans=0.0 2023-12-24 06:55:49,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2023-12-24 06:55:53,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1557826.6666666667, ans=0.125 2023-12-24 06:55:54,685 INFO [train.py:886] (0/4) Epoch 50, batch 150, loss[loss=0.01304, audio_tagging_loss=0.01304, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 2632270.41 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:55:55,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2023-12-24 06:56:18,341 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.960e+01 4.220e+01 4.441e+01 4.666e+01 5.364e+01, threshold=8.881e+01, percent-clipped=0.0 2023-12-24 06:56:30,125 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2023-12-24 06:56:45,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1558160.0, ans=0.0 2023-12-24 06:56:47,106 INFO [train.py:886] (0/4) Epoch 50, batch 200, loss[loss=0.01059, audio_tagging_loss=0.01059, over 25000.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 3148216.24 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:56:53,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2023-12-24 06:56:59,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1558293.3333333333, ans=0.125 2023-12-24 06:57:12,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1558360.0, ans=0.1 2023-12-24 06:57:14,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1558360.0, ans=0.0 2023-12-24 06:57:15,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1558360.0, ans=0.0 2023-12-24 06:57:17,435 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2023-12-24 06:57:33,068 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2023-12-24 06:57:36,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1558560.0, ans=0.125 2023-12-24 06:57:37,401 INFO [train.py:886] (0/4) Epoch 50, batch 250, loss[loss=0.01523, audio_tagging_loss=0.01523, over 24949.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 3551203.23 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:57:46,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2023-12-24 06:57:57,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1558693.3333333333, ans=0.1 2023-12-24 06:58:00,319 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.630e+01 4.054e+01 4.252e+01 4.416e+01 4.947e+01, threshold=8.505e+01, percent-clipped=0.0 2023-12-24 06:58:03,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2023-12-24 06:58:09,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1558760.0, ans=15.0 2023-12-24 06:58:24,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1558826.6666666667, ans=0.1 2023-12-24 06:58:29,393 INFO [train.py:886] (0/4) Epoch 50, batch 300, loss[loss=0.0112, audio_tagging_loss=0.0112, over 24750.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 3860278.44 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:58:29,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1558893.3333333333, ans=0.125 2023-12-24 06:58:40,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1558960.0, ans=0.125 2023-12-24 06:58:55,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1559026.6666666667, ans=0.125 2023-12-24 06:58:56,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1559026.6666666667, ans=0.0 2023-12-24 06:59:09,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1559160.0, ans=0.0 2023-12-24 06:59:10,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1559160.0, ans=0.2 2023-12-24 06:59:20,860 INFO [train.py:886] (0/4) Epoch 50, batch 350, loss[loss=0.01172, audio_tagging_loss=0.01172, over 24750.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4100504.47 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:59:21,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1559226.6666666667, ans=0.09899494936611666 2023-12-24 06:59:27,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1559226.6666666667, ans=0.1 2023-12-24 06:59:32,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1559293.3333333333, ans=0.1 2023-12-24 06:59:43,072 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.541e+01 4.035e+01 4.228e+01 4.389e+01 4.773e+01, threshold=8.456e+01, percent-clipped=0.0 2023-12-24 06:59:48,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1559360.0, ans=0.125 2023-12-24 07:00:00,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1559426.6666666667, ans=0.1 2023-12-24 07:00:12,429 INFO [train.py:886] (0/4) Epoch 50, batch 400, loss[loss=0.008257, audio_tagging_loss=0.008257, over 25000.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4288712.55 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:00:18,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1559560.0, ans=0.0 2023-12-24 07:00:29,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1559626.6666666667, ans=0.125 2023-12-24 07:00:39,657 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=12.0 2023-12-24 07:00:52,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1559760.0, ans=0.0 2023-12-24 07:01:04,251 INFO [train.py:886] (0/4) Epoch 50, batch 450, loss[loss=0.01015, audio_tagging_loss=0.01015, over 24750.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4442961.57 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:01:28,041 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.735e+01 4.049e+01 4.184e+01 4.376e+01 4.940e+01, threshold=8.368e+01, percent-clipped=0.0 2023-12-24 07:01:28,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1560026.6666666667, ans=0.0 2023-12-24 07:01:32,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.69 vs. limit=15.0 2023-12-24 07:01:33,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1560026.6666666667, ans=0.125 2023-12-24 07:01:34,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2023-12-24 07:01:57,592 INFO [train.py:886] (0/4) Epoch 50, batch 500, loss[loss=0.008489, audio_tagging_loss=0.008489, over 24077.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4555560.45 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:02:03,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=12.0 2023-12-24 07:02:05,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2023-12-24 07:02:12,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1560293.3333333333, ans=0.125 2023-12-24 07:02:15,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1560293.3333333333, ans=0.1 2023-12-24 07:02:27,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=12.0 2023-12-24 07:02:38,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1560493.3333333333, ans=0.125 2023-12-24 07:02:49,010 INFO [train.py:886] (0/4) Epoch 50, batch 550, loss[loss=0.009856, audio_tagging_loss=0.009856, over 25000.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4645374.69 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:03:12,061 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.758e+01 4.103e+01 4.273e+01 4.476e+01 5.412e+01, threshold=8.546e+01, percent-clipped=0.0 2023-12-24 07:03:14,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1560693.3333333333, ans=0.125 2023-12-24 07:03:19,376 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2023-12-24 07:03:41,684 INFO [train.py:886] (0/4) Epoch 50, batch 600, loss[loss=0.009549, audio_tagging_loss=0.009549, over 22021.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4708669.98 frames. ], batch size: 107, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:04:15,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1561093.3333333333, ans=0.1 2023-12-24 07:04:16,937 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:04:17,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1561093.3333333333, ans=0.0 2023-12-24 07:04:28,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1561160.0, ans=0.1 2023-12-24 07:04:34,492 INFO [train.py:886] (0/4) Epoch 50, batch 650, loss[loss=0.01386, audio_tagging_loss=0.01386, over 24750.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4758635.83 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:04:46,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-12-24 07:04:53,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1561360.0, ans=0.0 2023-12-24 07:04:56,133 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.774e+01 4.125e+01 4.275e+01 4.523e+01 5.661e+01, threshold=8.549e+01, percent-clipped=0.0 2023-12-24 07:05:23,173 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=12.0 2023-12-24 07:05:25,709 INFO [train.py:886] (0/4) Epoch 50, batch 700, loss[loss=0.009408, audio_tagging_loss=0.009408, over 25000.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4794368.31 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:05:26,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1561560.0, ans=0.125 2023-12-24 07:05:44,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1561626.6666666667, ans=0.0 2023-12-24 07:05:49,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2023-12-24 07:05:56,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1561760.0, ans=0.0 2023-12-24 07:05:59,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1561760.0, ans=0.125 2023-12-24 07:06:18,171 INFO [train.py:886] (0/4) Epoch 50, batch 750, loss[loss=0.01215, audio_tagging_loss=0.01215, over 25000.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4831271.20 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:06:18,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1561893.3333333333, ans=0.0 2023-12-24 07:06:22,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1561893.3333333333, ans=0.125 2023-12-24 07:06:29,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1561960.0, ans=0.125 2023-12-24 07:06:37,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1562026.6666666667, ans=0.0 2023-12-24 07:06:40,457 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.722e+01 4.000e+01 4.156e+01 4.359e+01 5.451e+01, threshold=8.313e+01, percent-clipped=0.0 2023-12-24 07:06:42,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1562026.6666666667, ans=15.0 2023-12-24 07:06:57,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1562093.3333333333, ans=0.2 2023-12-24 07:07:09,173 INFO [train.py:886] (0/4) Epoch 50, batch 800, loss[loss=0.01188, audio_tagging_loss=0.01188, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4860383.45 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:07:10,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.24 vs. limit=22.5 2023-12-24 07:07:11,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2023-12-24 07:07:26,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1562293.3333333333, ans=0.1 2023-12-24 07:07:35,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1562360.0, ans=0.125 2023-12-24 07:07:36,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.46 vs. limit=15.0 2023-12-24 07:07:40,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1562426.6666666667, ans=0.125 2023-12-24 07:08:00,081 INFO [train.py:886] (0/4) Epoch 50, batch 850, loss[loss=0.01062, audio_tagging_loss=0.01062, over 25000.00 frames. ], tot_loss[loss=0.01052, audio_tagging_loss=0.01052, over 4886551.37 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:08:13,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1562626.6666666667, ans=0.125 2023-12-24 07:08:23,902 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.686e+01 3.999e+01 4.249e+01 4.473e+01 4.944e+01, threshold=8.498e+01, percent-clipped=0.0 2023-12-24 07:08:25,500 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=15.0 2023-12-24 07:08:26,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.26 vs. limit=15.0 2023-12-24 07:08:32,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1562760.0, ans=15.0 2023-12-24 07:08:42,694 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.38 vs. limit=5.0 2023-12-24 07:08:51,882 INFO [train.py:886] (0/4) Epoch 50, batch 900, loss[loss=0.01102, audio_tagging_loss=0.01102, over 24750.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4902217.32 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:09:16,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1563026.6666666667, ans=0.0 2023-12-24 07:09:27,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1563093.3333333333, ans=0.125 2023-12-24 07:09:33,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1563160.0, ans=0.04949747468305833 2023-12-24 07:09:37,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1563160.0, ans=0.125 2023-12-24 07:09:42,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1563160.0, ans=0.2 2023-12-24 07:09:43,834 INFO [train.py:886] (0/4) Epoch 50, batch 950, loss[loss=0.01071, audio_tagging_loss=0.01071, over 24750.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4909599.55 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:09:44,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1563226.6666666667, ans=0.1 2023-12-24 07:10:03,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1563293.3333333333, ans=0.025 2023-12-24 07:10:07,343 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.705e+01 4.079e+01 4.256e+01 4.403e+01 5.816e+01, threshold=8.513e+01, percent-clipped=0.0 2023-12-24 07:10:12,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1563360.0, ans=0.0 2023-12-24 07:10:13,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1563360.0, ans=0.125 2023-12-24 07:10:17,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1563426.6666666667, ans=0.1 2023-12-24 07:10:36,927 INFO [train.py:886] (0/4) Epoch 50, batch 1000, loss[loss=0.009646, audio_tagging_loss=0.009646, over 24750.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4915278.29 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:10:40,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1563560.0, ans=0.05 2023-12-24 07:10:48,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1563626.6666666667, ans=0.125 2023-12-24 07:10:56,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1563693.3333333333, ans=0.125 2023-12-24 07:11:13,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2023-12-24 07:11:28,040 INFO [train.py:886] (0/4) Epoch 50, batch 1050, loss[loss=0.01112, audio_tagging_loss=0.01112, over 25000.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4928991.67 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:11:39,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1563960.0, ans=0.1 2023-12-24 07:11:51,236 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.709e+01 4.047e+01 4.197e+01 4.433e+01 5.378e+01, threshold=8.395e+01, percent-clipped=0.0 2023-12-24 07:11:54,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1564026.6666666667, ans=0.1 2023-12-24 07:11:55,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1564026.6666666667, ans=0.125 2023-12-24 07:12:00,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1564093.3333333333, ans=0.1 2023-12-24 07:12:12,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1564160.0, ans=0.1 2023-12-24 07:12:20,614 INFO [train.py:886] (0/4) Epoch 50, batch 1100, loss[loss=0.009166, audio_tagging_loss=0.009166, over 24750.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4938665.18 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:12:21,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1564226.6666666667, ans=0.125 2023-12-24 07:12:55,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1564426.6666666667, ans=0.2 2023-12-24 07:13:08,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-12-24 07:13:10,939 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:13:12,696 INFO [train.py:886] (0/4) Epoch 50, batch 1150, loss[loss=0.009778, audio_tagging_loss=0.009778, over 24750.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4944456.10 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:13:28,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1564626.6666666667, ans=0.125 2023-12-24 07:13:30,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1564626.6666666667, ans=0.125 2023-12-24 07:13:34,236 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.707e+01 4.052e+01 4.234e+01 4.430e+01 4.895e+01, threshold=8.468e+01, percent-clipped=0.0 2023-12-24 07:13:49,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1564760.0, ans=0.0 2023-12-24 07:13:55,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.38 vs. limit=15.0 2023-12-24 07:14:03,669 INFO [train.py:886] (0/4) Epoch 50, batch 1200, loss[loss=0.009869, audio_tagging_loss=0.009869, over 25000.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4950966.07 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:14:13,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1564960.0, ans=0.0 2023-12-24 07:14:27,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1565026.6666666667, ans=0.0 2023-12-24 07:14:30,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1565026.6666666667, ans=0.125 2023-12-24 07:14:55,931 INFO [train.py:886] (0/4) Epoch 50, batch 1250, loss[loss=0.01085, audio_tagging_loss=0.01085, over 24750.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4948460.91 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:14:59,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1565226.6666666667, ans=0.125 2023-12-24 07:15:01,506 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:15:12,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1565293.3333333333, ans=0.95 2023-12-24 07:15:12,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1565293.3333333333, ans=0.125 2023-12-24 07:15:15,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1565360.0, ans=0.04949747468305833 2023-12-24 07:15:19,850 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.704e+01 4.120e+01 4.290e+01 4.537e+01 5.051e+01, threshold=8.580e+01, percent-clipped=0.0 2023-12-24 07:15:45,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1565493.3333333333, ans=0.2 2023-12-24 07:15:45,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1565493.3333333333, ans=0.125 2023-12-24 07:15:46,255 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:15:46,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1565493.3333333333, ans=0.125 2023-12-24 07:15:47,907 INFO [train.py:886] (0/4) Epoch 50, batch 1300, loss[loss=0.009589, audio_tagging_loss=0.009589, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4943818.29 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:16:13,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1565693.3333333333, ans=0.125 2023-12-24 07:16:15,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1565693.3333333333, ans=0.1 2023-12-24 07:16:18,487 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:16:19,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1565760.0, ans=0.2 2023-12-24 07:16:32,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1565826.6666666667, ans=0.025 2023-12-24 07:16:37,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1565826.6666666667, ans=0.95 2023-12-24 07:16:39,854 INFO [train.py:886] (0/4) Epoch 50, batch 1350, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4942127.64 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:16:47,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1565893.3333333333, ans=0.125 2023-12-24 07:17:01,927 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.679e+01 4.124e+01 4.286e+01 4.481e+01 5.287e+01, threshold=8.572e+01, percent-clipped=0.0 2023-12-24 07:17:02,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1566026.6666666667, ans=0.125 2023-12-24 07:17:02,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1566026.6666666667, ans=0.125 2023-12-24 07:17:04,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1566026.6666666667, ans=0.0 2023-12-24 07:17:07,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1566026.6666666667, ans=0.125 2023-12-24 07:17:08,814 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.40 vs. limit=22.5 2023-12-24 07:17:12,601 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2023-12-24 07:17:30,532 INFO [train.py:886] (0/4) Epoch 50, batch 1400, loss[loss=0.01044, audio_tagging_loss=0.01044, over 25000.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4944015.68 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:17:31,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1566226.6666666667, ans=0.125 2023-12-24 07:17:35,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1566226.6666666667, ans=0.0 2023-12-24 07:17:46,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1566293.3333333333, ans=0.5 2023-12-24 07:18:05,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1566426.6666666667, ans=0.125 2023-12-24 07:18:13,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1566493.3333333333, ans=0.125 2023-12-24 07:18:14,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1566493.3333333333, ans=0.125 2023-12-24 07:18:16,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1566493.3333333333, ans=0.1 2023-12-24 07:18:18,177 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-12-24 07:18:19,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1566493.3333333333, ans=0.125 2023-12-24 07:18:20,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.39 vs. limit=22.5 2023-12-24 07:18:21,454 INFO [train.py:886] (0/4) Epoch 50, batch 1450, loss[loss=0.00984, audio_tagging_loss=0.00984, over 25000.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4941418.49 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:18:31,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1566560.0, ans=0.0 2023-12-24 07:18:44,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-12-24 07:18:44,688 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.709e+01 4.091e+01 4.245e+01 4.456e+01 5.361e+01, threshold=8.489e+01, percent-clipped=0.0 2023-12-24 07:19:08,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1566826.6666666667, ans=0.125 2023-12-24 07:19:14,050 INFO [train.py:886] (0/4) Epoch 50, batch 1500, loss[loss=0.01331, audio_tagging_loss=0.01331, over 25000.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4947632.00 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:19:41,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1567026.6666666667, ans=0.2 2023-12-24 07:19:47,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1567093.3333333333, ans=0.1 2023-12-24 07:19:52,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1567093.3333333333, ans=0.125 2023-12-24 07:19:52,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1567093.3333333333, ans=0.125 2023-12-24 07:19:54,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1567160.0, ans=0.0 2023-12-24 07:19:55,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1567160.0, ans=0.0 2023-12-24 07:20:06,499 INFO [train.py:886] (0/4) Epoch 50, batch 1550, loss[loss=0.01103, audio_tagging_loss=0.01103, over 24955.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4941597.02 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:20:18,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1567293.3333333333, ans=10.0 2023-12-24 07:20:22,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=6.56 vs. limit=15.0 2023-12-24 07:20:27,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1567360.0, ans=0.125 2023-12-24 07:20:27,450 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2023-12-24 07:20:28,749 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.759e+01 4.163e+01 4.336e+01 4.479e+01 4.983e+01, threshold=8.671e+01, percent-clipped=0.0 2023-12-24 07:20:33,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1567360.0, ans=0.0 2023-12-24 07:20:39,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1567426.6666666667, ans=0.05 2023-12-24 07:20:47,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.61 vs. limit=12.0 2023-12-24 07:20:54,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1567493.3333333333, ans=0.0 2023-12-24 07:20:57,206 INFO [train.py:886] (0/4) Epoch 50, batch 1600, loss[loss=0.009852, audio_tagging_loss=0.009852, over 24750.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4932708.21 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:20:58,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1567560.0, ans=0.1 2023-12-24 07:21:19,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1567693.3333333333, ans=0.1 2023-12-24 07:21:34,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2023-12-24 07:21:40,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1567826.6666666667, ans=0.125 2023-12-24 07:21:43,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1567826.6666666667, ans=0.0 2023-12-24 07:21:49,916 INFO [train.py:886] (0/4) Epoch 50, batch 1650, loss[loss=0.01093, audio_tagging_loss=0.01093, over 21976.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4936241.59 frames. ], batch size: 107, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:22:00,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1567960.0, ans=0.1 2023-12-24 07:22:04,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1567960.0, ans=15.0 2023-12-24 07:22:05,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.86 vs. limit=6.0 2023-12-24 07:22:14,135 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.627e+01 4.076e+01 4.290e+01 4.470e+01 5.188e+01, threshold=8.579e+01, percent-clipped=0.0 2023-12-24 07:22:19,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1568026.6666666667, ans=0.2 2023-12-24 07:22:21,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1568093.3333333333, ans=0.125 2023-12-24 07:22:42,264 INFO [train.py:886] (0/4) Epoch 50, batch 1700, loss[loss=0.01097, audio_tagging_loss=0.01097, over 25000.00 frames. ], tot_loss[loss=0.01046, audio_tagging_loss=0.01046, over 4937713.39 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:22:48,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1568226.6666666667, ans=0.2 2023-12-24 07:23:06,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1568360.0, ans=0.125 2023-12-24 07:23:08,641 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:23:22,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1568426.6666666667, ans=0.1 2023-12-24 07:23:25,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.21 vs. limit=6.0 2023-12-24 07:23:34,031 INFO [train.py:886] (0/4) Epoch 50, batch 1750, loss[loss=0.01171, audio_tagging_loss=0.01171, over 21885.00 frames. ], tot_loss[loss=0.01036, audio_tagging_loss=0.01036, over 4940587.69 frames. ], batch size: 107, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:23:41,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1568560.0, ans=0.0 2023-12-24 07:23:52,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1568626.6666666667, ans=0.1 2023-12-24 07:23:57,032 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.573e+01 3.982e+01 4.200e+01 4.347e+01 4.919e+01, threshold=8.401e+01, percent-clipped=0.0 2023-12-24 07:24:06,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1568760.0, ans=0.0 2023-12-24 07:24:20,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1568826.6666666667, ans=0.0 2023-12-24 07:24:26,388 INFO [train.py:886] (0/4) Epoch 50, batch 1800, loss[loss=0.009087, audio_tagging_loss=0.009087, over 25000.00 frames. ], tot_loss[loss=0.01034, audio_tagging_loss=0.01034, over 4948399.04 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:24:28,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2023-12-24 07:24:38,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2023-12-24 07:24:40,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1568960.0, ans=0.2 2023-12-24 07:24:52,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1569026.6666666667, ans=0.125 2023-12-24 07:25:01,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1569093.3333333333, ans=0.07 2023-12-24 07:25:11,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.57 vs. limit=22.5 2023-12-24 07:25:16,809 INFO [train.py:886] (0/4) Epoch 50, batch 1850, loss[loss=0.00814, audio_tagging_loss=0.00814, over 24750.00 frames. ], tot_loss[loss=0.01042, audio_tagging_loss=0.01042, over 4949800.05 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:25:25,666 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.70 vs. limit=22.5 2023-12-24 07:25:32,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1569293.3333333333, ans=0.1 2023-12-24 07:25:40,870 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.702e+01 4.153e+01 4.266e+01 4.443e+01 5.377e+01, threshold=8.532e+01, percent-clipped=0.0 2023-12-24 07:26:10,197 INFO [train.py:886] (0/4) Epoch 50, batch 1900, loss[loss=0.01064, audio_tagging_loss=0.01064, over 24750.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4945057.54 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:26:14,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1569560.0, ans=0.125 2023-12-24 07:26:17,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1569560.0, ans=0.125 2023-12-24 07:26:57,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2023-12-24 07:26:57,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1569826.6666666667, ans=0.125 2023-12-24 07:27:01,924 INFO [train.py:886] (0/4) Epoch 50, batch 1950, loss[loss=0.009873, audio_tagging_loss=0.009873, over 24750.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4939327.79 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:27:11,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1569960.0, ans=0.1 2023-12-24 07:27:17,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1569960.0, ans=0.125 2023-12-24 07:27:23,191 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.684e+01 4.036e+01 4.252e+01 4.490e+01 5.188e+01, threshold=8.504e+01, percent-clipped=0.0 2023-12-24 07:27:24,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1570026.6666666667, ans=0.125 2023-12-24 07:27:26,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1570026.6666666667, ans=0.125 2023-12-24 07:27:30,577 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.00 vs. limit=15.0 2023-12-24 07:27:34,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1570093.3333333333, ans=0.125 2023-12-24 07:27:36,131 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2023-12-24 07:27:45,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1570160.0, ans=0.05 2023-12-24 07:27:50,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1570226.6666666667, ans=10.0 2023-12-24 07:27:51,714 INFO [train.py:886] (0/4) Epoch 50, batch 2000, loss[loss=0.01073, audio_tagging_loss=0.01073, over 24750.00 frames. ], tot_loss[loss=0.0104, audio_tagging_loss=0.0104, over 4936426.91 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:28:26,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1570426.6666666667, ans=0.2 2023-12-24 07:28:29,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1570426.6666666667, ans=0.125 2023-12-24 07:28:44,773 INFO [train.py:886] (0/4) Epoch 50, batch 2050, loss[loss=0.01184, audio_tagging_loss=0.01184, over 25000.00 frames. ], tot_loss[loss=0.01035, audio_tagging_loss=0.01035, over 4942877.99 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:28:54,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1570626.6666666667, ans=0.125 2023-12-24 07:29:07,140 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.692e+01 4.007e+01 4.186e+01 4.409e+01 4.904e+01, threshold=8.372e+01, percent-clipped=0.0 2023-12-24 07:29:18,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1570760.0, ans=0.0 2023-12-24 07:29:23,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1570760.0, ans=0.025 2023-12-24 07:29:25,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.67 vs. limit=15.0 2023-12-24 07:29:31,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1570826.6666666667, ans=0.125 2023-12-24 07:29:35,802 INFO [train.py:886] (0/4) Epoch 50, batch 2100, loss[loss=0.01151, audio_tagging_loss=0.01151, over 25000.00 frames. ], tot_loss[loss=0.01033, audio_tagging_loss=0.01033, over 4949687.97 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:29:54,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1570960.0, ans=0.125 2023-12-24 07:30:19,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.63 vs. limit=6.0 2023-12-24 07:30:25,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1571160.0, ans=0.1 2023-12-24 07:30:28,616 INFO [train.py:886] (0/4) Epoch 50, batch 2150, loss[loss=0.01134, audio_tagging_loss=0.01134, over 25000.00 frames. ], tot_loss[loss=0.01039, audio_tagging_loss=0.01039, over 4955660.99 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:30:49,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1571360.0, ans=0.125 2023-12-24 07:30:51,640 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.676e+01 4.091e+01 4.279e+01 4.499e+01 5.273e+01, threshold=8.558e+01, percent-clipped=0.0 2023-12-24 07:30:59,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1571426.6666666667, ans=0.05 2023-12-24 07:31:01,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1571426.6666666667, ans=0.125 2023-12-24 07:31:03,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2023-12-24 07:31:11,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1571493.3333333333, ans=0.1 2023-12-24 07:31:20,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1571560.0, ans=0.0 2023-12-24 07:31:21,069 INFO [train.py:886] (0/4) Epoch 50, batch 2200, loss[loss=0.01314, audio_tagging_loss=0.01314, over 24750.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4946022.15 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:31:21,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1571560.0, ans=0.0 2023-12-24 07:31:45,748 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-12-24 07:31:54,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1571760.0, ans=0.0 2023-12-24 07:32:12,248 INFO [train.py:886] (0/4) Epoch 50, batch 2250, loss[loss=0.01071, audio_tagging_loss=0.01071, over 25000.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4942409.41 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:32:13,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1571893.3333333333, ans=0.05 2023-12-24 07:32:20,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1571893.3333333333, ans=0.125 2023-12-24 07:32:24,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1571960.0, ans=0.07 2023-12-24 07:32:25,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2023-12-24 07:32:35,509 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.741e+01 4.093e+01 4.254e+01 4.470e+01 6.173e+01, threshold=8.508e+01, percent-clipped=0.0 2023-12-24 07:32:59,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.43 vs. limit=5.0 2023-12-24 07:33:04,676 INFO [train.py:886] (0/4) Epoch 50, batch 2300, loss[loss=0.0121, audio_tagging_loss=0.0121, over 24750.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4940538.36 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:33:07,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.99 vs. limit=15.0 2023-12-24 07:33:25,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1572360.0, ans=0.0 2023-12-24 07:33:32,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1572360.0, ans=0.0 2023-12-24 07:33:39,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1572426.6666666667, ans=0.125 2023-12-24 07:33:49,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1572493.3333333333, ans=0.5 2023-12-24 07:33:56,435 INFO [train.py:886] (0/4) Epoch 50, batch 2350, loss[loss=0.009241, audio_tagging_loss=0.009241, over 24750.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4943806.32 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:34:16,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1572693.3333333333, ans=0.125 2023-12-24 07:34:18,580 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.700e+01 4.055e+01 4.217e+01 4.418e+01 5.746e+01, threshold=8.434e+01, percent-clipped=0.0 2023-12-24 07:34:30,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1572760.0, ans=0.125 2023-12-24 07:34:48,217 INFO [train.py:886] (0/4) Epoch 50, batch 2400, loss[loss=0.009918, audio_tagging_loss=0.009918, over 25000.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4953840.66 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:34:59,707 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=15.0 2023-12-24 07:35:07,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1572960.0, ans=0.2 2023-12-24 07:35:12,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1573026.6666666667, ans=0.0 2023-12-24 07:35:16,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1573026.6666666667, ans=0.125 2023-12-24 07:35:36,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1573160.0, ans=0.2 2023-12-24 07:35:40,405 INFO [train.py:886] (0/4) Epoch 50, batch 2450, loss[loss=0.01139, audio_tagging_loss=0.01139, over 25000.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4960161.60 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:35:43,445 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:35:45,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1573226.6666666667, ans=0.0 2023-12-24 07:35:55,612 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/checkpoint-236000.pt 2023-12-24 07:36:04,786 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.654e+01 4.083e+01 4.271e+01 4.433e+01 5.085e+01, threshold=8.543e+01, percent-clipped=0.0 2023-12-24 07:36:05,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1573360.0, ans=0.0 2023-12-24 07:36:25,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1573493.3333333333, ans=0.2 2023-12-24 07:36:30,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1573493.3333333333, ans=0.125 2023-12-24 07:36:33,464 INFO [train.py:886] (0/4) Epoch 50, batch 2500, loss[loss=0.01412, audio_tagging_loss=0.01412, over 24750.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4954168.90 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:36:39,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1573560.0, ans=0.125 2023-12-24 07:36:40,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2023-12-24 07:36:56,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.87 vs. limit=22.5 2023-12-24 07:36:59,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1573693.3333333333, ans=0.125 2023-12-24 07:37:03,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1573760.0, ans=0.0 2023-12-24 07:37:07,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1573760.0, ans=0.125 2023-12-24 07:37:08,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1573760.0, ans=0.0 2023-12-24 07:37:19,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1573826.6666666667, ans=0.0 2023-12-24 07:37:25,230 INFO [train.py:886] (0/4) Epoch 50, batch 2550, loss[loss=0.01066, audio_tagging_loss=0.01066, over 24026.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4942572.64 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:37:26,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1573893.3333333333, ans=0.0 2023-12-24 07:37:31,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1573893.3333333333, ans=0.125 2023-12-24 07:37:36,965 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2023-12-24 07:37:40,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1573960.0, ans=0.125 2023-12-24 07:37:49,907 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.655e+01 4.126e+01 4.329e+01 4.523e+01 5.381e+01, threshold=8.659e+01, percent-clipped=0.0 2023-12-24 07:37:50,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1574026.6666666667, ans=0.1 2023-12-24 07:38:04,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1574093.3333333333, ans=0.04949747468305833 2023-12-24 07:38:05,183 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2023-12-24 07:38:18,380 INFO [train.py:886] (0/4) Epoch 50, batch 2600, loss[loss=0.009827, audio_tagging_loss=0.009827, over 25000.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4945313.10 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:38:29,100 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2023-12-24 07:38:41,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1574360.0, ans=0.125 2023-12-24 07:38:50,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1574426.6666666667, ans=0.125 2023-12-24 07:38:51,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1574426.6666666667, ans=22.5 2023-12-24 07:39:03,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1574493.3333333333, ans=0.125 2023-12-24 07:39:09,500 INFO [train.py:886] (0/4) Epoch 50, batch 2650, loss[loss=0.0097, audio_tagging_loss=0.0097, over 25000.00 frames. ], tot_loss[loss=0.01047, audio_tagging_loss=0.01047, over 4950325.42 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:39:13,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1574560.0, ans=0.0 2023-12-24 07:39:16,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2023-12-24 07:39:23,302 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:39:33,612 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.680e+01 4.070e+01 4.297e+01 4.488e+01 5.436e+01, threshold=8.593e+01, percent-clipped=0.0 2023-12-24 07:39:33,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1574693.3333333333, ans=0.1 2023-12-24 07:39:35,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1574693.3333333333, ans=0.5 2023-12-24 07:39:42,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1574760.0, ans=0.0 2023-12-24 07:39:57,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1574826.6666666667, ans=22.5 2023-12-24 07:40:01,871 INFO [train.py:886] (0/4) Epoch 50, batch 2700, loss[loss=0.009311, audio_tagging_loss=0.009311, over 25000.00 frames. ], tot_loss[loss=0.01042, audio_tagging_loss=0.01042, over 4946642.31 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:40:15,230 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:40:17,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1574960.0, ans=0.125 2023-12-24 07:40:50,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1575160.0, ans=0.035 2023-12-24 07:40:53,312 INFO [train.py:886] (0/4) Epoch 50, batch 2750, loss[loss=0.01188, audio_tagging_loss=0.01188, over 25000.00 frames. ], tot_loss[loss=0.01043, audio_tagging_loss=0.01043, over 4953093.22 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:40:56,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1575226.6666666667, ans=0.1 2023-12-24 07:41:08,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1575293.3333333333, ans=0.1 2023-12-24 07:41:16,433 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.715e+01 4.042e+01 4.295e+01 4.516e+01 5.122e+01, threshold=8.590e+01, percent-clipped=0.0 2023-12-24 07:41:20,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.56 vs. limit=15.0 2023-12-24 07:41:28,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=12.0 2023-12-24 07:41:40,143 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-12-24 07:41:42,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1575493.3333333333, ans=10.0 2023-12-24 07:41:43,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1575493.3333333333, ans=0.125 2023-12-24 07:41:45,217 INFO [train.py:886] (0/4) Epoch 50, batch 2800, loss[loss=0.01113, audio_tagging_loss=0.01113, over 24950.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4955816.22 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:41:46,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1575560.0, ans=0.2 2023-12-24 07:41:52,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1575560.0, ans=0.2 2023-12-24 07:42:11,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1575693.3333333333, ans=0.2 2023-12-24 07:42:15,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1575760.0, ans=0.125 2023-12-24 07:42:17,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1575760.0, ans=0.0 2023-12-24 07:42:23,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1575760.0, ans=0.035 2023-12-24 07:42:32,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1575826.6666666667, ans=0.125 2023-12-24 07:42:38,595 INFO [train.py:886] (0/4) Epoch 50, batch 2850, loss[loss=0.008076, audio_tagging_loss=0.008076, over 23982.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4951733.79 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:42:50,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1575960.0, ans=0.125 2023-12-24 07:42:52,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1575960.0, ans=0.0 2023-12-24 07:42:55,376 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2023-12-24 07:43:01,386 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.781e+01 4.104e+01 4.361e+01 4.546e+01 5.152e+01, threshold=8.721e+01, percent-clipped=0.0 2023-12-24 07:43:04,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=22.5 2023-12-24 07:43:09,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1576093.3333333333, ans=0.1 2023-12-24 07:43:23,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1576160.0, ans=0.2 2023-12-24 07:43:27,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-12-24 07:43:28,289 INFO [train.py:886] (0/4) Epoch 50, batch 2900, loss[loss=0.01019, audio_tagging_loss=0.01019, over 25000.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4953212.17 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:43:52,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2023-12-24 07:43:54,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1576360.0, ans=0.2 2023-12-24 07:44:11,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.20 vs. limit=15.0 2023-12-24 07:44:20,091 INFO [train.py:886] (0/4) Epoch 50, batch 2950, loss[loss=0.01045, audio_tagging_loss=0.01045, over 25000.00 frames. ], tot_loss[loss=0.01044, audio_tagging_loss=0.01044, over 4949269.39 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:44:24,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1576560.0, ans=0.2 2023-12-24 07:44:25,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1576560.0, ans=0.05 2023-12-24 07:44:27,999 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:44:35,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1576626.6666666667, ans=0.0 2023-12-24 07:44:39,040 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.75 vs. limit=10.0 2023-12-24 07:44:44,664 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.679e+01 4.048e+01 4.207e+01 4.410e+01 5.096e+01, threshold=8.415e+01, percent-clipped=0.0 2023-12-24 07:45:12,380 INFO [train.py:886] (0/4) Epoch 50, batch 3000, loss[loss=0.01063, audio_tagging_loss=0.01063, over 24750.00 frames. ], tot_loss[loss=0.01038, audio_tagging_loss=0.01038, over 4950964.54 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:45:12,382 INFO [train.py:909] (0/4) Computing validation loss 2023-12-24 07:45:33,534 INFO [train.py:917] (0/4) Epoch 50, validation: loss=0.03799, audio_tagging_loss=0.03799, over 3737520.00 frames. 2023-12-24 07:45:33,535 INFO [train.py:918] (0/4) Maximum memory allocated so far is 14759MB 2023-12-24 07:45:40,094 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2023-12-24 07:45:45,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1576960.0, ans=0.2 2023-12-24 07:46:14,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1577160.0, ans=0.125 2023-12-24 07:46:15,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1577160.0, ans=0.125 2023-12-24 07:46:18,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1577160.0, ans=0.0 2023-12-24 07:46:25,132 INFO [train.py:886] (0/4) Epoch 50, batch 3050, loss[loss=0.009244, audio_tagging_loss=0.009244, over 24026.00 frames. ], tot_loss[loss=0.01031, audio_tagging_loss=0.01031, over 4950448.85 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:46:31,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1577226.6666666667, ans=0.05 2023-12-24 07:46:43,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1577293.3333333333, ans=0.2 2023-12-24 07:46:49,326 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.806e+01 4.000e+01 4.179e+01 4.391e+01 4.830e+01, threshold=8.357e+01, percent-clipped=0.0 2023-12-24 07:46:52,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1577360.0, ans=0.125 2023-12-24 07:46:56,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1577426.6666666667, ans=0.1 2023-12-24 07:47:08,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2023-12-24 07:47:16,862 INFO [train.py:886] (0/4) Epoch 50, batch 3100, loss[loss=0.01072, audio_tagging_loss=0.01072, over 24750.00 frames. ], tot_loss[loss=0.01033, audio_tagging_loss=0.01033, over 4949621.74 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:47:18,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1577560.0, ans=0.125 2023-12-24 07:47:33,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1577626.6666666667, ans=0.07 2023-12-24 07:47:38,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1577693.3333333333, ans=0.1 2023-12-24 07:47:50,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2023-12-24 07:47:57,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1577826.6666666667, ans=0.125 2023-12-24 07:48:06,167 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=22.5 2023-12-24 07:48:07,470 INFO [train.py:886] (0/4) Epoch 50, batch 3150, loss[loss=0.01119, audio_tagging_loss=0.01119, over 24750.00 frames. ], tot_loss[loss=0.01049, audio_tagging_loss=0.01049, over 4950404.52 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:48:10,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1577893.3333333333, ans=0.125 2023-12-24 07:48:22,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1577960.0, ans=0.0 2023-12-24 07:48:24,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1577960.0, ans=0.025 2023-12-24 07:48:31,861 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.715e+01 4.159e+01 4.326e+01 4.547e+01 5.411e+01, threshold=8.653e+01, percent-clipped=0.0 2023-12-24 07:48:41,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1578093.3333333333, ans=0.0 2023-12-24 07:48:42,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1578093.3333333333, ans=0.125 2023-12-24 07:48:52,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1578160.0, ans=0.0 2023-12-24 07:48:53,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1578160.0, ans=0.0 2023-12-24 07:48:53,463 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2023-12-24 07:49:00,289 INFO [train.py:886] (0/4) Epoch 50, batch 3200, loss[loss=0.0113, audio_tagging_loss=0.0113, over 24750.00 frames. ], tot_loss[loss=0.01049, audio_tagging_loss=0.01049, over 4950787.71 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:49:06,173 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:49:09,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1578293.3333333333, ans=0.125 2023-12-24 07:49:31,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1578426.6666666667, ans=0.0 2023-12-24 07:49:39,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1578426.6666666667, ans=0.125 2023-12-24 07:49:52,079 INFO [train.py:886] (0/4) Epoch 50, batch 3250, loss[loss=0.008633, audio_tagging_loss=0.008633, over 25000.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4951436.21 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:49:54,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.59 vs. limit=10.0 2023-12-24 07:49:55,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1578560.0, ans=0.5 2023-12-24 07:50:11,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1578626.6666666667, ans=0.125 2023-12-24 07:50:14,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1578693.3333333333, ans=0.2 2023-12-24 07:50:15,400 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.606e+01 4.040e+01 4.194e+01 4.403e+01 5.112e+01, threshold=8.389e+01, percent-clipped=0.0 2023-12-24 07:50:34,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1578826.6666666667, ans=0.0 2023-12-24 07:50:38,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1578826.6666666667, ans=0.1 2023-12-24 07:50:41,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1578826.6666666667, ans=0.2 2023-12-24 07:50:41,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1578826.6666666667, ans=0.025 2023-12-24 07:50:44,524 INFO [train.py:886] (0/4) Epoch 50, batch 3300, loss[loss=0.01214, audio_tagging_loss=0.01214, over 25000.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4954394.31 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:50:44,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1578893.3333333333, ans=0.125 2023-12-24 07:50:51,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1578893.3333333333, ans=0.2 2023-12-24 07:51:06,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2023-12-24 07:51:22,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1579093.3333333333, ans=0.125 2023-12-24 07:51:27,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1579160.0, ans=0.125 2023-12-24 07:51:36,607 INFO [train.py:886] (0/4) Epoch 50, batch 3350, loss[loss=0.009932, audio_tagging_loss=0.009932, over 25000.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4957505.95 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:51:42,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1579226.6666666667, ans=0.125 2023-12-24 07:51:46,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1579293.3333333333, ans=0.2 2023-12-24 07:51:59,929 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.696e+01 4.063e+01 4.248e+01 4.414e+01 5.248e+01, threshold=8.495e+01, percent-clipped=0.0 2023-12-24 07:52:22,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.52 vs. limit=15.0 2023-12-24 07:52:25,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1579493.3333333333, ans=0.125 2023-12-24 07:52:27,568 INFO [train.py:886] (0/4) Epoch 50, batch 3400, loss[loss=0.009376, audio_tagging_loss=0.009376, over 25000.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4961149.25 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:52:41,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1579626.6666666667, ans=0.125 2023-12-24 07:52:54,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.47 vs. limit=12.0 2023-12-24 07:53:01,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2023-12-24 07:53:15,040 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2023-12-24 07:53:20,142 INFO [train.py:886] (0/4) Epoch 50, batch 3450, loss[loss=0.01029, audio_tagging_loss=0.01029, over 22287.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4949804.48 frames. ], batch size: 107, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:53:24,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1579893.3333333333, ans=0.0 2023-12-24 07:53:45,038 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.673e+01 4.041e+01 4.285e+01 4.464e+01 5.704e+01, threshold=8.570e+01, percent-clipped=0.0 2023-12-24 07:54:05,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1580160.0, ans=0.2 2023-12-24 07:54:13,384 INFO [train.py:886] (0/4) Epoch 50, batch 3500, loss[loss=0.01091, audio_tagging_loss=0.01091, over 24750.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4941247.82 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:54:19,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1580226.6666666667, ans=0.125 2023-12-24 07:54:43,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1580426.6666666667, ans=0.2 2023-12-24 07:54:55,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1580493.3333333333, ans=0.0 2023-12-24 07:55:04,409 INFO [train.py:886] (0/4) Epoch 50, batch 3550, loss[loss=0.01197, audio_tagging_loss=0.01197, over 25000.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4942657.82 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:55:04,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1580560.0, ans=0.1 2023-12-24 07:55:28,354 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.681e+01 4.036e+01 4.198e+01 4.391e+01 5.355e+01, threshold=8.396e+01, percent-clipped=0.0 2023-12-24 07:55:28,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1580693.3333333333, ans=0.125 2023-12-24 07:55:36,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1580760.0, ans=0.125 2023-12-24 07:55:38,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.31 vs. limit=22.5 2023-12-24 07:55:41,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1580760.0, ans=0.2 2023-12-24 07:55:56,990 INFO [train.py:886] (0/4) Epoch 50, batch 3600, loss[loss=0.01156, audio_tagging_loss=0.01156, over 24750.00 frames. ], tot_loss[loss=0.01037, audio_tagging_loss=0.01037, over 4944035.71 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:56:11,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1580960.0, ans=0.0 2023-12-24 07:56:21,254 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.63 vs. limit=10.0 2023-12-24 07:56:27,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1581093.3333333333, ans=0.0 2023-12-24 07:56:34,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2023-12-24 07:56:41,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=22.5 2023-12-24 07:56:43,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1581160.0, ans=0.2 2023-12-24 07:56:45,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1581160.0, ans=0.1 2023-12-24 07:56:48,313 INFO [train.py:886] (0/4) Epoch 50, batch 3650, loss[loss=0.01147, audio_tagging_loss=0.01147, over 24910.00 frames. ], tot_loss[loss=0.01034, audio_tagging_loss=0.01034, over 4950636.19 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:56:49,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1581226.6666666667, ans=0.0 2023-12-24 07:56:53,521 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=22.5 2023-12-24 07:56:56,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1581226.6666666667, ans=0.2 2023-12-24 07:57:08,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.86 vs. limit=15.0 2023-12-24 07:57:11,802 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.587e+01 3.982e+01 4.158e+01 4.360e+01 5.165e+01, threshold=8.317e+01, percent-clipped=0.0 2023-12-24 07:57:14,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1581360.0, ans=0.0 2023-12-24 07:57:18,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1581426.6666666667, ans=0.1 2023-12-24 07:57:35,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2023-12-24 07:57:40,461 INFO [train.py:886] (0/4) Epoch 50, batch 3700, loss[loss=0.01082, audio_tagging_loss=0.01082, over 24750.00 frames. ], tot_loss[loss=0.01032, audio_tagging_loss=0.01032, over 4948169.93 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 16.0 2023-12-24 07:57:43,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1581560.0, ans=0.125 2023-12-24 07:58:13,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1581760.0, ans=0.2 2023-12-24 07:58:16,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1581760.0, ans=0.1 2023-12-24 07:58:33,651 INFO [train.py:886] (0/4) Epoch 50, batch 3750, loss[loss=0.009254, audio_tagging_loss=0.009254, over 21576.00 frames. ], tot_loss[loss=0.01041, audio_tagging_loss=0.01041, over 4940235.82 frames. ], batch size: 107, lr: 2.13e-03, grad_scale: 16.0 2023-12-24 07:58:34,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1581893.3333333333, ans=0.125 2023-12-24 07:58:37,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1581893.3333333333, ans=15.0 2023-12-24 07:58:39,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1581893.3333333333, ans=0.0 2023-12-24 07:58:40,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2023-12-24 07:58:45,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1581960.0, ans=0.1 2023-12-24 07:58:51,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1581960.0, ans=0.0 2023-12-24 07:58:54,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.81 vs. limit=22.5 2023-12-24 07:58:55,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1582026.6666666667, ans=0.0 2023-12-24 07:58:57,809 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.781e+01 4.161e+01 4.351e+01 4.483e+01 8.860e+01, threshold=8.701e+01, percent-clipped=1.0 2023-12-24 07:58:58,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1582026.6666666667, ans=0.0 2023-12-24 07:59:08,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1582093.3333333333, ans=0.0 2023-12-24 07:59:19,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=12.0 2023-12-24 07:59:24,590 INFO [train.py:886] (0/4) Epoch 50, batch 3800, loss[loss=0.01013, audio_tagging_loss=0.01013, over 24750.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4939746.55 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 16.0 2023-12-24 07:59:38,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1582293.3333333333, ans=0.04949747468305833 2023-12-24 07:59:51,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1582360.0, ans=0.125 2023-12-24 07:59:56,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1582426.6666666667, ans=0.125 2023-12-24 08:00:13,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1582493.3333333333, ans=0.1 2023-12-24 08:00:15,254 INFO [train.py:886] (0/4) Epoch 50, batch 3850, loss[loss=0.01036, audio_tagging_loss=0.01036, over 24036.00 frames. ], tot_loss[loss=0.01038, audio_tagging_loss=0.01038, over 4939066.18 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 16.0 2023-12-24 08:00:38,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1582693.3333333333, ans=0.125 2023-12-24 08:00:40,106 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.778e+01 4.086e+01 4.269e+01 4.439e+01 5.542e+01, threshold=8.539e+01, percent-clipped=0.0 2023-12-24 08:00:42,642 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=22.5 2023-12-24 08:00:44,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1582693.3333333333, ans=0.0 2023-12-24 08:00:47,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2023-12-24 08:01:06,012 INFO [train.py:886] (0/4) Epoch 50, batch 3900, loss[loss=0.008645, audio_tagging_loss=0.008645, over 24750.00 frames. ], tot_loss[loss=0.01041, audio_tagging_loss=0.01041, over 4939811.62 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 16.0 2023-12-24 08:01:13,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2023-12-24 08:01:20,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1582960.0, ans=0.5 2023-12-24 08:01:28,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1583026.6666666667, ans=0.125 2023-12-24 08:01:56,777 INFO [train.py:886] (0/4) Epoch 50, batch 3950, loss[loss=0.009233, audio_tagging_loss=0.009233, over 25000.00 frames. ], tot_loss[loss=0.0104, audio_tagging_loss=0.0104, over 4945967.66 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 16.0 2023-12-24 08:01:56,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1583226.6666666667, ans=0.125 2023-12-24 08:02:19,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1583360.0, ans=0.125 2023-12-24 08:02:21,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2023-12-24 08:02:22,394 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.534e+01 4.014e+01 4.237e+01 4.371e+01 9.981e+01, threshold=8.474e+01, percent-clipped=1.0 2023-12-24 08:02:29,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1583426.6666666667, ans=0.05 2023-12-24 08:02:39,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2023-12-24 08:02:50,039 INFO [train.py:886] (0/4) Epoch 50, batch 4000, loss[loss=0.009863, audio_tagging_loss=0.009863, over 24750.00 frames. ], tot_loss[loss=0.01043, audio_tagging_loss=0.01043, over 4952795.11 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:02:50,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1583560.0, ans=0.125 2023-12-24 08:02:55,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1583560.0, ans=0.1 2023-12-24 08:03:06,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1583626.6666666667, ans=0.125 2023-12-24 08:03:07,459 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.37 vs. limit=10.0 2023-12-24 08:03:20,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1583760.0, ans=0.125 2023-12-24 08:03:40,184 INFO [train.py:886] (0/4) Epoch 50, batch 4050, loss[loss=0.01302, audio_tagging_loss=0.01302, over 24750.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4954452.99 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:03:53,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1583960.0, ans=0.1 2023-12-24 08:03:54,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1583960.0, ans=0.125 2023-12-24 08:03:59,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1583960.0, ans=0.2 2023-12-24 08:04:05,078 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.755e+01 4.142e+01 4.284e+01 4.513e+01 5.002e+01, threshold=8.568e+01, percent-clipped=0.0 2023-12-24 08:04:19,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1584093.3333333333, ans=0.125 2023-12-24 08:04:31,971 INFO [train.py:886] (0/4) Epoch 50, batch 4100, loss[loss=0.01268, audio_tagging_loss=0.01268, over 24750.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4949882.99 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:04:41,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1584293.3333333333, ans=0.125 2023-12-24 08:04:42,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1584293.3333333333, ans=0.125 2023-12-24 08:04:48,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1584293.3333333333, ans=0.0 2023-12-24 08:05:23,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2023-12-24 08:05:24,702 INFO [train.py:886] (0/4) Epoch 50, batch 4150, loss[loss=0.008309, audio_tagging_loss=0.008309, over 25000.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4937753.74 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:05:26,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1584560.0, ans=0.0 2023-12-24 08:05:32,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1584560.0, ans=0.125 2023-12-24 08:05:33,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1584560.0, ans=0.125 2023-12-24 08:05:48,084 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.703e+01 4.057e+01 4.232e+01 4.457e+01 4.913e+01, threshold=8.465e+01, percent-clipped=0.0 2023-12-24 08:05:51,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1584693.3333333333, ans=0.125 2023-12-24 08:06:10,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1584826.6666666667, ans=0.0 2023-12-24 08:06:13,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1584826.6666666667, ans=0.0 2023-12-24 08:06:15,655 INFO [train.py:886] (0/4) Epoch 50, batch 4200, loss[loss=0.01071, audio_tagging_loss=0.01071, over 25000.00 frames. ], tot_loss[loss=0.01045, audio_tagging_loss=0.01045, over 4944587.57 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:06:24,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1584893.3333333333, ans=0.125 2023-12-24 08:06:35,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=12.0 2023-12-24 08:06:35,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1584960.0, ans=0.125 2023-12-24 08:06:50,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1585093.3333333333, ans=0.0 2023-12-24 08:06:52,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1585093.3333333333, ans=0.1 2023-12-24 08:07:01,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1585160.0, ans=0.1 2023-12-24 08:07:08,512 INFO [train.py:886] (0/4) Epoch 50, batch 4250, loss[loss=0.006782, audio_tagging_loss=0.006782, over 22644.00 frames. ], tot_loss[loss=0.0104, audio_tagging_loss=0.0104, over 4944035.60 frames. ], batch size: 107, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:07:25,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1585293.3333333333, ans=0.125 2023-12-24 08:07:32,775 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.753e+01 4.073e+01 4.229e+01 4.382e+01 5.254e+01, threshold=8.458e+01, percent-clipped=0.0 2023-12-24 08:07:37,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1585360.0, ans=0.125 2023-12-24 08:07:39,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1585426.6666666667, ans=0.1 2023-12-24 08:07:57,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1585493.3333333333, ans=0.125 2023-12-24 08:07:58,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1585560.0, ans=0.2 2023-12-24 08:07:58,895 INFO [train.py:886] (0/4) Epoch 50, batch 4300, loss[loss=0.01108, audio_tagging_loss=0.01108, over 25000.00 frames. ], tot_loss[loss=0.01043, audio_tagging_loss=0.01043, over 4946166.51 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:08:04,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1585560.0, ans=0.125 2023-12-24 08:08:09,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2023-12-24 08:08:33,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1585760.0, ans=0.125 2023-12-24 08:08:52,023 INFO [train.py:886] (0/4) Epoch 50, batch 4350, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.01045, audio_tagging_loss=0.01045, over 4951327.38 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:09:00,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1585960.0, ans=0.125 2023-12-24 08:09:16,918 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.615e+01 4.119e+01 4.307e+01 4.459e+01 5.187e+01, threshold=8.614e+01, percent-clipped=0.0 2023-12-24 08:09:17,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1586026.6666666667, ans=0.1 2023-12-24 08:09:19,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1586026.6666666667, ans=0.2 2023-12-24 08:09:33,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1586160.0, ans=0.1 2023-12-24 08:09:35,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2023-12-24 08:09:44,431 INFO [train.py:886] (0/4) Epoch 50, batch 4400, loss[loss=0.01343, audio_tagging_loss=0.01343, over 24944.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4950441.69 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:09:54,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1586293.3333333333, ans=0.125 2023-12-24 08:10:01,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1586293.3333333333, ans=22.5 2023-12-24 08:10:11,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1586360.0, ans=0.1 2023-12-24 08:10:35,822 INFO [train.py:886] (0/4) Epoch 50, batch 4450, loss[loss=0.01188, audio_tagging_loss=0.01188, over 21650.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4942123.23 frames. ], batch size: 107, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:10:38,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1586560.0, ans=0.125 2023-12-24 08:10:39,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1586560.0, ans=0.2 2023-12-24 08:11:01,424 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.07 vs. limit=10.0 2023-12-24 08:11:01,844 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.798e+01 4.098e+01 4.310e+01 4.508e+01 5.882e+01, threshold=8.619e+01, percent-clipped=0.0 2023-12-24 08:11:05,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.59 vs. limit=15.0 2023-12-24 08:11:25,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1586826.6666666667, ans=0.1 2023-12-24 08:11:28,296 INFO [train.py:886] (0/4) Epoch 50, batch 4500, loss[loss=0.009627, audio_tagging_loss=0.009627, over 25000.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4941873.46 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:11:31,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=12.0 2023-12-24 08:11:34,105 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.29 vs. limit=15.0 2023-12-24 08:11:52,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1587026.6666666667, ans=0.5 2023-12-24 08:12:04,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.45 vs. limit=15.0 2023-12-24 08:12:08,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1587160.0, ans=0.2 2023-12-24 08:12:10,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1587160.0, ans=0.1 2023-12-24 08:12:19,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2023-12-24 08:12:20,253 INFO [train.py:886] (0/4) Epoch 50, batch 4550, loss[loss=0.01001, audio_tagging_loss=0.01001, over 25000.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4944500.97 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:12:25,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1587226.6666666667, ans=0.5 2023-12-24 08:12:36,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1587293.3333333333, ans=0.0 2023-12-24 08:12:39,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1587293.3333333333, ans=0.035 2023-12-24 08:12:39,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1587293.3333333333, ans=0.125 2023-12-24 08:12:40,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1587360.0, ans=0.0 2023-12-24 08:12:44,697 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.783e+01 4.050e+01 4.235e+01 4.395e+01 5.112e+01, threshold=8.470e+01, percent-clipped=0.0 2023-12-24 08:12:55,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1587426.6666666667, ans=0.0 2023-12-24 08:13:11,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1587560.0, ans=0.1 2023-12-24 08:13:12,186 INFO [train.py:886] (0/4) Epoch 50, batch 4600, loss[loss=0.01094, audio_tagging_loss=0.01094, over 25000.00 frames. ], tot_loss[loss=0.01046, audio_tagging_loss=0.01046, over 4945452.72 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:13:17,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1587560.0, ans=0.125 2023-12-24 08:13:23,125 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2023-12-24 08:13:32,528 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-24 08:13:37,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1587693.3333333333, ans=0.2 2023-12-24 08:13:45,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1587760.0, ans=0.1 2023-12-24 08:13:47,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1587760.0, ans=0.1 2023-12-24 08:13:56,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1587826.6666666667, ans=0.125 2023-12-24 08:14:03,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2023-12-24 08:14:04,477 INFO [train.py:886] (0/4) Epoch 50, batch 4650, loss[loss=0.01068, audio_tagging_loss=0.01068, over 25000.00 frames. ], tot_loss[loss=0.01043, audio_tagging_loss=0.01043, over 4947531.32 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:14:24,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1588026.6666666667, ans=0.1 2023-12-24 08:14:28,735 WARNING [optim.py:484] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.571e+01 4.073e+01 4.249e+01 4.503e+01 5.611e+01, threshold=8.499e+01, percent-clipped=0.0 2023-12-24 08:14:28,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1588026.6666666667, ans=0.0 2023-12-24 08:14:54,258 INFO [train.py:886] (0/4) Epoch 50, batch 4700, loss[loss=0.01198, audio_tagging_loss=0.01198, over 24750.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4951055.14 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:14:56,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1588226.6666666667, ans=0.025 2023-12-24 08:15:01,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.50 vs. limit=22.5 2023-12-24 08:15:05,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1588293.3333333333, ans=0.125 2023-12-24 08:15:15,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2023-12-24 08:15:35,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1588493.3333333333, ans=0.1 2023-12-24 08:15:42,033 INFO [train.py:886] (0/4) Epoch 50, batch 4750, loss[loss=0.01338, audio_tagging_loss=0.01338, over 24750.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4949968.83 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:15:49,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2023-12-24 08:15:57,245 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_as_full/epoch-50.pt 2023-12-24 08:15:58,943 INFO [train.py:1099] (0/4) Done!